Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Horizontal Pod Autoscaling (HPA) governs the spinning up of additional pods when the existing resources (CPU and memory) of the microservice are exhausted or the message count threshold (runtime) for the queue is exceeded. The deletion of the additional pods occurs as and when the resources and the message count values are below their threshold values.  

...

  • The autoscaling of runtime pods happens based on the threshold values for Message Queue, CPU, and memory you set in the global values.yaml file. To use KEDA, you first need to enable it by setting the value for the type variable to keda under global > config > autoscaling section in the values.yaml file as shown in the following screenshot. To set the other relevant parameters, for example, the threshold number of messages in the Message Queue, refer to this section.

    Tip
    For a dedicated runtime (Deployment) pod, you need to set the threshold values for Message Queue, CPU, and memory while creating the Deployment. For more details, refer to this page.


  • The autoscaling of other microservices' pods happens based only on the threshold values for CPU and memory you set in the global values.yaml file. For more details, refer to this section.

When you use Kubernetes' HPA,

  • The autoscaling of runtime pods happens based only on the threshold values for CPU, and memory you set in the global values.yaml file. To set the relevant parameters, for example, the threshold values for CPU and memory, refer to this section.

    Tip
    Ensure that the value for the type variable under global > config > autoscaling section in the values.yaml file is set to hpa.  


    Tip
    For a dedicated runtime (Deployment) pod, you need to set the threshold values for CPU and memory while creating the Deployment. For more details, refer to this page.


  • The autoscaling of the other microservices' pods happens based only on the threshold values for CPU and memory you set in the global values.yaml file. To set the relevant parameters, refer to this section.

Anchor
HPA runtime microserviceHPA
runtime microservice

Configuring

...

autoscaling for runtime microservice

Like other microservices, the runtime microservice pods are adjusted (scaled up or scaled down) based on the two metrics – CPU utilization, and memory utilization. However, the parameters for configuring the runtime microservice for autoscaling slightly differ from those for the rest of the microservices. 

The following table describes the autoscaling parameters for runtime microservice. You can find these parameters in the runtimeImage: section in the global values.yaml file.

      enabled: RUNTIME_AUTOSCALING_ENABLED:       type:           stabilizationWindowSeconds: 
ParameterDescriptionDefault value

autoscaling:

Anchor
typetypeParameter to enable HPA by setting its value to true.true

RUNTIME_MIN_POD:

Anchor
RUNTIME_AUTOSCALING_TYPE
RUNTIME_AUTOSCALING_TYPE
Minimum number of pods.1

RUNTIME_MAX_POD:

The maximum number of pods the runtime microservice can scale up to.1

RUNTIME_AUTOSCALING_TYPE

Parameter to define whether you want the autoscaling to happen based on cpu or memory or both. The possible values for this parameter can be cpu, memory, and cpu-memory.cpu

criteria:

applicable only when keda is enabled

cpu: truememory: false
      minReplicas:Minimum number of pods for a microservice.1
      maxReplicas:The maximum number of pods a microservice can scale up to.1
      targetCPUUtilizationPercentage: 

RUNTIME_AUTOSCALING_CRITERIA_MESSAGE_COUNT: true

RUNTIME_AUTOSCALING_CRITERIA_CPU: true

RUNTIME_AUTOSCALING_CRITERIA_MEMORY: false



RUNTIME_AUTOSCALING_TARGETCPUUTILIZATIONPERCENTAGE:Value in percentage of CPU requests set in the global values.yaml for the runtime pods at which the HPA spins up a new pod.400     
targetMemoryUtilizationPercentage: RUNTIME_AUTOSCALING_TARGETMEMORYUTILIZATIONPERCENTAGE:Value in percentage of memory requests set in the global values.yaml for the runtime pods at which the HPA spins up a new pod.400
      behavior:        scaleUp:RUNTIME_AUTOSCALING_QUEUE_MESSAGE_COUNT: 5

RUNTIME_SCALE_UP_STABILIZATION_WINDOW_SECONDS:The duration (in seconds) for which the application keeps a watch on the spikes in the resource utilization by the currently running pods. This helps in determining whether scaling up is required or not.300         
maxPodToScaleUpRUNTIME_MAX_POD_TO_SCALE_UP:The maximum number of pods a the runtime microservice can scale up to at a time.1         
periodSecondsRUNTIME_SCALE_UP_PERIOD_SECONDS:The time duration (in seconds) that sets the frequency of tracking the spikes in the resource utilization by the currently running pods.60        scaleDown:
          stabilizationWindowSeconds: RUNTIME_SCALE_DOWN_STABILIZATION_WINDOW_SECONDS:The duration (in seconds) for which the application keeps a watch for drop in resource utilization by the currently running pods. This helps in determining whether scaling down is required or not.300         
maxPodToScaleDown: RUNTIME_MAX_POD_TO_SCALE_DOWN:The maximum number of pods a the runtime microservice can scale down to at a time.1         
periodSeconds: RUNTIME_SCALE_DOWN_PERIOD_SECONDS:The time duration (in seconds) that sets the frequency of tracking the drop in the resource utilization by the currently running pods.60


...

Anchor

...

other microservices
other microservices

Configuring

...

autoscaling for other microservices

...

(excluding runtime) 

To enable HPA, you need to set the parameters as described below for each of the microservices individually. You can find these parameters in the respective section of each microservice in the global values.yaml file.

RUNTIME_AUTOSCALING_ENABLED:RUNTIME_AUTOSCALING_TARGETCPUUTILIZATIONPERCENTAGE:
ParameterDescriptionDefault value

autoscaling:

Anchor
type
type


      enabled: 

Parameter to enable HPA by setting its value to true.true

RUNTIME_MIN_POD:

Anchor
RUNTIME_AUTOSCALING_TYPERUNTIME_AUTOSCALING_TYPEMinimum number of pods.1

RUNTIME_MAX_POD:

The maximum number of pods the runtime microservice can scale up to.1

RUNTIME_AUTOSCALING_TYPE

       type: Parameter to define whether you want the autoscaling to happen based on cpu or memory or both. The possible values for this parameter can be cpu, memory, and cpu-memory.cpu

RUNTIME_AUTOSCALING_CRITERIA_MESSAGE_COUNT: true

RUNTIME_AUTOSCALING_CRITERIA_CPU: true

RUNTIME_AUTOSCALING_CRITERIA_MEMORY: false

criteria:

applicable only when keda is enabled



cpu: true
memory: false
      minReplicas:Minimum number of pods for a microservice.1
      maxReplicas:The maximum number of pods a microservice can scale up to.1
      targetCPUUtilizationPercentage: 

Value in percentage of CPU requests set in the global values.yaml for the

runtime

pods at which the HPA spins up a new pod.

400RUNTIME
_AUTOSCALING_TARGETMEMORYUTILIZATIONPERCENTAGE:      targetMemoryUtilizationPercentage: Value in percentage of memory requests set in the global values.yaml for the runtime pods at which the HPA spins up a new pod.400
RUNTIME_AUTOSCALING_QUEUE_MESSAGE_COUNT: 5RUNTIME_SCALE_UP_STABILIZATION_WINDOW_SECONDS:      behavior:

        scaleUp:

          stabilizationWindowSeconds: The duration (in seconds) for which the application keeps a watch on the spikes in the resource utilization by the currently running pods. This helps in determining whether scaling up is required or not.300
RUNTIME_MAX_POD_TO_SCALE_UP          maxPodToScaleUp:The maximum number of pods the runtime a microservice can scale up to at a time.1
RUNTIME_SCALE_UP_PERIOD_SECONDS          periodSeconds:The time duration (in seconds) that sets the frequency of tracking the spikes in the resource utilization by the currently running pods.60RUNTIME_SCALE_DOWN_STABILIZATION_WINDOW_SECONDS:
        scaleDown:

          stabilizationWindowSeconds: The duration (in seconds) for which the application keeps a watch for drop in resource utilization by the currently running pods. This helps in determining whether scaling down is required or not.300RUNTIME
_MAX_POD_TO_SCALE_DOWN:          maxPodToScaleDown: The maximum number of pods the runtime a microservice can scale down to at a time.1RUNTIME
_SCALE_DOWN_PERIOD_SECONDS:          periodSeconds: The time duration (in seconds) that sets the frequency of tracking the drop in the resource utilization by the currently running pods.60


Load balancing among the runtime pods 

Kubernetes internally handles the load balancing of requests from a Queue to the runtime pods of the corresponding Deployment. There are two types of requests – Synchronous, and Asynchronous – that are processed by the runtime pods. 

...

When all the three runtime pods are completely occupied, the other messages in the queue are prioritized and routed to a runtime pod when it gets free and has a vacancy.

...



...

Related topic

Creating a Deployment

...