Horizontal Pod Autoscaling (HPA) governs the spinning up of additional pods when the existing resources (CPU and memory) of the microservice are exhausted or the message count threshold (runtime) for the queue is exceeded. The deletion of the additional pods occurs as and when the resources and the message count values are below their threshold values.
In Adeptia Connect, you can configure and use either Kubernetes HPA (default), or Kubernetes Event Driven Autoscaler (KEDA) for autoscaling of the microservices' pods. If you want to autoscale the runtime pods based on Message Queue, including CPU, and memory, you need to use KEDA.
When you use KEDA,
The autoscaling of runtime pods happens based on the threshold values for Message Queue, CPU, and memory you set in the global values.yaml file. To enable KEDA, refer to this section. For more details, refer to this section.
Tip For a dedicated runtime (Deployment) pod, you need to set the threshold values for Message Queue, CPU, and memory while creating the Deployment. For more details, refer to this page. - The autoscaling of other microservices' pods happens based only on the threshold values for CPU and memory you set in the global values.yaml file. For more details, refer to this section.
To install KEDA, refer to this page.
When you use Kubernetes' HPA,
...
Anchor | ||||
---|---|---|---|---|
|
Configuring HPA for microservices (excluding runtime)
...
(same for
...
keda and kubernetes HPA)
To enable HPA, you need to set the parameters as described below for each of the microservices individually. You can find these parameters in the respective section of each microservice in the global values.yaml file.
Parameter | Description | Default value | ||||||
---|---|---|---|---|---|---|---|---|
autoscaling:
| ||||||||
enabled: | Parameter to enable HPA by setting its value to true. | true | ||||||
cpu | ||||||||
criteria: applicable only when keda is enabled | ||||||||
cpu: true | ||||||||
memory: false | ||||||||
minReplicas: | Minimum number of pods for a microservice. | 1 | ||||||
maxReplicas: | The maximum number of pods a microservice can scale up to. | 1 | ||||||
targetCPUUtilizationPercentage: | Value in percentage of CPU requests set in the global values.yaml for the pods at which the HPA spins up a new pod. | 400 | ||||||
targetMemoryUtilizationPercentage: | Value in percentage of memory requests set in the global values.yaml for the pods at which the HPA spins up a new pod. | 400 | ||||||
behavior: | ||||||||
scaleUp: | ||||||||
stabilizationWindowSeconds: | The duration (in seconds) for which the application keeps a watch on the spikes in the resource utilization by the currently running pods. This helps in determining whether scaling up is required or not. | 300 | ||||||
maxPodToScaleUp: | The maximum number of pods a microservice can scale up to at a time. | 1 | ||||||
periodSeconds: | The time duration (in seconds) that sets the frequency of tracking the spikes in the resource utilization by the currently running pods. | 60 | ||||||
scaleDown: | ||||||||
stabilizationWindowSeconds: | The duration (in seconds) for which the application keeps a watch for drop in resource utilization by the currently running pods. This helps in determining whether scaling down is required or not. | 300 | ||||||
maxPodToScaleDown: | The maximum number of pods a microservice can scale down to at a time. | 1 | ||||||
periodSeconds: | The time duration (in seconds) that sets the frequency of tracking the drop in the resource utilization by the currently running pods. | 60 |
Anchor | ||||
---|---|---|---|---|
|
Configuring Kubernetes' HPA for runtime microservice
...
Parameter | Description | Default value | ||||||
---|---|---|---|---|---|---|---|---|
RUNTIME_AUTOSCALING_ENABLED: | Parameter to enable HPA by setting its value to true. | true | ||||||
RUNTIME_MIN_POD:
| Minimum number of pods. | 1 | ||||||
RUNTIME_MAX_POD: | The maximum number of pods the runtime microservice can scale up to. | 1 | ||||||
| cpu | |||||||
RUNTIME_AUTOSCALING_CRITERIA_MESSAGE_COUNT: true RUNTIME_AUTOSCALING_CRITERIA_CPU: true RUNTIME_AUTOSCALING_CRITERIA_MEMORY: false | ||||||||
RUNTIME_AUTOSCALING_TARGETCPUUTILIZATIONPERCENTAGE: | Value in percentage of CPU requests set in the global values.yaml for the runtime pods at which the HPA spins up a new pod. | 400 | ||||||
RUNTIME_AUTOSCALING_TARGETMEMORYUTILIZATIONPERCENTAGE: | Value in percentage of memory requests set in the global values.yaml for the runtime pods at which the HPA spins up a new pod. | 400 | ||||||
RUNTIME_AUTOSCALING_QUEUE_MESSAGE_COUNT: 5 | ||||||||
RUNTIME_SCALE_UP_STABILIZATION_WINDOW_SECONDS: | The duration (in seconds) for which the application keeps a watch on the spikes in the resource utilization by the currently running pods. This helps in determining whether scaling up is required or not. | 300 | ||||||
RUNTIME_MAX_POD_TO_SCALE_UP: | The maximum number of pods the runtime microservice can scale up to at a time. | 1 | ||||||
RUNTIME_SCALE_UP_PERIOD_SECONDS: | The time duration (in seconds) that sets the frequency of tracking the spikes in the resource utilization by the currently running pods. | 60 | ||||||
RUNTIME_SCALE_DOWN_STABILIZATION_WINDOW_SECONDS: | The duration (in seconds) for which the application keeps a watch for drop in resource utilization by the currently running pods. This helps in determining whether scaling down is required or not. | 300 | ||||||
RUNTIME_MAX_POD_TO_SCALE_DOWN: | The maximum number of pods the runtime microservice can scale down to at a time. | 1 | ||||||
RUNTIME_SCALE_DOWN_PERIOD_SECONDS: | The time duration (in seconds) that sets the frequency of tracking the drop in the resource utilization by the currently running pods. | 60 |
...
When all the three runtime pods are completely occupied, the other messages in the queue are prioritized and routed to a runtime pod when it gets free and has a vacancy.
Anchor runtime Config KEDA runtime Config KEDA
Configuring KEDA for runtime microservice
Anchor | ||||
---|---|---|---|---|
|
...
Related topic
...