Horizontal Pod Autoscaling (HPA) governs the spinning up of additional pods when the existing resources (CPU and Memory) of the microservice are exhausted. The deletion of the additional pods occurs as and when the resources are free or restored for the microservice. In Adeptia Connect, Autoscaling is by default enabled. You can enable HPA in Adeptia Connect by setting the required parameters in the global values.yaml file.
...
Parameter | Description | Default value | ||||||
---|---|---|---|---|---|---|---|---|
autoscaling:
| ||||||||
enabled: | Parameter to enable HPA by setting its value to true. | true | ||||||
type: | Parameter to define whether you want the autoscaling to happen based on cpu or memory or both. The possible values for this parameter can be cpu, memory, and cpu-memory. | cpu | ||||||
minReplicas: | Minimum number of pods for a microservice. | 1 | ||||||
maxReplicas: | The maximum number of pods a microservice can scale up to. | 1 | ||||||
targetCPUUtilizationPercentage: | Value in percentage of CPU requests set in the global values.yaml for runtimethe pods at which the autoscaler HPA spins up a new pod. | 400 | ||||||
targetMemoryUtilizationPercentage: | Value in percentage of memory requests set in the global values.yaml for runtimethe podsat which the autoscaler HPA spins up a new pod. | 400 | ||||||
behavior: | ||||||||
scaleUp: | ||||||||
stabilizationWindowSeconds: | The duration (in seconds) for which the application keeps a watch on the spikes in the resource utilization by the currently running pods. This helps in determining whether scaling up is required or not. | 300 | ||||||
maxPodToScaleUp: | The maximum number of pods a microservice can scale up to at a time. | 1 | ||||||
periodSeconds: | The time duration (in seconds) that sets the frequency of tracking the spikes in the resource utilization by the currently running pods. | 60 | ||||||
scaleDown: | ||||||||
stabilizationWindowSeconds: | The duration (in seconds) for which the application keeps a watch for drop in resource utilization by the currently running pods. This helps in determining whether scaling down is required or not. | 300 | ||||||
maxPodToScaleDown: | The maximum number of pods a microservice can scale down to at a time. | 1 | ||||||
periodSeconds: | The time duration (in seconds) that sets the frequency of tracking the drop in the resource utilization by the currently running pods. | 60 |
...
Parameter | Description | Default value | ||||||
---|---|---|---|---|---|---|---|---|
RUNTIME_AUTOSCALING_ENABLED: | Parameter to enable HPA by setting its value to true. | true | ||||||
RUNTIME_MIN_POD:
| Minimum number of pods. | 1 | ||||||
RUNTIME_MAX_POD: | The maximum number of pods the runtime microservice can scale up to. | 1 | ||||||
RUNTIME_AUTOSCALING_TYPE | Parameter to define whether you want the autoscaling to happen based on cpu or memory or both. The possible values for this parameter can be cpu, memory, and cpu-memory. | cpu | ||||||
RUNTIME_AUTOSCALING_TARGETCPUUTILIZATIONPERCENTAGE: | The value Value in percentage of CPU utilization (in percentage) requests set in the global values.yaml for the runtime pods at which the autoscaler HPA spins up a new pod. | 400 | ||||||
RUNTIME_AUTOSCALING_TARGETMEMORYUTILIZATIONPERCENTAGE: | The value Value in percentage of memory utilization (in percentage) requests set in the global values.yaml for the runtime podsat which the autoscaler HPA spins up a new pod. | 400 | ||||||
RUNTIME_SCALE_UP_STABILIZATION_WINDOW_SECONDS: | The duration (in seconds) for which the application keeps a watch on the spikes in the resource utilization by the currently running pods. This helps in determining whether scaling up is required or not. | 300 | ||||||
RUNTIME_MAX_POD_TO_SCALE_UP: | The maximum number of pods the runtime microservice can scale up to at a time. | 1 | ||||||
RUNTIME_SCALE_UP_PERIOD_SECONDS: | The time duration (in seconds) that sets the frequency of tracking the spikes in the resource utilization by the currently running pods. | 60 | ||||||
RUNTIME_SCALE_DOWN_STABILIZATION_WINDOW_SECONDS: | The duration (in seconds) for which the application keeps a watch for drop in resource utilization by the currently running pods. This helps in determining whether scaling down is required or not. | 300 | ||||||
RUNTIME_MAX_POD_TO_SCALE_DOWN: | The maximum number of pods the runtime microservice can scale down to at a time. | 1 | ||||||
RUNTIME_SCALE_DOWN_PERIOD_SECONDS: | The time duration (in seconds) that sets the frequency of tracking the drop in the resource utilization by the currently running pods. | 60 |
Load balancing among the runtime pods
Kubernetes internally handles the load balancing of requests from a Queue to the runtime pods of the corresponding Deployment. There are two types of requests – Synchronous, and Asynchronous – that are processed by the runtime pods.
Synchronous requests are processed by any random runtime pod that is selected by Kubernetes Service when set to its default iptables
proxy mode.
The Asynchronous requests are processed based on the concurrency level you set for the runtime pods of the Deployment. For example, if there are three (3) runtime pods (each having a concurrency of 5) and eight (8) messages in the Queue, here is how they will be routed:
- The first runtime pod will take up five (5) of the eight (8) messages.
- The second runtime pod will take the rest of the three (3) messages.
- The third runtime pod will remain unoccupied until there are more than ten (10) messages at a time.
When all the three runtime pods are completely occupied, the other messages in the queue are prioritized and routed to a runtime pod when it gets free and has a vacancy.