Horizontal Pod Autoscaling (HPA) governs the spinning up of additional pods when the existing resources (CPU and Memory) of the microservice are exhausted. The deletion of the additional pods occurs as and when the resources are free or restored for the microservice. In Adeptia Connect, Autoscaling is by default enabled. You can enable HPA in Adeptia Connect by setting the required parameters in the global values.yaml file.
To enable HPA, you need to set the parameters as described below for each of the microservices individually. You can find these parameters in the respective section of each microservice in the global values.yaml file.
...
Value in percentage of CPU requests set in the global values.yaml for the pods at which the HPA spins up a new pod.
...
Configuring HPA for runtime microservice
Like other microservices, the runtime microservice pods are adjusted (scaled up or scaled down) based on the two metrics – CPU utilization, and memory utilization. However, the parameters for configuring the runtime microservice for autoscaling slightly differ from those for the rest of the microservices.
The following table describes the autoscaling parameters for runtime microservice. You can find these parameters in the runtimeImage: section in the global values.yaml file.
...
Load balancing among the runtime pods
Kubernetes internally handles the load balancing of requests from a Queue to the runtime pods of the corresponding Deployment. There are two types of requests – Synchronous, and Asynchronous – that are processed by the runtime pods.
Synchronous requests are processed by any random runtime pod that is selected by Kubernetes Service when set to its default iptables
proxy mode.
For Synchronous API requests, Adeptia Connect checks for the readiness of the runtime pod by using the the following two threshold propertiesIn case of Synchronous API requests, you can make Adeptia Connect to first check if the runtime pod has enough CPU, or memory, or both available to be considered ready to accept any more request. The application performs this check by using the following two configurable threshold properties for runtime.
Tip |
---|
This readiness check is by default enabled for CPU and disabled for memory. You can choose to skip or perform this check based on either CPU, or memory, or both. |
- readiness.probe.cpu.thresholdthreshold
- readiness.probe.memory.thresholdthreshold
Property | Description |
---|---|
readiness.probe.cpu.threshold | Threshold value for CPU utilization (measured against(in percentage of the allocated CPU request) in percentage beyond whichfor CPU utilization. If the pod CPU utilization goes beyond this value, the pod will not accept any more requests to process. Ensure that this threshold value is always greater than the HPA target CPU utilization percentage value that has been set through the environment variable RUNTIME_AUTOSCALING_TARGETCPUUTILIZATIONPERCENTAGE in the global values.yaml file. You can keep the property value blank to skip this threshold check. |
readiness.probe.memory.threshold | Threshold value (in percentage of the minimum JVM memory XMS) for application memory utilization (measured against the allocated memory request) in percentage beyond which . If the application memory utilization goes beyond this value, the pod will not accept any more requests to process. Ensure that this threshold value is always greater than the HPA target memory utilization percentage value that has been set through the environment variable RUNTIME_AUTOSCALING_TARGETMEMORYUTILIZATIONPERCENTAGE in the global values.yaml file. You can keep the property value blank to skip this threshold check. |
You can find these properties at Account >Settings >Microservice Settings >Runtime >Readiness Probe Configuration. To know how to configure these properties, refer to the page Configuring the application properties.
The Asynchronous requests are processed based on the concurrency level you set for the runtime pods of the Deployment. For example, if there are three (3) runtime pods (each having a concurrency of 5) and eight (8) messages in the Queue, here is how they will be routed:
- The first runtime pod will take up five (5) of the eight (8) messages.
- The second runtime pod will take the rest of the three (3) messages.
- The third runtime pod will remain unoccupied until there are more than ten (10) messages at a time.
When all the three runtime pods are completely occupied, the other messages in the queue are prioritized and routed to a runtime pod when it gets free and has a vacancy.
Related Topics