...
The autoscaling of runtime pods can happen based on the threshold values for Message Queue or CPU or memory, or any combination of these three parameters. You can make these configurations in the global values.yaml file.
To use KEDA, you first need to enable it by setting the value for the type variable to keda under global > config > autoscaling section in the values.yaml file as shown in the following screenshot. To set the other relevant parameters, for example, the threshold number of messages in the Message Queue, refer to this section.
Tip For a dedicated runtime (Deployment) pod, you need to set the threshold values for Message Queue, CPU, and memory while creating the Deployment. For more details, refer to this page. - The autoscaling of other microservices' pods can happen based on the threshold values for CPU or memory, or both. You can make these configurations in the global values.yaml file. For more details, refer to this section.
When you use Kubernetes' HPA,
The autoscaling of runtime pods can happen based on the threshold values for CPU or memory, or both. You can make these configurations in the global values.yaml file. To set the relevant parameters in the values.yaml file, refer to this section.
Tip Ensure that the value for the type variable under global > config > autoscaling section in the values.yaml file is set to hpa. Tip For a dedicated runtime (Deployment) pod, you need to set the threshold values for CPU and memory while creating the Deployment. For more details, refer to this page. - The autoscaling of the other microservices' pods can happen based on the threshold values for CPU or memory, or both. You can make these configurations in the global values.yaml file. To set the relevant parameters in the values.yaml file, refer to this section.
Anchor | ||||
---|---|---|---|---|
|
...
The parameters for configuring the runtime microservice for autoscaling slightly differ from those for the rest of the microservices. The following table describes the autoscaling parameters for runtime microservice. You can find these parameters in the the runtimeImage: section in the global values.yaml file.
Parameter | Description | Default value | ||||||
---|---|---|---|---|---|---|---|---|
RUNTIME_AUTOSCALING_ENABLED: | Parameter to enable autoscaling by setting its value to true. | true | ||||||
RUNTIME_MIN_POD:
| Minimum number of pods. | 1 | ||||||
RUNTIME_MAX_POD: | The maximum number of pods the runtime microservice can scale up to. | 1 | ||||||
RUNTIME_AUTOSCALING_CRITERIA_MESSAGE_COUNT: | Variable to define whether you want the autoscaling to happen based on Message Queue count. Setting the value for this variable to true denotes that the autoscaling of the runtime pod happens based on the number of messages in queued state in the Message Queue.
| true | ||||||
RUNTIME_AUTOSCALING_CRITERIA_CPU: | Variable to define whether you want the autoscaling to happen based on CPU usage. Setting the value for this variable to true denotes that the autoscaling of the runtime pod happens based on the CPU usage. | true | ||||||
RUNTIME_AUTOSCALING_CRITERIA_MEMORY: | Variable to define whether you want the autoscaling to happen based on memory usage. Setting the value for this variable to true denotes that the autoscaling of the runtime pod happens based on the memory usage. | false | ||||||
RUNTIME_AUTOSCALING_QUEUE_MESSAGE_COUNT: The threshold value of the number of messages in queued state in the Message Queue at which KEDA spins up a new pod.
| 5 | RUNTIME_AUTOSCALING_TARGETCPUUTILIZATIONPERCENTAGE: | Value in percentage of CPU requests set in the global values.yaml for the runtime pods at which a new pod spins up. | 400 | ||||
RUNTIME_AUTOSCALING_TARGETMEMORYUTILIZATIONPERCENTAGE: | Value in percentage of memory requests set in the global values.yaml for the runtime pods at which a new pod spins up. | 400 | ||||||
RUNTIME_AUTOSCALING_QUEUE_MESSAGE_COUNT: | The threshold value of the number of messages in queued state in the Message Queue at which KEDA spins up a new pod.
| |||||||
RUNTIME_SCALE_UP_STABILIZATION_WINDOW_SECONDS: | The duration (in seconds) for which the application keeps a watch on the spikes in the resource utilization by the currently running pods. This helps in determining whether scaling up is required or not. | 300 | ||||||
RUNTIME_MAX_POD_TO_SCALE_UP: | The maximum number of pods the runtime microservice can scale up to at a time. | 1 | ||||||
RUNTIME_SCALE_UP_PERIOD_SECONDS: | The time duration (in seconds) that sets the frequency of tracking the spikes in the resource utilization by the currently running pods. | 60 | ||||||
RUNTIME_SCALE_DOWN_STABILIZATION_WINDOW_SECONDS: | The duration (in seconds) for which the application keeps a watch for drop in resource utilization by the currently running pods. This helps in determining whether scaling down is required or not. | 300 | ||||||
RUNTIME_MAX_POD_TO_SCALE_DOWN: | The maximum number of pods the runtime microservice can scale down to at a time. | 1 | ||||||
RUNTIME_SCALE_DOWN_PERIOD_SECONDS: | The time duration (in seconds) that sets the frequency of tracking the drop in the resource utilization by the currently running pods. | 60 |
...
To enable HPA, you need to set the parameters as described below for each of the microservices individually. You can find these parameters in the respective section of each microservice in the global values.yaml file.
Parameter | Description | Default value | ||||||
---|---|---|---|---|---|---|---|---|
autoscaling:
| ||||||||
enabled: | Parameter to enable autoscaling by setting its value to true. | true | ||||||
criteria: | ||||||||
cpu: | Variable to define whether you want the autoscaling to happen based on CPU usage. Setting the value for this variable to true denotes that the autoscaling of the microservices pods happens based on the CPU usage. | true | ||||||
memory: | Variable to define whether you want the autoscaling to happen based on memory usage. Setting the value for this variable to true denotes that the autoscaling of the microservices pods happens based on the memory usage. | false | ||||||
minReplicas: | Minimum number of pods for a microservice. | 1 | ||||||
maxReplicas: | The maximum number of pods a microservice can scale up to. | 1 | ||||||
targetCPUUtilizationPercentage: | Value in percentage of CPU requests set in the global values.yaml for the pods at which the HPA spins up a new pod. | 400 | ||||||
targetMemoryUtilizationPercentage: | Value in percentage of memory requests set in the global values.yaml for the pods at which the HPA spins up a new pod. | 400 | ||||||
behavior: | ||||||||
scaleUp: | ||||||||
stabilizationWindowSeconds: | The duration (in seconds) for which the application keeps a watch on the spikes in the resource utilization by the currently running pods. This helps in determining whether scaling up is required or not. | 300 | ||||||
maxPodToScaleUp: | The maximum number of pods a microservice can scale up to at a time. | 1 | ||||||
periodSeconds: | The time duration (in seconds) that sets the frequency of tracking the spikes in the resource utilization by the currently running pods. | 60 |
scaleDown: | ||
stabilizationWindowSeconds: | The duration (in seconds) for which the application keeps a watch for drop in resource utilization by the currently running pods. This helps in determining whether scaling down is required or not. | 300 |
maxPodToScaleDown: | The maximum number of pods a microservice can scale down to at a time. | 1 |
periodSeconds: | The time duration (in seconds) that sets the frequency of tracking the drop in the resource utilization by the currently running pods. | 60 |
Load balancing among the runtime pods
...