[Serverless Knative] Knative Docs - Autoscaling

Autoscaling

Knative Serving provides automatic scaling, or autoscaling, for applications to match incoming demand. This is provided by default, by using the Knative Pod Autoscaler (KPA).

For example, if an application is receiving no traffic and scale to zero is enabled, Knative Serving scales the application down to zero replicas. If scaling to zero is disabled, the application is scaled down to the minimum number of replicas specified for applications on the cluster. Replicas are scaled up to meet demand if traffic to the application increases.

You can enable and disable scale to zero functionality for your cluster if you have cluster administrator permissions. See Configuring scale to zero - https://knative.dev/docs/serving/autoscaling/scale-to-zero/.

To use autoscaling for your application if it is enabled on your cluster, you must configure concurrency - https://knative.dev/docs/serving/autoscaling/concurrency/ and scale bounds - https://knative.dev/docs/serving/autoscaling/scale-bounds/.

Additional resources

References

[1] About autoscaling - Knative - https://knative.dev/docs/serving/autoscaling/

[2] Home - Knative - https://knative.dev/docs/

[3] Configuring scale to zero - https://knative.dev/docs/serving/autoscaling/scale-to-zero/

[4] concurrency - https://knative.dev/docs/serving/autoscaling/concurrency/

[5] scale bounds - https://knative.dev/docs/serving/autoscaling/scale-bounds/

[6] Install optional Serving extensions - https://knative.dev/docs/install/serving/install-serving-with-yaml/#install-optional-serving-extensions

[7] Horizontal Pod Autoscaling | Kubernetes - https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/