Rollout Strategy Guide
Overview
KServe supports configurable rollout strategies for Standard Deployment Mode, allowing you to control how new versions of your models are deployed. Rollout strategies can be configured through the ConfigMap defaults or by setting Kubernetes DeploymentStrategy directly in the component extension spec.
Configuration Priority
The rollout strategy is applied with the following precedence:
- User-defined DeploymentStrategy (highest priority) - directly specified in component extension spec
- ConfigMap rollout strategy (fallback) - applies only when
defaultDeploymentModeis"Standard"
ConfigMap Configuration
When using ConfigMap configuration, you can specify maxSurge and maxUnavailable values directly. These values are applied to the Kubernetes deployment strategy when defaultDeploymentMode is set to "Standard".
Configuration
Method 1: Direct DeploymentStrategy Configuration (Recommended)
You can configure Kubernetes deployment strategy directly at the component level:
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: deployment-strategy-example
namespace: default
annotations:
serving.kserve.io/deploymentMode: "Standard"
spec:
predictor:
model:
modelFormat:
name: sklearn
storageUri: "s3://my-bucket/model"
# Direct Kubernetes deployment strategy configuration
deploymentStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: "0" # High availability
maxSurge: "1" # Allow one extra pod
transformer:
custom:
container:
image: my-transformer:latest
# Resource-efficient deployment strategy
deploymentStrategy:
type: RollingUpdate
rollingUpdate:
maxSurge: "0" # Resource efficient
maxUnavailable: "1" # Allow one pod to be unavailable
Method 2: ConfigMap Default Configuration
Configure defaults in the KServe ConfigMap that apply when no user-defined deployment strategy is specified and defaultDeploymentMode is set to "Standard":
apiVersion: v1
kind: ConfigMap
metadata:
name: inferenceservice-config
namespace: kserve
data:
deploy: |-
{
"defaultDeploymentMode": "Standard",
"deploymentRolloutStrategy": {
"defaultRollout": {
"mode": "Availability",
"maxSurge": "1",
"maxUnavailable": "1"
}
}
}
Rollout Strategy Modes
KServe supports two main rollout strategy approaches that you can configure either globally via ConfigMap or per-service via deploymentStrategy:
Availability Mode (Zero Downtime)
- Purpose: Ensures high availability during deployments by launching new pods first
- Configuration: Set
maxUnavailable: "0"andmaxSurgeto desired value/percentage - Behavior: New pods are created before old pods are terminated
- Use Case: Production environments where downtime is not acceptable
ResourceAware Mode (Resource Efficient)
- Purpose: Optimizes resource usage during deployments by terminating old pods first
- Configuration: Set
maxSurge: "0"andmaxUnavailableto desired value/percentage - Behavior: Old pods are terminated before new pods are created
- Use Case: Resource-constrained environments or cost optimization
Configuration Parameters
For both direct deploymentStrategy and ConfigMap configuration:
- maxSurge: Maximum number of pods that can be created above the desired replica count (e.g.,
"1","25%") - maxUnavailable: Maximum number of pods that can be unavailable during update (e.g.,
"1","25%")
KServe can configure default maxSurge and maxUnavailable values globally for all InferenceServices via ConfigMap. When users do not specify anything in the deploymentStrategy section of their InferenceService, the service will pick up these default values from the ConfigMap when defaultDeploymentMode is "Standard".
For direct DeploymentStrategy configuration:
- type: Should be
"RollingUpdate" - rollingUpdate.maxSurge: Same as above
- rollingUpdate.maxUnavailable: Same as above
KServe Defaults (When No ConfigMap Defaults)
If neither the InferenceService spec nor the ConfigMap defines rollout strategy values, KServe applies its own defaults:
- maxUnavailable:
25% - maxSurge:
25%
Special Case: Multinode Deployments
For multinode deployments (Ray workloads), KServe automatically overrides ALL rollout strategy configurations to ensure high availability:
- maxUnavailable:
0%(no pods are taken down during rollout) - maxSurge:
100%(can have up to double the number of pods during rollout)
Important: This override takes precedence over ALL other configurations, including:
- User-defined
deploymentStrategyin the component spec - ConfigMap rollout strategy defaults
- KServe default values
This behavior is triggered automatically when the RAY_NODE_COUNT environment variable is detected in the inference service configuration. It ensures that original pods remain available until new pods are ready, which is critical for maintaining distributed Ray cluster stability during updates.
Priority Order
The final rollout strategy values are determined by this priority order:
- Multinode deployment override (HIGHEST priority) - automatic for Ray workloads with
RAY_NODE_COUNTenvironment variable - User-defined DeploymentStrategy (high priority) - specified in component extension spec
- ConfigMap rollout strategy (fallback) - only applies when
defaultDeploymentModeis"Standard" - KServe default values (if no configuration is provided)
Important: The ConfigMap rollout strategy only applies when:
- No user-defined
deploymentStrategyis specified in the component spec - The
defaultDeploymentModein the ConfigMap is set to"Standard"
Default Values Summary
| Configuration | maxUnavailable | maxSurge | Notes |
|---|---|---|---|
| No rollout strategy specified | 25% | 25% | KServe defaults |
| Multinode deployment | 0% | 100% | Overrides ALL other configurations |
| Availability mode | 0 | <ratio> | From rollout spec |
| ResourceAware mode | <ratio> | 0 | From rollout spec |
Examples
Example 1: Availability Mode - High Availability Deployment
Availability Mode: Launch new pods first, terminate old pods after (zero downtime)
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: availability-mode-model
annotations:
serving.kserve.io/deploymentMode: "Standard"
spec:
predictor:
model:
modelFormat:
name: sklearn
storageUri: "s3://my-bucket/model"
# Availability mode: maxUnavailable = 0, maxSurge = desired value
deploymentStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: "0" # No pods unavailable during rollout
maxSurge: "100%" # Can double the pods during rollout
Behavior: New pods are created first, then old pods are terminated. Ensures zero downtime but uses more resources temporarily.
Example 2: ResourceAware Mode - Resource-Efficient Deployment
ResourceAware Mode: Terminate old pods first, launch new pods after (resource efficient)
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: resource-aware-model
annotations:
serving.kserve.io/deploymentMode: "Standard"
spec:
predictor:
model:
modelFormat:
name: sklearn
storageUri: "s3://my-bucket/model"
# ResourceAware mode: maxSurge = 0, maxUnavailable = desired value
deploymentStrategy:
type: RollingUpdate
rollingUpdate:
maxSurge: "0" # No extra pods during rollout
maxUnavailable: "25%" # Up to 25% of pods can be unavailable
Behavior: Old pods are terminated first, then new pods are created. Minimizes resource usage but may cause temporary unavailability.
Example 3: Using ConfigMap Global Defaults
When no deploymentStrategy is specified, the InferenceService picks up default values from the KServe ConfigMap:
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: configmap-defaults-model
annotations:
serving.kserve.io/deploymentMode: "Standard"
spec:
predictor:
model:
modelFormat:
name: sklearn
storageUri: "s3://my-bucket/model"
# No deploymentStrategy specified - will use ConfigMap defaults
# when defaultDeploymentMode is "Standard"
Behavior: Uses the global deploymentRolloutStrategy configuration from the KServe ConfigMap, allowing administrators to set organization-wide rollout policies.
Example 4: No Rollout Strategy (Uses KServe Defaults)
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: default-rollout-model
annotations:
serving.kserve.io/deploymentMode: "Standard"
spec:
predictor:
model:
modelFormat:
name: sklearn
storageUri: "s3://my-bucket/model"
# No rollout strategy specified - will use KServe defaults:
# maxUnavailable: "25%", maxSurge: "25%"
Note: For multinode deployments, even if no rollout strategy is specified, KServe will automatically override with maxUnavailable: "0%" and maxSurge: "100%" to ensure high availability.
Validation
For ConfigMap Configuration:
- mode is one of:
"Availability"or"ResourceAware" - maxSurge is a valid number or percentage string (e.g.,
"1","25%") - maxUnavailable is a valid number or percentage string (e.g.,
"1","25%")
For Direct DeploymentStrategy:
- type must be
"RollingUpdate" - rollingUpdate.maxSurge follows Kubernetes validation rules
- rollingUpdate.maxUnavailable follows Kubernetes validation rules
Best Practices
- Production Environments: Use
Availabilitymode with a ratio of25%to50%for most production workloads - Resource-Constrained Clusters: Use
ResourceAwaremode to minimize resource usage during deployments - Critical Services: Use
Availabilitymode with100%ratio for zero-downtime deployments - Testing: Use
ResourceAwaremode with50%ratio for development and testing environments
Migration from Previous Versions
If you were using the previous rollout strategy configuration, update your InferenceService specs to use the new approaches:
Option 1: Use Direct DeploymentStrategy (Recommended)
spec:
predictor:
deploymentStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: "0" # Availability mode
maxSurge: "25%" # Use the ratio value
Option 2: Use ConfigMap Configuration
Configure defaults in the ConfigMap instead of per-InferenceService configuration:
# ConfigMap configuration
data:
deploy: |-
{
"defaultDeploymentMode": "Standard",
"deploymentRolloutStrategy": {
"defaultRollout": {
"mode": "Availability",
"maxSurge": "25%",
"maxUnavailable": "25%"
}
}
}
# InferenceService (no deployment strategy needed)
spec:
predictor:
model:
# ... model configuration
# No rollout configuration - uses ConfigMap defaults
Troubleshooting
Common Issues
- ConfigMap not applied: Ensure the ConfigMap rollout strategy only applies when
defaultDeploymentModeis"Standard" - Invalid Mode: For ConfigMap configuration, ensure mode is exactly
"Availability"or"ResourceAware"(case-sensitive) - Invalid maxSurge/maxUnavailable: Ensure values are valid numbers or percentages (e.g.,
"1","25%") - Missing Annotation: Ensure
serving.kserve.io/deploymentMode: "Standard"is set for raw deployments - User strategy not taking precedence: Remember that user-defined
deploymentStrategyalways takes precedence over ConfigMap settings
Verification
To verify your rollout strategy is working correctly:
# Check the deployment strategy
kubectl get deployment <deployment-name> -o jsonpath='{.spec.strategy}'
# Check the rollout status
kubectl rollout status deployment <deployment-name>
# Check the pods during rollout
kubectl get pods -l app=<deployment-name>
Checking Configuration
To see what rollout strategy configuration is being applied:
# Check ConfigMap rollout defaults
kubectl get configmap inferenceservice-config -n kserve -o jsonpath='{.data.deploy}' | jq '.deploymentRolloutStrategy'
# Check the actual deployment strategy being used
kubectl get deployment <deployment-name> -o jsonpath='{.spec.strategy.rollingUpdate}'
# Check if user-defined deploymentStrategy is specified
kubectl get isvc <isvc-name> -o jsonpath='{.spec.predictor.deploymentStrategy}'
Understanding Which Strategy Is Applied
- If you see custom maxSurge/maxUnavailable values: Check if they match your user-defined
deploymentStrategy(highest priority) - If ConfigMap mode is "Availability": Expect
maxUnavailable: "0"andmaxSurge: <configured value> - If ConfigMap mode is "ResourceAware": Expect
maxSurge: "0"andmaxUnavailable: <configured value> - If you see
maxUnavailable: "25%"andmaxSurge: "25%": These are KServe defaults (no strategy configured anywhere)