Skip to main content
Version: Next

ModelMesh Installation

ModelMesh installation provides high-scale, high-density model serving for scenarios with frequent model changes and large numbers of models, making it particularly well-suited for predictive inference workloads.

It uses a distributed architecture particularly designed for:

  • High-scale model serving
  • Multi-model management
  • Intelligent model loading
  • Efficient resource utilization
  • Frequent model updates
When to use ModelMesh

Choose ModelMesh when you have many models that aren't all needed at the same time — it dynamically loads and evicts models to maximize resource utilization. If you have a small number of models that are always running, standard Kubernetes deployment is simpler. For LLMs and generative inference, use Standard Kubernetes Deployment instead.


Use Cases

ModelMesh is designed for predictive inference use cases where:

  • You have many models (hundreds to thousands)
  • Models are frequently updated or changed
  • Resource efficiency is critical
  • You need intelligent model placement and caching
  • Model inference times are relatively short
  • Models can share computational resources efficiently

Prerequisites

RequirementDetails
Kubernetesv1.32+
kubectlConfigured with cluster admin access
PermissionsCluster admin

Installation

Option 1: Quick Install

curl -s "https://raw.githubusercontent.com/kserve/modelmesh-serving/release-0.12.0/scripts/install.sh" | bash

Option 2: Manual Installation

Step 1 — Install etcd (model metadata storage):

kubectl apply -f https://raw.githubusercontent.com/kserve/modelmesh-serving/release-0.12.0/config/dependencies/etcd.yaml

Step 2 — Install ModelMesh Serving:

kubectl apply -f https://raw.githubusercontent.com/kserve/modelmesh-serving/release-0.12.0/config/default/modelmesh-serving.yaml

Step 3 — Install KServe Controller:

kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.17.0/kserve.yaml

Configuration

Enable ModelMesh Mode

Set ModelMesh as the default deployment mode in KServe:

kubectl patch configmap inferenceservice-config -n kserve-system -p '{
"data": {
"deploy": "{\"defaultDeploymentMode\": \"ModelMesh\"}"
}
}'

Storage Configuration

Configure a model storage backend (example: MinIO/S3):

apiVersion: v1
kind: Secret
metadata:
name: model-storage-config
namespace: modelmesh-serving
data:
localMinIO: |
{
"type": "s3",
"access_key_id": "minioadmin",
"secret_access_key": "minioadmin",
"endpoint_url": "http://minio.minio.svc.cluster.local:9000",
"default_bucket": "modelmesh-example-models",
"region": "us-south"
}

Features

Intelligent Model Management

  • Model Caching: Frequently accessed models stay in memory
  • LRU Eviction: Least recently used models are evicted when memory is full
  • Predictive Loading: Models can be pre-loaded based on usage patterns

High Density Serving

  • Resource Sharing: Multiple models share the same runtime pods
  • Dynamic Loading: Models are loaded and unloaded as needed
  • Efficient Packing: Optimal placement of models across available resources

Performance Optimization

  • Fast Model Loading: Optimized model loading and caching
  • Connection Pooling: Efficient request routing to model instances
  • Minimal Overhead: Low latency model switching