Version: Next

ModelMesh Installation

ModelMesh installation provides high-scale, high-density model serving for scenarios with frequent model changes and large numbers of models, making it particularly well-suited for predictive inference workloads.

It uses a distributed architecture particularly designed for:

High-scale model serving
Multi-model management
Intelligent model loading
Efficient resource utilization
Frequent model updates

When to use ModelMesh

Choose ModelMesh when you have many models that aren't all needed at the same time — it dynamically loads and evicts models to maximize resource utilization. If you have a small number of models that are always running, standard Kubernetes deployment is simpler. For LLMs and generative inference, use Standard Kubernetes Deployment instead.

Use Cases

ModelMesh is designed for predictive inference use cases where:

You have many models (hundreds to thousands)
Models are frequently updated or changed
Resource efficiency is critical
You need intelligent model placement and caching
Model inference times are relatively short
Models can share computational resources efficiently

Prerequisites

Requirement	Details
Kubernetes	v1.32+
kubectl	Configured with cluster admin access
Permissions	Cluster admin

Installation

Option 1: Quick Install

curl -s "https://raw.githubusercontent.com/kserve/modelmesh-serving/release-0.12.0/scripts/install.sh" | bash

Option 2: Manual Installation

Step 1 — Install etcd (model metadata storage):

kubectl apply -f https://raw.githubusercontent.com/kserve/modelmesh-serving/release-0.12.0/config/dependencies/etcd.yaml

Step 2 — Install ModelMesh Serving:

kubectl apply -f https://raw.githubusercontent.com/kserve/modelmesh-serving/release-0.12.0/config/default/modelmesh-serving.yaml

Step 3 — Install KServe Controller:

kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.17.0/kserve.yaml

Configuration

Enable ModelMesh Mode

Set ModelMesh as the default deployment mode in KServe:

kubectl patch configmap inferenceservice-config -n kserve-system -p '{
  "data": {
    "deploy": "{\"defaultDeploymentMode\": \"ModelMesh\"}"
  }
}'

Storage Configuration

Configure a model storage backend (example: MinIO/S3):

apiVersion: v1
kind: Secret
metadata:
  name: model-storage-config
  namespace: modelmesh-serving
data:
  localMinIO: |
    {
      "type": "s3",
      "access_key_id": "minioadmin",
      "secret_access_key": "minioadmin",
      "endpoint_url": "http://minio.minio.svc.cluster.local:9000",
      "default_bucket": "modelmesh-example-models",
      "region": "us-south"
    }

Features

Intelligent Model Management

Model Caching: Frequently accessed models stay in memory
LRU Eviction: Least recently used models are evicted when memory is full
Predictive Loading: Models can be pre-loaded based on usage patterns

High Density Serving

Resource Sharing: Multiple models share the same runtime pods
Dynamic Loading: Models are loaded and unloaded as needed
Efficient Packing: Optimal placement of models across available resources

Performance Optimization

Fast Model Loading: Optimized model loading and caching
Connection Pooling: Efficient request routing to model instances
Minimal Overhead: Low latency model switching

Use Cases​

Prerequisites​

Installation​

Option 1: Quick Install​

Option 2: Manual Installation​

Configuration​

Enable ModelMesh Mode​

Storage Configuration​

Features​

Intelligent Model Management​

High Density Serving​

Performance Optimization​