Skip to main content
KServe Logo

KServe

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

Simple and Powerful API

KServe provides a Kubernetes Custom Resource Definition for serving predictive and generative machine learning models. It encapsulates the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features to your ML deployments.

Standard K8s API across ML frameworks
Pre/post processing and explainability
OpenAI specification support for LLMs
Canary rollouts and A/B testing
inferenceservice.yaml
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "llm-service"
spec:
predictor:
model:
modelFormat:
name: huggingface
resources:
limits:
cpu: "6"
memory: 24Gi
nvidia.com/gpu: "1"
storageUri: "hf://meta-llama/Llama-3.1-8B-Instruct"

Quick Start

Get started with KServe in minutes. Follow these simple steps to deploy your first model.

1

Install KServe

Install KServe and its dependencies on your Kubernetes cluster:

kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.11.0/kserve.yaml
2

Create an InferenceService

Deploy a pre-trained model with a simple YAML configuration:

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "qwen-llm"
spec:
predictor:
model:
modelFormat:
name: huggingface
storageUri: "hf://Qwen/Qwen2.5-0.5B-Instruct"
resources:
requests:
cpu: "1"
memory: 4Gi
nvidia.com/gpu: "1"
3

Send Inference Requests

Make predictions using the deployed model:

curl -v -H "Host: qwen-llm.default.example.com" \
http://localhost:8080/openai/v1/chat/completions -d @./prompt.json

Trusted by Industry Leaders

KServe is used in production by organizations across various industries, providing reliable model inference at scale.

Bloomberg logo
IBM logo
Red Hat logo
NVIDIA logo
AMD logo
Kubeflow logo
Cloudera logo
Canonical logo
Cisco logo
Gojek logo
Inspur logo
Max Kelsen logo
Prosus logo
Wikimedia Foundation logo
Naver Corporation logo
Zillow logo
Striveworks logo
Cars24 logo
Upstage logo
Intuit logo
Alauda logo

Ready to Transform Your ML Deployment?

Simplify your journey from model development to production with KServe's standardized inference platform for both predictive and generative AI models