Skip to main content
Version: Next

Welcome to KServe

Deploy and scale AI models effortlessly โ€” from cutting-edge generative AI and large language models to traditional ML models โ€” with enterprise-grade reliability across any cloud or on-premises environment.

CNCF Incubating Project

KServe is a CNCF incubating project and part of the Kubeflow ecosystem.


Why KServe?โ€‹

KServe eliminates the complexity of productionizing AI models. Whether you're a data scientist, DevOps engineer, or platform architect, KServe provides a unified solution that works across clouds and scales with your needs.

๐Ÿš€ Minutes to Production

Deploy GenAI services and ML models with simple YAML โ€” no complex infrastructure setup required.

โ˜๏ธ Cloud-Agnostic

Run anywhere: AWS, Azure, GCP, on-premises, or hybrid environments with consistent behavior.

๐Ÿ“ˆ Enterprise-Scale Ready

Scale to zero when idle, handle traffic spikes automatically, and manage hundreds of models efficiently.


Key Benefitsโ€‹

FeatureDescription
LLM Multi-frameworkDeploy LLMs from Hugging Face, vLLM, and custom generative models
OpenAI-Compatible APIsChat completion, streaming, and embedding endpoints out of the box
LocalModelCacheReduce LLM startup time from 15โ€“20 minutes to ~1 minute
KV Cache OffloadingOptimized memory management for long conversations and large contexts
Multi-node InferenceDistributed LLM serving across multiple nodes
Envoy AI GatewayEnterprise-grade API management and routing for AI workloads
Metric-based AutoscalingScale on token throughput, queue depth, and GPU utilization
Canary DeploymentsA/B testing and canary rollouts for LLM experiments
โ†’ Full Generative AI docs

Architecture Overviewโ€‹

KServe consists of two main planes:

๐ŸŽ›๏ธ Control Plane

๐Ÿ“ก Data Plane

KServe extends Kubernetes with custom resources for AI/ML workloads โ€” handling load balancing, autoscaling, canary deployments, and monitoring automatically. Pluggable runtimes let you use the best engine per model type: vLLM for LLMs, TorchServe for PyTorch, or custom containers.


Supported Frameworksโ€‹


Get Startedโ€‹

Learning path: Tutorial โ†’ Core concepts โ†’ Production setup โ†’ API reference


Community & Supportโ€‹

ChannelLink
GitHubgithub.com/kserve/kserve โ€” issues, PRs, releases
SlackCNCF Slack #kserve โ€” questions and discussion
Community MeetingsMonthly calendar โ€” open to all
AdoptersSee who's using KServe