Version: Next

Deploy Your First

Choose your path based on what you want to serve:

I want to...	Start here
Serve an LLM or generative AI model (standard deployment)	LLM with InferenceService (Standard)
Serve an LLM with advanced features (prefix-aware routing, disaggregated serving, fine-grained GPU scheduling)	LLM with LLMInferenceService (Advanced)
Serve a traditional ML model (scikit-learn, XGBoost, TensorFlow, etc.)	Predictive InferenceService

Not sure which LLM option to pick?

Start with InferenceService (Standard) — it works for most LLM use cases and is simpler to set up. You can migrate to LLMInferenceService later if you need its advanced capabilities.