Deploy Your First
Choose your path based on what you want to serve:
| I want to... | Start here |
|---|---|
| Serve an LLM or generative AI model (standard deployment) | LLM with InferenceService (Standard) |
| Serve an LLM with advanced features (prefix-aware routing, disaggregated serving, fine-grained GPU scheduling) | LLM with LLMInferenceService (Advanced) |
| Serve a traditional ML model (scikit-learn, XGBoost, TensorFlow, etc.) | Predictive InferenceService |
Not sure which LLM option to pick?
Start with InferenceService (Standard) — it works for most LLM use cases and is simpler to set up. You can migrate to LLMInferenceService later if you need its advanced capabilities.