home/categories/machine-learning/ancoleman-ai-design-components-skills-model-serving-skill-md
machine-learningdata-ai

model-serving

LLM and ML model deployment for inference. Use when serving models in production, building AI APIs, or optimizing inference. Covers vLLM (LLM serving), TensorRT-LLM (GPU optimization), Ollama (local), BentoML (ML deployment), Triton (multi-model), LangChain (orchestration), LlamaIndex (RAG), and streaming patterns.

ancoleman
maintainer
ancoleman
更新於 12/9/2025
星標
333
分支
51
quick start

Installation and usage

LLM and ML model deployment for inference. Use when serving models in production, building AI APIs, or optimizing inference. Covers vLLM (LLM serving), TensorRT-LLM (GPU optimization), Ollama (local), BentoML (ML deployment), Triton (multi-model), LangChain (orchestration), LlamaIndex (RAG), and streaming patterns.

安裝
$ install --globalskills.sh
使用

安裝後,您可以透過在終端機執行以下指令來使用此技能:

skills use model-serving