machine-learningdata-ai
model-serving
LLM and ML model deployment for inference. Use when serving models in production, building AI APIs, or optimizing inference. Covers vLLM (LLM serving), TensorRT-LLM (GPU optimization), Ollama (local), BentoML (ML deployment), Triton (multi-model), LangChain (orchestration), LlamaIndex (RAG), and streaming patterns.
maintainer
ancoleman
更新於 12/9/2025
星標
333
分支
51
quick start
Installation and usage
LLM and ML model deployment for inference. Use when serving models in production, building AI APIs, or optimizing inference. Covers vLLM (LLM serving), TensorRT-LLM (GPU optimization), Ollama (local), BentoML (ML deployment), Triton (multi-model), LangChain (orchestration), LlamaIndex (RAG), and streaming patterns.
安裝
$ install --globalskills.sh
使用
安裝後,您可以通過在終端運行以下命令來使用此技能:
skills use model-serving