machine-learningdata-ai
model-serving
LLM and ML model deployment for inference. Use when serving models in production, building AI APIs, or optimizing inference. Covers vLLM (LLM serving), TensorRT-LLM (GPU optimization), Ollama (local), BentoML (ML deployment), Triton (multi-model), LangChain (orchestration), LlamaIndex (RAG), and streaming patterns.
maintainer
ancoleman
Atualizado 12/9/2025
Estrelas
333
Forks
51
quick start
Installation and usage
LLM and ML model deployment for inference. Use when serving models in production, building AI APIs, or optimizing inference. Covers vLLM (LLM serving), TensorRT-LLM (GPU optimization), Ollama (local), BentoML (ML deployment), Triton (multi-model), LangChain (orchestration), LlamaIndex (RAG), and streaming patterns.
Instalação
$ install --globalskills.sh
Uso
Depois de instalar, você pode usar esta skill executando o seguinte comando no terminal:
skills use model-serving