model-serving

LLM and ML model deployment for inference. Use when serving models in production, building AI APIs, or optimizing inference. Covers vLLM (LLM serving), TensorRT-LLM (GPU optimization), Ollama (local), BentoML (ML deployment), Triton (multi-model), LangChain (orchestration), LlamaIndex (RAG), and streaming patterns.

View Source machine-learning

maintainer

ancoleman

Updated 12/9/2025

Stars

333

Forks

quick start

Installation and usage

Installation

$ install --globalskills.sh

Usage

Once installed, you can use this skill by running the following command in your terminal:

skills use model-serving