home/categories/framework-internals/tdimino-claude-code-minoan-skills-integration-automation-llama-cpp-skill-md
framework-internalsdevelopment

llama-cpp

Secondary local LLM inference engine via llama.cpp. This skill should be used when running GGUF models directly, loading LoRA adapters for Kothar, benchmarking inference speed, or serving models via llama-server. Includes dedicated Qwen 3.5 serve scripts (9B dense with F16 option, 35B MoE) with asymmetric KV cache and thinking mode. Complements Ollama (which remains primary for RLAMA and general use).

tdimino
maintainer
tdimino
更新于 3/2/2026
星标
18
分支
2
quick start

Installation and usage

Secondary local LLM inference engine via llama.cpp. This skill should be used when running GGUF models directly, loading LoRA adapters for Kothar, benchmarking inference speed, or serving models via llama-server. Includes dedicated Qwen 3.5 serve scripts (9B dense with F16 option, 35B MoE) with asymmetric KV cache and thinking mode. Complements Ollama (which remains primary for RLAMA and general use).

安装
$ install --globalskills.sh
使用

安装后,您可以通过在终端运行以下命令来使用此技能:

skills use llama-cpp