home/categories/academic/davila7-claude-code-templates-cli-tool-components-skills-ai-research-evaluation-lm-evaluation-harness-skill-md
academicresearch
evaluating-llms-harness
Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.
maintainer
davila7
Atualizado 1/20/2026
Estrelas
17577
Forks
1576
quick start
Installation and usage
Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.
Instalação
$ install --globalskills.sh
Uso
Depois de instalar, você pode usar esta skill executando o seguinte comando no terminal:
skills use evaluating-llms-harness