home/categories/machine-learning/danielmiessler-personal-ai-infrastructure-releases-v4-0-0-claude-skills-utilities-evals-skill-md
machine-learningdata-ai

evals

Objective eval metrics via code/model/human graders with pass@k/pass^k scoring. USE WHEN eval, evaluate, test agent, benchmark, verify behavior, regression test, capability test, run eval, compare models, compare prompts, create judge, create use case, view results, failure to task, suite manager, transcript capture, trial runner.

danielmiessler
maintainer
danielmiessler
Mis à jour 2/28/2026
Étoiles
11259
Forks
1568
quick start

Installation and usage

Objective eval metrics via code/model/human graders with pass@k/pass^k scoring. USE WHEN eval, evaluate, test agent, benchmark, verify behavior, regression test, capability test, run eval, compare models, compare prompts, create judge, create use case, view results, failure to task, suite manager, transcript capture, trial runner.

Installation
$ install --globalskills.sh
Utilisation

Après l'installation, vous pouvez utiliser ce skill en exécutant la commande suivante dans votre terminal :

skills use evals