home/categories/machine-learning/danielmiessler-personal-ai-infrastructure-releases-v4-0-0-claude-skills-utilities-evals-skill-md
machine-learningdata-ai
evals
Objective eval metrics via code/model/human graders with pass@k/pass^k scoring. USE WHEN eval, evaluate, test agent, benchmark, verify behavior, regression test, capability test, run eval, compare models, compare prompts, create judge, create use case, view results, failure to task, suite manager, transcript capture, trial runner.
maintainer
danielmiessler
更新日 2/28/2026
スター
11259
フォーク
1568
quick start
Installation and usage
Objective eval metrics via code/model/human graders with pass@k/pass^k scoring. USE WHEN eval, evaluate, test agent, benchmark, verify behavior, regression test, capability test, run eval, compare models, compare prompts, create judge, create use case, view results, failure to task, suite manager, transcript capture, trial runner.
インストール
$ install --globalskills.sh
使い方
インストール後、ターミナルで以下のコマンドを実行してこのスキルを使用できます:
skills use evals