home/categories/machine-learning/danielmiessler-personal-ai-infrastructure-releases-v4-0-0-claude-skills-utilities-evals-skill-md
machine-learningdata-ai

evals

Objective eval metrics via code/model/human graders with pass@k/pass^k scoring. USE WHEN eval, evaluate, test agent, benchmark, verify behavior, regression test, capability test, run eval, compare models, compare prompts, create judge, create use case, view results, failure to task, suite manager, transcript capture, trial runner.

danielmiessler
maintainer
danielmiessler
업데이트됨 2/28/2026
스타
11259
포크
1568
quick start

Installation and usage

Objective eval metrics via code/model/human graders with pass@k/pass^k scoring. USE WHEN eval, evaluate, test agent, benchmark, verify behavior, regression test, capability test, run eval, compare models, compare prompts, create judge, create use case, view results, failure to task, suite manager, transcript capture, trial runner.

설치
$ install --globalskills.sh
사용법

설치 후 터미널에서 다음 명령을 실행하여 이 스킬을 사용할 수 있습니다:

skills use evals