home/categories/machine-learning/danielmiessler-personal-ai-infrastructure-releases-v4-0-0-claude-skills-utilities-evals-skill-md
machine-learningdata-ai
evals
Objective eval metrics via code/model/human graders with pass@k/pass^k scoring. USE WHEN eval, evaluate, test agent, benchmark, verify behavior, regression test, capability test, run eval, compare models, compare prompts, create judge, create use case, view results, failure to task, suite manager, transcript capture, trial runner.
maintainer
danielmiessler
آخر تحديث 2/28/2026
النجوم
11259
التفرعات
1568
quick start
Installation and usage
Objective eval metrics via code/model/human graders with pass@k/pass^k scoring. USE WHEN eval, evaluate, test agent, benchmark, verify behavior, regression test, capability test, run eval, compare models, compare prompts, create judge, create use case, view results, failure to task, suite manager, transcript capture, trial runner.
التثبيت
$ install --globalskills.sh
الاستخدام
بعد التثبيت، يمكنك استخدام هذه المهارة بتشغيل الأمر التالي في الطرفية:
skills use evals