home/categories/machine-learning/danielmiessler-personal-ai-infrastructure-releases-v4-0-0-claude-skills-utilities-evals-skill-md
machine-learningdata-ai
evals
Objective eval metrics via code/model/human graders with pass@k/pass^k scoring. USE WHEN eval, evaluate, test agent, benchmark, verify behavior, regression test, capability test, run eval, compare models, compare prompts, create judge, create use case, view results, failure to task, suite manager, transcript capture, trial runner.
maintainer
danielmiessler
اپ ڈیٹ ہوا 2/28/2026
اسٹارز
11259
فورکس
1568
quick start
Installation and usage
Objective eval metrics via code/model/human graders with pass@k/pass^k scoring. USE WHEN eval, evaluate, test agent, benchmark, verify behavior, regression test, capability test, run eval, compare models, compare prompts, create judge, create use case, view results, failure to task, suite manager, transcript capture, trial runner.
انسٹالیشن
$ install --globalskills.sh
استعمال
انسٹال کرنے کے بعد، آپ یہ اسکل ٹرمینل میں درج ذیل کمانڈ چلا کر استعمال کر سکتے ہیں:
skills use evals