home/categories/lab-tools/miosa-osa-canopy-library-skills-ai-patterns-validate-evaluator-skill-md
lab-toolsresearch

validate-evaluator

Calibrate LLM-as-Judge evaluators against human labels. Computes TPR, TNR, precision, recall, F1, and Cohen's kappa. Detects systematic biases and recommends prompt corrections. Produces a calibration report with confidence intervals. Triggers on: "validate evaluator", "calibrate judge", "judge accuracy", "evaluator validation", "judge metrics"

Miosa-osa
maintainer
Miosa-osa
আপডেট হয়েছে 3/19/2026
স্টার
172
ফর্ক
41
quick start

Installation and usage

Calibrate LLM-as-Judge evaluators against human labels. Computes TPR, TNR, precision, recall, F1, and Cohen's kappa. Detects systematic biases and recommends prompt corrections. Produces a calibration report with confidence intervals. Triggers on: "validate evaluator", "calibrate judge", "judge accuracy", "evaluator validation", "judge metrics"

ইনস্টলেশন
$ install --globalskills.sh
ব্যবহার

ইনস্টল করার পর, টার্মিনালে নিচের কমান্ড চালিয়ে আপনি এই স্কিল ব্যবহার করতে পারবেন:

skills use validate-evaluator