lab-toolsresearch
validate-evaluator
Calibrate LLM-as-Judge evaluators against human labels. Computes TPR, TNR, precision, recall, F1, and Cohen's kappa. Detects systematic biases and recommends prompt corrections. Produces a calibration report with confidence intervals. Triggers on: "validate evaluator", "calibrate judge", "judge accuracy", "evaluator validation", "judge metrics"
maintainer
Miosa-osa
更新於 3/19/2026
星標
172
分支
41
quick start
Installation and usage
Calibrate LLM-as-Judge evaluators against human labels. Computes TPR, TNR, precision, recall, F1, and Cohen's kappa. Detects systematic biases and recommends prompt corrections. Produces a calibration report with confidence intervals. Triggers on: "validate evaluator", "calibrate judge", "judge accuracy", "evaluator validation", "judge metrics"
安裝
$ install --globalskills.sh
使用
安裝後,您可以透過在終端機執行以下指令來使用此技能:
skills use validate-evaluator