home/categories/code-quality/ukgovernmentbeis-inspect-evals-claude-skills-eval-validity-review-skill-md
code-qualitytesting-security

eval-validity-review

Review a single evaluation's validity — whether its claims hold up, whether its name is accurate, whether samples can be both succeeded and failed at, and whether scoring measures ground truth. Use when user asks to check validity of an eval, or as part of the Master Checklist workflow. Do NOT use for code quality or test coverage (use eval-quality-workflow or ensure-test-coverage instead).

UKGovernmentBEIS
maintainer
UKGovernmentBEIS
업데이트됨 3/27/2026
스타
432
포크
286
quick start

Installation and usage

Review a single evaluation's validity — whether its claims hold up, whether its name is accurate, whether samples can be both succeeded and failed at, and whether scoring measures ground truth. Use when user asks to check validity of an eval, or as part of the Master Checklist workflow. Do NOT use for code quality or test coverage (use eval-quality-workflow or ensure-test-coverage instead).

설치
$ install --globalskills.sh
사용법

설치 후 터미널에서 다음 명령을 실행하여 이 스킬을 사용할 수 있습니다:

skills use eval-validity-review