home/categories/machine-learning/agentscope-ai-openjudge-skills-openjudge-skill-md
machine-learningdata-ai

openjudge

Build custom LLM evaluation pipelines using the OpenJudge framework. Covers selecting and configuring graders (LLM-based, function-based, agentic), running batch evaluations with GradingRunner, combining scores with aggregators, applying evaluation strategies (voting, average), auto-generating graders from data, and analyzing results (pairwise win rates, statistics, validation metrics). Use when the user wants to evaluate LLM outputs, compare multiple models, design scoring criteria, or build an automated evaluation system.

agentscope-ai
maintainer
agentscope-ai
更新于 3/10/2026
星标
537
分支
45
quick start

Installation and usage

Build custom LLM evaluation pipelines using the OpenJudge framework. Covers selecting and configuring graders (LLM-based, function-based, agentic), running batch evaluations with GradingRunner, combining scores with aggregators, applying evaluation strategies (voting, average), auto-generating graders from data, and analyzing results (pairwise win rates, statistics, validation metrics). Use when the user wants to evaluate LLM outputs, compare multiple models, design scoring criteria, or build an automated evaluation system.

安装
$ install --globalskills.sh
使用

安装后,您可以通过在终端运行以下命令来使用此技能:

skills use openjudge