openjudge

Build custom LLM evaluation pipelines using the OpenJudge framework. Covers selecting and configuring graders (LLM-based, function-based, agentic), running batch evaluations with GradingRunner, combining scores with aggregators, applying evaluation strategies (voting, average), auto-generating graders from data, and analyzing results (pairwise win rates, statistics, validation metrics). Use when the user wants to evaluate LLM outputs, compare multiple models, design scoring criteria, or build an automated evaluation system.

View Source machine-learning

maintainer

agentscope-ai

Updated 3/10/2026

Stars

537

Forks

quick start

Installation and usage

Installation

$ install --globalskills.sh

Usage

Once installed, you can use this skill by running the following command in your terminal:

skills use openjudge