home/categories/machine-learning/allenai-vla-evaluation-harness-claude-skills-run-evaluation-skill-md
machine-learningdata-ai

run-evaluation

Run a VLA model evaluation against a simulation benchmark. Use this skill whenever the user wants to evaluate, benchmark, test, or run a model on a sim environment — even if they say it casually like 'try OpenVLA on LIBERO' or 'get me CALVIN scores'. Covers the full workflow: serving the model, launching the benchmark, sharding for speed, merging results, and interpreting output.

allenai
maintainer
allenai
Actualizado 4/5/2026
Estrellas
208
Forks
15
quick start

Installation and usage

Run a VLA model evaluation against a simulation benchmark. Use this skill whenever the user wants to evaluate, benchmark, test, or run a model on a sim environment — even if they say it casually like 'try OpenVLA on LIBERO' or 'get me CALVIN scores'. Covers the full workflow: serving the model, launching the benchmark, sharding for speed, merging results, and interpreting output.

Instalación
$ install --globalskills.sh
Uso

Después de instalarlo, puedes usar este skill ejecutando el siguiente comando en tu terminal:

skills use run-evaluation