run-evaluation

Run a VLA model evaluation against a simulation benchmark. Use this skill whenever the user wants to evaluate, benchmark, test, or run a model on a sim environment — even if they say it casually like 'try OpenVLA on LIBERO' or 'get me CALVIN scores'. Covers the full workflow: serving the model, launching the benchmark, sharding for speed, merging results, and interpreting output.

Voir le code source machine-learning

maintainer

allenai

Mis à jour 4/5/2026

Étoiles

208

Forks

quick start

Installation and usage

Installation

$ install --globalskills.sh

Utilisation

Après l'installation, vous pouvez utiliser ce skill en exécutant la commande suivante dans votre terminal :

skills use run-evaluation