home/categories/machine-learning/orchestra-research-ai-research-skills-06-post-training-verl-skill-md
machine-learningdata-ai

verl-rl-training

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

Orchestra-Research
maintainer
Orchestra-Research
اپ ڈیٹ ہوا 1/29/2026
اسٹارز
6563
فورکس
515
quick start

Installation and usage

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

انسٹالیشن
$ install --globalskills.sh
استعمال

انسٹال کرنے کے بعد، آپ یہ اسکل ٹرمینل میں درج ذیل کمانڈ چلا کر استعمال کر سکتے ہیں:

skills use verl-rl-training