llm-aidata-ai
grpo
Group Relative Policy Optimization for reinforcement learning from human feedback. Covers GRPOTrainer, reward function design, policy optimization, and KL divergence constraints for stable RLHF training. Includes thinking-aware reward patterns.
maintainer
atrawog
Mis à jour 1/12/2026
Étoiles
0
Forks
0
quick start
Installation and usage
Group Relative Policy Optimization for reinforcement learning from human feedback. Covers GRPOTrainer, reward function design, policy optimization, and KL divergence constraints for stable RLHF training. Includes thinking-aware reward patterns.
Installation
$ install --globalskills.sh
Utilisation
Après l'installation, vous pouvez utiliser ce skill en exécutant la commande suivante dans votre terminal :
skills use grpo