llm-aidata-ai
grpo
Group Relative Policy Optimization for reinforcement learning from human feedback. Covers GRPOTrainer, reward function design, policy optimization, and KL divergence constraints for stable RLHF training. Includes thinking-aware reward patterns.
maintainer
atrawog
Actualizado 1/12/2026
Estrellas
0
Forks
0
quick start
Installation and usage
Group Relative Policy Optimization for reinforcement learning from human feedback. Covers GRPOTrainer, reward function design, policy optimization, and KL divergence constraints for stable RLHF training. Includes thinking-aware reward patterns.
Instalación
$ install --globalskills.sh
Uso
Después de instalarlo, puedes usar este skill ejecutando el siguiente comando en tu terminal:
skills use grpo