llm-aidata-ai
grpo
Group Relative Policy Optimization for reinforcement learning from human feedback. Covers GRPOTrainer, reward function design, policy optimization, and KL divergence constraints for stable RLHF training. Includes thinking-aware reward patterns.
maintainer
atrawog
Atualizado 1/12/2026
Estrelas
0
Forks
0
quick start
Installation and usage
Group Relative Policy Optimization for reinforcement learning from human feedback. Covers GRPOTrainer, reward function design, policy optimization, and KL divergence constraints for stable RLHF training. Includes thinking-aware reward patterns.
Instalação
$ install --globalskills.sh
Uso
Depois de instalar, você pode usar esta skill executando o seguinte comando no terminal:
skills use grpo