llm-aidata-ai
grpo
Group Relative Policy Optimization for reinforcement learning from human feedback. Covers GRPOTrainer, reward function design, policy optimization, and KL divergence constraints for stable RLHF training. Includes thinking-aware reward patterns.
maintainer
atrawog
اپ ڈیٹ ہوا 1/12/2026
اسٹارز
0
فورکس
0
quick start
Installation and usage
Group Relative Policy Optimization for reinforcement learning from human feedback. Covers GRPOTrainer, reward function design, policy optimization, and KL divergence constraints for stable RLHF training. Includes thinking-aware reward patterns.
انسٹالیشن
$ install --globalskills.sh
استعمال
انسٹال کرنے کے بعد، آپ یہ اسکل ٹرمینل میں درج ذیل کمانڈ چلا کر استعمال کر سکتے ہیں:
skills use grpo