home/categories/machine-learning/atrawog-bazzite-ai-plugins-bazzite-ai-jupyter-skills-reward-skill-md
machine-learningdata-ai
reward
Reward model training for RLHF pipelines. Covers RewardTrainer, preference dataset preparation, sequence classification heads, and reward scaling for stable reinforcement learning. Includes thinking quality scoring patterns.
maintainer
atrawog
업데이트됨 1/12/2026
스타
0
포크
0
quick start
Installation and usage
Reward model training for RLHF pipelines. Covers RewardTrainer, preference dataset preparation, sequence classification heads, and reward scaling for stable reinforcement learning. Includes thinking quality scoring patterns.
설치
$ install --globalskills.sh
사용법
설치 후 터미널에서 다음 명령을 실행하여 이 스킬을 사용할 수 있습니다:
skills use reward