home/categories/machine-learning/atrawog-bazzite-ai-plugins-bazzite-ai-jupyter-skills-reward-skill-md
machine-learningdata-ai
reward
Reward model training for RLHF pipelines. Covers RewardTrainer, preference dataset preparation, sequence classification heads, and reward scaling for stable reinforcement learning. Includes thinking quality scoring patterns.
maintainer
atrawog
Atualizado 1/12/2026
Estrelas
0
Forks
0
quick start
Installation and usage
Reward model training for RLHF pipelines. Covers RewardTrainer, preference dataset preparation, sequence classification heads, and reward scaling for stable reinforcement learning. Includes thinking quality scoring patterns.
Instalação
$ install --globalskills.sh
Uso
Depois de instalar, você pode usar esta skill executando o seguinte comando no terminal:
skills use reward