home/categories/machine-learning/atrawog-bazzite-ai-plugins-bazzite-ai-jupyter-skills-reward-skill-md
machine-learningdata-ai

reward

Reward model training for RLHF pipelines. Covers RewardTrainer, preference dataset preparation, sequence classification heads, and reward scaling for stable reinforcement learning. Includes thinking quality scoring patterns.

atrawog
maintainer
atrawog
Atualizado 1/12/2026
Estrelas
0
Forks
0
quick start

Installation and usage

Reward model training for RLHF pipelines. Covers RewardTrainer, preference dataset preparation, sequence classification heads, and reward scaling for stable reinforcement learning. Includes thinking quality scoring patterns.

Instalação
$ install --globalskills.sh
Uso

Depois de instalar, você pode usar esta skill executando o seguinte comando no terminal:

skills use reward