machine-learningdata-ai
dpo
Direct Preference Optimization for learning from preference pairs. Covers DPOTrainer, preference dataset preparation, implicit reward modeling, and beta tuning for stable preference learning without explicit reward models. Includes thinking quality patterns.
maintainer
atrawog
Mis à jour 1/12/2026
Étoiles
0
Forks
0
quick start
Installation and usage
Direct Preference Optimization for learning from preference pairs. Covers DPOTrainer, preference dataset preparation, implicit reward modeling, and beta tuning for stable preference learning without explicit reward models. Includes thinking quality patterns.
Installation
$ install --globalskills.sh
Utilisation
Après l'installation, vous pouvez utiliser ce skill en exécutant la commande suivante dans votre terminal :
skills use dpo