machine-learningdata-ai
dpo
Direct Preference Optimization for learning from preference pairs. Covers DPOTrainer, preference dataset preparation, implicit reward modeling, and beta tuning for stable preference learning without explicit reward models. Includes thinking quality patterns.
maintainer
atrawog
Updated 1/12/2026
Stars
0
Forks
0
quick start
Installation and usage
Direct Preference Optimization for learning from preference pairs. Covers DPOTrainer, preference dataset preparation, implicit reward modeling, and beta tuning for stable preference learning without explicit reward models. Includes thinking quality patterns.
Installation
$ install --globalskills.sh
Usage
Once installed, you can use this skill by running the following command in your terminal:
skills use dpo