machine-learningdata-ai
dpo
Direct Preference Optimization for learning from preference pairs. Covers DPOTrainer, preference dataset preparation, implicit reward modeling, and beta tuning for stable preference learning without explicit reward models. Includes thinking quality patterns.
maintainer
atrawog
更新於 1/12/2026
星標
0
分支
0
quick start
Installation and usage
Direct Preference Optimization for learning from preference pairs. Covers DPOTrainer, preference dataset preparation, implicit reward modeling, and beta tuning for stable preference learning without explicit reward models. Includes thinking quality patterns.
安裝
$ install --globalskills.sh
使用
安裝後,您可以透過在終端機執行以下指令來使用此技能:
skills use dpo