machine-learningdata-ai
simpo-training
Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.
maintainer
math-inc
更新於 3/19/2026
星標
1165
分支
96
quick start
Installation and usage
Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.
安裝
$ install --globalskills.sh
使用
安裝後,您可以透過在終端機執行以下指令來使用此技能:
skills use simpo-training