llm-aidata-ai
rlhf
Understanding Reinforcement Learning from Human Feedback (RLHF) for aligning language models. Use when learning about preference data, reward modeling, policy optimization, or direct alignment algorithms like DPO.
maintainer
itsmostafa
Atualizado 1/5/2026
Estrelas
10
Forks
0
quick start
Installation and usage
Understanding Reinforcement Learning from Human Feedback (RLHF) for aligning language models. Use when learning about preference data, reward modeling, policy optimization, or direct alignment algorithms like DPO.
Instalação
$ install --globalskills.sh
Uso
Depois de instalar, você pode usar esta skill executando o seguinte comando no terminal:
skills use rlhf