torch-pipeline-parallelism

This skill provides guidance for implementing PyTorch pipeline parallelism for distributed training of large language models. It should be used when implementing pipeline parallel training loops, partitioning transformer models across GPUs, or working with AFAB (All-Forward-All-Backward) scheduling patterns. The skill covers model partitioning, inter-rank communication, gradient flow management, and common pitfalls in distributed training implementations.

ソースを表示 machine-learning

maintainer

letta-ai

更新日 1/19/2026

スター

フォーク

quick start

Installation and usage

インストール

$ install --globalskills.sh

使い方

インストール後、ターミナルで以下のコマンドを実行してこのスキルを使用できます：

skills use torch-pipeline-parallelism