home/categories/machine-learning/letta-ai-skills-letta-benchmarks-trajectory-only-torch-pipeline-parallelism-skill-md
machine-learningdata-ai

torch-pipeline-parallelism

This skill provides guidance for implementing PyTorch pipeline parallelism for distributed training of large language models. It should be used when implementing pipeline parallel training loops, partitioning transformer models across GPUs, or working with AFAB (All-Forward-All-Backward) scheduling patterns. The skill covers model partitioning, inter-rank communication, gradient flow management, and common pitfalls in distributed training implementations.

letta-ai
maintainer
letta-ai
更新于 1/19/2026
星标
31
分支
5
quick start

Installation and usage

This skill provides guidance for implementing PyTorch pipeline parallelism for distributed training of large language models. It should be used when implementing pipeline parallel training loops, partitioning transformer models across GPUs, or working with AFAB (All-Forward-All-Backward) scheduling patterns. The skill covers model partitioning, inter-rank communication, gradient flow management, and common pitfalls in distributed training implementations.

安装
$ install --globalskills.sh
使用

安装后,您可以通过在终端运行以下命令来使用此技能:

skills use torch-pipeline-parallelism