home/categories/framework-internals/nvidia-tensorrt-llm-claude-skills-perf-torch-cuda-graphs-skill-md
framework-internalsdevelopment

perf-torch-cuda-graphs

Apply CUDA Graphs to PyTorch workloads — API selection (torch.compile, PyTorch make_graphed_callables, TE make_graphed_callables, MCore CudaGraphManager, FullCudaGraphWrapper, manual torch.cuda.graph), code compatibility, capture workflows, dynamic pattern handling, and troubleshooting. Triggers: CUDA graph, torch.cuda.graph, make_graphed_callables, reduce-overhead, graph capture, graph replay, kernel launch overhead, CudaGraphManager, FullCudaGraphWrapper, full-iteration graph, stream capture.

NVIDIA
maintainer
NVIDIA
更新於 4/8/2026
星標
13335
分支
2271
quick start

Installation and usage

Apply CUDA Graphs to PyTorch workloads — API selection (torch.compile, PyTorch make_graphed_callables, TE make_graphed_callables, MCore CudaGraphManager, FullCudaGraphWrapper, manual torch.cuda.graph), code compatibility, capture workflows, dynamic pattern handling, and troubleshooting. Triggers: CUDA graph, torch.cuda.graph, make_graphed_callables, reduce-overhead, graph capture, graph replay, kernel launch overhead, CudaGraphManager, FullCudaGraphWrapper, full-iteration graph, stream capture.

安裝
$ install --globalskills.sh
使用

安裝後,您可以透過在終端機執行以下指令來使用此技能:

skills use perf-torch-cuda-graphs