home/categories/debugging/nvidia-tensorrt-llm-claude-skills-perf-torch-sync-free-skill-md
debuggingtools

perf-torch-sync-free

Identify and eliminate host-device synchronizations in PyTorch code. Detects sync points (.item(), .cpu(), boolean indexing, torch.tensor on CUDA), classifies false vs true dependencies, provides sync-free alternatives. Triggers: sync-free, synchronization, .item(), .cpu(), host-device sync, eliminate syncs, CPU stall, non_blocking, set_sync_debug_mode, cudaStreamSynchronize, cudaEventSynchronize, remove syncs, async GPU.

NVIDIA
maintainer
NVIDIA
更新于 4/8/2026
星标
13335
分支
2271
quick start

Installation and usage

Identify and eliminate host-device synchronizations in PyTorch code. Detects sync points (.item(), .cpu(), boolean indexing, torch.tensor on CUDA), classifies false vs true dependencies, provides sync-free alternatives. Triggers: sync-free, synchronization, .item(), .cpu(), host-device sync, eliminate syncs, CPU stall, non_blocking, set_sync_debug_mode, cudaStreamSynchronize, cudaEventSynchronize, remove syncs, async GPU.

安装
$ install --globalskills.sh
使用

安装后,您可以通过在终端运行以下命令来使用此技能:

skills use perf-torch-sync-free