home/categories/debugging/nvidia-tensorrt-llm-claude-skills-perf-torch-sync-free-skill-md
debuggingtools

perf-torch-sync-free

Identify and eliminate host-device synchronizations in PyTorch code. Detects sync points (.item(), .cpu(), boolean indexing, torch.tensor on CUDA), classifies false vs true dependencies, provides sync-free alternatives. Triggers: sync-free, synchronization, .item(), .cpu(), host-device sync, eliminate syncs, CPU stall, non_blocking, set_sync_debug_mode, cudaStreamSynchronize, cudaEventSynchronize, remove syncs, async GPU.

NVIDIA
maintainer
NVIDIA
اپ ڈیٹ ہوا 4/8/2026
اسٹارز
13335
فورکس
2271
quick start

Installation and usage

Identify and eliminate host-device synchronizations in PyTorch code. Detects sync points (.item(), .cpu(), boolean indexing, torch.tensor on CUDA), classifies false vs true dependencies, provides sync-free alternatives. Triggers: sync-free, synchronization, .item(), .cpu(), host-device sync, eliminate syncs, CPU stall, non_blocking, set_sync_debug_mode, cudaStreamSynchronize, cudaEventSynchronize, remove syncs, async GPU.

انسٹالیشن
$ install --globalskills.sh
استعمال

انسٹال کرنے کے بعد، آپ یہ اسکل ٹرمینل میں درج ذیل کمانڈ چلا کر استعمال کر سکتے ہیں:

skills use perf-torch-sync-free