home/categories/debugging/nvidia-tensorrt-llm-claude-skills-perf-torch-sync-free-skill-md
debuggingtools

perf-torch-sync-free

Identify and eliminate host-device synchronizations in PyTorch code. Detects sync points (.item(), .cpu(), boolean indexing, torch.tensor on CUDA), classifies false vs true dependencies, provides sync-free alternatives. Triggers: sync-free, synchronization, .item(), .cpu(), host-device sync, eliminate syncs, CPU stall, non_blocking, set_sync_debug_mode, cudaStreamSynchronize, cudaEventSynchronize, remove syncs, async GPU.

NVIDIA
maintainer
NVIDIA
آخر تحديث 4/8/2026
النجوم
13335
التفرعات
2271
quick start

Installation and usage

Identify and eliminate host-device synchronizations in PyTorch code. Detects sync points (.item(), .cpu(), boolean indexing, torch.tensor on CUDA), classifies false vs true dependencies, provides sync-free alternatives. Triggers: sync-free, synchronization, .item(), .cpu(), host-device sync, eliminate syncs, CPU stall, non_blocking, set_sync_debug_mode, cudaStreamSynchronize, cudaEventSynchronize, remove syncs, async GPU.

التثبيت
$ install --globalskills.sh
الاستخدام

بعد التثبيت، يمكنك استخدام هذه المهارة بتشغيل الأمر التالي في الطرفية:

skills use perf-torch-sync-free