home/categories/framework-internals/nvidia-tensorrt-llm-claude-skills-kernel-tileir-optimization-skill-md
framework-internalsdevelopment

kernel-tileir-optimization

Optimize existing Triton kernels for NVIDIA TileIR backend on Blackwell GPUs (sm_100+). Adds TileIR-specific autotune configs: occupancy, num_ctas, TMA descriptors. Covers kernel classification (dot-related, norm-like, elementwise, reduction), type-specific transformations, and PTX-vs-TileIR benchmarking. Triggered by: "optimize for TileIR", "add TileIR configs", "Blackwell optimization", "TMA descriptors", "2CTA mode", "occupancy tuning". Kernels use standard `import triton`; TileIR activates via ENABLE_TILE=1 when nvtriton is installed.

NVIDIA
maintainer
NVIDIA
Mis à jour 4/8/2026
Étoiles
13335
Forks
2271
quick start

Installation and usage

Optimize existing Triton kernels for NVIDIA TileIR backend on Blackwell GPUs (sm_100+). Adds TileIR-specific autotune configs: occupancy, num_ctas, TMA descriptors. Covers kernel classification (dot-related, norm-like, elementwise, reduction), type-specific transformations, and PTX-vs-TileIR benchmarking. Triggered by: "optimize for TileIR", "add TileIR configs", "Blackwell optimization", "TMA descriptors", "2CTA mode", "occupancy tuning". Kernels use standard `import triton`; TileIR activates via ENABLE_TILE=1 when nvtriton is installed.

Installation
$ install --globalskills.sh
Utilisation

Après l'installation, vous pouvez utiliser ce skill en exécutant la commande suivante dans votre terminal :

skills use kernel-tileir-optimization