home/categories/framework-internals/huggingface-kernels-skills-cuda-kernels-skill-md
framework-internalsdevelopment

cuda-kernels

Provides guidance for writing and benchmarking optimized CUDA kernels for NVIDIA GPUs (H100, A100, T4) targeting HuggingFace diffusers and transformers libraries. Supports models like LTX-Video, Stable Diffusion, LLaMA, Mistral, and Qwen. Includes integration with HuggingFace Kernels Hub (get_kernel) for loading pre-compiled kernels. Includes benchmarking scripts to compare kernel performance against baseline implementations.

huggingface
maintainer
huggingface
Atualizado 2/17/2026
Estrelas
564
Forks
63
quick start

Installation and usage

Provides guidance for writing and benchmarking optimized CUDA kernels for NVIDIA GPUs (H100, A100, T4) targeting HuggingFace diffusers and transformers libraries. Supports models like LTX-Video, Stable Diffusion, LLaMA, Mistral, and Qwen. Includes integration with HuggingFace Kernels Hub (get_kernel) for loading pre-compiled kernels. Includes benchmarking scripts to compare kernel performance against baseline implementations.

Instalação
$ install --globalskills.sh
Uso

Depois de instalar, você pode usar esta skill executando o seguinte comando no terminal:

skills use cuda-kernels