home/categories/framework-internals/rocm-flydsl-claude-skills-prefetch-data-load-skill-md
framework-internalsdevelopment

prefetch-data-load

Apply prefetch optimization to FlyDSL kernel loops: pre-load the first iteration's data before the loop, issue async loads for the next iteration inside the loop body, and swap buffers at the loop tail via scf.for loop-carried values. This overlaps data load latency with compute instructions. Use when a kernel has a loop where buffer_load feeds into MFMA/compute and load latency is exposed. Usage: /prefetch-data-load

ROCm
maintainer
ROCm
更新日 4/9/2026
スター
148
フォーク
36
quick start

Installation and usage

Apply prefetch optimization to FlyDSL kernel loops: pre-load the first iteration's data before the loop, issue async loads for the next iteration inside the loop body, and swap buffers at the loop tail via scf.for loop-carried values. This overlaps data load latency with compute instructions. Use when a kernel has a loop where buffer_load feeds into MFMA/compute and load latency is exposed. Usage: /prefetch-data-load

インストール
$ install --globalskills.sh
使い方

インストール後、ターミナルで以下のコマンドを実行してこのスキルを使用できます:

skills use prefetch-data-load