prefetch-data-load
Apply prefetch optimization to FlyDSL kernel loops: pre-load the first iteration's data before the loop, issue async loads for the next iteration inside the loop body, and swap buffers at the loop tail via scf.for loop-carried values. This overlaps data load latency with compute instructions. Use when a kernel has a loop where buffer_load feeds into MFMA/compute and load latency is exposed. Usage: /prefetch-data-load
Installation and usage
Apply prefetch optimization to FlyDSL kernel loops: pre-load the first iteration's data before the loop, issue async loads for the next iteration inside the loop body, and swap buffers at the loop tail via scf.for loop-carried values. This overlaps data load latency with compute instructions. Use when a kernel has a loop where buffer_load feeds into MFMA/compute and load latency is exposed. Usage: /prefetch-data-load
安裝後,您可以透過在終端機執行以下指令來使用此技能:
skills use prefetch-data-load