gemm-optimization

Name: gemm-optimization
Author: ROCm

Comprehensive guide to optimizing GEMM (General Matrix Multiply) kernels in FlyDSL on AMD CDNA GPUs. Covers tiling strategy, LDS ping-pong double-buffer, XOR bank-conflict swizzle, A/B data prefetch pipeline, 2-stage software pipelining, MFMA instruction scheduling (hot_loop_scheduler), epilogue strategies (direct store vs CShuffle), TFLOPS/bandwidth calculation, main-loop instruction count analysis, and bottleneck identification from ATT traces. Based on the production preshuffle_gemm kernel. Usage: /gemm-optimization

소스 보기 framework-internals

maintainer

ROCm

업데이트됨 4/9/2026

스타

148

포크

quick start

Installation and usage

설치

$ install --globalskills.sh

사용법

설치 후 터미널에서 다음 명령을 실행하여 이 스킬을 사용할 수 있습니다:

skills use gemm-optimization