home/categories/framework-internals

category focus

Frameworks

Deep dive into framework internals.

1580 skillsall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

framework-internals

14.6K

tooling

Implementation details for the EF Core dotnet-ef CLI and tooling. Use when changing dotnet-ef commands, the ef wrapper, EFCore.Tools (PMC), or EFCore.Tasks MSBuild integration.

dotnet

development

open

framework-internals

14.4K

marko-best-practices

Apply Marko syntax and best practices when editing `.marko` files and building Marko components.

marko-js

development

open

framework-internals

14K

Build type-safe LLM applications with DSPy.rb — Ruby's programmatic prompt framework with signatures, modules, agents, and optimization. Use when implementing predictable AI features, creating LLM signatures and modules, configuring language model providers, building agent systems with tools, optimizing prompts, or testing LLM-powered functionality in Ruby applications.

EveryInc

development

open

framework-internals

13.3K

kernel-cute-writing

Write and implement GPU kernels using NVIDIA CuTe DSL (CUTLASS 4.x Python API) — NOT for Triton, CUDA C++, or conceptual explanations. Trigger only when the user wants to write or implement a kernel, not when asking questions about CuTe DSL concepts or layouts. CuTe DSL uses cute.jit/cute.kernel decorators and cutlass.cute imports. Covers element-wise kernels, GEMM patterns, reductions, memory hierarchy (global/shared/register/TMA), MMA tensor core operations, software pipelining, and framework integration.

NVIDIA

development

open

framework-internals

13.3K

kernel-tileir-optimization

Optimize existing Triton kernels for NVIDIA TileIR backend on Blackwell GPUs (sm_100+). Adds TileIR-specific autotune configs: occupancy, num_ctas, TMA descriptors. Covers kernel classification (dot-related, norm-like, elementwise, reduction), type-specific transformations, and PTX-vs-TileIR benchmarking. Triggered by: "optimize for TileIR", "add TileIR configs", "Blackwell optimization", "TMA descriptors", "2CTA mode", "occupancy tuning". Kernels use standard `import triton`; TileIR activates via ENABLE_TILE=1 when nvtriton is installed.

NVIDIA

development

open

framework-internals

13.3K

kernel-triton-writing

ONLY for OpenAI Triton (@triton.jit) kernel development. NEVER use for CUDA C++ kernels, TileIR, or profiling tools (ncu, nsys). The user's request must involve Triton explicitly. Covers Triton-specific patterns: fused elementwise, reductions (softmax, LayerNorm, RMSNorm), tiled GEMM with triton.autotune, and flash attention. Workflow: design, write, verify (with fast-path for explicit requests).

NVIDIA

development

open

framework-internals

13.3K

perf-optimization

Performance optimization coordination playbook. Contains specialist routing table, TileIR two-step pipeline, kernel generation specialist selection, prioritization criteria, and safe modification workflow. Use when the user asks to apply optimizations, write kernels, or improve performance. Covers both user-specified optimization and autopilot-driven iterative optimization.

NVIDIA

development

open

framework-internals

13.3K

perf-torch-cuda-graphs

Apply CUDA Graphs to PyTorch workloads — API selection (torch.compile, PyTorch make_graphed_callables, TE make_graphed_callables, MCore CudaGraphManager, FullCudaGraphWrapper, manual torch.cuda.graph), code compatibility, capture workflows, dynamic pattern handling, and troubleshooting. Triggers: CUDA graph, torch.cuda.graph, make_graphed_callables, reduce-overhead, graph capture, graph replay, kernel launch overhead, CudaGraphManager, FullCudaGraphWrapper, full-iteration graph, stream capture.

NVIDIA

development

open

framework-internals

13.3K

perf-workload-profiling

Code instrumentation for timing workloads. Two scenarios: (1) Training loop — inject manual timing to report per-iteration latency, throughput (samples/sec), and data load time. (2) Standalone kernel/op — write CUDA event timing code with warmup, per-iteration statistics, and anti-pattern avoidance. Also covers NVTX annotation for labeling profiler timelines. NOT for: running or analyzing profiler tools (nsys, ncu, Nsight Systems, Nsight Compute), writing kernels (Triton, CuTe, CUDA), applying optimizations (CUDA Graphs, gradient checkpointing, fusion), or interpreting roofline/SOL% metrics. Triggers: "measure throughput", "benchmark this function", "time my training loop", "samples per second", "NVTX annotate", "instrument my dataloader", "data load time", "kernel timing", "how do I time".

NVIDIA

development

open

framework-internals

12.1K

optimize

Solve constrained optimization problems using Z3. Supports minimization and maximization of objective functions over integer, real, and bitvector domains.

Z3Prover

development

open

framework-internals

11.2K

model-redux-statebuild-slices-and-selectors

Use this when authoring or refactoring slices with createSlice, selectors, create.asyncThunk, entity adapters, or lazy reducer injection. Covers Immer-backed mutation syntax, slice selectors, getSelectors, injectInto, withLazyLoadedSlices, and current RTK 2 slice patterns.

reduxjs

development

open

framework-internals

10.9K

distributed-training

Multi-GPU and distributed training patterns with PyTorch DDP. Use when scaling training across GPUs.

aiming-lab

development

open

framework-internals

10.9K

pytorch-training

Best practices for building robust PyTorch training loops. Use when generating or reviewing ML training code.

aiming-lab

development

open

framework-internals

8.1K

add-compat-flag

Step-by-step guide for adding a new compatibility flag to workerd, including capnp schema, C++ usage, testing, and documentation requirements.

cloudflare

development

open

framework-internals

8.1K

rust-engineer

Writes, reviews, and debugs idiomatic Rust code with memory safety and zero-cost abstractions. Implements ownership patterns, manages lifetimes, designs trait hierarchies, builds async applications with tokio, and structures error handling with Result/Option. Use when building Rust applications, solving ownership or borrowing issues, designing trait-based APIs, implementing async/await concurrency, creating FFI bindings, or optimizing for performance and memory safety. Invoke for Rust, Cargo, ownership, borrowing, lifetimes, async Rust, tokio, zero-cost abstractions, memory safety, systems programming.

Jeffallan

development

open

framework-internals

8.1K

embedded-systems

Use when developing firmware for microcontrollers, implementing RTOS applications, or optimizing power consumption. Invoke for STM32, ESP32, FreeRTOS, bare-metal, power optimization, real-time systems, configure peripherals, write interrupt handlers, implement DMA transfers, debug timing issues.

Jeffallan

development

open

framework-internals

8.1K

cpp-pro

Writes, optimizes, and debugs C++ applications using modern C++20/23 features, template metaprogramming, and high-performance systems techniques. Use when building or refactoring C++ code requiring concepts, ranges, coroutines, SIMD optimization, or careful memory management — or when addressing performance bottlenecks, concurrency issues, and build system configuration with CMake.

Jeffallan

development

open

framework-internals

7.8K

support-new-model

Add a new LLM or VLM to LMDeploy's PyTorch backend.

InternLM

development

open

framework-internals

7.5K

post-processor-integration

post processor integration

kreuzberg-dev

development

open

framework-internals

7.5K

registry-implementation

registry implementation

kreuzberg-dev

development

open

framework-internals

7.3K

idapython

IDA Pro Python scripting for reverse engineering. Use when writing IDAPython scripts, analyzing binaries, working with IDA's API for disassembly, decompilation (Hex-Rays), type systems, cross-references, functions, segments, or any IDA database manipulation. Covers ida_* modules (50+), idautils iterators, and common patterns.

mrexodia

development

open

framework-internals

7.3K

fix-int

Fix integer type definitions for specified platform. Researches correct primitive type mappings and applies fixes to platform-specific int headers.

FastLED

development

open

framework-internals

7.3K

platform-port

Guide porting FastLED to new MCU platforms, including int.h types, clockless drivers, SPI implementations, and platform detection. Use when adding support for a new microcontroller family or board.

FastLED

development

open

framework-internals

6.6K

pennylane

Hardware-agnostic quantum ML framework with automatic differentiation. Use when training quantum circuits via gradients, building hybrid quantum-classical models, or needing device portability across IBM/Google/Rigetti/IonQ. Best for variational algorithms (VQE, QAOA), quantum neural networks, and integration with PyTorch/JAX/TensorFlow. For hardware-specific optimizations use qiskit (IBM) or cirq (Google); for open quantum systems use qutip.

K-Dense-AI

development

open

Page 3 / 66