home/categories/framework-internals
category focus

Frameworks

Deep dive into framework internals.

1580 skillsall categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
framework-internals
14.6K

tooling

Implementation details for the EF Core dotnet-ef CLI and tooling. Use when changing dotnet-ef commands, the ef wrapper, EFCore.Tools (PMC), or EFCore.Tasks MSBuild integration.

dotnet
dotnet
development
open
framework-internals
14.4K

marko-best-practices

Apply Marko syntax and best practices when editing `.marko` files and building Marko components.

marko-js
marko-js
development
open
framework-internals
14K

dspy-ruby

Build type-safe LLM applications with DSPy.rb — Ruby's programmatic prompt framework with signatures, modules, agents, and optimization. Use when implementing predictable AI features, creating LLM signatures and modules, configuring language model providers, building agent systems with tools, optimizing prompts, or testing LLM-powered functionality in Ruby applications.

EveryInc
EveryInc
development
open
framework-internals
13.3K

kernel-cute-writing

Write and implement GPU kernels using NVIDIA CuTe DSL (CUTLASS 4.x Python API) — NOT for Triton, CUDA C++, or conceptual explanations. Trigger only when the user wants to write or implement a kernel, not when asking questions about CuTe DSL concepts or layouts. CuTe DSL uses cute.jit/cute.kernel decorators and cutlass.cute imports. Covers element-wise kernels, GEMM patterns, reductions, memory hierarchy (global/shared/register/TMA), MMA tensor core operations, software pipelining, and framework integration.

NVIDIA
NVIDIA
development
open
framework-internals
13.3K

kernel-tileir-optimization

Optimize existing Triton kernels for NVIDIA TileIR backend on Blackwell GPUs (sm_100+). Adds TileIR-specific autotune configs: occupancy, num_ctas, TMA descriptors. Covers kernel classification (dot-related, norm-like, elementwise, reduction), type-specific transformations, and PTX-vs-TileIR benchmarking. Triggered by: "optimize for TileIR", "add TileIR configs", "Blackwell optimization", "TMA descriptors", "2CTA mode", "occupancy tuning". Kernels use standard `import triton`; TileIR activates via ENABLE_TILE=1 when nvtriton is installed.

NVIDIA
NVIDIA
development
open
framework-internals
13.3K

kernel-triton-writing

ONLY for OpenAI Triton (@triton.jit) kernel development. NEVER use for CUDA C++ kernels, TileIR, or profiling tools (ncu, nsys). The user's request must involve Triton explicitly. Covers Triton-specific patterns: fused elementwise, reductions (softmax, LayerNorm, RMSNorm), tiled GEMM with triton.autotune, and flash attention. Workflow: design, write, verify (with fast-path for explicit requests).

NVIDIA
NVIDIA
development
open
framework-internals
13.3K

perf-optimization

Performance optimization coordination playbook. Contains specialist routing table, TileIR two-step pipeline, kernel generation specialist selection, prioritization criteria, and safe modification workflow. Use when the user asks to apply optimizations, write kernels, or improve performance. Covers both user-specified optimization and autopilot-driven iterative optimization.

NVIDIA
NVIDIA
development
open
framework-internals
13.3K

perf-torch-cuda-graphs

Apply CUDA Graphs to PyTorch workloads — API selection (torch.compile, PyTorch make_graphed_callables, TE make_graphed_callables, MCore CudaGraphManager, FullCudaGraphWrapper, manual torch.cuda.graph), code compatibility, capture workflows, dynamic pattern handling, and troubleshooting. Triggers: CUDA graph, torch.cuda.graph, make_graphed_callables, reduce-overhead, graph capture, graph replay, kernel launch overhead, CudaGraphManager, FullCudaGraphWrapper, full-iteration graph, stream capture.

NVIDIA
NVIDIA
development
open
framework-internals
13.3K

perf-workload-profiling

Code instrumentation for timing workloads. Two scenarios: (1) Training loop — inject manual timing to report per-iteration latency, throughput (samples/sec), and data load time. (2) Standalone kernel/op — write CUDA event timing code with warmup, per-iteration statistics, and anti-pattern avoidance. Also covers NVTX annotation for labeling profiler timelines. NOT for: running or analyzing profiler tools (nsys, ncu, Nsight Systems, Nsight Compute), writing kernels (Triton, CuTe, CUDA), applying optimizations (CUDA Graphs, gradient checkpointing, fusion), or interpreting roofline/SOL% metrics. Triggers: "measure throughput", "benchmark this function", "time my training loop", "samples per second", "NVTX annotate", "instrument my dataloader", "data load time", "kernel timing", "how do I time".

NVIDIA
NVIDIA
development
open
framework-internals
12.1K

optimize

Solve constrained optimization problems using Z3. Supports minimization and maximization of objective functions over integer, real, and bitvector domains.

Z3Prover
Z3Prover
development
open
framework-internals
11.2K

model-redux-statebuild-slices-and-selectors

Use this when authoring or refactoring slices with createSlice, selectors, create.asyncThunk, entity adapters, or lazy reducer injection. Covers Immer-backed mutation syntax, slice selectors, getSelectors, injectInto, withLazyLoadedSlices, and current RTK 2 slice patterns.

reduxjs
reduxjs
development
open
framework-internals
10.9K

distributed-training

Multi-GPU and distributed training patterns with PyTorch DDP. Use when scaling training across GPUs.

aiming-lab
aiming-lab
development
open
framework-internals
10.9K

pytorch-training

Best practices for building robust PyTorch training loops. Use when generating or reviewing ML training code.

aiming-lab
aiming-lab
development
open
framework-internals
8.1K

add-compat-flag

Step-by-step guide for adding a new compatibility flag to workerd, including capnp schema, C++ usage, testing, and documentation requirements.

cloudflare
cloudflare
development
open
framework-internals
8.1K

rust-engineer

Writes, reviews, and debugs idiomatic Rust code with memory safety and zero-cost abstractions. Implements ownership patterns, manages lifetimes, designs trait hierarchies, builds async applications with tokio, and structures error handling with Result/Option. Use when building Rust applications, solving ownership or borrowing issues, designing trait-based APIs, implementing async/await concurrency, creating FFI bindings, or optimizing for performance and memory safety. Invoke for Rust, Cargo, ownership, borrowing, lifetimes, async Rust, tokio, zero-cost abstractions, memory safety, systems programming.

Jeffallan
Jeffallan
development
open
framework-internals
8.1K

embedded-systems

Use when developing firmware for microcontrollers, implementing RTOS applications, or optimizing power consumption. Invoke for STM32, ESP32, FreeRTOS, bare-metal, power optimization, real-time systems, configure peripherals, write interrupt handlers, implement DMA transfers, debug timing issues.

Jeffallan
Jeffallan
development
open
framework-internals
8.1K

cpp-pro

Writes, optimizes, and debugs C++ applications using modern C++20/23 features, template metaprogramming, and high-performance systems techniques. Use when building or refactoring C++ code requiring concepts, ranges, coroutines, SIMD optimization, or careful memory management — or when addressing performance bottlenecks, concurrency issues, and build system configuration with CMake.

Jeffallan
Jeffallan
development
open
framework-internals
7.8K

support-new-model

Add a new LLM or VLM to LMDeploy's PyTorch backend.

InternLM
InternLM
development
open
framework-internals
7.3K

idapython

IDA Pro Python scripting for reverse engineering. Use when writing IDAPython scripts, analyzing binaries, working with IDA's API for disassembly, decompilation (Hex-Rays), type systems, cross-references, functions, segments, or any IDA database manipulation. Covers ida_* modules (50+), idautils iterators, and common patterns.

mrexodia
mrexodia
development
open
framework-internals
7.3K

fix-int

Fix integer type definitions for specified platform. Researches correct primitive type mappings and applies fixes to platform-specific int headers.

FastLED
FastLED
development
open
framework-internals
7.3K

platform-port

Guide porting FastLED to new MCU platforms, including int.h types, clockless drivers, SPI implementations, and platform detection. Use when adding support for a new microcontroller family or board.

FastLED
FastLED
development
open
framework-internals
6.6K

pennylane

Hardware-agnostic quantum ML framework with automatic differentiation. Use when training quantum circuits via gradients, building hybrid quantum-classical models, or needing device portability across IBM/Google/Rigetti/IonQ. Best for variational algorithms (VQE, QAOA), quantum neural networks, and integration with PyTorch/JAX/TensorFlow. For hardware-specific optimizations use qiskit (IBM) or cirq (Google); for open quantum systems use qutip.

K-Dense-AI
K-Dense-AI
development
open
Previous
Page 3 / 66
Next