home/categories/framework-internals

category focus

Frameworks

Deep dive into framework internals.

1580 个技能all categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

framework-internals

29.3K

multi-stage-dockerfile

Create optimized multi-stage Dockerfiles for any language or framework

github

development

open

framework-internals

29.3K

refactor-method-complexity-reduce

Refactor given method `${input:methodName}` to reduce its cognitive complexity to `${input:complexityThreshold}` or below, by extracting helper methods.

github

development

open

framework-internals

29.3K

semantic-kernel

Create, update, refactor, explain, or review Semantic Kernel solutions using shared guidance plus language-specific references for .NET and Python.

github

development

open

framework-internals

25.6K

add-sgl-kernel

Step-by-step tutorial for adding a heavyweight AOT CUDA/C++ kernel to sgl-kernel (including tests & benchmarks)

sgl-project

development

open

framework-internals

25K

torque

Expert guidance for navigating, implementing, and verifying V8 Torque (.tq) builtins and object layouts.

development

open

framework-internals

24.8K

claude-md

Tạo hoặc cập nhật các file CLAUDE.md theo các thực hành tốt nhất để onboarding tác nhân AI tối ưu

luongnv89

development

open

framework-internals

23.8K

Use when working with Paddle 3.0 compiler full pipeline: SOT (Symbolic Opcode Translator) for bytecode-level dy2st graph capture, PIR (Paddle IR) for SSA-based intermediate representation, CINN for fused CUDA kernel generation, operator decomposition (Prim), or the end-to-end flow from Python eager code to optimized GPU execution.

PaddlePaddle

development

open

framework-internals

23.8K

paddle-phi-kernel

Use when working with Paddle's PHI kernel system: registering new kernels, debugging kernel selection/dispatch, understanding code auto-generation from YAML, or implementing operator decomposition via the combination mechanism.

PaddlePaddle

development

open

framework-internals

23.8K

paddle-op-dev

PaddlePaddle (飞桨) C++ 算子开发指南。提供从 YAML 配置、InferMeta 函数、Kernel 实现、Python API 封装、单元测试到编译验证的完整算子开发流程指导。在以下场景使用此 skill：(1) 为 Paddle 框架新增 C++ 算子 (2) 修改或调试已有 Paddle 算子 (3) 编写算子的 YAML 配置、InferMeta、Kernel、Python API 或单元测试 (4) 理解 Paddle 算子开发架构和流程 (5) 编译 Paddle 并验证算子正确性

PaddlePaddle

development

open

framework-internals

22.2K

token-efficiency

Activate ultra-compressed output mode for maximum token efficiency. Use when context is running low, user requests brevity, or dealing with large-scale operations.

SuperClaude-Org

development

open

framework-internals

18.8K

python-sdk

Python SDK patterns for Opik. Use when working in sdks/python, on SDK APIs, integrations, or message processing.

comet-ml

development

open

framework-internals

18.8K

typescript-sdk

TypeScript SDK patterns for Opik. Use when working in sdks/typescript.

comet-ml

development

open

framework-internals

18.1K

pennylane

Hardware-agnostic quantum ML framework with automatic differentiation. Use when training quantum circuits via gradients, building hybrid quantum-classical models, or needing device portability across IBM/Google/Rigetti/IonQ. Best for variational algorithms (VQE, QAOA), quantum neural networks, and integration with PyTorch/JAX/TensorFlow. For hardware-specific optimizations use qiskit (IBM) or cirq (Google); for open quantum systems use qutip.

K-Dense-AI

development

open

framework-internals

18.1K

pymoo

Multi-objective optimization framework. NSGA-II, NSGA-III, MOEA/D, Pareto fronts, constraint handling, benchmarks (ZDT, DTLZ), for engineering design and optimization problems.

K-Dense-AI

development

open

framework-internals

18.1K

pytorch-lightning

Deep learning framework (PyTorch Lightning). Organize PyTorch code into LightningModules, configure Trainers for multi-GPU/TPU, implement data pipelines, callbacks, logging (W&B, TensorBoard), distributed training (DDP, FSDP, DeepSpeed), for scalable neural network training.

K-Dense-AI

development

open

framework-internals

17.8K

add-new-jit-ee-api

Add a new API to the JIT-VM (aka JIT-EE) interface in the codebase.

dotnet

development

open

framework-internals

17.8K

extensions-review

Guidance for writing and modifying Microsoft.Extensions.* and System.IO.Compression code in dotnet/runtime. Covers DI lifetime management, configuration binding, options validation, logging provider patterns, caching semantics, compression format compliance, and host lifecycle. For full code review, delegates to the @extensions-reviewer agent. Trigger words: Microsoft.Extensions, IServiceCollection, IConfiguration, ILogger, IHost, IMemoryCache, IOptions, ZipArchive, HttpClientFactory, IFileProvider, IChangeToken.

dotnet

development

open

framework-internals

17.6K

tensorrt-llm

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.

davila7

development

open

framework-internals

17.6K

pytorch-fsdp

Expert guidance for Fully Sharded Data Parallel training with PyTorch FSDP - parameter sharding, mixed precision, CPU offloading, FSDP2

davila7

development

open

framework-internals

17.6K

pennylane

Cross-platform Python library for quantum computing, quantum machine learning, and quantum chemistry. Enables building and training quantum circuits with automatic differentiation, seamless integration with PyTorch/JAX/TensorFlow, and device-independent execution across simulators and quantum hardware (IBM, Amazon Braket, Google, Rigetti, IonQ, etc.). Use when working with quantum circuits, variational quantum algorithms (VQE, QAOA), quantum neural networks, hybrid quantum-classical models, molecular simulations, quantum chemistry calculations, or any quantum computing tasks requiring gradient-based optimization, hardware-agnostic programming, or quantum machine learning workflows.

davila7

development

open

framework-internals

17.6K

gguf-quantization

GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without GPU requirements.

davila7

development

open

framework-internals

17.6K

optimizing-attention-flash

Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (>512 tokens), encountering GPU memory issues with attention, or need faster inference. Supports PyTorch native SDPA, flash-attn library, H100 FP8, and sliding window attention.

davila7

development

open

framework-internals

16.5K