skills.homescapability registry 搜尋

home/categories/framework-internals

category focus

Frameworks

Deep dive into framework internals.

1580 個技能all categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

framework-internals

6.6K

distributed-llm-pretraining-torchtitan

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.

Orchestra-Research

development

framework-internals

6.6K

nnsight-remote-interpretability

Provides guidance for interpreting and manipulating neural network internals using nnsight with optional NDIF remote execution. Use when needing to run interpretability experiments on massive models (70B+) without local GPU resources, or when working with any PyTorch architecture.

Orchestra-Research

development

framework-internals

6.6K

pyvene-interventions

Provides guidance for performing causal interventions on PyTorch models using pyvene's declarative intervention framework. Use when conducting causal tracing, activation patching, interchange intervention training, or testing causal hypotheses about model behavior.

Orchestra-Research

development

framework-internals

6.6K

torchforge-rl-training

Provides guidance for PyTorch-native agentic RL using torchforge, Meta's library separating infra from algorithms. Use when you want clean RL abstractions, easy algorithm experimentation, or scalable training with Monarch and TorchTitan.

Orchestra-Research

development

framework-internals

6.6K

huggingface-accelerate

Simplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for DeepSpeed/FSDP/Megatron/DDP. Automatic device placement, mixed precision (FP16/BF16/FP8). Interactive config, single launch command. HuggingFace ecosystem standard.

Orchestra-Research

development

framework-internals

6.6K

pytorch-fsdp2

Adds PyTorch FSDP2 (fully_shard) to training scripts with correct init, sharding, mixed precision/offload config, and distributed checkpointing. Use when models exceed single-GPU memory or when you need DTensor-based sharding with DeviceMesh.

Orchestra-Research

development

framework-internals

6.6K

pytorch-lightning

High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.

Orchestra-Research

development

framework-internals

6.6K

optimizing-attention-flash

Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (>512 tokens), encountering GPU memory issues with attention, or need faster inference. Supports PyTorch native SDPA, flash-attn library, H100 FP8, and sliding window attention.

Orchestra-Research

development

framework-internals

6.6K

ml-training-recipes

Battle-tested PyTorch training recipes for all domains — LLMs, vision, diffusion, medical imaging, protein/drug discovery, spatial omics, genomics. Covers training loops, optimizer selection (AdamW, Muon), LR scheduling, mixed precision, debugging, and systematic experimentation. Use when training or fine-tuning neural networks, debugging loss spikes or OOM, choosing architectures, or optimizing GPU throughput.

Orchestra-Research

development

framework-internals

6.6K

llama-cpp

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU.

Orchestra-Research

development

framework-internals

6.6K

tensorrt-llm

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.

Orchestra-Research

development

framework-internals

6.6K

outlines

Guarantee valid JSON/XML/code structure during generation, use Pydantic models for type-safe outputs, support local models (Transformers, vLLM), and maximize inference speed with Outlines - dottxt.ai's structured generation library

Orchestra-Research

development

framework-internals

6.6K

fine-tuning-serving-openpi

Fine-tune and serve Physical Intelligence OpenPI models (pi0, pi0-fast, pi0.5) using JAX or PyTorch backends for robot policy inference across ALOHA, DROID, and LIBERO environments. Use when adapting pi0 models to custom datasets, converting JAX checkpoints to PyTorch, running policy inference servers, or debugging norm stats and GPU memory issues.

Orchestra-Research

development

framework-internals

6.5K

adapter-ops

Extend LLM and embedding adapters in unstract/sdk1. Use when adding new adapters (LLM or embedding), removing adapters, adding/removing models to existing adapters, or editing adapter configurations. Supports OpenAI-compatible providers, cloud providers (AWS Bedrock, VertexAI, Azure), and self-hosted models (Ollama).

Zipstack

development

framework-internals

6.3K

liger-kernel-dev

Develops production-ready Triton kernels for Liger Kernel. Creates new kernels from PyTorch operations (local files, URLs, code snippets, or natural language) with ops, module wrappers, functional APIs, unit tests, benchmarks, and plots. Also modifies existing Liger kernels. Use when adding a new Triton kernel, converting a PyTorch operation to Triton, or updating an existing Liger kernel.

linkedin

development

framework-internals

6.3K

liger-kernel-perf

Optimizes the performance of existing Liger Kernel Triton kernels. Profiles kernels, diagnoses bottlenecks (memory-bound vs compute-bound), generates multiple optimization variants with benchmarking, and applies the best variant while maintaining correctness. Supports GPU architecture-specific optimization (Ampere, Hopper, Blackwell). Use when a user asks to optimize, speed up, tune, profile, or reduce memory of an existing Liger kernel.

linkedin

development

framework-internals

6.3K

mz-adapter-guide

Correctness invariants and architectural guidance for the adapter layer, coordinator, pgwire, peek paths, and timestamp oracle. Trigger when the user works on or asks questions about these subsystems — including "how does the coordinator work", "what are read holds", "explain the peek path", "how does timestamp selection work", "why does this query block". Also trigger when editing files in src/adapter/, src/pgwire/, or src/timestamp-oracle/.

MaterializeInc

development

framework-internals

6.1K

port-c-module

Guide for porting a C module to Rust. Use this when starting to port a C module to Rust.

RediSearch

development

framework-internals

5.4K

daft-udf-tuning

Optimize Daft UDF performance. Invoke when user needs GPU inference, encounters slow UDFs, or asks about async/batch processing.

Eventual-Inc

development

framework-internals

5.4K

add-cuda-kernel

Step-by-step tutorial for adding new CUDA kernels to FlashInfer

flashinfer-ai

development

framework-internals

5.3K

add-api

Add new C# APIs to SkiaSharp by wrapping Skia C++ functionality. Structured 6-phase workflow: C++ analysis → C API creation → submodule commits → binding generation → C# wrapper → testing. Triggers: - Issue classified as "New API" (after fetching and classification) - Direct request: "add DrawFoo method", "expose SkSurface::draw", "wrap sk_foo_bar" - Keywords: "add API", "expose function", "wrap method", "create binding for"

mono

development

framework-internals

5.2K

vyper-compiler

Vyper smart contract compiler internals. Use when working on the Vyper compiler codebase — compilation pipeline, Venom IR, semantic analysis, code generation, testing, or contributing. Triggers on vyper compiler development, Venom passes, AST/semantics changes, codegen work, or test writing.

vyperlang

development

framework-internals

5K

add-dataset

Guide for adding a new dataset loader to AReaL. Use when user wants to add a new dataset.

inclusionAI

development

framework-internals

5K

add-reward

Guide for adding a new reward function to AReaL. Use when user wants to create a reward function.

inclusionAI

development

Page 4 / 66