home/categories/framework-internals
category focus

Frameworks

Deep dive into framework internals.

1580 個技能all categories
sorting
stars
current ordering strategy
query
all entries
refine the visible subset
framework-internals
6.6K

distributed-llm-pretraining-torchtitan

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.

Orchestra-Research
Orchestra-Research
development
open
framework-internals
6.6K

nnsight-remote-interpretability

Provides guidance for interpreting and manipulating neural network internals using nnsight with optional NDIF remote execution. Use when needing to run interpretability experiments on massive models (70B+) without local GPU resources, or when working with any PyTorch architecture.

Orchestra-Research
Orchestra-Research
development
open
framework-internals
6.6K

pyvene-interventions

Provides guidance for performing causal interventions on PyTorch models using pyvene's declarative intervention framework. Use when conducting causal tracing, activation patching, interchange intervention training, or testing causal hypotheses about model behavior.

Orchestra-Research
Orchestra-Research
development
open
framework-internals
6.6K

torchforge-rl-training

Provides guidance for PyTorch-native agentic RL using torchforge, Meta's library separating infra from algorithms. Use when you want clean RL abstractions, easy algorithm experimentation, or scalable training with Monarch and TorchTitan.

Orchestra-Research
Orchestra-Research
development
open
framework-internals
6.6K

huggingface-accelerate

Simplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for DeepSpeed/FSDP/Megatron/DDP. Automatic device placement, mixed precision (FP16/BF16/FP8). Interactive config, single launch command. HuggingFace ecosystem standard.

Orchestra-Research
Orchestra-Research
development
open
framework-internals
6.6K

pytorch-fsdp2

Adds PyTorch FSDP2 (fully_shard) to training scripts with correct init, sharding, mixed precision/offload config, and distributed checkpointing. Use when models exceed single-GPU memory or when you need DTensor-based sharding with DeviceMesh.

Orchestra-Research
Orchestra-Research
development
open
framework-internals
6.6K

pytorch-lightning

High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.

Orchestra-Research
Orchestra-Research
development
open
framework-internals
6.6K

optimizing-attention-flash

Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (>512 tokens), encountering GPU memory issues with attention, or need faster inference. Supports PyTorch native SDPA, flash-attn library, H100 FP8, and sliding window attention.

Orchestra-Research
Orchestra-Research
development
open
framework-internals
6.6K

ml-training-recipes

Battle-tested PyTorch training recipes for all domains — LLMs, vision, diffusion, medical imaging, protein/drug discovery, spatial omics, genomics. Covers training loops, optimizer selection (AdamW, Muon), LR scheduling, mixed precision, debugging, and systematic experimentation. Use when training or fine-tuning neural networks, debugging loss spikes or OOM, choosing architectures, or optimizing GPU throughput.

Orchestra-Research
Orchestra-Research
development
open
framework-internals
6.6K

llama-cpp

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU.

Orchestra-Research
Orchestra-Research
development
open
framework-internals
6.6K

tensorrt-llm

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.

Orchestra-Research
Orchestra-Research
development
open
framework-internals
6.6K

outlines

Guarantee valid JSON/XML/code structure during generation, use Pydantic models for type-safe outputs, support local models (Transformers, vLLM), and maximize inference speed with Outlines - dottxt.ai's structured generation library

Orchestra-Research
Orchestra-Research
development
open
framework-internals
6.6K

fine-tuning-serving-openpi

Fine-tune and serve Physical Intelligence OpenPI models (pi0, pi0-fast, pi0.5) using JAX or PyTorch backends for robot policy inference across ALOHA, DROID, and LIBERO environments. Use when adapting pi0 models to custom datasets, converting JAX checkpoints to PyTorch, running policy inference servers, or debugging norm stats and GPU memory issues.

Orchestra-Research
Orchestra-Research
development
open
framework-internals
6.5K

adapter-ops

Extend LLM and embedding adapters in unstract/sdk1. Use when adding new adapters (LLM or embedding), removing adapters, adding/removing models to existing adapters, or editing adapter configurations. Supports OpenAI-compatible providers, cloud providers (AWS Bedrock, VertexAI, Azure), and self-hosted models (Ollama).

Zipstack
Zipstack
development
open
framework-internals
6.3K

liger-kernel-dev

Develops production-ready Triton kernels for Liger Kernel. Creates new kernels from PyTorch operations (local files, URLs, code snippets, or natural language) with ops, module wrappers, functional APIs, unit tests, benchmarks, and plots. Also modifies existing Liger kernels. Use when adding a new Triton kernel, converting a PyTorch operation to Triton, or updating an existing Liger kernel.

linkedin
linkedin
development
open
framework-internals
6.3K

liger-kernel-perf

Optimizes the performance of existing Liger Kernel Triton kernels. Profiles kernels, diagnoses bottlenecks (memory-bound vs compute-bound), generates multiple optimization variants with benchmarking, and applies the best variant while maintaining correctness. Supports GPU architecture-specific optimization (Ampere, Hopper, Blackwell). Use when a user asks to optimize, speed up, tune, profile, or reduce memory of an existing Liger kernel.

linkedin
linkedin
development
open
framework-internals
6.3K

mz-adapter-guide

Correctness invariants and architectural guidance for the adapter layer, coordinator, pgwire, peek paths, and timestamp oracle. Trigger when the user works on or asks questions about these subsystems — including "how does the coordinator work", "what are read holds", "explain the peek path", "how does timestamp selection work", "why does this query block". Also trigger when editing files in src/adapter/, src/pgwire/, or src/timestamp-oracle/.

MaterializeInc
MaterializeInc
development
open
framework-internals
6.1K

port-c-module

Guide for porting a C module to Rust. Use this when starting to port a C module to Rust.

RediSearch
RediSearch
development
open
framework-internals
5.4K

daft-udf-tuning

Optimize Daft UDF performance. Invoke when user needs GPU inference, encounters slow UDFs, or asks about async/batch processing.

Eventual-Inc
Eventual-Inc
development
open
framework-internals
5.4K

add-cuda-kernel

Step-by-step tutorial for adding new CUDA kernels to FlashInfer

flashinfer-ai
flashinfer-ai
development
open
framework-internals
5.3K

add-api

Add new C# APIs to SkiaSharp by wrapping Skia C++ functionality. Structured 6-phase workflow: C++ analysis → C API creation → submodule commits → binding generation → C# wrapper → testing. Triggers: - Issue classified as "New API" (after fetching and classification) - Direct request: "add DrawFoo method", "expose SkSurface::draw", "wrap sk_foo_bar" - Keywords: "add API", "expose function", "wrap method", "create binding for"

mono
mono
development
open
framework-internals
5.2K

vyper-compiler

Vyper smart contract compiler internals. Use when working on the Vyper compiler codebase — compilation pipeline, Venom IR, semantic analysis, code generation, testing, or contributing. Triggers on vyper compiler development, Venom passes, AST/semantics changes, codegen work, or test writing.

vyperlang
vyperlang
development
open
framework-internals
5K

add-dataset

Guide for adding a new dataset loader to AReaL. Use when user wants to add a new dataset.

inclusionAI
inclusionAI
development
open
framework-internals
5K

add-reward

Guide for adding a new reward function to AReaL. Use when user wants to create a reward function.

inclusionAI
inclusionAI
development
open
Previous
Page 4 / 66
Next