pytorch-lightning
Deep learning framework (PyTorch Lightning). Organize PyTorch code into LightningModules, configure Trainers for multi-GPU/TPU, implement data pipelines, callbacks, logging (W&B, TensorBoard), distributed training (DDP, FSDP, DeepSpeed), for scalable neural network training.
syz-extract-constants
Defining and extracting kernel constants for syzkaller syzlang descriptions
casadi-ipopt-nlp
Nonlinear optimization with CasADi and IPOPT solver. Use when building and solving NLP problems: defining symbolic variables, adding nonlinear constraints, setting solver options, handling multiple initializations, and extracting solutions. Covers power systems optimization patterns including per-unit scaling and complex number formulations.
python-parallelization
Transform sequential Python code into parallel/concurrent implementations. Use when asked to parallelize Python code, improve code performance through concurrency, convert loops to parallel execution, or identify parallelization opportunities. Handles CPU-bound (multiprocessing), I/O-bound (asyncio, threading), and data-parallel (vectorization) scenarios.
kineto-release
Update the third_party/kineto submodule in PyTorch to the latest commit from this kineto repo and commit the change. Use when updating the kineto submodule hash for a release.
sanity-live-cache-components
Migrate next-sanity apps to cacheComponents - strict mode, three-layer component pattern, explicit perspective/stega/includeDrafts, prop-drilling conventions
moai-framework-electron
Electron 33+ desktop app development specialist covering Main/Renderer process architecture, IPC communication, auto-update, and packaging with Electron Forge. Use when building cross-platform desktop applications.
defragmenting-memory
Defragments and cleans up agent memory blocks. Use when memory becomes messy, redundant, or poorly organized. Backs up memory, uses a subagent to clean it up, then restores the cleaned version.
f8-tools-runtime-foundation-workflow
Use when working with Runtime Foundation utility helpers — Algorithm, Assert, Program, Time, Unity utility classes in F8Framework.
f8-tools-coroutine-mainthread-workflow
Use when working with Coroutine/MainThread tools — coroutine utilities and main thread dispatch in F8Framework.
f8-features-fsm-workflow
Use when implementing or troubleshooting FSM feature workflows — finite state machines, state transitions, blackboard data, and FSM groups in F8Framework.
f8-features-storage-workflow
Use when implementing, documenting, or troubleshooting Storage workflows in F8Framework, including local persistence, user-scoped keys, AES encryption, Gzip compression, generic collections, and common Unity value types.
f8-features-input-workflow
Use when implementing or troubleshooting Input feature workflows — multi-platform input, virtual buttons, device switching, and key/axis listening in F8Framework.
add-pallas-kernel
Add or update a TPU kernel using jax.experimental.pallas. Use when asked to implement, modify, benchmark, or autotune a Pallas TPU/GPU kernel.
runtime-skills
Universal Runtime best practices for PyTorch inference, Transformers models, and FastAPI serving. Covers device management, model loading, memory optimization, and performance tuning.
llvm-optimization
Expertise in LLVM optimization passes, performance tuning, and code transformation techniques. Use this skill when implementing custom optimizations, analyzing pass behavior, improving generated code quality, or understanding LLVM's optimization pipeline.
mlir-development
Expertise in MLIR (Multi-Level Intermediate Representation) and CIR (Clang IR) development for domain-specific compilation and high-level optimizations. Use this skill when building ML compilers, domain-specific languages, or working with multi-level compilation pipelines.
llvm-security
Expertise in LLVM security features including sanitizers, hardening techniques, exploit mitigations, and secure compilation. Use this skill when implementing security-focused compiler features, analyzing vulnerabilities, or hardening applications.
llvm-obfuscation
Expertise in LLVM-based code obfuscation techniques including OLLVM, control flow flattening, string encryption, virtualization, and anti-analysis methods. Use this skill when working on code protection, anti-reverse engineering, or implementing custom obfuscation passes.
compiler-development
Expertise in compiler development using LLVM infrastructure including frontend design, IR generation, optimization passes, and code generation. Use this skill when building custom programming languages, implementing DSL compilers, or working on compiler internals.
cuopt-lp-milp-api-python
Solve Linear Programming (LP) and Mixed-Integer Linear Programming (MILP) with the Python API. Use when the user asks about optimization with linear constraints, integer variables, scheduling, resource allocation, facility location, or production planning.
cuopt-developer
Contribute to NVIDIA cuOpt codebase including C++/CUDA, Python, server, docs, and CI. Use when the user wants to modify solver internals, add features, submit PRs, or understand the codebase architecture.
axiom-swift-performance
Use when optimizing Swift code performance, reducing memory usage, improving runtime efficiency, dealing with COW, ARC overhead, generics specialization, or collection optimization