python-pro
Expert Python developer specializing in modern Python 3.11+ development with deep expertise in type safety, async programming, data science, and web frameworks. Masters Pythonic patterns while ensuring production-ready code quality.
Expert Python developer specializing in modern Python 3.11+ development with deep expertise in type safety, async programming, data science, and web frameworks. Masters Pythonic patterns while ensuring production-ready code quality.
Add new operator definitions to PyPTO across all layers (C++, Python IR, Python DSL, tests, codegen, docs). Covers tile ops, tensor ops, tensor-to-tile conversion, and codegen registration. Use when the user asks to add a new op, define a new operator, implement a new tile/tensor operation, or extend the operator system.
Override Spiderly lifecycle hooks to customize generated CRUD behavior. Use when overriding lifecycle hooks, customizing generated CRUD logic, adding business logic to save/delete/get operations, handling MARS exceptions or transaction issues, or throwing business/hacker exceptions.
Configure vLLM-Omni for different hardware backends including NVIDIA CUDA, AMD ROCm, Huawei NPU, and Intel XPU. Use when selecting a hardware backend, troubleshooting GPU issues, configuring device placement, or optimizing for specific accelerators.
Find the single highest-leverage refactor in a repo by reducing system complexity, improving information hiding, deepening shallow modules, shrinking interface burden, and eliminating unnecessary special cases. Use when the user asks what to refactor, wants the best refactor, highest-leverage cleanup, architectural simplification, boundary extraction, duplication removal, complexity reduction, testability improvement, or an ExecPlan for the best refactor. Accept optional user guidance about scope, constraints, refactor style, or risk tolerance.
Patterns for handling commands, validating input, and filtering messages in XMTP agents. Use when implementing slash commands, validators, or message filters. Triggers on command handling, input validation, or type guards.
Expand Rust macros using cargo-expand to inspect generated code. Use this when asked to debug or verify macro invocations.
将 GPU/CUDA Triton 算子迁移为 Triton-Ascend,或将 Python/PyTorch 算子改写为可在 Ascend NPU 上运行的 Triton-Ascend 实现,并在发现明确优化空间时直接输出优化后的代码、最小验证脚本和排障说明。用户只要提到 昇腾、Ascend、NPU、triton-ascend、Triton 算子迁移、PyTorch 算子改写、coreDim、UB overflow、1D grid、物理核绑定、block_ptr、stride、访存对齐、mask 性能、dtype 退化、算子优化,或者直接问“这个 skill 怎么用”“怎么在命令行里跑”“怎么在容器里执行迁移/验证”,就应优先使用本 skill,即使用户没有明确说“写 skill”或“做迁移”。
通过 PyTorch torch.distributed 接口测试昇腾 NPU 通信算子性能。支持指定任意 tensor shape、dtype,使用 torchrun 启动,贴近真实训练场景的通信算子测试与性能分析。Use for testing collective communication operators (AllReduce, AllGather, ReduceScatter, etc.) with specific tensor shapes via torch.distributed on Ascend NPU.
将简单Vector类型Triton算子从GPU迁移到昇腾NPU。当用户需要迁移Triton代码到NPU、提到GPU到NPU迁移、Triton迁移、昇腾适配时使用。注意:无法自动迁移存在编译问题的算子。
Installs op-plugin (torch_npu operator plugin) environment and guides custom NPU operator integration with PyTorch via three patterns (A: no workspace, B: workspace+tiling, C: OpCommand reuse). Covers kernel implementation, host registration, build, and test. Use when working with op-plugin, operator integration, torch_npu custom ops, Ascend C, NPU operators, cpp_extension, xpu_kernel, or running custom operators on NPU.
ProteinBERT 昇腾 NPU 部署与迁移 Skill,适用于将 TensorFlow 或 Keras 版 ProteinBERT 转成基于 PyTorch 与 torch_npu 的实现,覆盖权重转换、embedding 提取、微调训练、注意力可视化和 GPU 与 NPU 精度验证。
Reverse engineering with Ghidra via MCP. Use when analyzing binaries, decompiling code, managing functions/symbols/data types, or performing any Ghidra-related reverse engineering task.
Improve developer experience for multi-component solutions: onboarding, F5 contract, cross-platform tasks, local inner loop, and reproducible setup. Use when the repo is hard to run, debug, test, or onboard into.
Use when developing firmware for microcontrollers, implementing RTOS applications, or optimizing power consumption. Invoke for STM32, ESP32, FreeRTOS, bare-metal, power optimization, real-time systems.
N-version programming for critical implementations - generates N independent solutions and selects the best through comparison
DEPRECATED — Use dev-orchestrator skill instead. This skill redirects to dev-orchestrator for backward compatibility.
Terminal emulation, text rendering optimization, and SwiftTerm integration for modern Swift applications
Terminal emulation, text rendering optimization, and SwiftTerm integration for modern Swift applications
Focused Effect v4 error handling. Use for replacing catchAll patterns, designing typed errors, and Option/Match boundary handling.
Creates hybrid Native AOT + CoreCLR .NET 10 tool packages using ToolPackageRuntimeIdentifiers. Use for building high-performance CLI tools with Native AOT on supported platforms and CoreCLR fallback for universal compatibility.