grpo-rl-training
Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training
Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training
Parameter-efficient fine-tuning for LLMs using LoRA, QLoRA, and 25+ methods. Use when fine-tuning large models (7B-70B) with limited GPU memory, when you need to train <1% of parameters with minimal accuracy loss, or for multi-adapter serving. HuggingFace's official library integrated with transformers ecosystem.
Expert guidance for Fully Sharded Data Parallel training with PyTorch FSDP - parameter sharding, mixed precision, CPU offloading, FSDP2
Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.
Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.
Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.
Review a DataFusion Comet pull request for Spark compatibility and implementation correctness. Provides guidance to a reviewer rather than posting comments directly.
Generate formatted data reports from SQL query results
Guided workflow for SQL data analysis using db_tools
Coordinate raw analysis and publish a normalized research entry into chan.dev's `src/content/research/`. Use when research should become a durable chan.dev report.
分布式多智能体缺陷检测总控技能。基于输入随机化、角色化并行评审、语义桶化、加权共识与裁决复核输出高信噪比代码评审报告。用于大规模 PR、复杂逻辑变更、安全敏感改动或单智能体评审召回率不足的场景。
bug-hunter 阶段 2 技能。负责将随机化后的 diff 按 persona 矩阵分发给 8 个子智能体并行评审,并收集统一 JSON 结果。
bug-hunter 阶段 3 技能。负责对多智能体原始发现做语义去重、桶化聚类与冲突识别,形成可投票的缺陷候选池。
bug-hunter 阶段 4 技能。负责对缺陷桶执行加权共识投票,筛选过阈值问题,并输出裁决级结构化评审报告。
Generate release notes for the new NNCF release.
Minimalist UX/Interaction Audit Expert that deconstructs complex interactions through cognitive load and operational efficiency lenses. Use this skill when you need to perform a UX walkthrough audit on a Figma prototype or web interface, evaluating usability based on principles like fewer clicks, less UI elements, no hidden logic, and self-explanatory design.
Selects the appropriate quasi-experimental method (DiD, ITS, SC) based on data structure and research questions. Use when the user is unsure which method to apply.
Fits causal models, estimates impacts, and plots results using CausalPy. Use when performing analysis with DiD, ITS, SC, or RD.
Loads internal CausalPy example datasets. Use when the user needs example data or asks about available demos.
Use when integrating a new PTQ workflow into cache-dit; designing quantize/load API shape, backend-specific config validation, save/load manifests, benchmark and regression tests, or reviewing a PTQ integration plan. Uses the SVDQ PTQ integration only as a style and coverage reference. Do not copy the SVDQ implementation mechanically.
This skill should be used when the agent needs to give a spoken voice update to the user, or when reminded by a Stop hook to provide audio feedback. Use this skill to speak a short summary of what was accomplished.
Extract full context of the last task from the most recent parent session shown in the session lineage. Strategically uses sub-agents to avoid bloating your own context.
Use when starting any conversation - establishes mandatory workflows for finding and using skills, including using Skill tool before announcing usage, alignment before implementation, and creating TodoWrite todos for checklists