experiment-analysis
Analyze GRPO training runs for learning dynamics and pipeline performance. Use when diagnosing training issues, reviewing Elo progression, checking throughput, or updating experiment results.
Analyze GRPO training runs for learning dynamics and pipeline performance. Use when diagnosing training issues, reviewing Elo progression, checking throughput, or updating experiment results.
Performs rigorous time series cross-validation using expanding and sliding windows. Use when needing to evaluate the performance of time series models on unseen data. Trigger with "cross validate time series", "evaluate forecasting model", "time series backtesting".
This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence. Should be invoked for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.
Defensive techniques using adversarial examples to improve model robustness and security
Feature engineering techniques including feature extraction, transformation, selection, and feature store management for ML systems.
Use when managing Shannon CLI performance and costs - check cache statistics, clear stale entries, set budgets, understand automatic model selection and cost optimization
AEGISモデルのARC-Challenge評価改善とGSM8K健全性チェックのためのPlanモード。タイムアウト率・抽出失敗率分析、頑健な回答抽出、データ汚染検査、複数seed評価を実行。
This skill should be used when optimizing strategy parameters through grid search, random search, or Bayesian optimization. It provides systematic approaches to find optimal parameter combinations while avoiding overfitting through cross-validation and walk-forward methods.
Machine Learning Systems - comprehensive knowledge for building production ML systems from data engineering through deployment and operations. Based on Harvard ML Systems course and Designing ML Systems by Chip Huyen.
Production-grade ML model monitoring, drift detection, and observability
Guide for adversarial machine learning: adversarial examples, data poisoning, model backdoors, and evasion attacks.
Python machine learning with scikit-learn, PyTorch, and TensorFlow
YAML for configuration-driven engineering workflows, model setup, and analysis parameters
ML deployment paradigms including batch vs real-time inference, online vs offline serving, edge deployment, and serverless ML.
Discover patterns in unlabeled data using clustering, dimensionality reduction, and anomaly detection
Advanced techniques for optimizing LLM fine-tuning. Covers learning rates, LoRA configuration, batch sizes, gradient strategies, hyperparameter tuning, and monitoring. Use when fine-tuning models for best performance.
Master machine learning foundations - algorithms, preprocessing, feature engineering, and evaluation
Data engineering, machine learning, AI, and MLOps. From data pipelines to production ML systems and LLM applications.
Research pipeline for topology-aware GNN representation learning on power grids using the PowerGraph benchmark. Use when (1) building physics-guided GNNs for power flow (PF), optimal power flow (OPF), or cascading failure prediction, (2) implementing self-supervised pretraining for power systems, (3) evaluating cascade explanation fidelity against ground-truth masks, or (4) conducting reproducible ML-for-power-systems research. Triggers include "PowerGraph", "power flow GNN", "OPF surrogate", "cascade prediction", "physics-guided GNN", "grid analytics ML", "power system representation learning".