design-experiment
Plan LLM fine-tuning and evaluation experiments. Use when the user wants to design a new experiment, plan training runs, or create an experiment_summary.yaml file.
Plan LLM fine-tuning and evaluation experiments. Use when the user wants to design a new experiment, plan training runs, or create an experiment_summary.yaml file.
Optimizes black-box functions (e.g., experimental yield) using Gaussian Processes, ideal for self-driving labs.
Set up complete experimental infrastructure for all runs in a designed experiment. Orchestrates parallel generation of fine-tuning configs (via scaffold-torchtune) and evaluation configs (via scaffold-inspect). Use after design-experiment to prepare configs before running experiments.
Execute the complete experimental workflow - model optimization followed by evaluation - for all runs in a scaffolded experiment. Use after scaffold-experiment to submit jobs to SLURM.
Router to training optimization skills based on symptoms and training problems
LLM specialist router to prompt engineering, fine-tuning, RAG, evaluation, and safety skills.
Complete fal.ai model selection system. PROACTIVELY activate for: (1) Choosing image generation models (FLUX, SDXL), (2) Choosing video models (Kling, Sora, LTX), (3) Choosing audio models (Whisper, ElevenLabs), (4) Model quality vs speed comparison, (5) Cost optimization by model tier, (6) 3D generation models, (7) Model-specific parameters, (8) Development vs production model selection. Provides: Model comparison tables, decision trees, pricing tiers, performance benchmarks. Ensures optimal model selection for quality, speed, and cost.
Route AI/ML tasks to correct Yzmir pack - frameworks, training, RL, LLMs, architectures, production
Use when building networks that grow, prune, or adapt topology during training. Routes to continual learning, gradient isolation, modular composition, and lifecycle orchestration skills.
Expert guidance for regression analysis, statistical modeling, and outlier detection in Python using statsmodels, scikit-learn, scipy, and PyOD - includes model diagnostics, assumption checking, robust methods, and comprehensive outlier detection strategies
Expert guidance for mathematical optimization in Python - systematic problem classification, library selection (scipy, pyomo, cvxpy, GEKKO), solver configuration, and implementation patterns for LP, QP, NLP, MIP, convex, and global optimization problems
Multi-signal framework detection with confidence scoring for 6 major frameworks
The architecture selection router for CNNs, Transformers, RNNs, GANs, GNNs by data modality and constraints
Adaptive epoch selection for Walk-Forward Optimization using efficient frontier analysis. Per-fold epoch sweeps with WFE-based selection and carry-forward priors. TRIGGERS - epoch selection, WFO epoch, walk-forward epoch, training epochs WFO, efficient frontier epochs, overfitting epochs, epoch sweep, BiLSTM epochs, WFE optimization, adaptive hyperparameter, Pardo WFE, epoch carry-forward.
Expert guidance for multiobjective optimization in Python - Pareto optimality, evolutionary algorithms (NSGA-II, NSGA-III, MOEA/D), scalarization methods, Pareto front analysis, and implementation with pymoo, platypus, and DEAP
Systematic improvement of existing agents through performance analysis, prompt engineering, and continuous iteration.
Master advanced prompt engineering techniques to maximize LLM performance, reliability, and controllability in production. Use when optimizing prompts, improving LLM outputs, or designing production prompt templates.
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.
Machine learning in Python with scikit-learn. Use when working with supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), model evaluation, hyperparameter tuning, preprocessing, or building ML pipelines. Provides comprehensive reference documentation for algorithms, preprocessing techniques, pipelines, and best practices.
Reduces LLM costs and improves response times through caching, model selection, batching, and prompt optimization. Provides cost breakdowns, latency hotspots, and configuration recommendations. Use for "cost reduction", "performance optimization", "latency improvement", or "efficiency".
Add and manage evaluation results in Hugging Face model repositories using the new .eval_results/ format. Supports extracting scores from model cards, importing from Artificial Analysis API, and batch processing trending models.
Expert guidance for Stan probabilistic programming language development, including modern syntax, cmdstanr/cmdstanpy integration, and testing patterns
Bayesian statistical modeling with PyMC v5+. Use when building probabilistic models, specifying priors, running MCMC inference, diagnosing convergence, or comparing models. Covers PyMC, ArviZ, pymc-bart, pymc-extras, nutpie, and JAX/NumPyro backends. Triggers on tasks involving: Bayesian inference, posterior sampling, hierarchical/multilevel models, GLMs, time series, Gaussian processes, BART, mixture models, prior/posterior predictive checks, MCMC diagnostics, LOO-CV, WAIC, model comparison, or causal inference with do/observe.
This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.