eval-recipes-runner
Run Microsoft's eval-recipes benchmarks to validate amplihack improvements against baseline agents. Auto-activates when testing improvements, running evals, or benchmarking changes.
Run Microsoft's eval-recipes benchmarks to validate amplihack improvements against baseline agents. Auto-activates when testing improvements, running evals, or benchmarking changes.
Ralph Wiggum persistence loop with intelligent multi-model routing (Gemini, Codex, Claude, Council)
Expert ML engineer specializing in machine learning model lifecycle, production deployment, and ML system optimization. Masters both traditional ML and deep learning with focus on building scalable, reliable ML systems from training to serving.
Model-Development standards for model development in Ml Ai environments.
Expert AI engineer specializing in AI system design, model implementation, and production deployment. Masters multiple AI frameworks and tools with focus on building scalable, efficient, and ethical AI solutions from research to production.
Expert LLM architect specializing in large language model architecture, deployment, and optimization. Masters LLM system design, fine-tuning strategies, and production serving with focus on building scalable, efficient, and safe LLM applications.
Expert knowledge in AI/ML development, model deployment, and MLOps practices
Expert prompt engineer specializing in designing, optimizing, and managing prompts for large language models. Masters prompt architecture, evaluation frameworks, and production prompt systems with focus on reliability, efficiency, and measurable outcomes.
Expert ML engineer specializing in production model deployment, serving infrastructure, and scalable ML systems. Masters model optimization, real-time inference, and edge deployment with focus on reliability and performance at scale.
Expert NLP engineer specializing in natural language processing, understanding, and generation. Masters transformer models, text processing pipelines, and production NLP systems with focus on multilingual support and real-time performance.
Model-Deployment standards for model deployment in Ml Ai environments.
生成適用於金融機器學習的高質量特徵。包含分數階差分 (Fractional Differentiation) 與無前視偏差的滾動標準化。
Index directory for automatically learned skills from execution feedback
Expert MLOps engineer specializing in ML infrastructure, platform engineering, and operational excellence for machine learning systems. Masters CI/CD for ML, model versioning, and scalable ML platforms with focus on reliability and automation.
ML inference latency optimization, model compression, distillation, caching strategies, and edge deployment patterns. Use when optimizing inference performance, reducing model size, or deploying ML at the edge.
LLM inference infrastructure, serving frameworks (vLLM, TGI, TensorRT-LLM), quantization techniques, batching strategies, and streaming response patterns. Use when designing LLM serving infrastructure, optimizing inference latency, or scaling LLM deployments.
Use when you have validated symmetry groups and need to design neural network architecture that respects those symmetries. Invoke when user mentions equivariant layers, G-CNN, e3nn, steerable networks, building symmetry into model, or needs architecture recommendations for specific symmetry groups. Provides architecture patterns and implementation guidance.
Choose appropriate model for custom agent tasks. Use when selecting between Haiku, Sonnet, and Opus for agents, optimizing cost vs quality tradeoffs, or matching model capability to task complexity.
C4 model architecture visualization and documentation
Measure model performance on test datasets. Use when assessing accuracy, precision, recall, and other metrics.
Use when you need to empirically test whether hypothesized symmetries actually hold in your data or model. Invoke when user mentions testing invariance, validating equivariance, checking if symmetry assumptions are correct, debugging symmetry-related model failures, or needs data-driven validation before committing to equivariant architecture. Provides test protocols and metrics.