llm-evaluation
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.
Expert Python developer specializing in modern Python 3.11+ with deep expertise in type safety, async programming, testing, and production-grade code. Invoke for Pythonic patterns, type hints, pytest, async/await, dataclasses.
Design LLM applications using the LangChain framework with agents, memory, and tool integration patterns. Use when building LangChain applications, implementing AI agents, or creating complex LLM workflows.
Newest DSPy optimizer using LLM reflection on execution trajectories for agentic systems
Implements real-time streaming UI patterns for ChatKit applications. This skill should be used when adding response lifecycle management, progress indicators, client effects, and thread state synchronization. Covers onResponseStart/End, onEffect, ProgressUpdateEvent, and thread lifecycle events.
Understand the components, mechanics, and constraints of context in agent systems. Use when designing agent architectures, debugging context-related failures, or optimizing context usage.
Creates and refines Claude agent skills following best practices. Use when creating new skills, improving existing ones, or learning about skill structure and conventions.
Build AI chat interfaces with custom backends, authentication, and context injection. Use when integrating chat UI with AI agents, adding auth to chat, injecting user/page context, or implementing httpOnly cookie proxies. Covers ChatKitServer, useChatKit, and MCP auth patterns. NOT when building simple chatbots without persistence or custom agent integration.
Design multi-agent architectures for complex tasks. Use when single-agent context limits are exceeded, when tasks decompose naturally into subtasks, or when specializing agents improves quality.
Apply optimization techniques to extend effective context capacity. Use when context limits constrain agent performance, when optimizing for cost or latency, or when implementing long-running agent systems.
This skill should be used when writing or improving system prompts for AI agents, providing expert guidance based on Anthropic's context engineering principles.
Build Retrieval-Augmented Generation (RAG) systems for LLM applications with vector databases and semantic search. Use when implementing knowledge-grounded AI, building document Q&A systems, or integrating LLMs with external knowledge bases.
State-of-the-art Bayesian optimization for DSPy programs with joint instruction and demo tuning
Fine-tune LLM weights using DSPy's BootstrapFinetune optimizer
Build a custom marketing strategy through an interactive interview. Develops business positioning, audience targeting, channel selection, and customer journey—then exports as a reusable JSON profile.
Forecast categories, weighted pipeline calculations, deal scoring models, and forecast accuracy metrics.
Automate comprehensive market research using web data, competitive analysis, and structured synthesis. Use when researching markets, industries, competitors, or target audiences.
Measurement framework for Answer Engine Optimization (AEO). Provides AI visibility metrics, share of voice tracking, citation monitoring, and referral demand measurement. Use when discussing AEO/GEO metrics or AI visibility performance.
Create powerful pitch closings with market insights, proprietary advantages, founder mission, and subtle FOMO triggers in under 45 seconds.
Generate and evaluate marketing slogans for any product or service. Creates options across multiple angles, scores against criteria, and recommends the best fit.
Codify your brand's writing style into a reusable voice guide. Analyzes existing content to extract patterns, then generates a comprehensive style document for consistent messaging across all channels.