Testing & Security
QA, penetration testing, and code quality.
python-review
Python code review guidelines for the Cog SDK
python-development
Coding standards, conventions, and patterns for developing Python code in the Agent Framework repository. Use this when writing or modifying Python source files in the python/ directory.
verify-dotnet-samples
How to build, run and verify the .NET sample projects in the Agent Framework repository. Use this when a user wants to verify that the samples still function as expected.
python-testing
Guidelines for writing and running tests in the Agent Framework Python codebase. Use this when creating, modifying, or running tests.
phoenix-playwright-tests
Write Playwright E2E tests for the Phoenix AI observability platform. Use when creating, updating, or debugging Playwright tests, or when the user asks about testing UI features, writing E2E tests, or automating browser interactions for Phoenix.
phoenix-client-development
Guide for the phoenix-client TypeScript package — experiment lifecycle, tracer provider management, and test conventions.
grouped-tools-test
Test skill for groupedTools. When executing this skill, use the record_result tool to record the result value.
sample-skill
Sample skill fixture for classpath registry enhancement tests.
risingwave-rust-analyzer
Use rust-analyzer CLI and editor/LSP settings to inspect, diagnose, and refactor RisingWave Rust code. Use when working in the RisingWave workspace and you need fast semantic analysis, unresolved-reference checks, macro-aware navigation, structured search/replace, or guidance on choosing the correct crate root and feature flags before heavier cargo or risedev commands.
code-review
Review changed code against project standards. Checks for missing tests, dead code, type safety, lint issues, and coding conventions. Run after completing any implementation work.
create-new-gosec-rule
Propose and implement a new generic gosec rule from a Go security issue description.
testing-android-code
This skill should be used when writing or reviewing tests for Android code in Bitwarden. Triggered by "BaseViewModelTest", "BitwardenComposeTest", "BaseServiceTest", "stateEventFlow", "bufferedMutableSharedFlow", "FakeDispatcherManager", "expectNoEvents", "assertCoroutineThrows", "createMockCipher", "createMockSend", "asSuccess", "Why is my Bitwarden test failing?", or testing questions about ViewModels, repositories, Compose screens, or data sources in Bitwarden.
perform-android-preflight-checklist
Quality gate checklist to run before committing or creating a PR. Use when finishing implementation, checking work quality, or preparing to commit. Triggered by "self review", "check my work", "ready to commit", "done implementing", "review checklist", "quality check".
build-test-verify
Build, test, lint, and deploy commands for the Bitwarden Android project. Use when running tests, building APKs/AABs, running lint/detekt, deploying, using fastlane, or discovering codebase structure. Triggered by "run tests", "build", "gradle", "lint", "detekt", "deploy", "fastlane", "assemble", "verify", "coverage".
harness-eval
This skill should be used when the user asks to "test the harness", "run integration tests", "validate features with real API", "test with real model calls", "run agent loop tests", "verify end-to-end", or needs to verify OpenHarness features on a real codebase with actual LLM calls.
upgrade-dep
Upgrade a dependency in the Sentry JavaScript SDK. Use when upgrading packages, bumping versions, or fixing security vulnerabilities via dependency updates.
skill-scanner
Scan agent skills for security issues. Use when asked to "scan a skill", "audit a skill", "review skill security", "check skill for injection", "validate SKILL.md", or assess whether an agent skill is safe to install. Checks for prompt injection, malicious scripts, excessive permissions, secret exposure, and supply chain risks.
bump-size-limit
Bump size limits in .size-limit.js when the size-limit CI check is failing. Use when the user mentions size limit failures, bundle size checks failing, CI size check errors, or needs to update size-limit thresholds. Also use when the user says "bumpSizeLimit", "fix size limit", "size check failing", or "update bundle size limits".
validate-prompts
Validate extracted Claude Code prompt data by reading files and checking rules directly — no external scripts or API calls needed. Checks JSON structure (30+ rules), generated markdown files, README consistency, and semantic variable name correctness. Use whenever asked to validate prompt JSON files, check generated output, run pre-release checks, debug validation errors, or analyze variable naming. Trigger phrases: "validate", "check prompts", "run validation", "verify prompts", "structural checks", "semantic check", "release prep". Also use when investigating a specific validation rule (A1–A21, B1–B6, C1–C7, A23) or when encountering errors in prompt data.
verify-changelog
Verify changelog entries against actual prompt diffs by reading both JSON files and evaluating accuracy directly. Compares two prompt JSON versions (old → new), identifies added/removed/changed prompts, and checks that a human-written changelog accurately describes the changes. Use whenever writing, reviewing, or verifying a changelog entry for a new Claude Code version, when comparing prompt versions, when preparing a release, or when asked to "verify changelog", "check changelog", "changelog accuracy", or "diff vs changelog". Also use when asked whether a changelog is correct, complete, or well-worded, or when asked to help write a changelog for a version.
code-review
Performs an architectural and quality code review on a specified file or set of files. Checks for coding standard compliance, architectural pattern adherence, SOLID principles, testability, and performance concerns.