dyad-deflake-e2e-recent-commits
Automatically gather flaky E2E tests from recent CI runs on the main branch and from recent PRs by wwwillchen/keppo-bot/dyad-assistant, then deflake them.
Automatically gather flaky E2E tests from recent CI runs on the main branch and from recent PRs by wwwillchen/keppo-bot/dyad-assistant, then deflake them.
Identify and fix flaky E2E tests by running them repeatedly and investigating failures.
Rebase E2E test snapshots based on failed tests from the PR comments.
Guide for writing UI tests using IDE Starter and UI Driver frameworks. Use when creating or modifying UI tests or when user ask to implement test case from testops.
Guidelines for writing tests in IntelliJ codebase. Use when creating new test classes or test methods.
Guide for writing UI tests using IDE Starter and UI Driver frameworks. Use when creating or modifying UI tests or when user ask to implement test case from testops.
Guidelines for writing tests in IntelliJ codebase. Use when creating new test classes or test methods.
Creates or updates promptfoo evaluation suites (promptfooconfig.yaml, prompts, tests, assertions, providers). Use when adding eval coverage, debugging regressions, or scaffolding a new eval matrix.
E2E validation workflow for frontend changes in playground packages using Playwright MCP
本技能用于编写 EGG 应用的单元测试。覆盖 HTTP 接口测试、Service/DI 对象测试、Mock 数据模拟、BackgroundTask 和 EventBus 测试。使用 @eggjs/mock、app.httpRequest()、app.getEggObject()、mm() 等 API。
Playwright E2E test generation workflow for Opik. Use when generating, fixing, or planning automated tests in tests_end_to_end/.
Improve test coverage in the OpenAI Agents Python repository: run `make coverage`, inspect coverage artifacts, identify low-coverage files, propose high-impact tests, and confirm with the user before writing tests.
Apply property-based testing to find security vulnerabilities by generating adversarial inputs automatically. Use when writing property tests for security invariants, fuzz-testing parsers or validators, testing auth boundaries with generated inputs, or verifying cryptographic properties.
Application security testing toolkit from the Trail of Bits Testing Handbook. Helps the agent set up fuzzing campaigns, write fuzz harnesses, run coverage-guided fuzzers (libFuzzer, AFL++, cargo-fuzz, Atheris, Ruzzy), and triage crashes. Covers memory-safety sanitizers (AddressSanitizer, UBSan, MSan), static analysis with Semgrep and CodeQL, cryptographic validation using Wycheproof test vectors, and constant-time verification. Use when testing C, C++, Rust, Python, or Ruby code for vulnerabilities, improving code coverage, building seed corpora, creating fuzzing dictionaries, overcoming fuzzing obstacles, or integrating security checks into CI/CD with OSS-Fuzz.
Guides agents in using Wycheproof test vectors to validate cryptographic implementations against known attacks, edge cases, and vulnerability patterns. Covers integrating test vectors for AES-GCM, ECDSA, ECDH, EdDSA, RSA, and ChaCha20-Poly1305 into testing workflows. Helps when writing crypto tests, checking for signature malleability, invalid curve attacks, padding oracle issues, DER encoding bugs, or setting up CI for cryptographic libraries. Applies to verifying encryption, decryption, signing, and key exchange correctness using structured JSON test vector suites.
Write Storybook stories and visual regression tests for the Kilo VS Code extension webview UI
Extract a standalone JIT regression test case from a given GitHub issue and save it under the JitBlue folder. USE FOR: creating JIT regression tests, extracting repro code from dotnet/runtime issues, "write a test for this JIT bug", "create a regression test for issue #NNNNN", converting issue repro to xunit test. DO NOT USE FOR: non-JIT tests (use standard test patterns), debugging JIT issues without a known repro, performance benchmarks (use performance-benchmark skill).
Audit build and CI configuration for correctness risks
Discover test coverage gaps that could hide correctness defects
Audit exception safety and failure atomicity across all throw sites
Audit one cache subsystem for concurrency correctness defects