k8s-troubleshoot
Debug Kubernetes pods, nodes, and workloads. Use when pods are failing, containers crash, nodes are unhealthy, or users mention debugging, troubleshooting, or diagnosing Kubernetes issues.
Debug Kubernetes pods, nodes, and workloads. Use when pods are failing, containers crash, nodes are unhealthy, or users mention debugging, troubleshooting, or diagnosing Kubernetes issues.
Respond to Kubernetes incidents with runbooks and diagnostics. Use for outages, pod failures, node issues, network problems, and emergency response.
kubectl operations for applying, patching, deleting, and executing commands on Kubernetes resources. Use when modifying resources, running commands in pods, or managing resource lifecycle.
Cilium and Hubble network observability for Kubernetes. Use when managing network policies, observing traffic flows, or troubleshooting connectivity with eBPF-based networking.
Orchestrate a configurable, multi-member CLI planning council (Codex, Claude Code, Gemini, OpenCode, or custom) to produce independent implementation plans, anonymize and randomize them, then judge and merge into one final plan. Use when you need a robust, bias-resistant planning workflow, structured JSON outputs, retries, and failure handling across multiple CLI agents.
Use this skill for GitHub issue, PR, and release operations for Weekly blog via gh CLI.
Search and install skills from skills.sh and GitHub repos. Use when users ask to find skills, install skills, download skills, add skills from GitHub, search for skills, browse skills, get a skill, or want new capabilities. Trigger phrases include "install skill", "find skill", "search skills", "add skill", "download skill", "get skill from github", "skills.sh", "browse skills", "what skills are available", "I need a skill for".
Start, monitor, and babysit Zephyr pipeline jobs on Iris. Use when launching a zephyr job, watching it run, or restarting after failures.
Use the legacy `scripts/ray/dev_tpu.py` workflow to allocate a temporary Ray-backed TPU VM for fast debugging, testing, and benchmark iteration. Use only when you specifically need the Ray-backed dev TPU path.
Scheduled scrub workflow for docs-code parity in the Marin repository.
File a GitHub issue from the current conversation. Use when bugs, regressions, or improvements are identified during a session and need to be captured as a tracked issue.
Scheduled scrub workflow for ongoing self-improvement in the Marin repository.
Fix a bug using test-driven development. Analyze the problem, write a failing test first, fix the code, validate, then commit. Use for in-branch bug fixes where the issue is already understood. For bugs originating from a GitHub issue, prefer fix-issue instead.
Specialized skill for developing and refactoring the DST (Don't Starve Together) Admin Go project. Use this skill whenever the user mentions DST, game server management, API refactoring, adding handlers/services, creating CRUD endpoints, or working within the dst-admin-go codebase. This skill ensures all code follows the project's architectural patterns including dependency injection, layered architecture (Handler → Service → Model), and specific naming conventions. Make sure to use this skill when the user asks to refactor existing APIs, add new features, fix bugs, or modify any part of the dst-admin-go project structure.
Use when a quest is ready for a concrete implementation pass or a main experiment run tied to a selected idea and an accepted baseline.
Refine draft skills into strict, reusable SKILL.md documents with stable triggers, preconditions, and failure branches.