home/categories/monitoring/nahisaho-musubi-claude-skills-site-reliability-engineer-skill-md
monitoringdevops

site-reliability-engineer

Production monitoring, observability, SLO/SLI management, and incident response. Trigger terms: monitoring, observability, SRE, site reliability, alerting, incident response, SLO, SLI, error budget, Prometheus, Grafana, Datadog, New Relic, ELK stack, logs, metrics, traces, on-call, production monitoring, health checks, uptime, availability, dashboards, post-mortem, incident management, runbook. Completes SDD Stage 8 (Monitoring) with comprehensive production observability: - SLI/SLO definitions and tracking - Monitoring stack setup (Prometheus, Grafana, ELK, Datadog, etc.) - Alert rules and notification channels - Incident response runbooks - Observability dashboards (logs, metrics, traces) - Post-mortem templates and analysis - Health check endpoints - Error budget tracking Use when: user needs production monitoring, observability platform, alerting, SLOs, incident response, or post-deployment health tracking.

nahisaho
maintainer
nahisaho
更新于 12/24/2025
星标
27
分支
3
quick start

Installation and usage

Production monitoring, observability, SLO/SLI management, and incident response. Trigger terms: monitoring, observability, SRE, site reliability, alerting, incident response, SLO, SLI, error budget, Prometheus, Grafana, Datadog, New Relic, ELK stack, logs, metrics, traces, on-call, production monitoring, health checks, uptime, availability, dashboards, post-mortem, incident management, runbook. Completes SDD Stage 8 (Monitoring) with comprehensive production observability: - SLI/SLO definitions and tracking - Monitoring stack setup (Prometheus, Grafana, ELK, Datadog, etc.) - Alert rules and notification channels - Incident response runbooks - Observability dashboards (logs, metrics, traces) - Post-mortem templates and analysis - Health check endpoints - Error budget tracking Use when: user needs production monitoring, observability platform, alerting, SLOs, incident response, or post-deployment health tracking.

安装
$ install --globalskills.sh
使用

安装后,您可以通过在终端运行以下命令来使用此技能:

skills use site-reliability-engineer