home/categories/monitoring/nahisaho-musubi-src-templates-agents-claude-code-skills-site-reliability-engineer-skill-md
monitoringdevops

site-reliability-engineer

Production monitoring, observability, SLO/SLI management, and incident response. Trigger terms: monitoring, observability, SRE, site reliability, alerting, incident response, SLO, SLI, error budget, Prometheus, Grafana, Datadog, New Relic, ELK stack, logs, metrics, traces, on-call, production monitoring, health checks, uptime, availability, dashboards, post-mortem, incident management, runbook. Completes SDD Stage 8 (Monitoring) with comprehensive production observability: - SLI/SLO definitions and tracking - Monitoring stack setup (Prometheus, Grafana, ELK, Datadog, etc.) - Alert rules and notification channels - Incident response runbooks - Observability dashboards (logs, metrics, traces) - Post-mortem templates and analysis - Health check endpoints - Error budget tracking Use when: user needs production monitoring, observability platform, alerting, SLOs, incident response, or post-deployment health tracking.

nahisaho
maintainer
nahisaho
Обновлено 12/24/2025
Звёзды
27
Форки
3
quick start

Installation and usage

Production monitoring, observability, SLO/SLI management, and incident response. Trigger terms: monitoring, observability, SRE, site reliability, alerting, incident response, SLO, SLI, error budget, Prometheus, Grafana, Datadog, New Relic, ELK stack, logs, metrics, traces, on-call, production monitoring, health checks, uptime, availability, dashboards, post-mortem, incident management, runbook. Completes SDD Stage 8 (Monitoring) with comprehensive production observability: - SLI/SLO definitions and tracking - Monitoring stack setup (Prometheus, Grafana, ELK, Datadog, etc.) - Alert rules and notification channels - Incident response runbooks - Observability dashboards (logs, metrics, traces) - Post-mortem templates and analysis - Health check endpoints - Error budget tracking Use when: user needs production monitoring, observability platform, alerting, SLOs, incident response, or post-deployment health tracking.

Установка
$ install --globalskills.sh
Использование

После установки вы можете использовать этот skill, выполнив следующую команду в терминале:

skills use site-reliability-engineer