Automated Runbooks
Create and execute automated remediation workflows for Kubernetes issues.
What are Runbooks?
Runbooks are automated workflows that execute predefined actions to remediate issues in your Kubernetes clusters. They can be triggered manually, automatically when certain issues are detected, or on a schedule.
Trigger Types
How runbooks can be triggered
Manual
Execute on-demand from the dashboard or API. Ideal for one-off operations or testing.
Automatic
Triggered when specific issue types are detected. Enables self-healing infrastructure.
Scheduled
Run on a cron schedule. Perfect for maintenance tasks and periodic cleanups.
Available Actions
Runbooks support these action types:
restart_pod
Delete a pod to trigger recreation by its controller
{
"type": "restart_pod",
"target": "{{issue.resourceName}}",
"namespace": "{{issue.namespace}}"
}scale_deployment
Scale a deployment up or down
{
"type": "scale_deployment",
"deployment": "api-server",
"namespace": "default",
"replicas": 3
}patch_resource
Apply a JSON patch to any Kubernetes resource
{
"type": "patch_resource",
"kind": "Deployment",
"name": "api-server",
"namespace": "default",
"patch": {"spec": {"template": {"spec": {"containers": [{"name": "app", "resources": {"limits": {"memory": "512Mi"}}}]}}}}
}run_command
Execute a command inside a container
{
"type": "run_command",
"pod": "cache-redis-0",
"namespace": "redis",
"container": "redis",
"command": ["redis-cli", "FLUSHDB"]
}notify
Send notifications via Slack, email, or webhook
{
"type": "notify",
"channel": "slack",
"message": "Runbook executed: {{runbook.name}} for issue {{issue.type}}"
}Creating a Runbook
Step-by-step guide to creating your first runbook
- Navigate to Runbooks
Go to Runbooks in the dashboard and click "New Runbook"
- Configure Basic Settings
Set a name, description, and select which issue types should trigger this runbook
- Add Actions
Add one or more actions to execute. Actions run sequentially.
- Set Conditions (Optional)
Add conditions to control when the runbook executes based on issue severity, cluster, namespace, etc.
- Test with Dry Run
Use the "Dry Run" feature to preview what actions would be taken without executing them
- Enable the Runbook
Toggle the runbook to enabled when ready for use
GitOps Workflow
Create pull requests instead of direct changes
For clusters with GitOps enabled, runbooks can create pull requests instead of applying changes directly. This enables review workflows and maintains your Git repository as the source of truth.
How it works
Best Practices
Start with Manual Triggers
Test runbooks manually before enabling automatic triggers to ensure they behave as expected.
Use Conditions
Add conditions to prevent runbooks from executing in inappropriate contexts (e.g., only in staging, not production).
Include Notifications
Always include a notify action so your team knows when automated remediation occurs.
Keep Actions Idempotent
Design actions that can safely run multiple times without causing issues.
Execution History
All runbook executions are logged with:
- Trigger source (manual, automatic, scheduled)
- Who triggered it (user email or "system")
- Associated issue (if applicable)
- Status (pending, running, completed, failed)
- Output and error messages from each action
- Duration and timestamps
View execution history from the runbook detail page or the Settings > Audit section.