Runbooks

Automated Runbooks

Create and execute automated remediation workflows for Kubernetes issues.

What are Runbooks?

Runbooks are automated workflows that execute predefined actions to remediate issues in your Kubernetes clusters. They can be triggered manually, automatically when certain issues are detected, or on a schedule.

Trigger Types

How runbooks can be triggered

Manual

Execute on-demand from the dashboard or API. Ideal for one-off operations or testing.

Automatic

Triggered when specific issue types are detected. Enables self-healing infrastructure.

Scheduled

Run on a cron schedule. Perfect for maintenance tasks and periodic cleanups.

Available Actions

Runbooks support these action types:

restart_pod

Delete a pod to trigger recreation by its controller

{
  "type": "restart_pod",
  "target": "{{issue.resourceName}}",
  "namespace": "{{issue.namespace}}"
}

scale_deployment

Scale a deployment up or down

{
  "type": "scale_deployment",
  "deployment": "api-server",
  "namespace": "default",
  "replicas": 3
}

patch_resource

Apply a JSON patch to any Kubernetes resource

{
  "type": "patch_resource",
  "kind": "Deployment",
  "name": "api-server",
  "namespace": "default",
  "patch": {"spec": {"template": {"spec": {"containers": [{"name": "app", "resources": {"limits": {"memory": "512Mi"}}}]}}}}
}

run_command

Execute a command inside a container

{
  "type": "run_command",
  "pod": "cache-redis-0",
  "namespace": "redis",
  "container": "redis",
  "command": ["redis-cli", "FLUSHDB"]
}

notify

Send notifications via Slack, email, or webhook

{
  "type": "notify",
  "channel": "slack",
  "message": "Runbook executed: {{runbook.name}} for issue {{issue.type}}"
}

Creating a Runbook

Step-by-step guide to creating your first runbook

  1. Navigate to Runbooks

    Go to Runbooks in the dashboard and click "New Runbook"

  2. Configure Basic Settings

    Set a name, description, and select which issue types should trigger this runbook

  3. Add Actions

    Add one or more actions to execute. Actions run sequentially.

  4. Set Conditions (Optional)

    Add conditions to control when the runbook executes based on issue severity, cluster, namespace, etc.

  5. Test with Dry Run

    Use the "Dry Run" feature to preview what actions would be taken without executing them

  6. Enable the Runbook

    Toggle the runbook to enabled when ready for use

GitOps Workflow

Create pull requests instead of direct changes

For clusters with GitOps enabled, runbooks can create pull requests instead of applying changes directly. This enables review workflows and maintains your Git repository as the source of truth.

Best Practices

Start with Manual Triggers

Test runbooks manually before enabling automatic triggers to ensure they behave as expected.

Use Conditions

Add conditions to prevent runbooks from executing in inappropriate contexts (e.g., only in staging, not production).

Include Notifications

Always include a notify action so your team knows when automated remediation occurs.

Keep Actions Idempotent

Design actions that can safely run multiple times without causing issues.

Execution History

All runbook executions are logged with:

  • Trigger source (manual, automatic, scheduled)
  • Who triggered it (user email or "system")
  • Associated issue (if applicable)
  • Status (pending, running, completed, failed)
  • Output and error messages from each action
  • Duration and timestamps

View execution history from the runbook detail page or the Settings > Audit section.