How to Configure Alerts
Define policies that open incidents and send notifications when servers or monitors enter a failure state.
Overview
An Alert Policy is a rule that tells Pinguzo: "when this metric crosses this threshold for this long, open an incident and notify these contacts." You can create policies that apply to all of your servers or monitors, or target a specific resource.
Alert policies are evaluated by the edge servers continuously. When a condition is met, the edge opens an incident record and dispatches notifications to the configured contacts.
Opening Alert Policies
Navigate to Alerts in the left sidebar, then click the Alert Policies tab. A list of all your policies is shown with their current enabled/disabled state.
Creating an Alert Policy
Click "Add Alert Policy"
The button is in the top-right of the Alert Policies tab. A form opens.
Name the policy
Enter a descriptive name so you can identify it at a glance (e.g., "High CPU on production servers" or "Payment API down").
Choose the metric
Select what to watch. The list changes based on whether you choose a server metric or a monitor metric (see Metric Reference below).
Set the condition and threshold
Choose a comparison operator and enter a numeric threshold value. Not required for no_data and monitor_status metrics.
Set timing options
Configure Alert after (how long the condition must persist before triggering) and Repeat alert after (cooldown between repeated notifications).
Choose scope
Apply the policy to all servers/monitors or to a specific one using the dropdown.
Select contacts
Choose one or more contacts to notify. You must have at least one contact configured — see Configure Contacts.
Save
Click Save Policy. The policy is active immediately.
Metric Reference
Server Metrics
| Metric Key | Description | Unit | Typical Threshold |
|---|---|---|---|
cpu_percent | Overall CPU usage | % | 80% |
cpu_steal | CPU steal time (VMs only) | % | 10% |
memory_percent | RAM usage | % | 90% |
disk_percent | Highest disk partition usage | % | 90% |
load_per_core | 5-minute load average ÷ CPU cores | ratio | 2.0 |
uptime_seconds | System uptime (used to detect reboots) | seconds | < 300 (consider rebooted) |
no_data | No data received from agent (no threshold needed) | — | — |
Monitor Metrics
| Metric Key | Description |
|---|---|
monitor_status | Monitor check failed (no threshold needed — triggers on any failure) |
Condition Operators
| Operator | Symbol | Meaning |
|---|---|---|
gt | > | Metric is greater than the threshold |
gte | ≥ | Metric is greater than or equal to the threshold |
lt | < | Metric is less than the threshold |
lte | ≤ | Metric is less than or equal to the threshold |
eq | = | Metric equals the threshold exactly |
Timing Options
Alert after (duration)
The number of minutes a threshold must be continuously exceeded before Pinguzo opens an incident. This prevents false alarms from brief spikes.
- 0 minutes: Alert fires immediately on first threshold breach (good for monitors)
- 5–15 minutes: Typical for CPU/memory alerts to avoid alerting on momentary spikes
- Maximum: 1,440 minutes (24 hours)
Repeat alert after (cooldown)
How many minutes to wait before sending another notification for the same ongoing incident. Prevents notification fatigue for long outages.
- 0 minutes: Never repeat (only notify once at incident start)
- 60 minutes: Send a reminder every hour while the incident is open (recommended)
- Maximum: 1,440 minutes (24 hours)
Scope: Targeting Specific Resources
The Apply to field controls which resources the policy covers:
| Option | Behavior |
|---|---|
| All servers | The policy applies to every server in your account. New servers added in the future are also covered automatically. |
| Specific server | The policy only applies to the selected server. Use this for server-specific thresholds (e.g., the database server needs a stricter disk alert). |
| All monitors | Applies to every monitor. New monitors are automatically covered. |
Quick-Start Templates
When you click Add Alert Policy, Pinguzo offers quick-start templates to get common policies set up in one click:
High CPU Usage
Triggers when CPU ≥ 80% for 10 minutes on any server.
High Memory Usage
Triggers when memory ≥ 90% for 10 minutes on any server.
High Disk Usage
Triggers when disk ≥ 90% for 15 minutes on any server.
High Load Average
Triggers when load-per-core ≥ 2.0 for 10 minutes on any server.
Server Rebooted
Triggers when uptime drops below 5 minutes (300 seconds) — detects unexpected reboots.
No Data Received
Triggers when no agent data is received for 15 minutes — detects agent crashes or network issues.
Monitor Down
Triggers immediately when any monitor check fails.
High CPU Steal
Triggers when CPU steal ≥ 10% for 5 minutes — signals hypervisor oversubscription.
Managing Alert Policies
Enable / Disable
Use the toggle switch on each policy row to temporarily disable a policy without deleting it. Disabled policies do not trigger incidents or send notifications. This is useful during planned maintenance windows.
Edit
Click the Edit (pencil) icon to update any field. Changes take effect immediately.
Delete
Click the Delete (trash) icon and confirm. Deleting a policy does not close any currently open incidents associated with it — those incidents will resolve naturally when the condition clears.
How Alert Triggering Works
- The edge server evaluates alert policies on each metric data point
- If the condition is met, a timer starts (Alert after countdown)
- If the condition remains met until the timer expires, an incident is opened
- Notifications are dispatched to all selected contacts
- The cooldown timer starts — no further notifications until it expires
- If the metric returns to normal before the timer expires, the timer resets (no incident is opened)
- When the incident condition clears, the incident is resolved and recovery notifications are sent
Next Steps
- Configure Contacts — add email, Slack, Discord, Telegram, or webhook channels
- View Incidents — see all incidents opened by your alert policies