How to Configure Alerts

Define policies that open incidents and send notifications when servers or monitors enter a failure state.

Overview

An Alert Policy is a rule that tells Pinguzo: "when this metric crosses this threshold for this long, open an incident and notify these contacts." You can create policies that apply to all of your servers or monitors, or target a specific resource.

Alert policies are evaluated by the edge servers continuously. When a condition is met, the edge opens an incident record and dispatches notifications to the configured contacts.

Opening Alert Policies

Navigate to Alerts in the left sidebar, then click the Alert Policies tab. A list of all your policies is shown with their current enabled/disabled state.

Creating an Alert Policy

1

Click "Add Alert Policy"

The button is in the top-right of the Alert Policies tab. A form opens.

2

Name the policy

Enter a descriptive name so you can identify it at a glance (e.g., "High CPU on production servers" or "Payment API down").

3

Choose the metric

Select what to watch. The list changes based on whether you choose a server metric or a monitor metric (see Metric Reference below).

4

Set the condition and threshold

Choose a comparison operator and enter a numeric threshold value. Not required for no_data and monitor_status metrics.

5

Set timing options

Configure Alert after (how long the condition must persist before triggering) and Repeat alert after (cooldown between repeated notifications).

6

Choose scope

Apply the policy to all servers/monitors or to a specific one using the dropdown.

7

Select contacts

Choose one or more contacts to notify. You must have at least one contact configured — see Configure Contacts.

8

Save

Click Save Policy. The policy is active immediately.

Metric Reference

Server Metrics

Metric KeyDescriptionUnitTypical Threshold
cpu_percentOverall CPU usage%80%
cpu_stealCPU steal time (VMs only)%10%
memory_percentRAM usage%90%
disk_percentHighest disk partition usage%90%
load_per_core5-minute load average ÷ CPU coresratio2.0
uptime_secondsSystem uptime (used to detect reboots)seconds< 300 (consider rebooted)
no_dataNo data received from agent (no threshold needed)

Monitor Metrics

Metric KeyDescription
monitor_statusMonitor check failed (no threshold needed — triggers on any failure)

Condition Operators

OperatorSymbolMeaning
gt>Metric is greater than the threshold
gteMetric is greater than or equal to the threshold
lt<Metric is less than the threshold
lteMetric is less than or equal to the threshold
eq=Metric equals the threshold exactly
Use >= for percentage thresholds For CPU, memory, and disk alerts, use (gte) rather than > (gt). This ensures the alert fires when the metric hits exactly the threshold value, not just when it exceeds it.

Timing Options

Alert after (duration)

The number of minutes a threshold must be continuously exceeded before Pinguzo opens an incident. This prevents false alarms from brief spikes.

Repeat alert after (cooldown)

How many minutes to wait before sending another notification for the same ongoing incident. Prevents notification fatigue for long outages.

Scope: Targeting Specific Resources

The Apply to field controls which resources the policy covers:

OptionBehavior
All servers The policy applies to every server in your account. New servers added in the future are also covered automatically.
Specific server The policy only applies to the selected server. Use this for server-specific thresholds (e.g., the database server needs a stricter disk alert).
All monitors Applies to every monitor. New monitors are automatically covered.

Quick-Start Templates

When you click Add Alert Policy, Pinguzo offers quick-start templates to get common policies set up in one click:

🔥

High CPU Usage

Triggers when CPU ≥ 80% for 10 minutes on any server.

💾

High Memory Usage

Triggers when memory ≥ 90% for 10 minutes on any server.

💿

High Disk Usage

Triggers when disk ≥ 90% for 15 minutes on any server.

⚖️

High Load Average

Triggers when load-per-core ≥ 2.0 for 10 minutes on any server.

🔄

Server Rebooted

Triggers when uptime drops below 5 minutes (300 seconds) — detects unexpected reboots.

📡

No Data Received

Triggers when no agent data is received for 15 minutes — detects agent crashes or network issues.

🔴

Monitor Down

Triggers immediately when any monitor check fails.

☁️

High CPU Steal

Triggers when CPU steal ≥ 10% for 5 minutes — signals hypervisor oversubscription.

Managing Alert Policies

Enable / Disable

Use the toggle switch on each policy row to temporarily disable a policy without deleting it. Disabled policies do not trigger incidents or send notifications. This is useful during planned maintenance windows.

Edit

Click the Edit (pencil) icon to update any field. Changes take effect immediately.

Delete

Click the Delete (trash) icon and confirm. Deleting a policy does not close any currently open incidents associated with it — those incidents will resolve naturally when the condition clears.

How Alert Triggering Works

  1. The edge server evaluates alert policies on each metric data point
  2. If the condition is met, a timer starts (Alert after countdown)
  3. If the condition remains met until the timer expires, an incident is opened
  4. Notifications are dispatched to all selected contacts
  5. The cooldown timer starts — no further notifications until it expires
  6. If the metric returns to normal before the timer expires, the timer resets (no incident is opened)
  7. When the incident condition clears, the incident is resolved and recovery notifications are sent
At least one contact is required A policy with no contacts selected will still open incidents, but no one will be notified. Always assign at least one contact. See Configure Contacts.

Next Steps