How to Configure Alerts

Define policies that open incidents and send notifications when servers or monitors enter a failure state.

Overview

An Alert Policy is a rule that tells Pinguzo: "when this metric crosses this threshold for this long, open an incident and notify these contacts." You can create policies that apply to all of your servers or monitors, or target a specific resource.

Alert policies are evaluated by the edge servers continuously. When a condition is met, the edge opens an incident record and dispatches notifications to the configured contacts.

Opening Alert Policies

Navigate to Alerts in the left sidebar, then click the Alert Policies tab. A list of all your policies is shown with their current enabled/disabled state.

Creating an Alert Policy

Click "Add Alert Policy"

The button is in the top-right of the Alert Policies tab. A form opens.

Name the policy

Enter a descriptive name so you can identify it at a glance (e.g., "High CPU on production servers" or "Payment API down").

Choose the metric

Select what to watch. The list changes based on whether you choose a server metric or a monitor metric (see Metric Reference below).

Set the condition and threshold

Choose a comparison operator and enter a numeric threshold value. Not required for no_data and monitor_status metrics.

Set timing options

Configure Alert after (how long the condition must persist before triggering) and Repeat alert after (cooldown between repeated notifications).

Choose scope

Apply the policy to all servers/monitors or to a specific one using the dropdown.

Select contacts

Choose one or more contacts to notify. You must have at least one contact configured — see Configure Contacts.

Save

Click Save Policy. The policy is active immediately.

Metric Reference

Server Metrics

Metric Key	Description	Unit	Typical Threshold
`cpu_percent`	Overall CPU usage	%	80%
`cpu_steal`	CPU steal time (VMs only)	%	10%
`memory_percent`	RAM usage	%	90%
`disk_percent`	Highest disk partition usage	%	90%
`load_per_core`	5-minute load average ÷ CPU cores	ratio	2.0
`uptime_seconds`	System uptime (used to detect reboots)	seconds	< 300 (consider rebooted)
`no_data`	No data received from agent (no threshold needed)	—	—

Monitor Metrics

Metric Key	Description
`monitor_status`	Monitor check failed (no threshold needed — triggers on any failure)

Condition Operators

Operator	Symbol	Meaning
`gt`	>	Metric is greater than the threshold
`gte`	≥	Metric is greater than or equal to the threshold
`lt`	<	Metric is less than the threshold
`lte`	≤	Metric is less than or equal to the threshold
`eq`	=	Metric equals the threshold exactly

Use >= for percentage thresholds For CPU, memory, and disk alerts, use ≥ (gte) rather than > (gt). This ensures the alert fires when the metric hits exactly the threshold value, not just when it exceeds it.

Timing Options

Alert after (duration)

The number of minutes a threshold must be continuously exceeded before Pinguzo opens an incident. This prevents false alarms from brief spikes.

0 minutes: Alert fires immediately on first threshold breach (good for monitors)
5–15 minutes: Typical for CPU/memory alerts to avoid alerting on momentary spikes
Maximum: 1,440 minutes (24 hours)

Repeat alert after (cooldown)

How many minutes to wait before sending another notification for the same ongoing incident. Prevents notification fatigue for long outages.

0 minutes: Never repeat (only notify once at incident start)
60 minutes: Send a reminder every hour while the incident is open (recommended)
Maximum: 1,440 minutes (24 hours)

Scope: Targeting Specific Resources

The Apply to field controls which resources the policy covers:

Option	Behavior
All servers	The policy applies to every server in your account. New servers added in the future are also covered automatically.
Specific server	The policy only applies to the selected server. Use this for server-specific thresholds (e.g., the database server needs a stricter disk alert).
All monitors	Applies to every monitor. New monitors are automatically covered.

Quick-Start Templates

When you click Add Alert Policy, Pinguzo offers quick-start templates to get common policies set up in one click:

🔥

High CPU Usage

Triggers when CPU ≥ 80% for 10 minutes on any server.

💾

High Memory Usage

Triggers when memory ≥ 90% for 10 minutes on any server.

💿

High Disk Usage

Triggers when disk ≥ 90% for 15 minutes on any server.

⚖️

High Load Average

Triggers when load-per-core ≥ 2.0 for 10 minutes on any server.

🔄

Server Rebooted

Triggers when uptime drops below 5 minutes (300 seconds) — detects unexpected reboots.

📡

No Data Received

Triggers when no agent data is received for 15 minutes — detects agent crashes or network issues.

🔴

Monitor Down

Triggers immediately when any monitor check fails.

☁️

High CPU Steal

Triggers when CPU steal ≥ 10% for 5 minutes — signals hypervisor oversubscription.

Managing Alert Policies

Enable / Disable

Use the toggle switch on each policy row to temporarily disable a policy without deleting it. Disabled policies do not trigger incidents or send notifications. This is useful during planned maintenance windows.

Edit

Click the Edit (pencil) icon to update any field. Changes take effect immediately.

Delete

Click the Delete (trash) icon and confirm. Deleting a policy does not close any currently open incidents associated with it — those incidents will resolve naturally when the condition clears.

How Alert Triggering Works

The edge server evaluates alert policies on each metric data point
If the condition is met, a timer starts (Alert after countdown)
If the condition remains met until the timer expires, an incident is opened
Notifications are dispatched to all selected contacts
The cooldown timer starts — no further notifications until it expires
If the metric returns to normal before the timer expires, the timer resets (no incident is opened)
When the incident condition clears, the incident is resolved and recovery notifications are sent

At least one contact is required A policy with no contacts selected will still open incidents, but no one will be notified. Always assign at least one contact. See Configure Contacts.

Next Steps

Configure Contacts — add email, Slack, Discord, Telegram, or webhook channels
View Incidents — see all incidents opened by your alert policies