How to View Incidents

Review all downtime events, alert triggers, and outage durations across your servers and monitors.

What is an Incident?

An incident is a recorded period during which a server or monitor was in a failure state. Pinguzo opens an incident when:

An incident is automatically resolved when the condition clears — the check passes again, the server comes back online, or the metric drops below the threshold.

Cross-edge verification Before opening a monitor incident, the detecting edge server sends a spot-check request to one or more peer edge servers, which independently run the same check. An incident is only opened if the peer(s) also confirm the failure. This prevents false positives from edge-specific network issues.

Opening the Incidents Page

Click Incidents in the left sidebar. The page aggregates incident data from all of your edge servers and displays them in a unified timeline.

Filtering Incidents

Use the filter controls at the top of the page to narrow down the list:

Status Filter

OptionShows
AllEvery incident, open and resolved
OpenIncidents that are still ongoing (the condition has not cleared)
ResolvedIncidents that have ended

Type Filter

OptionShows
All TypesServer incidents and monitor incidents
ServersOnly incidents related to server metrics (offline, high CPU, etc.)
MonitorsOnly incidents related to uptime checks (HTTP errors, ping failures, etc.)

Incident Table Columns

ColumnDescription
Resource The server or monitor that triggered the incident. Click the name to open the metrics detail page for that resource.
Trigger The specific type of failure (see Trigger Types below).
Message A human-readable description of what went wrong (e.g., "HTTP 503 Service Unavailable", "CPU usage 94.2% for 12 minutes").
Went Down Timestamp when the incident started (shown in your local timezone).
Recovered Timestamp when the incident resolved. Shows "—" for open incidents.
Duration Total downtime duration. For open incidents, this updates live while you view the page.
Status Open or Resolved

Trigger Types

Monitor Triggers

TriggerMeaning
http_errorThe HTTP request returned a non-2xx status code (e.g., 404, 500, 503)
https_errorThe HTTPS request failed or returned a non-2xx status code
ssl_errorSSL certificate is invalid, expired, or the handshake failed
ping_failedICMP echo request received no reply — host is unreachable
port_unreachableTCP connection to the target port was refused or timed out
keyword_mismatchThe page loaded (HTTP 200) but the expected keyword was not found in the response body
dns_failedDNS resolution of the hostname failed

Server Triggers

TriggerMeaning
offlineThe Pinguzo Agent stopped reporting; the server may be down or unreachable
high_cpuCPU usage exceeded the configured threshold for the specified duration
high_memoryMemory usage exceeded the configured threshold for the specified duration
high_diskDisk usage exceeded the configured threshold for the specified duration
high_loadLoad-per-core exceeded the configured threshold for the specified duration
no_dataNo data received from the agent for longer than the configured no-data timeout

Pagination

The incidents table shows 20 incidents per page. Use the Previous / Next buttons at the bottom to navigate. The total number of incidents matching your current filters is displayed above the table.

How Incidents Relate to Alerts

An incident is a recorded event. An alert is a policy that decides when to create an incident and who to notify. You can have incidents without alert policies (they still appear in the incidents list), but you will only receive notifications if you have a matching alert policy with contacts configured.

See Configure Alerts to set up notification policies and Configure Contacts to add notification channels.

Incident Lifecycle

  1. Detection: An edge server detects a failure during a check cycle
  2. Verification: Peer edge(s) independently confirm the failure (monitor incidents only)
  3. Incident created: A new incident record is opened in the database
  4. Notifications sent: Matching alert policies trigger contact notifications (email, Slack, Discord, Telegram, webhook)
  5. Condition monitored: Subsequent checks continue; the incident remains open while the failure persists
  6. Recovery detected: The check passes (or metric drops below threshold, or agent reports again)
  7. Incident resolved: The incident is closed with a recovered_at timestamp
  8. Recovery notification sent: Contacts configured with recovery notifications are alerted

Next Steps