Skip to content

Monitors and Alerts

Six default monitors and SLOs (Service Level Objectives) are automatically generated by OaC, across all three environments (PROD, TEST, DEV). You can create Custom Monitors via OaC and Test Monitor Alerts using the Datadog UI.

Default monitors and SLOs

The following table describes the default monitors and alerts. All alerts are configurable in Datadog and prefixed with [OAC].

Name Type Trigger Description
P90 Latency Default Monitor >10% of requests slower than observability.p90_latency_seconds Latency is the time it takes to service a request. The default latency monitor is named with the template [OAC] Monitor_<Environment>_<Workload>_<ApplicationName>_Latencyp90_Above_<Target>. It is based on the request/event count metric.
Error Rate Default Monitor >1% of requests give errors The default error rate monitor is with the template Monitor_<Environment>_<Workload>_<ApplicationName>_errorPercentage_Above_1. It is based on the error traces detected by Datadog.
CPU Usage Default Monitor >80% of CPU in use The default CPU usage monitor is named with the template Monitor_<Environment>_<Workload>_<ApplicationName>_cpuPercentage_Above_80.
Memory Usage Default Monitor >80% of memory in use The default memory usage monitor is named with the template Monitor_<Environment>_<Workload>_<ApplicationName>_memoryPercentage_Above_80.
Latency SLO Default SLO Less than observability.p90_latency_seconds over 90% of the time over the last 7 days The monitor is named with the template SLO_<Environment>_<Workload>_<ApplicationName>_Latency_Below_<Target>s_90%_7D. It is based on the duration trace.
Availability SLO Default SLO >Availability Class successful requests/events over the last 7 days The monitor is named with the template SLO_<Environment>_<Workload>_<ApplicationName>_Availability_Above_<Target>%_7D.

Update default trigger values

To change the values for any of the default SLOs and Monitors, update the availability and p90_latency_seconds values in the observability section of the manifest file. For example:

observability:
title: Reliability Bob
team: Reliability
availability: important  #critical: 99.9%, essential: 99.5%, important: 99.0%
type: netty
p90_latency_seconds: 1
operation: trace.netty.request