PCF Application-Based Alerts

Feature Summary and Revision History

Summary Data

Table 1. Summary Data

Applicable Products or Functional Area

PCF

Applicable Platform(s)

SMI

Feature Default Setting

Disabled – Configuration required to enable

Related Documentation

Not Applicable

Revision History

Table 2. Revision History
Revision Details Release
First introduced. 2020.01.0

Feature Description

When the system detects an anomaly, it generates an alert notification. The system statistics are the cause for these alert notifications. You can set an expression to trigger an alert when the expression becomes true.

How it Works

This section describes how this feature works.

The Common Execution Environment (CEE) uses the Prometheus Alert Manager for alerting operations. The CEE YANG model, accessible through CLI or API, allows you to view the active alerts, silenced alerts, and alert history. During the application installation or upgradation, the system adds a set of preset alerting rules. Also, the applications can call the alert API directly to add or clear alerts. The Prometheus Alert Manager API (v2) is the standard API used.

The Prometheus Alerts Manager includes the following options:

  • DefiningAlert Rules: This option defines the types of alerts that the Alert Manager should trigger. Use the Prometheus Query Language (PromQL) to define the alerts.

  • Defining Alert Routing: This option defines the action the Alert Manager should take after receiving the alerts. At present, the SNMP Trapper is supported as the outbound alerting. Also, the CEE provides an Alert Logger for storing the generated alerts.

Configuring Alert Rules

This section describes how to configure the alert rules.

To configure the alert rules, use the following configuration:

config 
   alerts rules group alert_group_name 
   rule rule_name 
      expression promql_expression 
      duration duration 
      severity severity_level 
      type alert-type 
      annotation annotation_name 
      value annotation_value 
      end 

NOTES:

  • alerts rules —Specify the Prometheus alerting rules.

  • group alert_group_name —Specify the Prometheus alerting rule group. One alert group can have multiple lists of rules. alert-group-name is the name of the alert group. alert_group_name must be a string in the range of 0–64 characters.

  • rule rule_name —Specify the alerting rule definition. rule_name is the name of the rule.

  • expression promql_expression —Specify the PromQL alerting rule expression. promql_expression is the alert rule query expressed in PromQL syntax. The promql_expression must be a string in the range of 0–64 characters.

  • duration duration —Specify the duration of a true condition before it is considered true. duration is the time interval before the alert is triggered.

  • severity severity_level —Specify the severity of the alert. severity-level is the severity level of the alert. The severity levels are critical, major, minor, and warning.

  • type alert_type —Specify the type of the alert. alert_type is the user-defined alert type. For example, Communications Alarm, Environmental Alarm, Equipment Alarm, Indeterminate Integrity Violation Alarm, Operational Violation Alarm, Physical Violation Alarm, Processing Error Alarm, Quality of Service Alarm, Security Service Alarm, Mechanism Violation Alarm, or Time Domain Violation Alarm.

  • annotation annotation_name —Specify the annotation to attach to the alerts. annotation_name is the name of the annotation.

  • value annotation_value —Specify the annotation value. annotation_value is the value of the annotation.

The following example configures an alert, which is triggered when the percentage of N7 responses is less than the specified threshold limit.

Example:

	configure terminal
  alerts rules group PCFN7chk_incr
  interval-seconds 300
  rule PCFN7chk_incr
  expression "sum(increase(inbound_request_total{interface_name=\"N7\", result_code=~\"2..\"}[3m])) / sum(increase(inbound_request_total{interface_name=\"N7\"}[3m]))< 0.95"
    severity major
    type "N7 Communications Alarm"
    annotation summary
    value "This alert is fired when the percentage of N7 responses is less than threshold“
    exit
  exit
exit 

Viewing Alert Logger

The Alert Logger stores the alerts that PCF generates by defualt. You can view these alerts using the following command:

show alert history [ filtering ]

You can narrow down the result using the following filtering options:

  • annotations—Specify the annotations of the alert.

  • endsAt—Specify the end time of the alert.

  • labels—Specify the additional labels of the alert.

  • severity—Specify the severity of the alert.

  • source—Specify the source of the alert.

  • startsAt—Specify the start time of the alert.

  • type—Specify the type of the alert.

You can view the active and silenced alerts with the show alerts active and show alerts active commands.

Example:

show running-config alerts
  interval-seconds 300
  rule PCFN7chk_incr
  expression "sum(increase(inbound_request_total{interface_name=\"N7\", result_code=~\"2..\"}[3m])) / sum(increase(inbound_request_total{interface_name=\"N7\"}[3m]))< 0.95"
    severity major
    type "N7 Communications Alarm"
    annotation summary
    value "This alert is fired when the percentage of N7 responses is less than threshold“
    exit 
  exit
exit

The following example displays the history of the alerts configured in the system:

Example:
show alerts history
alerts active PCFN7chk_incr ac2a970ab621
state active
severity major
type "N7 Communications Alarm"
startsAt 2019-11-15T08:26:48.283Z
source System
annotations [ "summary:This alert is fired when the percentage of N7 responses is less than threshold.” ]

The following example displays the active alerts. The alerts remain active as long as the evaluated expression is true.

Example:

show alerts active
alerts active PCFN7chk_incr ac2a970ab621
state active
severity major
type "N7 Communications Alarm"
startsAt 2019-11-15T08:26:48.283Z
source System
annotations [ "summary:This alert is fired when the percentage of N7 responses is less than threshold.” ]

Sample Alerts Configuration

This section provides sample configurations that are defined in PCF.

Interface-Specific Alerts

N7 Interface Inbound

Use the following commands to configure alerts related to an inbound N7 interface.

alerts rules group PCFSvcStatus 
   interval-seconds 300 
   rule PCFN7Inbound 
   expression sum(increase(inbound_request_total{interface_name=\”N7\”, result_code=~\”2..\”}[5m])) /sum(increase(inbound_request_total{interface_name =\”N7\”}[5m])) <0.90 
   severity major 
   type Communications Alarm 
   annotation summary 
   value This alert is fired when the percentage of Success N7 responses sent is lesser threshold. 
   exit 
exit 

N7 Interface Outbound

Use the following commands to configure alerts related to an outbound N7 interface.

alerts rules group PCFSvcStatus 
   interval-seconds 300 
   rule PCFN27outbound 
   expression sum(increase(outgoing_request_total{interface_name =\”N7\”,response_status=~\”2..\”}[5m])) /sum(increase(outgoing_request_total{interface_name =\”N7\”}[5m])) <0.90 
   severity major 
   type Communications Alarm 
   annotation summary 
   value This alert is fired when the percentage of Success N7 responses received is lesser threshold. 
   exit 
exit 

N28 Interface Inbound

Use the following commands to configure alerts related to an inbound N28 interface.

alerts rules group PCFSvcStatus 
   interval-seconds 300 
   rule PCFN28Inbound 
   expression sum(increase(inbound_request_total{interface_name=\”N28\”,response_status=~\”2..\”}[5m])) /sum(increase(inbound_request_total{interface_name =\”N28\”}[5m])) <0.90 
   severity major 
   type Communications Alarm 
   annotation summary 
   value This alert is fired when the percentage of Success N28 responses sent is lesser threshold. 
   exit 
exit 

N28 Interface Outbound

Use the following commands to configure alerts related to an outbound N28 interface.

alerts rules group PCFSvcStatus 
   interval-seconds 300 
   rule PCFN28outbound 
   expression sum(increase(outgoing_request_total{interface_name =\”N28\”,response_status=~\”2..\”}[5m])) /sum(increase(outgoing_request_total{interface_name =\”N28\”}[5m])) <0.90 
   severity major 
   type Communications Alarm 
   annotation summary 
   value This alert is fired when the percentage of Success N28 responses received is lesser threshold. 
   exit 
exit 

Diameter Rx Interface Inbound

Use the following commands to configure alerts related to an inbound Diameter Rx interface.

alerts rules group PCFSvcStatus 
   interval-seconds 300 
   rule PCFNRxInbound 
   expression sum(increase(diameter_responses_total{command_code=\”AAA|STA\”,response_status=~\”2001\”}[5m])) /sum(diameter_responses_total(outgoing_request_total{command_code=\”A AA|STA\”}[5m])) < 0.90 
   severity major 
   type Communications Alarm 
   annotation summary 
   value This alert is fired when the percentage of Success Rx responses Send is lesser threshold. 
   exit 
exit 

Diameter Rx Interface Outbound

Use the following commands to configure alerts related to an outbound Diameter Rx interface.

alerts rules group PCFSvcStatus 
   interval-seconds 300 
   rule PCFNRxOutbound 
   expression sum(increase(diameter_responses_total{command_code=\”RAA|ASA\”,response_status=~\”2001\”}[5m])) /sum(diameter_responses_total(outgoing_request_total{command_code=\”AAA|STA\”}[5m])) < 0.90 
   severity major 
   type Communications Alarm 
   annotation summary 
   value This alert is fired when the percentage of Success Rx responses received is lesser threshold. 
   exit 
exit 

Message-Level Alerts

N7 Create Request

Use the following commands to configure alerts related to N7 Create Request.

alerts rules group PCFSvcStatus 
   interval-seconds 300 
   rule N7CreateRequest 
   expression sum(increase(inbound_request_total{interface_name=\”N7\”, command=\“Create\”, result_code=~\”2..\”}[5m])) / sum(increase(inbound_request_total{interface_name =\”N7\”, command=\“Create\”}[5m])) < 0.90 
   severity major 
   type Communications Alarm 
   annotation summary 
   value This alert is fired when the percentage of Success N7 Create responses sent is lesser threshold. 
   exit 
exit 

N7 Update Request

Use the following commands to configure alerts related to N7 Update Request.

alerts rules group PCFSvcStatus 
   interval-seconds 300 
   rule N7UpdateRequest 
   expression sum(increase(inbound_request_total{interface_name=\”N7\”, command=\“Update\”, result_code=~\”2..\”}[5m])) /sum(increase(inbound_request_total{interface_name =\”N7\”, command=\“Update\”}[5m])) < 0.90 
   severity major 
   type Communications Alarm 
   annotation summary 
   value This alert is fired when the percentage of Success N7 Update responses sent is lesser threshold. 
   exit 
exit 

N7 Delete Request

Use the following commands to configure alerts related to N7 Delete Request.

alerts rules group PCFSvcStatus 
   interval-seconds 300 
   rule N7DeleteRequest 
   expression sum(increase(inbound_request_total{interface_name=\”N7\”,command=\“Delete\”, result_code=~\”2..\”}[5m])) /sum(increase(inbound_request_total{interface_name =\”N7\”,command=\“Delete\”}[5m])) < 0.90 
   severity major 
   type Communications Alarm 
   annotation summary 
   value This alert is fired when the percentage of Success N7 Delete responses sent is lesser threshold. 
   exit 
exit 

N7 Notify Request

Use the following commands to configure alerts related to N7 Notify Request.

alerts rules group PCFSvcStatus 
   interval-seconds 60 
   rule N7NotifyUpdate 
   expression sum(increase(outgoing_request_total{interface_name =\”N7\”,command=\“Notify\”, response_status=~\”2..\”}[5m])) /sum(increase(outgoing_request_total{interface_name =\”N7\”,command=\“Notify\”}[5m])) < 0.90 
   severity major 
   type Communications Alarm 
   annotation summary 
   value This alert is fired when the percentage of Success N7 Notify responses received is lesser threshold. 
   exit 
exit 

N28 Subscribe (Initial) Request

Use the following commands to configure alerts related to N28 Subscribe Request.

alerts rules group PCFSvcStatus 
   interval-seconds 60 
   rule N28Subscribe 
   expression sum(increase(outgoing_request_total{interface_name =\”N28\”,command=\“Subscribe\”, response_status=~\”2..\”}[5m])) /sum(increase(outgoing_request_total{interface_name =\”N28\”,command=\“Subscribe\”}[5m])) < 0.90 
   severity major 
   type Communications Alarm 
   annotation summary 
   value This alert is fired when the percentage of Success N28 Subscribe (Initial) responses received is lesser threshold. 
   exit 
exit 

N28 Subscribe (Update) Request

Use the following commands to configure alerts related to N28 Subscribe Update Request.

alerts rules group PCFSvcStatus 
   interval-seconds 60 
   rule N28SubscribeUpdate 
   expression sum(increase(outgoing_request_total{interface_name =\”N28\”, command=\“Subscribe_Update\”, response_status=~\”2..\”}[5m])) / sum(increase(outgoing_request_total{interface_name =\”N28\”, command=\“Subscribe_Update\”}[5m])) < 0.90 
   severity major 
   type Communications Alarm 
   annotation summary 
   value This alert is fired when the percentage of Success N28 Subscribe (Update) responses received is lesser threshold. 
   exit 
exit 

N28 Notify Request

Use the following commands to configure alerts related to N28 Notify Request.

alerts rules group PCFSvcStatus 
   interval-seconds 60 
   rule N28Notify 
   expression sum(increase(outgoing_request_total{interface_name =\”N28\”, command=\“Notify\”, response_status=~\”2..\”}[5m])) / sum(increase(outgoing_request_total{interface_name =\”N28\”, command=\“Notify\”}[5m])) < 0.90 
   severity major 
   type Communications Alarm 
   annotation summary 
   value This alert is fired when the percentage of Success N28 Notify responses send is lesser threshold. 
   exit 
exit 

Rx AAR Request

Use the following commands to configure alerts related to Rx AAR Request.

alerts rules group PCFSvcStatus 
   interval-seconds 300 
   rule RxAAR 
   expression sum(increase(diameter_responses_total{command_code=\”AAA\”, response_status=~\”2001\”}[5m])) / sum(diameter_responses_total(outgoing_request_total{command_code=\”AAA\”}[5m])) < 0.90” 
   severity major 
   type Communications Alarm 
   annotation summary 
   value This alert is fired when the percentage of Success Rx AAR responses send is lesser threshold. 
   exit 
exit 

Rx STR Request

Use the following commands to configure alerts related to Rx STR Request.

alerts rules group PCFSvcStatus 
   interval-seconds 300 
   rule RxSTR 
   expression sum(increase(diameter_responses_total{command_code=\”STA\”,response_status=~\”2001\”}[5m])) /sum(diameter_responses_total(outgoing_request_total{command_code=\”STA\”}[5m])) < 0.90 
   severity major 
   type Communications Alarm 
   annotation summary 
   value This alert is fired when the percentage of Success Rx STA responses send is lesser threshold. 
   exit 
exit 

Rx RAR Request

Use the following commands to configure alerts related to Rx RAR Request.

alerts rules group PCFSvcStatus 
   interval-seconds 300 
   rule RxSTR 
   expression sum(increase(diameter_responses_total{command_code=\”RAA\”, response_status=~\”2001\”}[5m])) /sum(diameter_responses_total(outgoing_request_total{command_code=\”RAA\”}[5m])) < 0.90 
   severity major 
   type Communications Alarm 
   annotation summary 
   value This alert is fired when the percentage of Success Rx RAR responses Received is lesser threshold. 
   exit 
exit 

Rx ASR Request

Use the following commands to configure alerts related to Rx ASR Request.

alerts rules group PCFSvcStatus 
   interval-seconds 300 
   rule RxASR 
   expression sum(increase(diameter_responses_total{command_code=\”ASA\”,response_status=~\”2001\”}[5m])) /sum(diameter_responses_total(outgoing_request_total{command_code=\”ASA\”}[5m])) < 0.90 
   severity major 
   type Communications Alarm 
   annotation summary 
   value This alert is fired when the percentage of Success Rx ASR responses send is lesser threshold. 
   exit 
exit 

Process-Level Alerts

CDL Endpoint Down

Use the following commands to configure alerts related to CDL endpoint down.

alerts rules group cdl-ep-change 
   rule pod-down 
   expression up{pod=~'cdl-ep.*'} == 0 
   duration 1m 
   severity major 
   type Equipment Alarm 
   annotation description 
   value CDL EP Pod Down 
   exit 
exit 

CDL Slot State Change

Use the following commands to configure alerts related to CDL slot state change.

alerts rules group cdl-slot-change 
   rule pod-down 
   expression up{pod=\"cdl-slot-session-c1-m1-0\"} == 0 
   severity major 
   type Equipment Alarm 
   annotation description 
   value CDL Pod Slot Change 
   exit 
exit 

Diameter Endpoint State Change

Use the following commands to configure alerts related to Diameter endpoint state change.

alerts rules group diamter-ep-change 
   rule pod-down 
   expression up{pod=~'diameter-ep.*'} == 0 
   duration 1m 
   severity major 
   type Equipment Alarm 
   annotation description 
   value Diameter EP Change 
   exit 
exit 

ETCD State Change

Use the following commands to configure alerts related to etcd state change.

alerts rules group ep-mapping-change 
   rule pod-down 
   expression up{pod=~'etcd-pcf.*'} == 0 
   duration 1m 
   severity major 
   type Equipment Alarm 
   annotation description 
   value EP Mapping Change 
   exit 
exit 

Grafana Dashboard State Change

Use the following commands to configure alerts related to Grafana dashboard state change.

alerts rules group grafana-dashboard-change 
   rule pod-down 
   expression up{pod=~'grafana-dashboard.*'} == 0 
   duration 1m 
   severity major 
   type Equipment Alarm 
   annotation description 
   value Grafana Dashboard Change 
   exit 
exit 

Kafka State Change

Use the following commands to configure alerts related to Kafka state change.

alerts rules group kafka-change 
   rule pod-down 
   expression up{pod=~'kafka.*'} == 0 
   duration 1m 
   severity major 
   type Equipment Alarm 
   annotation description 
   value Kafka Changed 
   exit 
exit 

LDAP Endpoint State Change

Use the following commands to configure alerts related to LDAP endpoint state change.

alerts rules group ldap-change 
   rule pod-down 
   expression up{pod=~'ldap-pcf.*'} == 0 
   duration 1m 
   severity major 
   type Equipment Alarm 
   annotation description 
   value LDAP Pod Changed 
   exit 
exit 

PCF Engine State Change

Use the following commands to configure alerts related to PCF Engine state change.

alerts rules group pcf-engine-change 
   rule pod-down 
   expression up{pod=~'pcf-engine-pcf.*'} == 0 
   duration 1m 
   severity major 
   type Equipment Alarm 
   annotation description 
   value PCF Engine Changed 
   exit 
exit 

REST Endpoint State Change

Use the following commands to configure alerts related to REST endpoint state change.

alerts rules group pcf-rest-ep-change 
   rule pod-down 
   expression up{pod=~'pcf-rest-ep.*'} == 0 
   duration 1m 
   severity major 
   type Equipment Alarm 
   annotation description 
   value PCF Rest EP Change 
   exit 
exit 

Call Flow Procedure Alerts

LDAP Query Request

Use the following commands to configure alerts related to LDAP Query Request.

alerts rules group PCFProcStatus 
   interval-seconds 300 
   rule LDAPQuery 
   expression sum(increase(message_total{type=~\”.*_ldap_query\”, status=\”success\”}[5m])) /sum(increase(message_total{type=~\”.*_ldap_query\”}[5m])) < 0.90 
   severity major 
   type Communications Alarm 
   annotation summary 
   value This alert is fired when the success percentage of ldap query request is lesser threshold. 
   exit 
exit 

LDAP Modify Request

Use the following commands to configure alerts related to LDAP Modify Request.

alerts rules group PCFProcStatus 
   interval-seconds 300 
   rule LDAPModify 
   expression sum(increase(message_total{component=\”ldap-ep\”, type=~\”.*_ldap_modify\”, status=\”success\”}[5m])) / sum(increase(message_total{component=\”ldap-ep\”, type=~\”.*_ldap_modify\”}[5m])) < 0.90 
   severity major 
   type Communications Alarm 
   annotation summary 
   value This alert is fired when the success percentage of ldap modify request is lesser threshold. 
   exit 
exit 

PLF Query Request

Use the following commands to configure alerts related to PLF Query Request.

alerts rules group PCFProcStatus 
   interval-seconds 300 
   rule PLFRequest 
   expression sum(increase(message_total{type=~\”ldap_search-res_success\”,status=\”success\”}[5m])) /sum(increase(message_total{type=~\”ldap_search-res_.*\”}[5m])) <0.90 
   severity major 
   type Communications Alarm 
   annotation summary 
   value This alert is fired when the success percentage of PLF request is lesser threshold. 
   exit 
exit 

NAP Notification Request

Use the following commands to configure alerts related to NAP Notification Request.

alerts rules group PCFProcStatus 
   interval-seconds 300 
   rule NAPNotification 
   expression sum(increase(message_total{type=~\”ldap_change-res_success\”, status=\”success\”}[5m])) /sum(increase(message_total{type=~\”ldap_change-res_.*\”}[5m])) <0.90 
   severity major 
   type Communications Alarm 
   annotation summary 
   value This alert is fired when the success percentage of NAP request is lesser threshold. 
   exit 
exit 

System Alerts

Disk Full Alert

Use the following commands to configure alerts related to disk full alert.

alerts rules group 
   rule node-disk-running-full 
   expression node_filesystem_usage > 0.0001 
   duration 5m 
   severity critical 
   type Processing Error Alarm 
   annotation disk_full 
   value test 
   exit 
exit 

VM Down Alert

Use the following commands to configure alerts related to virtual machine down alert.

alerts rules group vm-state-change 
   rule vm-down 
   expression up{pod=~\"node-expo.*\"} == 0 
   duration 1m 
   severity major 
   type Equipment Alarm 
   annotation summary 
   value VM Down 
   exit 
exit 

High Memory Usage

Use the following commands to configure alerts related to high memory usage.

alerts rules group memory-util-high 
   rule mem-util-high 
   expression avg(node_memory_MemAvailable_bytes /node_memory_MemTotal_bytes * 100) by (hostname) < 20 
   duration 1m 
   severity critical 
   type Processing Error Alarm 
   annotation mem_util_high 
   value Hig Memory Usage 
   exit 
exit 

High Disk Usage

Use the following commands to configure alerts related to high disk usage alert.

alerts rules group disk-util-high 
   duration 1m 
   rule disk-util-high 
   expression avg (node_filesystem_avail_bytes{mountpoint =\"/\"} /node_filesystem_size_bytes{mountpoint =\"/\"} *100) by (hostname) <20 
   severity critical 
   type Processing Error Alarm 
   annotation description 
   value Hig Memory Usage 
   exit 
exit 

High CPU Usage

Use the following commands to configure alerts related to high CPU usage alert.

alerts rules group cpu-util-high 
   rule cpu-util-idle 
   duration 1m 
   expression avg(rate(node_cpu_seconds_total{mode='idle'}[1m])) by (hostname) *100 < 50 
   severity critical 
   type Processing Error Alarm 
   annotation description 
   value Hig CPU 
   exit 
exit