Health Monitoring in Cisco Prime Access Registrar

This appendix briefs about enhanced health monitoring in Cisco Prime Access Registrar (Prime Access Registrar) and lists the supported statistics.

Prime Access Registrar supports regular health monitoring for RADIUS server. A new parameter EnableHealthMonitoring is introduced to support enhanced health monitoring for RADIUS and Diameter.

You can monitor the health of Prime Access Registrar server using the following parameters:

  • CPU Utilization
  • Memory
  • Packet Buffer
  • Worker Threads count
  • Packet Rejects
  • Packet Drops
  • Packet Time Outs
  • Peer Connectivity

You have an option to set threshold limits against which the individual health check parameters are monitored. The threshold limits are entered in percentage unit. You can also set the monitoring frequency.

Table F-1 lists and describes the configuration details of health monitoring counters.

 

Table F-1 Health Monitoring Counters

Fields
Description

EnableHealthMonitoring

Set to TRUE to enable health monitoring for RADIUS/Diameter in Prime Access Registrar.

CPUUtilizationWarningThreshold

Warning threshold for CPU utilization. If the CPU utilization hits the warning threshold, the corresponding health is decremented and a warning trap is initiated.

CPUUtilizationErrorThreshold

Error threshold for CPU utilization. If the CPU utilization drops below the error threshold value, an error trap is initiated.

MemoryWarningThreshold

Warning threshold for memory utilization. If the memory utilization hits the warning threshold, the corresponding health is decremented and a warning trap is initiated.

MemoryErrorThreshold

Error threshold for memory utilization. If the memory utilization drops below the error threshold value, an error trap is initiated.

PacketsInUseWarningThreshold

Warning threshold for packet buffer. If the packet buffer hits the warning threshold, the corresponding health is decremented and a warning trap is initiated.

PacketsInUseErrorThreshold

Error threshold for packet buffer. If the packet buffer drops below the error threshold value, an error trap is initiated.

WorkerThreadsWarningThreshold

Warning threshold for worker threads. If the worker thread count hits the warning threshold, the corresponding health is decremented and a warning trap is initiated.

WorkerThreadsErrorThreshold

Error threshold for worker threads. If the worker thread count drops below the error threshold value, an error trap is initiated.

PacketRejectsWarningThreshold

Warning threshold for packet rejects. If the packet reject count hits the warning threshold, the corresponding health is decremented and a warning trap is initiated.

PacketRejectsErrorThreshold

Error threshold for packet rejects. If the packet reject count drops below the error threshold value, an error trap is initiated.

PacketTimedOutsWarningThreshold

Warning threshold for packet timeouts. If the packet timeout count hits the warning threshold, the corresponding health is decremented and a warning trap is initiated.

PacketTimedOutsErrorThreshold

Error threshold for packet timeouts. If the packet timeout count drops below the error threshold value, an error trap is initiated.

PacketDropsWarningThreshold

Warning threshold for packet drops. If the packet dropout count hits the warning threshold, the corresponding health is decremented and a warning trap is initiated.

PacketDropsErrorThreshold

Error threshold for packet drops. If the packet dropout count drops below the error threshold value, an error trap is initiated.

PeerConnectivityWarningThreshold

Warning threshold for peer connectivity. If the peer connectivity count hits the warning threshold, the corresponding health is decremented and a warning trap is initiated.

PeerConnectivityErrorThreshold

Error threshold for peer connectivity. If the peer connectivity count drops below the error threshold value, an error trap is initiated.

HealthMonitorFreqInsecs

The frequency, in seconds, to monitor the health parameters.

note.gif

Noteblank.gif All the above parameters are represented in percentage values from 0 - 100. You can choose to set up a value more than zero only for those parameters for which you wish to enable monitoring.


The following is a sample CLI that shows the health monitoring counters:

[ //localhost/Radius/Advanced/HealthMonitor ]
EnableHealthMonitoring = TRUE
CPUUtilizationWarningThreshold = 90
CPUUtilizationErrorThreshold = 0
MemoryWarningThreshold = 0
MemoryErrorThreshold = 0
PacketsInUseWarningThreshold = 0
PacketsInUseErrorThreshold = 0
WorkerThreadsWarningThreshold = 0
WorkerThreadsErrorThreshold = 0
PacketRejectsWarningThreshold = 0
PacketRejectsErrorThreshold = 0
PacketTimedOutsWarningThreshold = 0
PacketTimedOutsErrorThreshold = 0
PacketDropsWarningThreshold = 0
PacketDropsErrorThreshold = 0
HealthMonitorLogFreqInsecs = 0

 

The status of the health monitoring parameters are displayed as one of the following in the statistics:

  • GOOD—If the parameter is within the limits.
  • REDUCING—If the parameter is hitting the warning threshold value.
  • CRITICAL—If the parameter is dropping below the error threshold value.
  • UNMONITORED—If the parameter is unmonitored (no threshold values are set for the parameter).

You can use the health command in CLI to display the health statistics of all the parameters. You can use the status command to display the overall health status of Prime Access Registrar.

The following traps are triggered for each of the health monitoring parameters in Prime Access Registrar:

  • HealthMonitoringWarningTrap—Triggered when the parameter health hits the warning threshold limit.
  • HealthMonitoringErrorTrap—Triggered when the parameter health hits the error threshold limit.
  • HealthMonitoringResetTrap—Triggered to indicate that the parameter health has reached the configured error/warning threshold percentage limit and falls behind the error/warning threshold percentage limit. After this notification is sent, this type of notification will not be sent again until the parameter health on the server increases above the configured error/warning threshold percentage limit.

The following is an example of the health monitoring statistics:

--> health
 
Diameter Health Detailed Report:
 
CPU Utilization Health = GOOD
Memory Health = GOOD
Packet Buffer Health = GOOD
Worker Threads Health = GOOD
Packet Rejects = GOOD
Packet Drops = GOOD
Packets TimedOuts = GOOD
 
Radius Health Detailed Report:
 
CPU Utilization Health = GOOD
Memory Health = GOOD
Packet Buffer Health = GOOD
Worker Threads Health = GOOD
Packet Rejects = GOOD
Packet Drops = GOOD
Packets Timedouts = GOOD