Platform Automated Monitoring
Feature Name |
Release Information |
Feature Description |
---|---|---|
Support For Platform Automated Monitoring |
Cisco IOS XE Dublin 17.12.1y |
With this release, cBR-8 supports Platform Automated Monitoring (PAM), which is a system monitoring tool that is integrated with Cisco IOS XE Software image to monitor the following issues:
PAM is an IOSd-process running on the Supervisor Card (SUP) to periodically monitor the system’s crash. When an RP/FP/CC crashinfo or corefile is detected, the syslog displays on the active SUP’s IOSd console. The benefit of PAM is that you can use a script (for example, EEM) to monitor PAM and automatically submit a TAC case and share the core/crashinfo with TAC, when a crash event is detected. |
PAM Process
PAM is an IOSd-process running on the Supervisor Card (SUP) to periodically monitor the system’s crash. Use the show process | in PAM command to check if the PAM process is running:
router#show process | in PAM
314 Mwe 633F12E936BA 142300 563398 252 15808/24000 0 CBR PAM Process
The preceding output is a sample showing an example of the cbr-8 PAM process already running.
A hidden *.pam file file is created in the /harddisk/core/ path. This is an empty file which is used to record the last monitored timestamp of the PAM process. Only the corefile/crashinfo whose timestamp is newer than the *.pam file timestamp, is considered processed by PAM.
Use the following command to view the *.pam file file.
router#dir harddisk:core
Directory of harddisk:/core/
4751365 -rw- 1 Feb 20 2024 13:56:27 +08:00 .pam
PAM process handles two timers:
-
5-minute Periodical Timer: PAM initiates a 5 minute to check the new crashinfo/corefile on both active and standby SUP. The following messages are possible outputs which can be displayed on the SUP’s IOSd console:
-
Initial Message
%PAM-4-TEMP_CORE: PAM detects a new core file %s start to dump at %-27s. Need to wait for several minutes to get the full core file.
-
This is an example of a successful dump of a core file case:
%PAM-3-CRASH: PAM detects crash <crashinfo or corefile path>
-
This is an example of an incomplete dump of a core file case:
%PAM-3-CORE_UNCOMPLETE: PAM detects core file <uncomplete core file path> doesn't generate successfully.
-
-
Here is a sample output that is displayed on the console:
router#dir harddisk:core Directory of harddisk:/core/ 2981892 -rw- 11010048 Feb 20 2024 23:23:22 +08:00 router_SIP_1_vidman_7014_1704986383.core.gz.TEMP_IN_PROGRESS 3080199 -rw- 1 Feb 20 2024 23:19:51 +08:00 .pam 2981891 -rw- 7884800 Feb 20 2024 23:19:46 +08:00 router_SIP_1_vidman_7014_1704986383.core 2981889 -rw- 0 Feb 20 2024 23:19:46 +08:00 router_SIP_1_vidman%cc_1_0%0.TEMP_IN_PROGRESS Feb 20 23:19:51.179 CST: %PAM-4-TEMP_CORE: PAM detects a new core file harddisk:core/router_SIP_1_vidman_7014_1704986383.core start to dump at Feb 20 2024 23:19:47 +08:00. Need to wait for several minutes to get the full core file. Feb 20 23:29:51.397 CST: %PAM-3-CRASH: PAM detects crash for process vidman on fru CC slot 1, path: harddisk:core/router_SIP_1_vidman_7014_1704986383.core.gz router#
-
30 Minutes One Time Timer: This timer begins when the standby SUP initializes with a bootup image. If the boot fails and the timer expires, then the following error message about the standby SUP bootup failure displays:
%PAM-3-FAILURE: StandbySUP stucks at booting state for 30 minutes.
Location and Format of the Crashinfo Or Corefile
The following tables show the Location and Format of the Crashinfo Or Corefile with examples:
Type |
Location and Format With Example |
---|---|
cdman crashinfo |
harddisk:<hostname>_SIP_<slot>_cdman_crashinfo_xxx.log Example:
|
iosd-clc crashinfo |
harddisk:Slot-<slot>-0_crashinfo_SIP_<slot>_xxx.log Example:
|
sup-iosd crashinfo |
bootflash:<hostname>_crashinfo_RP_<slot>_xxx Example:
|
Type |
Location and Format With Example |
---|---|
Linecard process core |
harddisk:core/<hostname>_SIP_<slot>_<process_name>_<pid>_xxx.core.gz Examples:
|
RP process core |
harddisk:core/<hostname>_<process_name>_<pid>_xxx.core.gz Examples:
Sample Console Messages:
|
Incomplete core file |
harddisk:core/<hostname>_<process_name>_<pid>_xxx.core Examples:
Sample Console Messages:
|
Type |
Location and Format With Example |
---|---|
Kernel Core |
harddisk:core/kernel.CC_CYLONS_<slot>_<timestamp>.core.flat.gz harddisk:core/kernel.RP_CBR_<slot>_<timestamp>.core.flat.gz Examples:
Sample Console Messages:
|
StandbySUP Crash or Core |
Location and Format With Example |
||
---|---|---|---|
Kernel Core |
stby-harddisk:core/<hostname>_<process_name>_<pid>_xxx.core.gz Examples:
Sample Console Messages:
|
Limitations of PAM
-
If you configure the exception crashinfo file command, then this feature does not work.
Configuring the exception crashinfo file command allows you to define a custom prefix of the crashinfo file. PAM cannot detect such crashinfo since it cannot know which process/fru/slot crash happened.
-
If the standbySUP cannot bootup, PAM cannot cover the following cases:
-
StandbySUP is removed intentionally.
-
StandbySUP is inserted and under ROMMON state without bootup image. This may occur due to config-register configured as 0x0.
-
StandbySUP is inserted but stops responding and does not have a bootup image. This may occur due to a hardware issue.
-
In releases before Cisco IOS XE Dublin 17.12.1y, there is no support for a unified syslog, which covers all modules or processes crash. You must manually filter several syslogs to obtain the relevant log information and manually submit the log files to TAC.