THIS FIELD NOTICE IS PROVIDED ON AN "AS IS" BASIS AND DOES NOT IMPLY ANY KIND OF GUARANTEE OR WARRANTY, INCLUDING THE WARRANTY OF MERCHANTABILITY. YOUR USE OF THE INFORMATION ON THE FIELD NOTICE OR MATERIALS LINKED FROM THE FIELD NOTICE IS AT YOUR OWN RISK. CISCO RESERVES THE RIGHT TO CHANGE OR UPDATE THIS FIELD NOTICE AT ANY TIME.
Affected Product Name | Description | Comments |
---|---|---|
N9K-SUP-A+ | Supervisor for Nexus 9500 | |
N9K-SUP-A+= | Supervisor for Nexus 9500 | |
N9K-SUP-B+ | Supervisor B+ for Nexus 9500 | |
N9K-SUP-B+= | Supervisor B+ for Nexus 9500 |
Defect ID | Headline |
CSCwd65255 | Certain SUP-A+/SUP-B+ require a 2nd manual power cycle/physical reseat to boot after EPLD upgrade |
CSCwf44222 | Bundle new FPGA version to ACI switch code |
CSCwb86706 | ACI policy-based upgrade always upgrades spine SUP FPGA even the SUP has same or higher FPGA version |
After certain Cisco Nexus 9500 Series Switches have a policy-based upgrade of Cisco Application Centric Infrastructure (ACI), the supervisor of the modular spine does not boot up.
In certain cases, the supervisor can be recovered and will boot up normally after power-cycling the spine chassis or conducting an online insertion and removal (OIR) of the affected supervisor. In other scenarios, power-cycling or conducting an OIR will not recover the supervisor.
Cisco Nexus 9500 Series Switches with supervisors N9K-SUP-A+ (part number 73-18562-02) or N9K-SUP-B+ (part number 73-18570-02) may fail to boot due to one of two issues.
VRM Issue
A small batch of N9K-SUP-A+ and N9K-SUP-B+ supervisors have a voltage regulator module (VRM) that has a firmware defect. The VRM is responsible for powering the supervisor. Due to this firmware defect, the supervisor may get stuck after a Cisco ACI upgrade, which requires a power cycle of the supervisor. This issue will not occur with a software reload. The issue can be resolved by either power-cycling the switch chassis or conducting an OIR of the supervisor.
FPGA Downgrade Issue
CSCwb86706 and CSCwf44222 affect supervisors with part numbers 73-18562-02 and 73-18570-02. The supervisors are installed with field-programmable gate array (FPGA) version 0x18. However, releases of Cisco ACI that do not include the fix for CSCwf44222 are bundled with FPGA version 0x15.
Due to CSCwb86706, the supervisor FPGA is downgraded to version 0x15 when a policy-based upgrade or downgrade from a Cisco ACI 5.2 release is implemented. As a result, the supervisor will fail to boot and will not be recoverable. This issue only happens with a policy-based upgrade. The issue does not happen when the supervisor is manually upgraded or reloaded.
After upgrade, Nexus ACI 9500 supervisor N9K-SUP-A+ or N9K-SUP-B+ may not boot up. The spine becomes unreachable, and the console becomes unresponsive. From the ACI fabric, the spine becomes inactive.
If the supervisor has the VRM issue, as described in the Background section, the supervisor will boot up and operate normally after a power cycle of the chassis or an OIR of the supervisor.
If the supervisor has the FPGA downgrade issue, as described in the Background section, the supervisor will remain down after a power cycle or an OIR of the supervisor.
If the issue happened during a Cisco ACI upgrade and the supervisor failed to boot, try to power-cycle the chassis or conduct an OIR of the affected supervisor. The supervisor then may boot up normally.
If the supervisor still does not boot up after a power cycle or OIR, contact Cisco TAC to replace the affected supervisor.
Workaround
Before upgrading, use the ACI pre-upgrade validation script to proactively detect the risks of the upgrade. The script will check the part number and identify an affected supervisor. The script can be downloaded from GitHub at https://github.com/datacenter/ACI-Pre-Upgrade-Validation-Script/tree/master.
If the script detects an affected supervisor, mitigate the risk by using the instructions in the following solutions.
Solution for the VRM Issue
Cisco has released software that includes the fix for the VRM issue (CSCwd65255). If the spine is running on a Cisco ACI release with the fix, such as Release 15.2.8f, no action is required.
However, this issue will occur on a device that is upgraded from a vulnerable release to a fixed release. The supervisor will not boot up after the upgrade. In this case, a script is available to fix the issue. The script and its user manual can be downloaded from GitHub at https://github.com/cisco-aci/vrm_update. Cisco recommends running this script before upgrading the image.
Solution for the FPGA Downgrade Issue
Cisco recommends upgrading to Cisco ACI Release 5.2.8f or later or Release 6.0.3d or later. These releases include the fix for CSCwf44222 and the latest FPGA version 0x18. Upgrading to one of these releases will mitigate the potential risk of CSCwb86706.
Alternatively, avoid the FPGA downgrade by manually upgrading the spine supervisor. This approach is useful when a customer cannot upgrade to a fixed release.
As stated in preceding section, before the upgrade, Cisco recommends downloading the VRM script to check for and fix the VRM issue. The script and its user manual can be downloaded from GitHub at https://github.com/cisco-aci/vrm_update.
Once the VRM issue has been fixed, proceed with policy-based upgrade to Cisco ACI Release 5.2.8f or later or Release 6.0.3d or later, or use the following procedures to manually upgrade the supervisor.
To manually upgrade a single supervisor spine, do the following:
# touch /tmp/install_in_progress
# scp user@<tftp-user-ip>:/tftpboot/<image-name> /bootflash
# cd /bootflash
# md5sum <image-name>
# setup-bootvars.sh <image-name>
# setup-clean-config.sh
# sync
# reload
This command will reload the chassis, Proceed (y/n)? [n]: y
To manually upgrade a dual supervisor spine, do the following:
# show module | grep SUP
27 0 Supervisor Module N9K-SUP-A+ standby
28 0 Supervisor Module N9K-SUP-A+ active
# touch /tmp/install_in_progress
# scp user@<tftp-user-ip>:/tftpboot/<image-name> /bootflash
# cd /bootflash
# md5sum <image-name>
# setup-bootvars.sh <image-name>
# setup-clean-config.sh
# sync
User Access Verification
(none) login: admin
********************************************************************************
Fabric discovery in progress, show commands are not fully functional
Logout and Login after discovery to continue to use show commands.
Run show discoveryissues for more details.
********************************************************************************
# touch /tmp/install_in_progress
scp admin@127.1.1.28:/bootflash/<image-name> /bootflash/
# setup-bootvars.sh <image-name>
# setup-clean-config.sh
# sync
# reload
This command will reload the chassis, Proceed (y/n)? [n]: y
Use one of the following methods to identify the affected supervisor:
APIC#bash
admin@APIC: ~> icurl -k -g 'https:/127.0.0.1/api/node/class/eqptSpCmnBlk.json?query-target-filter=or(eq(eqptSpCmnBlk.prtNum,"73-18562-02"), eq(eqptSpCmnBlk.prtNum,"73-18570-02 "))'
spine # show sprom sup | grep "Part Number"
The affected part numbers are as follows:
Cisco provides the Serial Number Validation Tool to verify whether a device is impacted by this issue. To check the device, enter the serial number in the Serial Number Validation Tool.
Important: For security reasons, you must click the Serial Number Validation Tool link that is provided in this section. Do not copy and paste the link into a browser. Use of the Serial Number Validation Tool URL external to this field notice will fail.
Version | Description | Section | Date |
1.1 | Updated to hardware vs software FN, no changes to any content. | — | 2023-NOV-09 |
1.0 | Initial Release | — | 2023-NOV-02 |
For further assistance or for more information about this field notice, contact the Cisco Technical Assistance Center (TAC) using one of the following methods:
To receive email updates about Field Notices (reliability and safety issues), Security Advisories (network security issues), and end-of-life announcements for specific Cisco products, set up a profile in My Notifications