- Preface
- New and Changed Information
- Overview
- Troubleshooting Matrix
- Troubleshooting an Installation or Update
- Troubleshooting the Configuration
- Troubleshooting Cisco APIC-EM Single and Multi-Host
- Troubleshooting Services Using System Health
- Troubleshooting Services Using the Controller Admin Console
- Troubleshooting Using the Logs
- Troubleshooting Passwords
- Troubleshooting Commands
- Troubleshooting Log Files
- Contacting the Cisco Technical Assistance Center
- Index
- Recovery Procedures for Cisco APIC-EM Node Failures
- Removing a Single Host from a Multi-Host Cluster
- Removing a Faulted Host from a Multi-Host Cluster
- Resetting the Cisco APIC-EM
- Adding a New Host to a Multi-Host Cluster
- Shutting Down and Starting Up a Host in a Multi-Host Cluster
- Confirming the Multi-Host Cluster Configuration Values
- Changing the Settings in a Multi-Host Cluster
Troubleshooting
Cisco APIC-EM Single and Multi-Host
The following information may be used to troubleshoot Cisco APIC-EM single and multi-host:
- Recovery Procedures for Cisco APIC-EM Node Failures
- Removing a Single Host from a Multi-Host Cluster
- Removing a Faulted Host from a Multi-Host Cluster
- Resetting the Cisco APIC-EM
- Adding a New Host to a Multi-Host Cluster
- Shutting Down and Starting Up a Host in a Multi-Host Cluster
- Confirming the Multi-Host Cluster Configuration Values
- Changing the Settings in a Multi-Host Cluster
Recovery Procedures for Cisco APIC-EM Node Failures
The following table describes recommended procedures to take to resolve a Cisco APIC-EM single node failure scenario.
The following table describes recommended procedures to take to resolve a Cisco APIC-EM multi-host (node) failure scenario.
Node Failure Scenario |
Symptoms and Recovery Procedures |
||
---|---|---|---|
Power outage causing one or more of the cluster nodes to go down. |
In most cases, the host(s) should rejoin the Cisco APIC-EM cluster on its own when the power is restored. In rare situations, some of the Cisco APIC-EM services may not form the cluster with the existing Cisco APIC-EM hosts. In such cases, you would need to execute the following steps to ensure that the failed host joins the cluster:
|
||
Bad or faulty hardware on one of the cluster nodes. |
In this case, you would need to first remove the faulty (bad) host from the cluster and then add the new host to the cluster. Perform the following steps:
|
||
Network connectivity issues between the cluster nodes. |
In most cases, the node(s) should rejoin the Cisco APIC-EM cluster on its own when the network connectivity is restored. In rare situations, some of the Cisco APIC-EM services may not form the cluster with the existing Cisco APIC-EM nodes. In such cases, you would need to execute the following steps to ensure that the failed node joins the cluster: |
||
Controller software upgrade failure on one of the cluster hosts. |
In this case, to recover from the upgrade failure and return to the current Cisco APIC-EM version, perform the following steps:
|
||
Hardware upgrade on one of the cluster nodes. |
Gracefully, shut down the host, upgrade the hardware (RAM, CPU, etc.) and restart the host. See Shutting Down and Starting Up a Host in a Multi-Host Cluster. |
Removing a Single Host from a Multi-Host Cluster
To troubleshoot an issue with a multi-host cluster, you may need to remove a single host from a multi-host cluster. This procedure describes how to remove one of the hosts running Cisco APIC-EM from a multi-host cluster. You use the Cisco APIC-EM configuration wizard to perform this procedure.
![]() Note | The configuration wizard option to remove a host only appears if the host on which you are running the configuration wizard is part of a multi-host cluster. If the host is not part of a multi-host cluster, then the option to remove a host does not display. When performing this procedure, controller downtime occurs. For this reason, we recommend that you perform this procedure during a maintenance time period. |
You should have installed the Cisco APIC-EM on a multi-host cluster as described in the Cisco Application Policy Infrastructure Controller Enterprise Module Installation Guide.
You must perform this procedure on the single host that is to be removed from the multi-host cluster.
The multi-host cluster should still be operational.
Removing a Faulted Host from a Multi-Host Cluster
Perform the steps in the following procedure to remove a faulted or inoperative host (running Cisco APIC-EM) from a multi-host cluster. You use the Cisco APIC-EM configuration wizard to perform this procedure. A host becomes faulted when it can no longer participate in the cluster due to hardware or software issues.
After following this procedure on a three host cluster (moving from three hosts to two hosts), you will lose high-availability protection against loss of a host. After following this procedure for a two host cluster, then the cluster will become inoperable until that second host is brought back up and added to the cluster.
![]() Note | The fact that the host becomes "faulted" results in replacement instances of the services on the faulted host being grown on the remaining hosts in the cluster. During the time period when the replacement instances are being grown and depending on the types of services being grown, certain Cisco APIC-EM functionality may not be available. |
You have installed the Cisco APIC-EM on a multi-host cluster following the procedure described in the Cisco Application Policy Infrastructure Controller Enterprise Module Installation Guide.
You must perform this procedure on an active host in the multi-host cluster. You cannot perform this procedure on the faulted host that is to be removed from the multi-host cluster. A faulted host is displayed as red in the System Health tab view in the Home page of the controller's GUI.
![]() Note | You should always first attempt to bring the faulted host back online. After determining that the faulted host can no longer participate in the cluster, then try to remove the faulted host using the Remove this host from its APIC-EM cluster configuration wizard option (as described in the previous procedure). You should only follow this procedure and the Remove a faulted host from this APIC-EM cluster configuration wizard option, if that other option is tried first and is unsuccessful in removing the host. |
Step 1 | Using a Secure
Shell (SSH) client, log into the host (appliance, server, or virtual machine)
with the IP address that you specified using the configuration wizard.
| ||
Step 2 | When prompted, enter your Linux username ('grapevine') and password for SSH access. | ||
Step 3 | Enter the
following command to access the configuration wizard.
$ config_wizard
| ||
Step 4 | Review the Welcome to the APIC-EM Configuration Wizard! screen and choose the option to forcibly remove the faulted host from the cluster: | ||
Step 5 | A message
appears with the following options:
| ||
Step 6 | At the end of
this process, you must then either run the configuration wizard again to
configure the host as a new controller or join the controller to a cluster.
If you wish to use this host again as either a stand-alone controller or operating within a cluster, then you must run the configuration wizard again and re-install the Cisco APIC-EM. Do not attempt to use this host again as either a standalone host or within a cluster without re-installing the Cisco APIC-EM. |
Resetting the Cisco APIC-EM
You can troubleshoot a Cisco APIC-EM deployment by resetting the controller back to configuration values that were originally set using the configuration wizard the first time. A reset of the controller is helpful, when the controller has gotten itself into an unstable state and other troubleshooting activities have not resolved the situation.
![]() Note | In a multi-host environment, you need to perform this procedure on only a single host. After performing this procedure on a single host, the other two hosts will be automatically reset. |
You have installed the Cisco APIC-EM following the procedure described in the Cisco Application Policy Infrastructure Controller Enterprise Module Installation Guide.
Step 1 | Using a Secure
Shell (SSH) client, log into the host (physical or virtual) with the IP address
that you specified using the configuration wizard.
| ||||||
Step 2 | When prompted, enter your Linux username ('grapevine') and password for SSH access. | ||||||
Step 3 | Navigate to the bin directory on the Grapevine root. The bin directory contains the grapevine scripts. | ||||||
Step 4 | Enter the
reset_grapevine
command at the prompt to run the reset grapevine script.
$ reset_grapevine The reset_grapevine command returns the configuration settings back to values that you configured when running the configuration wizard for the first time. The configuration settings are saved to a .JSON file. This .JSON file is located at: /etc/grapevine/controller-config.json. The reset_grapevine command uses the data in the controller-config.json file to return to the earlier configuration settings, so do not delete this file. If you delete this file, you must run the configuration wizard again and reenter your configuration data. The reset_grapevine command will terminate if the SSH connection is disconnected for any reason. To avoid this, we recommend that you use tmux (terminal multiplexer) which is already installed on the controller to run the reset_grapevine command in the session. You can use the following commands for tmux:
After entering the reset_grapevine command, you are then prompted to reenter your Grapevine password. | ||||||
Step 5 | Enter your
Grapevine password a second time.
[sudo] password for grapevine:******** You are then prompted to delete all virtual disks The virtual disks are where the Cisco APIC-EM database resides. For example, data about devices that the controller discovered are saved on these virtual disks. If you enter yes (y), all of this data is deleted. If you enter no (n), then the new cluster will come up populated with your existing data once the reset procedure completes. | ||||||
Step 6 | Enter
n to prevent the deletion all of the virtual disks.
THIS IS A DESTRUCTIVE OPERATION Do you want to delete all VIRTUAL DISKS in your APIC-EM cluster? (y/n):n You are then prompted to delete all Cisco APIC-EM authentication timeout policies, user password policies, and user accounts other than the primary administrator account. | ||||||
Step 7 | Enter
n to prevent the deletion of all authentication
timeout policies, user password policies, and user accounts other than the
primary administrator account.
THIS IS A DESTRUCTIVE OPERATION Do you want to delete authentication timeout policies, user password policies, and Cisco APIC-EM user accounts other than the primary administrator account? (y/n): n You are then prompted to delete any imported certificates. | ||||||
Step 8 | Enter
n to prevent the deletion of any imported
certificates.
THIS IS A DESTRUCTIVE OPERATION Do you want to delete the imported certificates? (y/n): n You are then prompted to delete any backups. | ||||||
Step 9 | Enter
n to prevent the deletion of any backups.
THIS IS A DESTRUCTIVE OPERATION Do you want to delete the backups? (y/n): n The controller then resets itself with the configuration values that were originally set using the configuration wizard the first time. When the controller is finished resetting, you are presented with a command prompt from the controller. | ||||||
Step 10 | Using the Secure Shell (SSH) client, log out of the host. |
Adding a New Host to a Multi-Host Cluster
Perform the steps in this procedure to configure Cisco APIC-EM on your host and to join it to another, pre-existing host to create a cluster. Configuring the Cisco APIC-EM on multiple hosts to create a cluster is best practice for both high availability and scale.
![]() Caution |
|
You must have performed the following prerequisites:
-
You must have either received a Cisco APIC-EM Controller Appliance with the Cisco APIC-EM pre-installed or you must have downloaded, verified, and installed the Cisco ISO image onto a second server or virtual machine.
-
You must have already configured Cisco APIC-EM on the first host (server or virtual machine) in your planned multi-host cluster following the steps in the previous procedure.
-
Additionally, you must have checked the controller's health on the first host using the SYSTEM HEALTH tab in the GUI. The SYSTEM HEALTH tab is directly accessible from the HOME page. For information about this procedure, see the Cisco Application Policy Infrastructure Controller Enterprise Module Administrator Guide.
This procedure must be run on the second host that you are joining to the cluster. When joining the new host to the cluster, you must specify an existing host in the cluster to connect to.
![]() Note | The Cisco APIC-EM multi-host configuration supports the following two workflows:
|
Step 1 | Boot up the host. | ||||||||||
Step 2 | Review the
APIC-EM
License Agreement screen that appears and choose either
<view
license agreement> to review the license agreement or
accept>> to accept the license agreement and
proceed with the deployment.
After accepting the license agreement, you are then prompted to select a configuration option. | ||||||||||
Step 3 | Review the
Welcome
to the APIC-EM Configuration Wizard! screen and choose one of the
two displayed options to begin.
For the multi-host deployment, click the Add this host to an existing APIC-EM cluster option. | ||||||||||
Step 4 | Enter configuration values for the NETWORK ADAPTER #1 on the host.
The configuration wizard discovers and prompts you to confirm values for the network adapter or adapters on your host. For example, if your host has two network adapters you are prompted to confirm configuration values for network adapter #1 (eth0) and network adapter #2 (eth1).
On Cisco UCS servers, the NIC labeled with number 1 would be the physical NIC. The NIC labeled with the number 2 would be eth1.
Later in this procedure, the following information will be discovered and copied from the cluster to the configuration file of this host: Once satisfied with the controller network adapter settings, enter next>> to proceed. After entering next>>, the configuration wizard proceeds to validate the values you entered. After validation, you are then prompted to enter values for the APIC-EM CLUSTER SETTINGS. | ||||||||||
Step 5 | Enter
configuration values for the
APIC-EM CLUSTER SETTINGS.
After configuring the administrator cluster settings, enter next>> to proceed. After entering next>>, the configuration wizard then proceeds to prepare the host to join the cluster. You will receive a message to please wait, while the remote cluster is being queried and data is retrieved. | ||||||||||
Step 6 | Enter
configuration values for the
Virtual IP.
Once satisfied with the virtual IP address settings, enter next>> to proceed. After entering next>>, the configuration wizard proceeds to validate the values you entered. | ||||||||||
Step 7 | (Optional)
Enter additional configuration values for the
Virtual IP.
The configuration wizard proceeds to continue its discovery of any pre-existing configuration values on the hosts in the cluster. Depending upon what the configuration wizard discovers, you may be prompted to enter additional configuration values. For example:
Once satisfied with the virtual IP address settings, enter next>> to proceed. After entering next>>, the configuration wizard proceeds to validate the values you entered. | ||||||||||
Step 8 | A final
message appears stating that the wizard is now ready to proceed to join the
host to the cluster.
The following options are available:
Enter proceed>> to proceed. After entering proceed>>, the configuration wizard applies the configuration values that you entered above.
At the end of the configuration process, a successful configuration message appears. | ||||||||||
Step 9 | Open your
browser and enter an IP address to access the
Cisco APIC-EM
GUI.
You can use the first displayed IP address of the Cisco APIC-EM GUI at the end of the configuration process.
| ||||||||||
Step 10 | After
entering the IP address in the browser, a message stating that "Your connection
is not private" appears.
Ignore the message and click the Advanced link. | ||||||||||
Step 11 | After
clicking the
Advanced link, a message stating that the site’s
security certificate is not trusted appears.
Ignore the message and click the link.
| ||||||||||
Step 12 | In the Login window, enter the administrator username and password that you configured above and click the Log In button. |
What to Do Next
Proceed to follow the same procedure described here to join the third and final host to the multi-host cluster.
After configuring each host be sure to check the controller's health on the host using the SYSTEM HEALTH tab in the GUI. The SYSTEM HEALTH tab is directly accessible from the HOME page. For information about this procedure, see the Cisco Application Policy Infrastructure Controller Enterprise Module Administrator Guide.
![]() Note | You can send feedback about the Cisco APIC-EM by clicking the Feedback icon ("I wish this page would....") at the lower right of each window in the GUI. Clicking on this icon opens an email. Use this email to send a comment on the current window or to send a request to the Cisco APIC-EM development team. |
Shutting Down and Starting Up a Host in a Multi-Host Cluster
Perform the steps in this procedure to gracefully shutdown and restart a host in a multi-host cluster.
![]() Note | It is best practice to gracefully shutdown a host, before removing it from the multi-host cluster. |
You should have installed the Cisco APIC-EM on a multi-host cluster as described in the Cisco Application Policy Infrastructure Controller Enterprise Module Installation Guide.
You must perform this procedure on the single host that is to be removed from the multi-host cluster.
The multi-host cluster should still be operational.
Step 1 | Using a Secure
Shell (SSH) client, log into the host (appliance, server, or virtual machine)
with the IP address that you specified using the configuration wizard.
| ||
Step 2 | When prompted, enter your Linux username ('grapevine') and password for SSH access. | ||
Step 3 | Enter the
following command to redeploy services off of this host and onto the other
hosts in the multi-host cluster.
$ grape host evacuate | ||
Step 4 | Power off the host. | ||
Step 5 | Proceed to perform any troubleshooting or maintenance operations on the host that you powered off. | ||
Step 6 | Power on the host back on. | ||
Step 7 | If the hosts
comes up and no error message appears, then enter the following command on the
host to enable services on it.
$ grape host enable
| ||
Step 8 | If the hosts
comes up and no error message appears, then enter the following additional
command on the host to rebalance services on it and with other hosts in your
multi-host cluster.
$ grape instance rebalance
| ||
Step 9 | Log into one of the other operational hosts (working hosts) in the multi-host cluster. | ||
Step 10 | Enter the
following command on the selected operational host.
$ remove faulted node This command will remove the stale entries of the host that was shut down. | ||
Step 11 | Run the
configuration wizard on the selected operational node to trigger 'remove
fault-node node'.
$ config_wizard | ||
Step 12 | The operational host will then display another selection, 'Revert to single-node', | ||
Step 13 | Select the 'Revert to single-node' option and wait until operation completes. | ||
Step 14 | Proceed to join the host back to the existing two host cluster using the configuration wizard and as described in the procedure to add a host to a multi-host cluster. For information, see Adding a New Host to a Multi-Host Cluster. |
Confirming the Multi-Host Cluster Configuration Values
If you are experiencing issues with your multi-host cluster, then you can use the Cisco APIC-EM CLI to check the configuration values.
You should have installed the Cisco APIC-EM following the procedure described in the Cisco Application Policy Infrastructure Controller Enterprise Module Installation Guide.
Step 1 | Using a Secure
Shell (SSH) client, log into the host (physical or virtual) with the IP address
that you specified using the configuration wizard.
| ||
Step 2 | When prompted, enter your Linux username ('grapevine') and password for SSH access. | ||
Step 3 | Enter the
following command to display the multi-host configuration.
$ grape root display Command output similar to the following should appear. ROOT PROPERTY VALUE ---------------------------------------------------------------------- 4cbe3972-9872-4771-800d-08c89463f1eb hostname root-1 4cbe3972-9872-4771-800d-08c89463f1eb interfaces [{'interface': 'eth0', 'ip': '209.165.200.10', 'mac': '00:50:56:100:d2:14', 'netmask': '255.255.255.0'}, {'interface': 'eth1', 'ip': '209.165.200.10', 'mac': '00:50:56:95:5c:18', 'net mask': '255.255.255.0'}, {'interface': 'grape-br0', 'ip': '209.165.200.11', 'mac': 'ba:ed:c4:19:0d:77', 'netmask': '255.255.255.0'}] 4cbe3972-9872-4771-800d-08c89463f1eb is_alive True 4cbe3972-9872-4771-800d-08c89463f1eb last_heartbeat Wed Sep 09, 2015 11:02:52 PM (just now) 4cbe3972-9872-4771-800d-08c89463f1eb public_key ssh-rsa c2EAAAADAQABAAABAQDYlyCfidke3MTjGkzsTAu73MtG+lynFFvxWZ4xVIkDkhGC7KCs6XMhORMaABb6 bU4EX/6osa4qyta4NYaijxjL6GL6kPkSBZiEKcUekHCmk1+H+Ypp5tc0wyvSpe5HtbLvPicLrXHHI/TS ... V44t+VvtFaLurG9+FW/ngZwGrR/grapevine@grapevine-root 4cbe3972-9872-4771-800d-08c89463f1eb root_id 4cbe3972-9872-4771-800d-08c89463f1eb 4cbe3972-9872-4771-800d-08c89463f1eb root_index 0 4cbe3972-9872-4771-800d-08c89463f1eb root_version 0.3.0.958.dev140-gda6a16 4cbe3972-9872-4771-800d-08c89463f1eb vm_password ****** (grapevine) # ROOT PROPERTY VALUE ---------------------------------------------------------------------- 4cbe3972-9872-4771-800d-08c89463f1eb hostname root-2 4cbe3972-9872-4771-800d-08c89463f1eb interfaces [{'interface': 'eth0', 'ip': '209.165.200.101, 'mac': '00:50:56:100:d2:14', 'netmask': '255.255.255.0'}, {'interface': 'eth1', 'ip': '209.165.200.11', 'mac': '00:50:56:95:5c:18', 'net mask': '255.255.255.0'}, {'interface': 'grape-br0', 'ip': '209.165.200.11', 'mac': 'ba:ed:c4:19:0d:77', 'netmask': '255.255.255.0'}] 4cbe3972-9872-4771-800d-08c89463f1eb is_alive True 4cbe3972-9872-4771-800d-08c89463f1eb last_heartbeat Wed Sep 09, 2015 11:02:52 PM (just now) 4cbe3972-9872-4771-800d-08c89463f1eb public_key ssh-rsa c2EAAAADAQABAAABAQDYlyCfidke3MTjGkzsTAu73MtG+lynFFvxWZ4xVIkDkhGC7KCs6XMhORMaABb6 bU4EX/6osa4qyta4NYaijxjL6GL6kPkSBZiEKcUekHCmk1+H+Ypp5tc0wyvSpe5HtbLvPicLrXHHI/TS ... V44t+VvtFaLurG9+FW/ngZwGrR/grapevine@grapevine-root 4cbe3972-9872-4771-800d-08c89463f1eb root_id 4cbe3972-9873-4771-800d-08c89463f1eb 4cbe3972-9872-4771-800d-08c89463f1eb root_index 0 4cbe3972-9872-4771-800d-08c89463f1eb root_version 0.3.0.958.dev140-gda6a16 4cbe3972-9872-4771-800d-08c89463f1eb vm_password ****** (grapevine) The following data is displayed by this command:
| ||
Step 4 | If any of the
fields in the command output appear incorrect, enter the root cause analysis
(rca) command.
$ rca The rca command runs a root cause analysis script that creates a tar file that contains the following data:
| ||
Step 5 | Send the
tar file created by the
rca command procedure to Cisco support for
assistance in resolving your issue.
For information about contacting Cisco support, see Contacting the Cisco Technical Assistance Center. |
Changing the Settings in a Multi-Host Cluster
To troubleshoot an issue with a multi-host cluster, you may need to change its configuration settings. This procedure describes how to change the Cisco APIC-EM external network settings, NTP server address, and/or password for the Linux grapevine user in a multi-host cluster. The external network settings that can be changed include:
![]() Note | In order to change the external network settings, NTP server address, and/or the Linux grapevine user password in a multi-host deployment, you need to first break up the multi-host cluster. As a result, controller downtime occurs. For this reason, we recommend that you perform this procedure during a maintenance time period. For information about changing settings for a single host configuration, see Updating the Configuration Using the Wizard |
You must have successfully configured the Cisco APIC-EM as a multi-host cluster using the configuration wizard, as described in the Cisco Application Policy Infrastructure Controller Enterprise Module Installation Guide.
Step 1 | Using a Secure
Shell (SSH) client, log into one of the hosts in your cluster.
Log in using the IP address that you specified using the configuration wizard.
| ||
Step 2 | When prompted, enter your Linux username ('grapevine') and password for SSH access. | ||
Step 3 | Enter the
following command to access the configuration wizard.
$ config_wizard
| ||
Step 4 | Review the Welcome to the APIC-EM Configuration Wizard! screen and choose the option to remove the host from the cluster: | ||
Step 5 | A message
appears with the following options:
At the end of this process, this host is removed from the cluster. | ||
Step 6 | Repeat the above
steps (steps 1-5) on a second host in the cluster.
| ||
Step 7 | Using a Secure
Shell (SSH) client, log into that final host in your cluster and run the
configuration wizard.
$ config_wizard After logging into the host, begin the configuration process. | ||
Step 8 | Make any
necessary changes to the configuration values for the external network
settings, NTP server address, and/or password for the Linux grapevine user
using the wizard.
After making your configuration change(s), continue through the configuration process to the final message. | ||
Step 9 | At the end of
the configuration process, a final message appears stating that the wizard is
now ready to proceed with applying the configuration.
The following options are available:
Enter proceed>> to complete the installation. After entering proceed>>, the configuration wizard applies the configuration values that you entered above.
At the end of the configuration process, a CONFIGURATION SUCCEEDED! message appears. | ||
Step 10 | Log into the
other hosts in your multi-host cluster and use the configuration wizard to
recreate the cluster.
Refer to Cisco Application Policy Infrastructure Controller Enterprise Module Installation Guide for information about this specific procedure. |