Troubleshooting Cisco APIC-EM Multi-Host

The following information may be used to troubleshoot a Cisco APIC-EM multi-host configuration:

Troubleshooting Cisco APIC-EM Multi-Host

The following table describes recommended actions to take to resolve a Cisco APIC-EM multi-host issue.

Symptom

Possible Cause

Recommended Action

Controller in a multi-host configuration appears to be in an unstable state. For example, applications are not running, or applications are inaccessible, and/or not appearing in the GUI.

Controller in unstable state, possibly due to error(s) in entering configuration values with the Cisco APIC-EM configuration wizard.

Log into the host, check the configuration values, and reenter any configuration values that are incorrect.

References:

Controller was working fine for a multi-host configuration, but after a period of time one of the hosts becomes erratic and unstable.

Possible failed service or services in the multi-host cluster.

Remove and then reattach unstable host from the multi-host cluster.

References:

Controller was working fine for a multi-host configuration, but after a period of time one of the hosts fails.

Possible failed service or services in the multi-host cluster.

Remove and then reattach failed and inoperable host from the multi-host cluster.

References:

Host fails due to a power outage.

Power to the server or appliance was inadvertently shut off. When the power returned to the server or appliance, the controller failed to restart properly.

Reset the controller on the host that experienced the power outage back to its previous configuration.

References:

Confirming the Multi-Host Cluster Configuration Values

If you are experiencing issues with your multi-host cluster, then you can use the Cisco APIC-EM CLI to check the configuration values.

Before You Begin

You should have attempted to deploy the Cisco APIC-EM following the procedure described in the Cisco APIC-EM deployment guide.


    Step 1   Using a Secure Shell (SSH) client, log into the host (physical or virtual) with the IP address that you specified using the configuration wizard.
    Note   

    The IP address to enter for the SSH client is the IP address that you configured for the network adapter. This IP address connects the host to the external network.

    Step 2   When prompted, enter your Linux username ('grapevine') and password for SSH access.
    Step 3   Enter the following command to display the multi-host configuration.
    $ grape root display
    
    

    Command output similar to the following should appear.

    
    ROOT                                   PROPERTY             VALUE
    ----------------------------------------------------------------------
    
    4cbe3972-9872-4771-800d-08c89463f1eb   hostname             root-1
    4cbe3972-9872-4771-800d-08c89463f1eb   interfaces           [{'interface': 'eth0', 'ip': '209.165.200.10', 'mac': '00:50:56:100:d2:14', 'netmask': '255.255.255.0'}, {'interface': 'eth1', 'ip': '209.165.200.10', 'mac': '00:50:56:95:5c:18', 'net  mask': '255.255.255.0'}, {'interface': 'grape-br0', 'ip': '209.165.200.11', 'mac': 'ba:ed:c4:19:0d:77', 'netmask': '255.255.255.0'}]
    4cbe3972-9872-4771-800d-08c89463f1eb   is_alive             True
    4cbe3972-9872-4771-800d-08c89463f1eb   last_heartbeat       Wed Sep 09, 2015 11:02:52 PM (just now)
    
    4cbe3972-9872-4771-800d-08c89463f1eb   public_key           ssh-rsa                                                                                                             
    c2EAAAADAQABAAABAQDYlyCfidke3MTjGkzsTAu73MtG+lynFFvxWZ4xVIkDkhGC7KCs6XMhORMaABb6
    bU4EX/6osa4qyta4NYaijxjL6GL6kPkSBZiEKcUekHCmk1+H+Ypp5tc0wyvSpe5HtbLvPicLrXHHI/TS
    ...
    V44t+VvtFaLurG9+FW/ngZwGrR/grapevine@grapevine-root
    
    4cbe3972-9872-4771-800d-08c89463f1eb   root_id              4cbe3972-9872-4771-800d-08c89463f1eb
    4cbe3972-9872-4771-800d-08c89463f1eb   root_index           0
    4cbe3972-9872-4771-800d-08c89463f1eb   root_version         0.3.0.958.dev140-gda6a16
    4cbe3972-9872-4771-800d-08c89463f1eb   vm_password          ******
    (grapevine)
    
    #
    
    ROOT                                   PROPERTY             VALUE
    ----------------------------------------------------------------------
    
    4cbe3972-9872-4771-800d-08c89463f1eb   hostname             root-2
    4cbe3972-9872-4771-800d-08c89463f1eb   interfaces           [{'interface': 'eth0', 'ip': '209.165.200.101, 'mac': '00:50:56:100:d2:14', 'netmask': '255.255.255.0'}, {'interface': 'eth1', 'ip': '209.165.200.11', 'mac': '00:50:56:95:5c:18', 'net  mask': '255.255.255.0'}, {'interface': 'grape-br0', 'ip': '209.165.200.11', 'mac': 'ba:ed:c4:19:0d:77', 'netmask': '255.255.255.0'}]
    4cbe3972-9872-4771-800d-08c89463f1eb   is_alive             True
    4cbe3972-9872-4771-800d-08c89463f1eb   last_heartbeat       Wed Sep 09, 2015 11:02:52 PM (just now)
    
    4cbe3972-9872-4771-800d-08c89463f1eb   public_key           ssh-rsa                                                                                                             
    c2EAAAADAQABAAABAQDYlyCfidke3MTjGkzsTAu73MtG+lynFFvxWZ4xVIkDkhGC7KCs6XMhORMaABb6
    bU4EX/6osa4qyta4NYaijxjL6GL6kPkSBZiEKcUekHCmk1+H+Ypp5tc0wyvSpe5HtbLvPicLrXHHI/TS
    ...
    V44t+VvtFaLurG9+FW/ngZwGrR/grapevine@grapevine-root
    
    4cbe3972-9872-4771-800d-08c89463f1eb   root_id              4cbe3972-9873-4771-800d-08c89463f1eb
    4cbe3972-9872-4771-800d-08c89463f1eb   root_index           0
    4cbe3972-9872-4771-800d-08c89463f1eb   root_version         0.3.0.958.dev140-gda6a16
    4cbe3972-9872-4771-800d-08c89463f1eb   vm_password          ******
    (grapevine)
    

    The following data is displayed by this command:

    • hostname—The configured hostname.

    • interfaces—The configured interface values, including Ethernet port, IP address, and netmask.

    • is_alive—Status of the host. True indicates a running host, False indicates a host that has shut down.

    • last_heartbeat—Date and time of last heartbeat message sent from the host.

    • public_key—Public key used by host.

    • root_id—Individual root identification number.

    • root_index—Individual root index number.

    • root_version—Software version of root.

    • vm_password—VMware vSphere password that is masked.

    Step 4   If any of the fields in the command output appear incorrect, enter the root cause analysis (rca) command.
    $ rca
    
    
    The rca command runs a root cause analysis script that creates a tar file that contains the following data:
    • Log files

    • Configuration files

    • Command output

    Step 5   Send the tar file created by the rca command procedure to Cisco support for assistance in resolving your issue.

    For information about contacting Cisco support, see Contacting the Cisco Technical Assistance Center.


    Changing the Settings in a Multi-Host Cluster

    To troubleshoot an issue with a multi-host cluster, you may need to change its configuration settings. This procedure describes how to change the Cisco APIC-EM external network settings, NTP server address, and/or password for the Linux grapevine user in a multi-host cluster. The external network settings that can be changed include:

    • Host IP address

    • Virtual IP address

    • DNS server

    • Default gateway

    • Static routes


    Note


    In order to change the external network settings, NTP server address, and/or the Linux grapevine user password in a multi-host deployment, you need to first break up the multi-host cluster. As a result, controller downtime occurs. For this reason, we recommend that you perform this procedure during a maintenance time period. For information about changing settings for a single host configuration, see Updating the Configuration Using the Wizard
    Before You Begin

    You must have successfully configured the Cisco APIC-EM as a multi-host cluster using the configuration wizard, as described in the Cisco APIC-EM deployment guide.


      Step 1   Using a Secure Shell (SSH) client, log into one of the hosts in your cluster.

      Log in using the IP address that you specified using the configuration wizard.

      Note   

      The IP address to enter for the SSH client is the IP address that you configured for the network adapter. This IP address connects the appliance to the external network.

      Step 2   When prompted, enter your Linux username ('grapevine') and password for SSH access.
      Step 3   Enter the following command to access the configuration wizard.
      $ config_wizard
      
      
      Note   

      The config_wizard command is in the PATH of the 'grapevine' user, and not the "root" user. Either run the command as the "grapevine" user, or fully qualify the command as the "root" user. For example: /home/grapevine/bin/config_wizard

      Step 4   Review the Welcome to the APIC-EM Configuration Wizard! screen and choose the option to remove the host from the cluster:
      • Remove this host from its APIC-EM cluster

      Step 5   A message appears with the following options:
      • [cancel]—Exit the configuration wizard.

      • [proceed]—Begin the process to remove this host from its cluster.

      Choose proceed>> to begin. After choosing proceed>>, the configuration wizard begins to remove this host from its cluster.

      At the end of this process, this host is removed from the cluster.

      Step 6   Repeat the above steps (steps 1-5) on a second host in the cluster.
      Note   

      You must repeat the above steps on each host in your cluster, until you only have a single host remaining. You must make your configuration changes on this final remaining host.

      Step 7   Using a Secure Shell (SSH) client, log into that final host in your cluster and run the configuration wizard.
      $ config_wizard
      
      

      After logging into the host, begin the configuration process.

      Step 8   Make any necessary changes to the configuration values for the external network settings, NTP server address, and/or password for the Linux grapevine user using the wizard.

      After making your configuration change(s), continue through the configuration process to the final message.

      Step 9   At the end of the configuration process, a final message appears stating that the wizard is now ready to proceed with applying the configuration.

      The following options are available:

      • [back]—Review and verify your configuration settings.

      • [cancel]—Discard your configuration settings and exit the configuration wizard.

      • [save & exit]—Save your configuration settings and exit the configuration wizard.

      • [proceed]—Save your configuration settings and begin applying them.

      Enter proceed>> to complete the installation. After entering proceed>>, the configuration wizard applies the configuration values that you entered above.

      Note   

      At the end of the configuration process, a CONFIGURATION SUCCEEDED! message appears.

      Step 10   Log into the other hosts in your multi-host cluster and use the configuration wizard to recreate the cluster.

      Refer to Cisco Application Policy Infrastructure Controller Enterprise Module Deployment Guide for information about this specific procedure.


      Removing a Single Host from a Multi-Host Cluster

      To troubleshoot an issue with a multi-host cluster, you may need to remove a single host from a multi-host cluster. This procedure describes how to remove one of the hosts running Cisco APIC-EM from a multi-host cluster. You use the Cisco APIC-EM configuration wizard to perform this procedure.


      Note


      The configuration wizard option to remove a host only appears if the host on which you are running the configuration wizard is part of a multi-host cluster. If the host is not part of a multi-host cluster, then the option to remove a host does not display. When performing this procedure, controller downtime occurs. For this reason, we recommend that you perform this procedure during a maintenance time period.


      Before You Begin

      You should have deployed Cisco APIC-EM on a multi-host cluster as described in the Cisco APIC-EM deployment guide.

      You must perform this procedure on the single host that is to be removed from the multi-host cluster.


        Step 1   Using a Secure Shell (SSH) client, log into the host (appliance, server, or virtual machine) with the IP address that you specified using the configuration wizard.
        Note   

        The IP address to enter for the SSH client is the IP address that you configured for the network adapter. This IP address connects the appliance to the external network.

        Step 2   When prompted, enter your Linux username ('grapevine') and password for SSH access.
        Step 3   Enter the following command to access the configuration wizard.
        $ config_wizard
        
        
        Note   

        The config_wizard command is in the PATH of the 'grapevine' user, and not the "root" user. Either run the command as the "grapevine" user, or fully qualify the command as the "root" user. For example: /home/grapevine/bin/config_wizard

        Step 4   Review the Welcome to the APIC-EM Configuration Wizard! screen and choose the option to remove the host from the cluster:
        • Remove this host from its APIC-EM cluster

        Step 5   A message appears with the following options:
        • [cancel]—Exit the configuration wizard.

        • [proceed]—Begin the process to remove this host from its cluster.

        Choose proceed>> to begin. After choosing proceed>>, the configuration wizard begins to remove this host from its cluster.
        Step 6   At the end of this process, you must then either run the configuration wizard again to configure the host as a new Cisco APIC-EM or join the Cisco APIC-EM to a cluster.
        Important:

        If you wish to use this host again as either a stand-alone controller or operating within a cluster, then you must run the configuration wizard again and re-install the Cisco APIC-EM. Do not attempt to use this host again as either a standalone host or within a cluster without re-installing the Cisco APIC-EM.


        Removing a Faulted Host from a Multi-Host Cluster

        Perform the steps in the following procedure to remove a faulted or inoperative host (running Cisco APIC-EM) from a multi-host cluster. You use the Cisco APIC-EM configuration wizard to perform this procedure. A host becomes faulted when it can no longer participate in the cluster due to hardware or software issues.

        After following this procedure on a three host cluster (moving from three hosts to two hosts), you will lose high-availability protection against loss of a host. After following this procedure for a two host cluster, then the cluster will become inoperable until that second host is brought back up and added to the cluster.


        Note


        The fact that the host becomes "faulted" results in replacement instances of the services on the faulted host being grown on the remaining hosts in the cluster. During the time period when the replacement instances are being grown and depending on the types of services being grown, certain Cisco APIC-EM functionality may not be available.


        Before You Begin

        You have deployed Cisco APIC-EM on a multi-host cluster following the procedure described in the Cisco APIC-EM deployment guide.

        You must perform this procedure on an active host in the multi-host cluster. You cannot perform this procedure on the faulted host that is to be removed from the multi-host cluster. A faulted host is displayed as red in the System Health tab view in the Home page of the controller's GUI.


        Note


        You should always first attempt to bring the faulted host back online. After determining that the faulted host can no longer participate in the cluster, then try to remove the faulted host using the Remove this host from its APIC-EM cluster configuration wizard option (as described in the previous procedure). You should only follow this procedure and the Remove a faulted host from this APIC-EM cluster configuration wizard option, if that other option is tried first and is unsuccessful in removing the host.



          Step 1   Using a Secure Shell (SSH) client, log into the host (appliance, server, or virtual machine) with the IP address that you specified using the configuration wizard.
          Note   

          The IP address to enter for the SSH client is the IP address that you configured for the network adapter. This IP address connects the appliance to the external network.

          Step 2   When prompted, enter your Linux username ('grapevine') and password for SSH access.
          Step 3   Enter the following command to access the configuration wizard.
          $ config_wizard
          
          
          Note   

          The config_wizard command is in the PATH of the 'grapevine' user, and not the "root" user. Either run the command as the "grapevine" user, or fully qualify the command as the "root" user. For example: /home/grapevine/bin/config_wizard.

          Step 4   Review the Welcome to the APIC-EM Configuration Wizard! screen and choose the option to forcibly remove the faulted host from the cluster:
          • Remove a faulted host from this APIC-EM cluster

          Step 5   A message appears with the following options:
          • <Remove IP Address from cluster>—Forcibly removes the faulted host (identified by its IP address) from the multi-host cluster.

          • <exit>—Exit the configuration wizard without removing the faulted host.

          Choose <Remove IP Address from cluster> to begin. After choosing <Remove IP Address from cluster>, the configuration wizard begins to remove this faulted host from its cluster.
          Step 6   At the end of this process, you must then either run the configuration wizard again to configure the host as a new controller or join the controller to a cluster.
          Important:

          If you wish to use this host again as either a stand-alone controller or operating within a cluster, then you must run the configuration wizard again and re-install the Cisco APIC-EM. Do not attempt to use this host again as either a standalone host or within a cluster without re-installing the Cisco APIC-EM.