Troubleshooting Services Using System Health

The following procedures may be used to troubleshoot services using the System Health tab in the Cisco APIC-EM GUI:

About Cisco APIC-EM Services

The Cisco APIC-EM creates a Platform as a Service (PaaS) environment for your network, using Grapevine as an Elastic Services platform to support the controller's infrastructure and services. A service in this PaaS environment is a horizontally scalable application that adds instances of itself when demand increases, and frees instances of itself when demand decreases.

The Cisco APIC-EM controls elasticity at the service level, rather than at the Grapevine client level.

Service Managers and Monitors

The Cisco APIC-EM services that run on the Grapevine Elastic Services Platform provide the controller with its functionality. The Grapevine Elastic Services Platform consists the following components:

  • Grapevine root—Handles all policy management in regards to service updates, as well as the service lifecycle for both itself and the Grapevine client.

  • Grapevine client—Location where the supported services run.

After installation, service functionality is enabled using the following managers and monitors:

  • Grapevine Root

    • Service manager—Starts, stops, and monitors service instances across the Grapevine clients.

    • Capacity manager—Provides on-demand capacity to run the services.

    • Load monitor—Monitors the load and health of services across the Grapevine clients.

    • Service catalog—Repository of service bundles that can be deployed on the Grapevine clients.

  • Grapevine Client

    • Service manager—Starts, stops, and monitors service instances on the Grapevine client.

    • Service instance manager—Deploys the service.

Service Features

The Cisco APIC-EM provides the following service features:

  • Adding capacity on an existing client—When a service load exceeds a specified threshold on a client, the controller can request another service instance to start on a second, preexisting client.
  • Adding capacity on a newly instantiated client—When a service load exceeds a specified threshold on a client, the controller can request a new client to be instantiated and then start another service instance on this client.
  • Allows automatic scaling of services—As the service load increases, the controller instantiates additional service instances in response. As the service load decreases, the controller tears down the number of instances in response.

  • Resiliency for services—When a service fails, the controller starts a replacement instance. The controller then ensures that the service’s minimum instance count requirements are maintained.

Services

The following is a list of default Cisco APIC-EM services for the Cisco APIC-EM Release 1.5.x.

  • access-policy-programmer-service

  • apic-em-event-service

  • apic-em-inventory-manager-service

  • apic-em-jboss-ejbca

  • apic-em-network-programmer-service

  • apic-em-pki-broker-service

  • cas-service

  • cassandra

  • election-service

  • file-service

  • grapevine

  • grapevine-coordinator-service

  • grapevine-log-collector

  • grouping-service

  • identity-manager-pxgrid-service

  • nbar-policy-programmer-service

  • network-poller-service

  • node-ui

  • pnp-service

  • policy-analysis-service

  • policy-manager-service

  • postgres

  • qos-lan-policy-programmer-service

  • qos-monitoring-service

  • qos-policy-programmer-service

  • rabbitmq

  • rbac-service

  • reverse-proxy

  • router

  • scheduler-service

  • task-service

  • telemetry-service

  • topology-service


Note


The Cisco APIC-EM services running on your controller is dependent upon the applications installed and enabled on the host.


Reviewing the Service Version and Status Using the SYSTEM HEALTH Tab

You are able to perform the following tasks using the SYSTEM HEALTH tab in the Home page of the Cisco APIC-EM GUI:

  • Review the status of each service

  • Review the number of instances of each service running

  • Review the version of each service

  • Review the IP address of the host where the service is running

Before You Begin

You must have successfully installed the Cisco APIC-EM and it must be operational.

You must have administrator (ROLE_ADMIN) permissions and either access to all resources (RBAC scope set to ALL) or an RBAC scope that contains all of the resources that you want to group. For example, to create a group containing a specific set of resources, you must have access to those resources (custom RBAC scope set to all of the resources that you want to group).

For information about the user permissions required to perform tasks using the Cisco APIC-EM, see the chapter, Managing Users and Roles in the Cisco Application Policy Infrastructure Controller Enterprise Module Administrator Guide.


    Step 1   Log into the controller to view the controller's GUI.
    Step 2   Click the SYSTEM HEALTH tab in the Home page to view information about the controller's health.

    The following information is displayed in the SYSTEM HEALTH tab.

    System (Host) Health Data

    Data displayed include:

    • Host IP address

    • CPU—Host CPU usage is displayed in MHZ. Both the currently used and available host CPU is displayed.

    • Memory—Host memory usage is displayed in GB. Both the currently used and available host memory is displayed.

    • Storage—Host storage usage is displayed in GB. Both the currently used and available host storage is displayed.

    Color indicates status for the above host data:

    • Green—Indicates proper usage and support.

    • Yellow—Indicates usage is approaching improper levels and triggers this warning (color change).

    • Red—Indicates a failure based upon the usage exceeding the maximum supported value.

    Additionally, a graphical representation of the above data over the last 24 hours is displayed in this tab. Moving your cursor or mousing over the graph displays a data summation for specific date and time.

    Note   

    By placing your cursor over (mouseover) a color warning in the window, further information about the warning or failure message appears.

    Application Health Data

    Displays applications available from the Navigation pane, and the services that support each application. For example, the Topology application accessible in the GUI is supported by topology-service.

    Color bars indicate the status for the applications and the supporting service(s):

    • Green —Indicates that an application instance is starting. An application instance is the aggregation of the service instances. You can configure a minimum or maximum number of service instances, as well as grow and harvest these service instances (spin up or spin down the services).

    • Yellow—Indicates application instance and its supporting service instance(s) are experiencing issues and triggers this warning (color change).

    • Red—Indicates a failure of the application instance and its supporting service instance(s). You can harvest a service instance and then regrow it using the GUI. If the service instance does not regrow using the GUI, then you can manually regrow it. When you harvest a service instance, the controller will determine which instance is regrown (load balancing among them).

    • Blue—Indicates an in-progress state for the application or service instance (growing or harvesting).

    Step 3   Review the status and version of each service and application listed in the SYSTEM HEALTH tab.

    What to Do Next

    If there are any problems with any of the services or applications, then review the following procedures to troubleshoot a service.

    Removing a Service Instance Using the SYSTEM HEALTH Tab

    You are able to remove or harvest a service instance manually by using the SYSTEM HEALTH tab in the controller's GUI. You may wish to harvest a service instance and then regrow (recreate) it to correct for a faulty or unstable service.


    Caution


    Only advanced users should perform the tasks described in this procedure or attempt to troubleshoot the services.


    Before You Begin

    You must have successfully installed the Cisco APIC-EM and it must be operational.

    You must have administrator (ROLE_ADMIN) permissions and either access to all resources (RBAC scope set to ALL) or an RBAC scope that contains all of the resources that you want to group. For example, to create a group containing a specific set of resources, you must have access to those resources (custom RBAC scope set to all of the resources that you want to group).

    For information about the user permissions required to perform tasks using the Cisco APIC-EM, see the chapter, Managing Users and Roles in the Cisco Application Policy Infrastructure Controller Enterprise Module Administrator Guide.


      Step 1   Log into the controller to view the controller's GUI.
      Step 2   Click the SYSTEM HEALTH tab in the Home page to view information about the controller's health.

      For information about what is displayed in the SYSTEM HEALTH tab, see Reviewing the Service Version and Status Using the SYSTEM HEALTH Tab.

      Step 3   Review the list of operational services in the SYSTEM HEALTH tab.

      Each service is represented by a square. A green-tinged colored square represents an active instance of the service, and a red-tinged colored square represents a service with a faulty or failed instance. Squares without color represents inactive services (no instances initiated and running).

      Note   

      Placing your cursor over a square displays the version of the service, number of instances running, and host IP address where the service instance is running.

      Step 4   Locate the service where you want to manually remove (harvest) an instance of a service and click the subtraction sign (-) at the upper right.

      You are then prompted to confirm your action to remove a service instance.

      Step 5   Choose Yes in the dialog box to confirm that you want to remove an instance of the service.

      The instance of the service is then spun down.

      When the process is finished, the square representing the service instance is removed.


      What to Do Next

      Manage your services by either manually removing (harvesting) additional instances or growing (restoring) instances for the services.

      Creating a Service Instance Using the SYSTEM HEALTH Tab

      You are able to create or restore a service instance manually by using the SYSTEM HEALTH tab in the controller's GUI. You may wish to create or restore a service after previously harvesting or removing it for faulty or unstable behavior.


      Caution


      Only advanced users should perform the tasks described in this procedure or attempt to troubleshoot the services.


      Before You Begin

      You must have successfully installed the Cisco APIC-EM and it must be operational.

      You must have administrator (ROLE_ADMIN) permissions and either access to all resources (RBAC scope set to ALL) or an RBAC scope that contains all of the resources that you want to group. For example, to create a group containing a specific set of resources, you must have access to those resources (custom RBAC scope set to all of the resources that you want to group).

      For information about the user permissions required to perform tasks using the Cisco APIC-EM, see the chapter, Managing Users and Roles in the Cisco Application Policy Infrastructure Controller Enterprise Module Administrator Guide.


        Step 1   Log into the controller to view the controller's GUI.
        Step 2   Click the SYSTEM HEALTH tab in the Home page to view information about the controller's health.

        For information about what is displayed in the SYSTEM HEALTH tab, see .

        Step 3   Review the list of operational services in the SYSTEM HEALTH tab.

        Each service is represented by a square. A green-tinged colored square represents an active instance of the service, and a red-tinged colored square represents a service with a faulty or failed instance. Squares without color represents inactive services (no instances initiated and running).

        Note   

        Placing your cursor over a square displays the version of the service, number of instances running, and host IP address where the service instance is running.

        Step 4   Locate the service where you want to manually create or restore an instance of a service and click the addition sign (+) at the upper right.

        You are then prompted to confirm your action to create or restore an instance.

        Step 5   Choose Yes in the dialog box to confirm that you want to create or restore an instance of the service.

        The instance of the service is then spun up.

        When the process is finished, the square representing the service instance is created.


        What to Do Next

        Manage your services by manually growing (restoring) additional instances or removing (harvesting) instances from the services.