Monitor Service Health

This section explains the following topics:

Start Service Health monitoring

Before you begin

The following procedure assumes that you have already provisioned L2VPN/L3VPN services. To create and provision services, refer to the Orchestrated Service Provisioning chapter in the Cisco Crosswork Network Controller Solution Workflow Guide.

To start health monitoring for a service:

Procedure


Step 1

From the main menu, choose Services & Traffic Engineering > VPN Services. The map opens on the left side of the page and the table opens on the right side.

Step 2

In the Actions column, click for the service you want to start monitoring.

Step 3

Click Start Monitoring.

Note

 
The Health column color coding indicates the health of the service:
  • Blue = Initiated

  • Green = Good

  • Orange = Degraded

  • Red = Down

  • Gray = Not Monitoring

  • Yellow = Paused

  • Red = Error

All services that are not being monitored currently have the Health column as gray.

Step 4

In the Monitor Service window that appears: .

  1. Select the Monitoring Level as Basic Monitoring or Advanced Monitoring.

  2. Click a Configuration Profile from the list of profiles that is displayed to select and apply it to monitor the service.

Step 5

Click Start Monitoring.

Note

 

Once you have started monitoring the health of the service, in the Actions column, if you click to view additional Service Health options, you will see: Stop Monitoring, Pause Monitoring, Edit Monitoring Settings, and Assurance Graph.


After you start monitoring the service, the Health column of the service gets updated to reflected the health of the service.

What to do next

If Service Health reports the health as Degraded for the service, identify the root cause for service degradation and take measures to correct the issue. See Analyze Degraded Services for more information.

Adjust Monitoring Settings

The following topics explain the various monitoring settings you can use to adjust the service health monitoring.

Edit Existing Monitoring Settings

You can adjust the monitoring settings any time after the service health monitoring is enabled. You can update the Monitoring Level for the service from Basic Monitoring to Advanced Monitoring, or from Advanced Monitoring to Basic Monitoring. You can also update the Configuration Profile (from Gold profile to Silver profile or from Silver profile to Gold profile).

To edit the existing monitoring settings:

Procedure


Step 1

From the main menu, choose Services & Traffic Engineering > VPN Services. The map opens on the left side of the page and the table opens on the right side.

Step 2

In the Actions column, click for the service for which you want to edit the monitoring settings.

Step 3

Choose Edit Monitoring Settings from the menu.

The Edit Monitoring Settings dialog box appears.

Step 4

Choose the Monitoring Level or the Configuration Profile, as required.

Note

 

When switching between Advanced and Basic Monitoring, it may take over 15 minutes to view sub service health and active symptoms.

Step 5

Click Save.

A confirmation dialog box appears.

Step 6

Click Start monitoring-type Monitoring.

Crosswork Service Health starts monitoring the service health using the updated values.


What to do next

If Service Health reports the health as Degraded for the service, identify the root cause for service degradation and take measures to correct the issue. See Analyze Degraded Services for more information.

Pause and Resume Service Health Monitoring

Using this option, you can temporarily pause monitoring the health of the services. This is useful in scenarios where a service is down due to a reported outage or scheduled maintenance and you do not want to continue to be notified about the degradation. If you pause and then resume monitoring a service, it resumes monitoring using the same Basic or Advanced monitoring rule and profile options that were used before the pause action. In addition, historical data and Events of Significance (EOS) are preserved in the history of the service. As no data is collected while monitoring is paused, the historical data will contain expected gaps when the monitoring is re-activated.

To pause and resume monitoring the health of the services, do the following:

Procedure


Step 1

From the main menu, choose Services & Traffic Engineering > VPN Services. The map opens on the left side of the page and the table opens on the right side.

Step 2

In the Actions column, click for the service that you want to pause the monitoring.

Step 3

Choose Pause Monitoring from the menu.

A confirmation dialog box appears. Click Pause Monitoring.

Note

 

When monitoring is paused, you can still view the Assurance Graph which will show only the top level service with state of paused icon badge and with no child subservices underneath.

Step 4

In the Actions column, if you now click for the service that you paused, you will see the Resume Monitoring option. Click this option to resume monitoring the service health.

A confirmation dialog box appears. Click Resume Monitoring.

Crosswork Service Health begins monitoring the health of the service using the same monitoring rule and profile options that were used before the Pause action.


Stop Service Health Monitoring

If you decide to stop monitoring a service, Service Health prompts you to confirm whether you want to retain the historical monitoring data or not. If you choose to retain the data, the historical data for the service will still be available if you start monitoring the service again. However, if you choose not to retain the historical data when you stop monitoring the service, all monitoring settings are deleted. The historical service data expires or purges if you choose to start monitoring the service later. In addition, the Assurance Graph for that stopped service will no longer be available. You may need to start monitoring the health of that service and begin service data collection anew.

If you stop monitoring a service and do not select the Retain historical Monitoring service for the data check box, the monitoring settings are deleted.

To stop monitoring the health of a service, do the following:

Procedure


Step 1

From the main menu, choose Services & Traffic Engineering > VPN Services. The map opens on the left side of the page and the table opens on the right side.

Step 2

In the Actions column, click for the service you want to stop monitoring.

Step 3

Choose Stop Monitoring from the menu.

Step 4

The Stop Monitoring dialog box appears. To retain the historical service data for that service, select the Retain historical Monitoring service for the data check box.

Step 5

Click Stop Monitoring.

Step 6

If you stopped monitoring a service and selected the Retain historical Monitoring service for the data check box, you can start monitoring that same service with historical data still available at a later time. From the Actions column of the service, click and select Start Monitoring.

Note

 
If External Storage is configured and you are monitoring a large number of services, you can ensure that the historical data of the stopped and restarted service is preserved for continued monitoring and inspection. For more information, see Configure Additional External Storage.

Enable SR PM Monitoring for TE Policies

To measure the performance metrics of VPN services associated with SR-MPLS or RSVP-TE Traffic Engineering policies, Service Health leverages the Segment Routing Performance Measurement (SR-PM) feature. This feature enables delay measurement and liveness detection on the underlay SR-TE policy to enforce Service Level Agreements in VPN services.

The SR-PM metric data is used for the policy subservice health computation and to determine the SLA for the service. You can view the KPI metrics, as well as the operational and administration status of the service, on the policy tab in the Service Details page. The combination of probe metrics and administration and operational status helps determine the health of the subservice for the VPN instance. Additionally, this data is used to provide historical data for the policy and to indicate the end-to-end delay experienced by the traffic sent over an SR-TE or RSVP-TE policy. This information is used to in determine whether the delay violated SLAs, which are defined in the Heuristic package.

Enable SR-PM Metrics Collection

To view the KPI metrics for the TE policies, you must first enable SR-PM metrics collection on the device for the TE policy and in Crosswork.

Procedure


Step 1

Configure SR-PM metrics collection in Crosswork.

  1. From the main menu choose Administration > Settings > Performace Monitoring and Analytics > Performance Metrics.

  2. In the Performance Metrics Settings pane on the right, use the toggle button to enable SR-PM collection for SR-Policies.

  3. Click Save.

Step 2

Configure SR-PM collection on the device.

  1. Navigate to the Traffic Engineering Topology map. From the main menu choose Services & Traffic Engineering > Traffic Engineering.

  2. Click the SR-MPLS or RSVP-TE tab as required.

  3. Locate the policy you are interested in from the policy table. Click and click View Details.

    The policy details page opens.

  4. In this page, click the Actions () button, and click Edit/ Delete.

    Clicking this option will take you to the Provision (NSO) page and open the Edit Policy page.

  5. Scroll down to the performance measurement section.

  6. Toggle the Enable performance-measurement button.

  7. Under Profile-type, click delay or liveness and click the toggle button in the profile to enable collection.

    Note

     

    You can configure Delay or Liveness (not both together) manually on the device. See the device platform documentation for information. For example: Segment Routing Configuration Guide for NCS 540 Series Routers.

    If you enable the Delay, Delay Variance metrics, and disable the Liveness metric (or vice versa), the updated metric polling will take effect only from the next cadence. This is expected behavior.


View SR-MPLS Policy Performance Metrics

This procedure lists the steps for viewing KPI metrics for a SR-MPLS policy. KPI metrics contain Delay, Delay Variance (Jitter) or Liveness (Boolean value) along with traffic utilization.

Before you begin

Ensure that devices and the policies have been added and device groups have been created.

Procedure


Step 1

Navigate to the Traffic Engineering topology map. From the main menu, choose Services & Traffic Engineering > Traffic Engineering.

Step 2

Click the policy tab that you are interested in.

For example, to view policy performance metrics for SR-MPLS policies, click the SR-MPLS tab. Hover your mouse over the graph icon to view the KPI metrics in a carousel view.

Figure 1. SR-MPLS Policy Performance Metrics in the Traffic Engineering Table

Step 3

Alternatively, from the the Actions column, click > View Details for one of the SR-MPLS policies. The Service Details page opens.

The KPI metrics for the policy are available in the Performance Metrics section.


View RSVP-TE Policy Performance Metrics

This procedure lists the steps for viewing KPI metrics for a RSVP-TE policy. KPI metrics include Delay and Delay Variance (Jitter) along with utilization.


Note


The metric Liveness is not supported for RSVP-TE policies.


Before you begin

Ensure that devices and the policies have been added and device groups have been created.

Procedure


Step 1

From the main menu, choose Services & Traffic Engineering > Traffic Engineering.

Step 2

From the Traffic Engineering window, select the RSVP-TE tab.

Hover your mouse over the graph icon to view the KPI metrics in a carousel view.

Figure 2. RSVP TE Tunnel Performance Metrics in the Traffic Engineering Table

Step 3

Alternatively, from the the Actions column, click > View Details for one of the RSVP-TE policies. The Service Details page opens.

The KPI metrics for the policy are available in the Performance Metrics section.


Monitor Service Health using Accedian Skylight

Crosswork Network Controller can leverage external probing, provided by Accedian Skylight, to measure metrics of the L3VPN services in the network. The metrics are compared with the contracted SLA (defined in the Heuristic package), and the results are made available on the UI for further analysis.

High-level Flow

  1. When you provision a L3VPN service with probe intent and service monitoring is enabled, Accedian Skylight learns the probe intent and probe topology from provisioned service.

    The following probe intents are supported:

    • Agent configurations: ne-id, VLAN, IP, sub-interface.

    • Topology: point-to-point, hub-spoke, full-mesh.

  2. Probe sessions with Accedian Skylight are set up automatically to monitor the service by invoking the relevant RESTConf APIs. The list of RESTConf APIs that are invoked to provision probes sessions are - endpoint, session, service, session activation. The maximum number of probe sessions per service are capped at 200 (for all connection types).

  3. Service Health monitors a new sub-service subservice.probe.session.health and collects the following metrics for the service in the probe session:

    • Forward and Reverse Delay.

    • Forward and Reverse Variance.

    • Forward and Reverse Packet Loss.

  4. The metrics collected during the probe sessions are analyzed and symptoms are raised accordingly, which are then displayed on the Crosswork Network Controller UI.

Add Accedian Skylight as a Provider

Before you begin

Ensure that you have taken care of the following prerequisites before onboarding Accedian Skylight as a provider:

  1. Installed the Accedian Skylight software. Refer to the Accedian Skylight documentation for information on installing Accedian Skylight and deploying it with Crosswork Network Controller.


    Note


    You need an account with Accedian Skylight to access the documentation. Sign up and create an account with the self sign-up tool.


  2. Have the following certificates from Accedian Skylight downloaded on your local system or on a folder that can be accessed by Crosswork Network Controller:

    • CA certificate

    • Client certificate

    • Client key

Procedure


Step 1

Create a credential profile.

  1. Navigate to Administration > Device Management > Credential Profiles and click + to create a new profile.

  2. Enter a name, add the following credential protocols: HTTPS and gNMI. Add the username and password for both connections.

  3. Click Save.

Step 2

Create a certificate profile.

  1. Navigate to Administration > Certificate Management and click +.

  2. Enter a name and select the Certificate Role as Accedian Provider Mutual Auth

  3. Upload the certificates (ca_cert.pem, client_cert.pem, and client_key.key).

  4. (Optional) Enter the passphrase for the certificate chain.

  5. Click Save.

Step 3

Add Accedian Skylight as a provider in Crosswork Network Controller.

  1. Navigate to Administration > Manage Provider Access.

  2. Click + and enter details in the fields as follows:

    • Provider Name: Enter a name.

    • Credential profile: Select the credential profile that you created for Accedian Skylight.

    • Family: Select ACCEDIAN_PROXY.

    • Certificate profile : Select the Accedian Skylight certificate profile that you created in Step 2.

      Note

       

      This field is displayed after you select the Family as ACCEDIAN_PROXY.

    • Connection types: Supported protocols are automatically updated from the Accedian credential profile.

    • IP addresses: Enter the IP address or the Fully Qualified Domain Name (FQDN).

    • Ports: Enter 443 for HTTPS and a port value for GNMI.

    • Encoding Type: Select PROTO.

      Note

       

      Only encoding of type PROTO is supported.

  3. Click Save.

Step 4

Confirm reachability

  1. From the main menu, choose Administration > Manage Provider Access.

  2. Confirm that the Accedian Skylight provider shows a green Reachability status without any errors.


View Accedian Skylight Probe Session Details

To view the metrics from an Accedian Skylight probe session in the Crosswork Network Controller UI for a L3VPN service in the UI:

Before you begin

Ensure that you have completed the following steps:

  1. Created and provisioned the required L3VPN services with the supported probe intent. For details, see the "Orchestrated Service Provisioning" chapter in the Cisco Crosswork Network Controller Solution Workflow Guide

  2. Enabled service monitoring for the required services. See Enable Service Health monitoring.

Procedure


Step 1

Go to Services & Traffic Engineering > VPN Services. The map opens on the left side of the screen and the table opens on the right side of the screen.

Step 2

Click Probe Sessions tab in the Service Details page.

Step 3

Click the graph icon next to a probe session for a detailed view of the performance metrics.

If a metric has crossed the defined threshold, a red icon is displayed in the corresponding performance metrics dashlet.

Step 4

To view the Performance Metrics for a service in a carousel view, click the icon in the Actions column.

The Probe Session Details window opens displaying the metrics in a carousel view.

The Historical Data tab provides probe metrics data from last 24-hours or from the service creation time (whichever is earlier) for all Probe metrics. See View Historical Data from Accedian Skylight Probe Sessions for more information.


What to do next

  • Analyze the health of a degraded service. See Analyze Degraded Services for more information.

  • If a L3VPN service has probe provisioning errors, use the Reactivate Probe button to start the probe session again for the service. The Probe Sessions page gets updated automatically to reflect the updated metrics if the probe session was reactivated successfully.

View Historical Data from Accedian Skylight Probe Sessions

The Historical Data tab provides probe metrics data from last 24-hours or from the time the service monitoring was enabled (whichever is earlier) for all Probe metrics.


Note


The database retains the data for up to 24 hours, and there may be some data retention for additional time before purging.


According to the current system behavior, when a monitoring session is deleted and re-added for a service, the Historical Data tab for that service will provide probe metrics data from the time monitoring was enabled or the last 24 hours (whichever is earlier) for all probe metrics. This is because the probe metric data is stored based on the devices and not based on the session or subservices.

Consider the scenario where monitoring for a service (consisting of up to 200 sessions) is enabled at 8:00 AM, and a monitoring session is removed at 9:00 AM, and the same session is re-added at 10:00 AM. At 9:00 AM the next day, you would expect to see historical data for this session from 10:00 AM the previous day. However, the actual behavior is that the historical data for the deleted and re-added session shows data from 9:00 AM the previous day.

Known Issues and Limitations with Accedian Skylight

The following is a list of known issues and limitations when Accedian Skylight is deployed for probing service health:

  1. When monitoring is enabled for a service with probe intent but the Accedian Skylight provider is not added in Crosswork Network Controller, an error about the provider not being available is displayed for each of the probe metrics associated with the subservice.

  2. You cannot delete the Accedian Skylight provider when a probe session is active.

  3. The Active Symptoms tab displays the observed value of the metric at the time the symptom occurred, while the Probe Sessions tab is constantly updated with the live values of the metrics. Therefore, check the Probe Sessions tab for the real-time values of the performance metrics.

  4. The Accedian Skylight provider is shown as reachable always in the Crosswork Network Controller Providers list page (Administration > Manage Provider Access) inspite of the following issues:

    • Connection issues between Accedian Skylight and Crosswork Network Controller.

    • Accedian Skylight provider credentials such as certificates, ports or IP addresses are wrong or invalid.

    In such cases, services related to the Accedian Skylight probes have the health as degraded state with the symptom as 'Accedian provider does not exist in DLM' . The symptoms are not cleared until you add the Accedian Skylight provider again, pause and resume the monitoring.