Site Isolation

Feature Summary and Revision History

Summary Data

Table 1. Summary Data

Applicable Product(s) or Functional Area

PCF

Applicable Platform(s)

SMI

Feature Default Setting

Enabled – Configuration required to disable

Related Documentation

Not Applicable

Revision History

Table 2. Revision History
Revision Details Release

Enhancement introduced.

Introduced instructions to configure the remote system ID in the secondary site while the primary site is undergoing a site isolation procedure.

2021.04.0

First introduced.

2020.02.0

Feature Description

Site isolation is segmenting your PCF environment to create silos of cluster or a standalone CDL instance in a Geographic Redundancy (GR) deployment. Each silo is self-sufficient with access to dedicated resources and network utilities. With this approach, you can upgrade or resolve network issues targeted towards the affected site without impacting any other site.

The site isolation strategy protects against data loss by replicating changes between the primary site and the secondary site. The secondary site takes over the primary site's traffic workload whenever the primary site is unavailable. After the maintenance activity is completed, you can bring up the primary site and reinstate it to the previous state to process the requests.

How it Works

This section describes how this feature works.

A site can be unavailable when it is undergoing maintenance level upgrade or experiencing a network issue. During this period, the site cannot manage the traffic that the client directs towards it. In such situations, you can isolate the site so that the traffic workload is switched from a primary site to a secondary site.

Configuring the PCF site isolation feature is a simplified process that involves issuing the commands from the PCF Ops Center console of the primary and secondary sites. The primary-secondary-primary switch includes the following:

  1. In the PCF Ops Center of the primary site, set the PCF registration status to UNDISCOVEREABLE. If the primary site is unavailable, the client automatically contacts the secondary site. Similarly, when the primary site comes online, the client attempts to connect to the primary site. No manual intervention is required to bring up the secondary site.

    The primary and secondary sites are always synchronized, so the data integrity is maintained.

    To determine whether all the traffic requests are switched successfully to the secondary site, review the traffic status on the Grafana dashboard. Also, verify that the primary site has not received any SBA inbound traffic.

  2. After the traffic is switched to a secondary site, you can bring down the primary site and take the required actions to upgrade or resolve the accessibility issues.


    Note


    If you intend to isolate the site without disrupting the GR replication system, do not shut down the primary site.


  3. In the primary site, ensure that only the Ops Center-specific pods are running in the PCF product namespace. The rest of the pods must be terminated.

  4. After the planned activities are completed on the primary site, and it is ready to be brought back to a consistent state, bring up the primary site.

  5. Ensure that the sessions on the primary site are synchronized with the recent updates on the secondary site. You can verify the CDL changes and compare the CLD local session count on both the sites

Prerequisites

This section describes the prerequisites that must be met to configure the site isolation feature.

Before bringing down a site, ensure that all the in-progress traffic requests are completed.

Configuring the Site Isolation Feature

You can configure the site isolation feature from the PCF Ops Center.

Configuring the site isolation feature involves the following steps:

  1. Configuring the PCF Registration Status

  2. Bringing Down the Primary Site

  3. Determining the Pod Status

  4. Bringing Up the Primary Site

  5. Verifying if the Sessions are Synchronized

  6. Verifying if the Primary Site is Up

Configuring the PCF Registration Status

This section describes how to configure PCF as undiscoverable.

To configure the PCF registration status to undiscoverable, use the following configuration from the PCF Ops Center of the primary site:

config 
  service-registration 
   profile 
    nf-status { REGISTERED | UNDISCOVERABLE } 
    commit 
    exit 

NOTES:

  • config —Enters the configuration mode.

  • service-registration —Enters the service registration configuration mode.

  • profile —Enters the profile configuration mode.

  • nf-status { REGISTERED | UNDISCOVERABLE } —Enters the profile configuration mode.

Bringing Down the Primary Site

This section describes how to configure to bring the primary site down and the remote site for generating notification when primary site is isolated.


Note


If you want to isolate the site without disrupting the GR replication system, do not bring down the primary site.


  1. Configure the primary site to bring the primary site down on the PCF Ops Center:

    The secondary site takes over the primary site’s traffic when the primary site is down or in the UNDISCOVERABLE state.

    config 
      system mode shutdown 
      commit 
      end 

    NOTES:

    • config —Enter the configuration mode.

    • system mode shutdown —Shut down the site.

  2. Configure the remote system ID on the PCF Ops Center:

    After primary site is unavailable, configure the remote-system-id in the secondary site using the siteID of the primary site.

    config 
      cdl 
        datastore session 
          slot notification remote-system-id [ siteID ] 
        exit 
         exit 

    NOTES:

    • config —Enter the configuration mode.

    • cdl —Enter the CDL configuration mode.

    • datastore session —Enter the datastore session configuration.

    • slot notification remote-system-id [ siteID ] —Specify the siteID for the primary site. The SiteID is associated with the cdl remote-site system-id configuration in the YANG model.

    Sample Configuration

    The following is a sample configuration for specifying the siteID.

    cdl datastore session
    slot notification remote-system-id [ 1 ]. <- 1 is the siteID of site1
    exit

For more information on CDL components, see Cisco Common Data Layer documentation.

Determining the Pod Status

This section describes how to verify that only the PCF Ops Center-specific pod is running on the secondary site.

To verify if the Ops Center-specific pod is running in the PCF product namespace, use the following:

Use the following command in the CEE Ops Center of the secondary site:

show cluster pods | tab | nomore | include ops-center 

Alternatively, on the master node, use the following command to display the pod status associated with a specific namespace.

kubectl get pods -n pcf_namespace 

Bringing Up the Primary Site

This section describes how to bring up the primary site.

  1. Configure the secondary site to remove siteID on the PCF Ops Center:

    Before bringing up the primary site, remove the primary site's siteID from the secondary site's “remote-system-id” list.

    no cdl datastore session slot notification remote-system-id 
    Sample Configuration
    no cdl datastore session slot notification remote-system-id
  2. Configure the primary site to bring up the primary site on the PCF Ops Center:

    config 
      system mode running 
      commit 
      exit 

    NOTES:

    • config —Enters the configuration mode.

    • system mode running—Configures the system mode as “running”.

Verifying if the Sessions are Synchronized

This section describes how to verify if the sessions are synchronized between the sites.

The site isolation implementation requires that sessions are synchronized between the primary-secondary-primary sites. After the sites are switched, you can validate that synchronization is successful by reviewing the slots' state and indexes in both the sites. If the state of the slots and indexes is ONLINE, the synchronization status is a success. Another approach is to ensure that the local session count on both the sites match. The local session counts are synchronized between the primary and secondary site when the sessions are replicated.

To display the CDL status in the secondary site, use the following commands on the PCF Ops Center:

  • To display the state of slots and indexes, run the following:

    cdl show status 
  • To display the local session count details, run the following:

    cdl show sessions count summary 

Note


  • Ensure the count mismatch between the sites are minimal, as the sessions count is updated with the live traffic.

  • Ensure each slot and index instances has non-zero records and status shows "ONLINE" in "cdl show status" output.

  • Alternatively, Grafana CDL dashboard can be used to view the total number of session counts, per slot and index records in respective panels.


Verifying if the Primary Site is Up

This section describes how to confirm if the primary site is brought up successfully.

To verify whether the primary site is up, review the deployment status and percentage usage using the following commands on the PCF Ops Center:

show system status deployed 
show system status percent-ready 

Example:

The following example displays the output of the show system status deployed and show system status percent-ready commands:

system status deployed true
system status percent-ready 100.0

NOTES:

  • The deployment status of the system must be true.

  • The percentage of the system must be 100.

  • When the primary site is available, ensure to change the nf-status from UNDISCOVERABLE to REGISTERED to enable PCF to serve the SBI traffic. For information on how to change the nf-status, see Configuring the PCF Registration Status.