The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
These topics describe how to use the high availability scripts to perform a switchover between two active sites, perform a failover when a primary site fails, restore the configuration of a failed site, and other high availability operations.
Use the primeha -switch command to perform a scheduled move from a primary site to a remote site, when both sites are available. This is called a switchover. This is used for planned switches initiated by administrators.
The primeha -switch command will use the inputs you provided when you installed the gateway server high availability solution but will also give you an opportunity to modify those settings before performing the switchover. The switchover process consists of the following:
You can also use the switchover command to fallback to the primary site when a failed server is brought back online. The switchover will again reverse the replication directions. After performing a manual switchover, move any AVMs from unreachable units at the primary site to reachable units at the remote site.
Note primeha -switch command must be run from the server with the primary active database.
Step 1 Log into the server that contains the primary active database. (You can validate this by running primeha -status.)
Step 2 Move to the proper directory and start the script. The script will use the inputs you provided when you installed the gateway server high availability solution but will also give you an opportunity to modify those settings before performing the switchover.
Step 3 Approve or edit the default choices that appear based on the inputs you provided when you installed the gateway server high availability:
– From Prime Network 5.2 onwards,
While doing switch_over from active to standby server or while doing faileover from the server, Switch over scripts prompts the option "AD Search Scope" in "switch_over.pl" and other fail over scripts. Now both LDAP settings and AD Search scope are stored in the Registry file.
Parameters for Switch Over process:
Otherwise, proceed to Step 4.
|
|
---|---|
IP address of the remote gateway server on which Prime Network will run. |
|
Root password for the remote node on which Prime Network will run. |
For the remote site gateway server, the root password for the operating system (required for SSH). |
IP address of the remote server\service on which Oracle database will run |
|
Root password of the remote server on which Oracle database will run |
For the remote database, the root password for the operating system (required for SSH). |
If system users LDAP (external authentication) for user authentication (see the Cisco Prime Network 5.2 Administrator Guide). |
Step 4 Confirm that you want to continue with the switchover. Prime Network proceeds and displays text similar to the following.
Step 5 If required, manually move the AVMs from the unreachable units at the primary site to the reachable units at the remote site. For moving and deleting AVMs information see the Cisco Prime Network 5.2 Administrator Guide. (This is not required if the local units were not affected by a failure; the script will reconfigure the units to use the relevant gateway and database.)
Step 6 Verify that the new gateway IP address and database IP addresses are correct. If needed, switch the IP address manually using one of the following procedures:
Step 7 To verify the setup, perform all of the tests (not including the step for creating database links) that are described in Verifying the Geographical Redundancy Setup.
Note To make cross-launch work on the upgrade setup, de-register and register the Prime Network, after the Prime Central and Prime Network switchover.
Use the below procedure to perform switch over from local node (also local cluster node) to remote (DR) server (S2) with Prime Network Integration Layer installed.
To perform a switchover on systems that has PN-IL installed on top of Prime Network.
Step 1 As root user, log in to Prime Network primary server and perform switch over to DR node, using the procedure in Performing a Scheduled Site Move.
Step 2 After switchover, login to the DR node as the Prime Network user.
Step 3 Enable PN-IL Health monitor. [Health monitor will bring up PN-IL service if it is down.]
Step 4 After the primary server is up, login as a root user to the Prime Network DR node and again perform the switch over to primary node, using the procedure in Performing a Scheduled Site Move.
Step 5 After switchover, log in to the DR node as the Prime Network user and disable the PN-IL health monitor
Note A manual failover should only be performed when the primary site has failed.
Use the primeha -fail command to perform a site failover for disaster recovery. A site failover is a manual move from the failed primary site to the standby site at remote location. The script will use the inputs you provided when you installed the gateway server high availability solution but will also give you an opportunity to modify those settings before performing the failover. When you invoke primeha -fail, the command does the following:
After performing a manual failover, move any AVMs from unreachable units at the primary site to reachable units at the remote site.
If you are using Operations Reports, the data from the past 1 hour and 20 minutes will be lost.
Note The failover must be run from the node that contains the standby database. If the system is using external authentication (LDAP), you will have to provide the LDAP URL, distinguished name prefix and suffix, and the protocol (see Configuring Prime Network to Communicate with the External LDAP Server in the Cisco Prime Network 5.2 Administrator Guide).
Step 1 As a root user, log into the active node that contains the standby database. (You can validate this by running primeha -status.)
Move to the proper directory and start the script. The script will use the inputs you provided when you installed the gateway server high availability solution but will also give you an opportunity to modify those settings before performing the failover.
Step 2 Approve or edit the default choices that appear based on the inputs you provided when you installed the gateway server high availability:
If you enter yes and the system is not using external authentication, proceed to Step 3.
|
|
---|---|
IP address of the server to which the gateway should fail over |
IP address of the standby gateway. If the remote site is a member of a dual-node cluster, use the floating IP address (of the management port of the cluster). |
IP address of the server to which the database should fail over |
IP address of the standby database. If the remote site is a member of a dual-node cluster, use the floating IP address. |
For the remote gateway server, the root password for the operating system (required for SSH). |
|
For the remote database, the root password for the operating system (required for SSH). |
|
If system users LDAP (external authentication) for user authentication (For details on LDAP, see Configuring Prime Network to Communicate with the External LDAP Server in the Cisco Prime Network 5.2 Administrator Guide). |
Step 3 Confirm that you want to continue with the failover. Prime Network proceeds and displays text similar to the following.
Step 4 Move any AVMs from unreachable units at the primary site to reachable units at the remote site. For moving and deleting AVMs information see the Cisco Prime Network 5.2 Administrator Guide.
Step 5 Verify that the new gateway IP address and database IP addresses are correct. If needed, switch the IP address manually using one of the following procedures:
Note To restore the configuration after a disaster, see Restoring the Failed Site (Hot Backup).
Step 6 To verify the setup, perform all of the tests (not including the step for creating database links) that are described in Verifying the Geographical Redundancy Setup.
Step 7 Log into Prime Network using the new IP address.
Note In this section, failed site refers to the non-active site. This could be:
After the servers are up and running on the failed site, use the procedures in this section to restore the redundancy configuration on the failed site. For information about how to failover to a standby site after a disaster, see Failing Over to the Standby Site for Disaster Recovery.
While restoring the redundancy configuration on the failed site, you do not have to take down the active site. Both sites are up while the failed site resumes, and you can switch back to the active site, without any down time. Restoring the redundancy configuration on the failed site depends on whether the servers on the failed site were down due to a catastrophic or non-catastrophic failure.
The resumeFromFailOver.pl script is used for restoring the redundancy configuration and has the following format:
perl resumeFromFailOver.pl -setup_replication [-daemonize] | -reconfigure_setup | [-autoconf dir ] -reinstall_setup
Table 5-1 describes the arguments and option and also indicates the node from where the commands should be executed.
|
|
||
---|---|---|---|
Reinstalls the gateway on the failed site1 that can be either a dual node cluster or a single node server. The -autoconf dir option runs the operation in non-interactive mode using input from the rf_auto_install_RH.ini file located in dir ( dir must be a full pathname). |
|||
Restores the replication between the failed site and remote site. The daemonize option runs the replication process in background without any user interaction. |
|||
Reconfigures the failed2 site1 after a failover. Use this flag only when the setup still exists on the failed site. If for some reason, the --setup_replication fails, then use this flag on the failed site first, and then run --setup_replication again (from the active site). |
Depending on the type of failure on the failed site, do one of the following to restore the redundancy configuration on the failed site:
Make sure all the installation and high availability requirements are met. See Installation Requirements for Geographical Redundancy.
Note Ensure that you have already performed the failover procedure before proceeding to the restore redundancy configuration on the failed site. For information about how to failover to a standby site after a disaster, see Failing Over to the Standby Site for Disaster Recovery.
To restore the redundancy configuration on the failed site after a catastrophic failure, do the following:
Step 1 As a root user, log into the failed site and unzip the RH_ha.zip located on the “Disk 1 New Install” DVD. Unzipping RH_ha.zip creates the /tmp/RH_ha directory. Also, unzip the RH_ha.zip in the primary location.
Step 2 From the /tmp/RH_ha directory, run the perl resumeFromFailOver.pl -reinstall_setup for reinstalling the failed site. For information about other options or arguments used with resumeFromFailOver.pl script, see Table 5-1 .
Step 3 Enter y at the prompt to continue with the Prime Network installation.
Step 4 Enter the server details as shown in Table 5-2 or Table 5-3 , depending on whether it is local and geographical redundancy configuration or a geographical redundancy only.
|
|
---|---|
Hostname of the active site, i.e, the site that is currently running both the cluster services (ana, oracle_db). |
|
Password of the active site, i.e, the site that is currently running both the cluster services (ana, oracle_db). |
|
NTP should be configured on two gateways. If not configured, first configure NTP and then continue with the installation. For more details on procedures, see configuring NTP in the Cisco Prime Network 5.2 Installation Guide. |
|
Checking whether to run automated backup for embedded database backup |
Indicates whether to run the embedded database automated backups, yes or no. |
Enter full path to the installation image version that was first installed. For example, if you had installed 3.8 first, then upgraded to 3.10, then upgraded to 4.1, and then upgraded to 4.2, 4.2.3, or 4.3 provide the path of the 3.8 image. |
|
The root user password for the node running the installation. For local redundancy dual-node clusters, this node must be one of the cluster nodes. |
|
The email address to which error messages will be sent from the embedded database if problems occur. |
|
An available multicast address accessible and configured for both cluster nodes. |
|
User-defined cluster name. The cluster name cannot be more than 15 non-NUL (ASCII 0) characters. For local redundancy, the cluster name must be unique within the LAN. |
|
The hostname of the fencing device configured for the node running the installation. This can be an IP address for some fencing devices. |
|
The hostname of the fencing device configured for the second cluster node. For some fencing devices, this can be an IP address. |
|
Password for the Prime Network cluster web interface admin user |
Indicates the port and the password for the cluster web interface. The LUCI_PORT must be available and should not be in the Prime Network debug range (60000 <= X < 61000) or in the Prime Network avm port range (2000 <= X < 3000) OR (8000 <= X < 9000). The password must contain at least 6 characters. |
|
|
---|---|
Whether NTP is configured on the 2 gateways (local and remote) |
NTP should be configured on two gateways. If not configured, first configure NTP and then continue with the installation. For more details on procedures, see configuring NTP in the Cisco Prime Network 5.2 Installation Guide. |
Enter full path to the installation image version that was first installed. For example, if you had installed 3.8 first, then upgraded to 3.10, then upgraded to 4.1, and then upgraded to 4.2, 4.2.2, 4.2.3, or 4.3, provide the path of the 3.8 image. |
|
The email address to which error messages will be sent from the embedded database if problems occur. |
Step 5 Log into the active site and run the perl resumeFromFailOver.pl -setup_replication from the directory where RH_ha.zip was extracted. This script will set up replication between the failed and the active site.
For information on other options or arguments used with resumeFromFailOver.pl script, see Table 5-1 .
Step 6 Confirm that you want to continue with setting up of replication process.
Step 7 To verify the setup, perform all of the tests (not including the step for creating database links) that are described in Verifying the Geographical Redundancy Setup.
Make sure all the installation and high availability requirements are met. See Installation Requirements for Geographical Redundancy.
Note Ensure that you have already performed the failover procedure before proceeding to restore the redundancy configuration on the failed site. For information about how to failover to a standby site after a disaster, see Failing Over to the Standby Site for Disaster Recovery.
To restore the redundancy configuration on the failed site after a non-catastrophic failure, do the following:
Step 1 As a root user, log into the failed site and unzip the RH_ha.zip located on the “Disk 1 New Install” DVD. Unzipping RH_ha.zip creates the /tmp/RH_ha directory. Also, unzip the RH_ha.zip in the primary location.
Step 2 From the /tmp/RH_ha directory, run the perl resumeFromFailOver.pl -reconfigure_setup for restoring the replication between the failed and the active site.
For information on options or arguments used with resumeFromFailOver.pl script, see Table 5-1 .
Step 3 Log into the active site and run the perl resumeFromFailOver.pl -setup_replication from the /tmp/RH_ha directory for setting up replication between the failed and the active site.
For information on other options or arguments used with resumeFromFailOver.pl script, see Table 5-1
Step 4 Confirm to continue with setting up of replication process.
Step 5 To verify the setup, perform all of the tests (not including the step for creating database links) that are described in Verifying the Geographical Redundancy Setup.
These topics explain how to stop and restart the data replication process:
Use the stop replication command primeha -stop when you need to perform scheduled work on a server in the remote site. It stops the replication process to the remote site and shuts down the standby database. Then resume replication when maintenance is complete as described in Resuming Data Replication.
The following includes an example of a stop replication session. In the example:
This example stops data replication from P1 to S1.
Note This command must be run from the server that contains the standby database (S1 in this example). You can validate which server is the standby by running primeha -status.
Step 1 Move to the correct directory.
Step 2 Verify the status of the active and backup servers using primeha -statu s:
Step 3 Log into the server with the remote database (S1) and enter the following command. This will stop replicating data and will shut down the remote site database.
Step 4 Enter the server details as shown in the following table.
Step 5 Verify the status of the active and backup servers using primeha -statu s:
Step 6 Verify that all applications on the standby server (S1) are stopped by running fsuser -c /export/home. If any processes are still running (such as the Apache webserver), boot the standby server in single-user mode.
Step 7 Perform any necessary maintenance.
Note This command can only be used if (1) the remote database was stopped using primeha -stop, and (2) the remote database has not been down for more than 7 days. If the remote database has been down for more than seven days, you must recreate the remote database by using the setup_Prime_DR.pl script. See Installing the Prime Network Gateway Geographical Redundancy Software for information on using setup_Prime_DR.pl script.
Use the resume replication utility primeha -start to start the database at the remote site (in open, read-only mode) and restart the replication process. Run this command after all work is completed on the remote site.
The following includes an example of a start replication session. In this example:
This example starts data replication from the local active gateway (P1) to the remote standby gateway (S1).
Note This command must be run from the server that contains the remote database.
Step 1 Verify the following on the server with the remote database (S1):
Step 2 Log into the server with the remote database (S1) and move to the correct directory.
Step 3 Verify the status of the active and backup servers using primeha -statu s. If any services are running, stop them using primeha -stop.
Step 4 Enter the following command. This will start replicating data and will shut down the remote site database.
Step 5 Enter the server details (see Table 5-4 ).
Note If the process aborts, run primeha -stop again. (The script most likely aborted because a process is not shut down.) Then verify that no services are running with primeha -status.
Step 6 When the process completes, verify the status of the active and backup servers using primeha -statu s:
If all IP addresses are not automatically changed after a failover or switchover, use the following procedures, as appropriate.
If the gateway IP address is not updated on any of the units (or on the gateway) during a site-to-site failover or switchover, use the changeSite.pl utility to do so manually. This procedure will change the address on the gateway and all reachable units.
Note If a dual-node cluster is part of a local redundancy setup, use the logical IP addresses.
The following table describes the options or arguments to the changeSite.pl utility. If you are using an external LDAP server for user authentication, you must also set the necessary LDAP parameters, as described below. For more details on these parameters, see Configuring Prime Network to Communicate with the External LDAP Server in the Cisco Prime Network 5.2 Administrator Guide .
Step 1 If you will reset LDAP information, reconfigure them first from the Prime Network Administration GUI client. For details on LDAP, see Configuring Prime Network to Communicate with the External LDAP Server in the Cisco Prime Network 5.2 Administrator Guide.
Step 2 Log into the primary gateway server as pnuser.
Step 3 Change to the correct directory:
Step 4 Run the following command:
The following is an example of a changeSite.pl session. In this example the following is being changed:
For some reason, the IP addresses were not correctly changed to reflect the new addresses. The utility forces the IP addresses to be changed to 1.1.1.2 for the gateway and 1.1.1.3 for the database. In this example the system is not using LDAP, so those parameters are not included.
If any of the units do not reflect the updated gateway and database IP address after a site-to-site failover or switchover, use the switchUnit.pl utility to do so manually. This procedure will change the address only on the unit from which it is run.
Note If a dual-node cluster is part of a local redundancy setup, use the logical IP addresses.
For any unit that does not reflect the updated gateway and database IP addresses:
Step 1 Log into the unit as pnuser.
Step 2 Change to the correct directory:
Step 3 Run the following command: