How High Availability Works
The Cisco EPN Manager high availability (HA) framework ensures continued system operation in case of failure. HA uses a pair of linked, synchronized Cisco EPN Manager servers to minimize or eliminate the impact of application or hardware failures that may take place on either server. Servers can fail due to issues in one or more of the following areas:
-
Application processes—Server, TFTP, FTP, and other process failures. You can view the status of these processes using the CLI ncs status command.
-
Database server—Database-related process failures (the database server runs as a service on Cisco EPN Manager).
-
Network—Problems with network access or reachability.
-
System—Problems with the server's physical hardware or operating system.
-
Virtual machine (if HA is running in a VM environment)—Problems with the VM environment on which the primary and secondary servers are installed.
The following figure shows the main components and process flows for an HA setup.
An HA deployment consists of a primary and a secondary server with Health Monitor (HM) instances (running as an application process) on both servers. When the primary server fails (either automatically or because it is manually stopped), the secondary server takes over and manages the network while you restore access to the primary server. If the deployment is configured for automatic failover, the secondary server takes over the active role within two to three minutes after the failover. This HA is based on the active/passive or cold standby model of operation. Because it is not a clustered system, when the primary server fails, the sessions are not preserved in the secondary server.
When issues on the primary server are resolved and the server is in a running state, it remains in standby mode during which it begins syncing its data with the active secondary server. When the primary is available again, you can initiate a failback operation. When a failback is triggered, the primary server again takes over the active role. This role switching between the primary and secondary servers happens within two to three minutes.
Whenever the HA configuration determines that the primary server has changed, it synchronizes this change with the secondary server. These changes are of two types:
-
File changes, which are synchronized using the HTTPS protocol. This includes items such as report configurations, configuration templates, TFTP-root directory, administration settings, licensing files, and the key store. File synchronization is done:
-
In batches, for files that are not updated frequently (such as license files). These files are synchronized once every 500 seconds.
-
Near real-time, for files that are updated frequently. These files are synchronized once every 11 seconds.
-
-
Database changes, such as updates related to configuration, performance and monitoring data. Oracle Recovery Manager (RMAN) creates the initial standby database and Oracle Active Data Guard synchronizes the databases when there is any change.
The primary and secondary HA servers exchange the following messages to maintain synchronization between the two servers:
-
Database Sync—Includes all the information necessary to ensure that the databases on the primary and secondary servers are running and synchronized.
-
File Sync—Includes frequently updated configuration files. These are synchronized every 11 seconds, while other infrequently updated configuration files are synchronized every 500 seconds.
Note
Configuration files that are updated manually on the primary are not synced to the secondary. When you update a configuration file manually on the primary, you must update the file on the secondary as well.
-
Process Sync—Ensures that application- and database-related processes are running. These messages fall under the Heartbeat category.
-
Health Monitor Sync—These messages check for the network, system, and health monitor failure conditions.