Control-M/EM and Control-M/Server High Availability with Oracle/MSSQL/External PostgreSQL

After you have installed a secondary Control-M (see High Availability Installation), the Control-M/EM or Control-M/Server Configuration Agent on the secondary host monitors the primary Control-M/EM or Control-M/Server based on defined intervals. If there is no response from the primary, you can fail over to the secondary in one of the following modes:

  • Automatic Failover: The secondary Configuration Agent automatically takes control and resumes production, when it detects that the primary Control-M/EM or Control-M/Server and its primary Configuration Agent has stopped unexpectedly.

  • Manual Failover: You can perform a manual failover at any time from the CCM if the manual failover option is enabled. After the failover is complete, the production runs on the secondary.

The following procedures describe how to manually fail over to a secondary host, pause Control-M/Server, fall back to a primary host, and set secondary to Primary:

If you attempt to manually start up components on the secondary when the primary is active, the components shut down automatically. This prevents both the primary and secondary from running components simultaneously.

For a description of configurable Control-M/EM High Availability system parameters, see Maintenance Parameters. For a description of configurable Control-M/Server High Availability system parameters, see High Availability Parameters.To receive notifications about Control-M/Server High Availability events, see Control-M/Server General Parameters. To receive notifications about Control-M/EM High Availability events, see Control-M/EM General Parameters.

Control-M/EM High Availability Architecture

The following diagram shows Control-M/EM in a High Availability environment using an Oracle, MSSQL, or external PostgreSQL database.

The following diagram shows a Control-M/EM automatic failover when the primary components are no longer available.

Control-M/Server High Availability Architecture (Oracle/MSSQL/External PostgreSQL)

The following diagram shows Control-M/Server in a High Availability environment using an Oracle, MSSQL, or external PostgreSQL database.

The Configuration Agent on the primary and secondary host communicates using port 2368. To change this setting, see Communication Parameters.

The following diagram shows a Control-M/Server automatic failover when the primary components are no longer available.

Automatic Failover

An automatic failover occurs when the secondary Configuration Agent detects that the primary Control-M/EM or Control-M/Server and its Configuration Agent is not alive and the production on the primary has stopped unexpectedly. This can occur due to a hardware malfunction, machine crash, a network card stops responding, or if all components are down.

Control-M/EM: To ensure that the primary Control-M/EM is not functioning, the following conditions must be met before an automatic failover occurs (default: 60 seconds):

  • There are no life check responses from all Control-M/EM components and the primary Configuration Agent (see Maintenance Parameters).

    If HA_LIFECHECK_TRIES is set to 3, and each Check Interval for each Control-M/EM component is set to 20, an automatic failover starts after 60 seconds. The production on the secondary is ready after all the components are up and this time is determined by the operating system, number of Control-M/Servers, and number of jobs.

  • There are no transactions recorded in the database from all Control-M/EM components and its primary Configuration Agent.

    If all components are down and the Configuration Agent is up, an automatic failover does not occur.

  • The Oracle, MSSQL, or external PostgreSQL database is up.

Control-M/Server: To ensure that the primary Control-M/Server is not functioning, the following conditions must be met before an automatic failover occurs (default: 60 seconds):

  • There is no life check response from the primary Configuration Agent (see High Availability Parameters ).

    If HA_TIME_BETWEEN_LIFECHECKS is set to 15 (default) and HA_LIFE_CHECK_TIMEOUT is set to 5 (default), the primary Configuration Agent is considered not functioning after 20 seconds.

  • There are no transactions recorded in the database from all running Control-M/Server processes and its primary Configuration Agent.

    If HA_LIFE_CHECK_TRIES is set to 3 (default), HA_TIME_BETWEEN_LIFECHECKS is set to 15 (default) and HA_LIFE_CHECK_TIMEOUT is set to 5 (default), processes are considered not writing to the database after 40 seconds ( (3-1) * (15 +5) )

    If all Control-M/Server processes are down but the Configuration Agent is up, an automatic failover does not occur.

  • The Oracle, MSSQL, or external PostgreSQL database is up.

Manual Failover

You can perform a manual failover at any time from the CCM if the manual failover option is enabled.

The following scenarios describe the required conditions for a manual failover to occur.

Oracle/MSSQL/External PosgreSQL: A manual failover can occur in one of the following scenarios:

  • If the primary Configuration Agent is running:

    • The secondary Configuration Agent responds to life check requests from the primary Configuration Agent.

    • The database server is available for the Primary Configuration Agent.

  • If the primary Configuration Agent is not running:

    • The primary Control-M/Server is not running.

    • The database server is available for the Secondary Configuration Agent.

Dedicated BMC PostgreSQL: A manual failover can occur in one of the following scenarios:

  • If the primary Configuration Agent is running:

    • The secondary Configuration Agent responds to life check requests from the primary Configuration Agent.

    • The primary and secondary Configuration Agent has access to the shared directory.

  • If the primary Configuration Agent is not running:

    • The primary database server is not running.

    • The secondary Configuration Agent has access to the shared directory.

    • The secondary database server is available for the Secondary Configuration Agent.

Changing the Control-M/EM Failover Mode

This procedure describes how to change the failover mode from Automatic to Manual on Control-M/EM in the CCM. This enables you to determine when to shut down Control-M/EM and perform a failover.

Begin

  1. Log in to the CCM.

  2. Select the primary Control-M/EM component.

  3. Right-click and select Properties.

    The Control-M/EM Properties window appears.

  4. From the Failover Mode drop-down list, select Manual.

  5. Click Save.

The failover mode is now set to Manual, and a failover does not occur until you perform it manually, as described in Failing Over a Control-M/Server to Secondary.

To allow you to continue the failover, the secondary Configuration Agent starts up the CMS on the secondary host, which enables the Failover option for Control-M/EM in the CCM. If the primary starts up again without performing a failover, the secondary Configuation Agent stops the CMS on the secondary host.

Changing the Control-M/Server Failover Mode

This procedure describes how to change the failover mode from Automatic to Manual on Control-M/Server. This enables you to determine when to shut down Control-M/Server and perform a failover.

Begin

  1. From the icon, select Configuration.

  2. From the drop-down list, select Control-M/Servers.

    The Control-M/Servers list opens.

  3. Select the Control-M/Server component.

  4. From the High Availability drop-down list, select Failover Mode.

    The Failover Mode dialog box appears.

  5. Select Manual Only.

  6. Click Save.

The failover mode is now set to Manual, and a failover does not occur until you perform it manually, as described in Failing Over a Control-M/EM to Secondary .

Failing Over a Control-M/EM to Secondary

This procedure describes how to manually fail over a Control-M/EM to a secondary host in the CCM.

In manual mode, the secondary CA starts up the CMS.

Control-M/EM must be using a MSSQL, Oracle, or PostgreSQL database.

Begin

  1. Log in to the CCM.

  2. From the High Availability tab, select the primary Control-M/EM component and click Failover to Secondary.

    A progress window appears listing each step in the failover process.

  3. After the failover is complete, click Close.

    Control-M/EM is now running on the secondary host.

  4. If you want to revert to your original configuration, fix the problem on the primary and then fall back to primary, as described in Falling Back to the Control-M/EM Primary.

Control-M/EM is now running on the primary host.

Failing Over a Control-M/Server to Secondary

This procedure describes how to manually fail over a Control-M/Server to a secondary host.

In manual mode, the secondary CA starts up the CMS.

Control-M/Server must be using a MSSQL or Oracle database.

Begin

  1. From the icon, select Configuration.

  2. From the drop-down list, select Control-M/Servers.

    The Control-M/Servers list opens.

  3. Select the Control-M/Server component.

  4. From the High Availability drop-down list, select Failover to Secondary.

  5. In the dialog box that appears, select Failover.

    A progress window appears listing each step in the failover process. You can click Close to close the progress window at any moment.

    Control-M/Server is now running on the secondary host.

  6. If you want to revert to your original configuration, fix the problem on the primary and then fall back to primary, as described in Falling Back to the Control-M/Server Primary.

    Control-M/Server is now running on the primary host.

Falling Back to the Control-M/EM Primary

This procedure describes how to manually fall back a Control-M/EM  to the primary host in the CCM.

Control-M/EM must be using a MSSQL, Oracle, or external PostgreSQL database.

Begin

  1. Log in to the CCM.

  2. On the primary host, start up the Configuration Agent.

  3. From the High Availability tab, click Fallback to Primary.

    A progress window appears listing each step in the fallback process.

  4. After the fallback is complete, click Close.

    Control-M/EM is now running on the primary host.

Falling Back to the Control-M/Server Primary

This procedure describes how to manually fall back a Control-M/Server to the primary host.

Control-M/Server must be using a MSSQL, Oracle, or external PostgreSQL database.

Begin

  1. From the icon, select Configuration.

  2. From the drop-down list, select Control-M/Servers.

    The Control-M/Servers list opens.

  3. Select the Control-M/Server component.

  4. From the High Availability drop-down list, select Fallback to Primary.

    A progress window appears listing each step in the failover process. You can click Close to close the progress window at any moment.

    Control-M/Server is now running on the primary host.

Setting a Secondary Control-M/EM to Primary

This procedure describes how to set a secondary Control-M host to act as the primary host in the CCM when the primary installation is corrupted.

Begin

  1. Log in to the CCM.

  2. After a successful failover has occurred, from the High Availability tab, select the secondary and click Set as Primary.

    The secondary is now the new primary host.

  3. Install a secondary on the original primary computer or on another computer, as described in High Availability Installation.

    The primary detects the new secondary, and you now have a new high availability configuration. You can work with this configuration, but if you want to revert to your original configuration (the secondary is installed on the original primary computer), continue to the next step.

  4. Perform a failover from the new primary to the new secondary, as described in Failing Over a Control-M/EM to Secondary .

    The secondary is now the active host.

  5. From the High Availability tab, select the secondary and click Set as Primary.

You have now reverted to your original high availability configuration.

Setting a Secondary Control-M/Server to Primary

This procedure describes how to set a secondary Control-M/Server host to act as the primary host when the primary installation is corrupted.

Begin

  1. After a successful failover has occurred, from the High Availability tab, select the secondary and click Set as Primary.

    The secondary is now the new primary host.

  2. Install a secondary on the original primary computer or on another computer, as described in High Availability Installation.

    The primary detects the new secondary, and you now have a new high availability configuration. You can work with this configuration, but if you want to revert to your original configuration (the secondary is installed on the original primary computer), continue to the next step.

  3. Perform a failover from the new primary to the new secondary, as described in Failing Over a Control-M/Server to Secondary.

    The secondary is now the active host.

  4. From the High Availability drop-down list, select the secondary and click Set as Primary.

  5. In the dialog box that appears, click Confirm.

You have now reverted to your original high availability configuration.