Control-M/Agent Cluster Configuration

The following procedures describe how to configure clusters on Control-M/Agent:

Control-M with Active/Active (Load Balancing) Clusters

Control-M does not support the use of network load balancers or broadcast IP addressing, to describe an active/active cluster. Control-M/Server must be able to connect to a definitive address on a Control-M/Agent computer that runs the job. For this reason, the following configuration is recommended for an active/active cluster:

  • Each host in the cluster must have an Agent installed that listens on a non-load balanced, or broadcast IP, address. The Server-to-Agent port must be reachable without going through any network load balancer or port address translation.

  • Discover each Agent through Control-M/Server.

  • Create a host group for the application. This is the name that must be used when scheduling jobs for this application. We recommend using the virtual name or the application name for familiarity with schedulers.

  • Update or create your job definitions to refer to the host group that was created in the previous step.

Control-M with Active/Passive (High Availability) Clusters

When you implement Control-M/Agent on a UNIX cluster, a dedicated Agent is installed within each resource group to which Control-M must submit jobs. When a single application is running on the cluster, a single Agent must be installed. When multiple applications are running on the cluster, Control-M submits jobs to those applications using different Agents.

The file system on which Control-M/Agent is installed must be located on the shared disk. This file system must always be mounted to the same host as the application to which Control-M submits jobs. The file system can be as follows:

  • The same file system as the application file system.

  • A different file system, as long as both file systems are always active on the same host (if they are not members in the same application resource group).

Each Agent must be configured to use the application virtual host name for the communication with Control-M/Server. When submitting jobs to this Agent, the NODEID parameter value for the jobs must be the virtual host name.

Before starting the implementation of Control-M/Agent on a UNIX cluster, first identify the file system where the Agent must be installed, and determine the resource group where the Agent must be installed.

Creating Control-M/Agent UNIX Accounts

This procedure describes how the Agent is installed into the same file system as Control-M/Server (referred to in the example: /export2), using the same virtual network name as Control-M/Server (referred to in the example: vhctmxxx). The same procedure can be used if the Agent is installed for any other external application.

Begin

  1. Create two user accounts as shown in the following example, one on each host.

    useradd -g controlm -s /bin/tcsh -m -d /export2/agxxxctm agxxxctm

    This command must be invoked by a user with administrative permissions.

  2. Both users must have identical names (referred to in the example as: agxxxctm) and identical user IDs (UID).

  3. Both user home directories must point to the same location on a shared disk (referred to in the example as: /export2/agxxxctm).

Installing Control-M/Agent

Begin

  1. Install Control-M/Agent on the relevant file system on the shared disk according to the instructions provided in Agent Installation.

  2. Install the latest Fix Pack to apply the most recent software updates.

  3. Run the Agent configuration utility (either ctmag or ctmagcfg) to configure the logical Agent name. In the configuration utility, select Logical Agent Name from the Advanced menu. The logical Agent name must contain the virtual network name.

  4. In the Agent configuration menu, define the Control-M/Server host name as authorized to submit jobs to this Agent. If Control-M/Server is installed on a cluster, only the virtual network name of Control-M/Server (referred to in the example: vhctmxxx) must be specified.

Missing Jobs

Each time a job is submitted, a process is created which monitors the job, and reports about its completion. This process is called Agent Monitor (AM). With each job, when the AM is started, two files are created for the job: a status file and a procid file.

In a normal scenario, the AM detects the job completion, updates the procid file and sends a trigger to the Agent Tracker (AT) about its completion. The AT then sends the update to Control-M/Server.

In a failover scenario, while the job is still executing, the Agent process is stopped and the Agent file system is unmounted from the first host. In this case the job can keep running, but the procid file will not be updated when the job completes (the Agent file system will be mounted to the backup host). Therefore, when the Agent is started on the backup host, and the next AT track time arrives, it finds the original procid file but it does not find the actual process. This is why the job is marked as disappeared.

As an optional workaround, you can define a JLOST ON statement for the jobs that run on the clustered Agent (Statement=*, Code=JLOST) and execute a DO RERUN command. In this case, the jobs are automatically restarted (rerun) on the backup server when Control-M/Server determines that they have disappeared.

You must enter value greater than 0 in the MAX RERUN parameter to resubmit the job.

Monitoring Control-M/Agent Processes

When monitoring Control-M/Agent processes on a cluster, use the following process names for cluster monitoring definitions:

Control-M/Agent Component

Process Name

Control‑M/Agent Listener

p_ctmag

Control‑M/Agent Tracker

p_ctmat

Control‑M/Agent Router

p_ctmar

Control-M/Agent Tracker-Worker

p_ctmatw

Control-M/Agent Remote Utilities Listener

p_ctmru

Control-M/Agent SSH connection pool

sshcourier.jar

(Windows only) Control-M/Agent Recovery

p_ctmam

The Control-M/Agent Router (p_ctmar) is only active when working in persistent connection mode. When working in transient connection mode, only the Control M/Agent Listener (p_ctmag) and Tracker (p_ctmat) are active.

On UNIX, you might see more than one p_ctmag (one for each job).

Control-M/Agent Cluster Environment on Windows

Note the following:

  • Install Control-M/Agent, as described in Installing Control-M/Agent on Windows.

    The Agent and File Watcher cluster resources are installed and online.

  • Multiple Agents can be installed on the same virtual server group or in separate virtual server groups.

  • Agents that share the same IP and Network name resources must be associated with separate Control-M/Servers.

  • Disk, IP, and Network Name resources must be online in the virtual server group where the Agent is installed.

  • Automatic installation and automatic upgrade of the Agent is not supported for Microsoft Windows cluster environments.