Control-M for GCP Dataproc

Google Cloud Platform (GCP) Dataproc enables you to perform cloud-based big data processing and machine learning.

Control-M for GCP Dataproc enables you to do the following:

  • Execute single or Workflow Template GCP Dataproc jobs.

  • Manage GCP Dataproc credentials in a secure connection profile.

  • Connect to any GCP Dataproc endpoint.

  • Introduce all Control-M capabilities to Control-M for GCP Dataproc, including advanced scheduling criteria, complex dependencies, Resource Pools, Lock Resources, and variables.

  • Integrate GCP Dataproc jobs with other Control-M jobs into a single scheduling environment.

  • Monitor the status, results, and output of GCP Dataproc jobs.

  • Attach an SLA job to the GCP Dataproc jobs.

  • Run 50 GCP Dataproc jobs simultaneously per Agent.

Setting up Control-M for GCP DataprocLink copied to clipboard

This procedure describes how to deploy the GCP Dataproc plug-in, create a connection profile, and define a GCP Dataproc job in Control-M SaaS and Automation API.

Before You Begin

  • Verify that Automation API is installed, as described in Setting Up the API.

  • Verify that Agent version 9.0.21.080 or later is installed.

  1. On the Agent host, run one of the following commands to set the Java environment variable:

    • Linux:

      • Bourne shell/bash: export BMC_INST_JAVA_HOME=<java_11_directory>

      • csh/tcsh: setenv BMC_INST_JAVA_HOME <java_11_directory>

    • Windows: set BMC_INST_JAVA_HOME="<java_11_directory>"

  2. Run one of the following API commands:

    • To install, type one of the following provision image commands:

      • Linux: ctm provision image GDP_plugin.Linux

      • Windows: ctm provision image GDP_plugin.Windows

    • To upgrade, type the following command:

      ctm provision agent::update

  3. Create a GCP Dataproc connection profile in Control-M SaaS or Automation API, as follows:

  4. Define a GCP Dataproc job in Control-M SaaS or Automation API, as follows:

To remove this plug-in from an Agent, see Removing a Plug-in. The plug-in ID is GDP042022.

Change LogLink copied to clipboard

The following table provides details about changes that were introduced in new versions of this plug-in:

Plug-in Version

Details

1.0.03

Added ability to terminate the interactive session resource.

1.0.02

Set the Batch ID and Requested ID parameters to resolve on rerun

1.0.01

  • Added the ability to trigger batch jobs using Dataproc Serverless for Spark Batch Jobs

  • Added the new Batches option to the Dataproc task type parameter

  • Added the following new parameters:

    • Batch ID

    • Requested ID

1.0.00

Initial version