Control-M for Databricks

Databricks is a cloud-based data analytics platform that enables you to process and analyze large workloads of data.

Control-M for Databricks enables you to do the following:

  • Execute Databricks jobs.

  • Manage Databricks credentials in a secure connection profile.

  • Connect to any Databricks endpoint.

  • Introduce all Control-M capabilities to Control-M for Databricks, including advanced scheduling criteria, complex dependencies, resource pools, lock resources, and variables.

  • Integrate Databricks jobs with other Control-M jobs into a single scheduling environment.

  • Monitor the status, results, and output of Databricks jobs.

  • Attach an SLA job to the Databricks jobs.

  • Run 100 Databricks jobs simultaneously per Agent.

Setting up Control-M for Databricks

This procedure describes how to deploy the Databricks plug-in, create a connection profile, and define a Databricks job in Control-M SaaS and Automation API.

Before You Begin

  • Verify that you have Java installed, as described in Control-M External Java Installation.

  • Verify that Automation API is installed, as described in Setting Up the API.

  • Verify that Agent version 9.0.22.000 or higher is installed.

  • Verify that Application Integrator patch 9.0.22.001 or higher is installed.

Begin

  1. Do one of the following:

    • Install: Run one of the following provision image commands:

      • Linux: ctm provision image DBX_plugin.Linux

      • Windows: ctm provision image DBX_plugin.Windows

    • Upgrade: Run the following command:

      ctm provision agent::update

  2. Create a Databricks connection profile in Control-M SaaS or Automation API, as follows:

  3. Define a Databricks job in Control-M SaaS or Automation API, as follows:

To remove this plug-in from an Agent, see Removing a Plug-in. The plug-in ID is DBX032022.

Change Log

The following table provides details about changes that were introduced in new versions of this plug-in:

Plug-in Version

Details

1.0.07

  • Displayed task level output according to user preference or Databricks job failure

  • Added HTTP Codes, Rerun Interval, and Attempt Reruns parameters to the connection profile to rerun an execution step with an HTTP code

1.0.06

Added Failure Tolerance job parameter

1.0.05

Added semantic changes

1.0.04

Removed the Job Name attribute

1.0.03

Add new job icon

1.0.02

Added idempotency enhancement

1.0.01

Added multiple task enhancement

1.0.00

Initial version