Control-M for Azure Databricks

Azure Databricks is a cloud-based data analytics platform that enables you to process large workloads of data.

Control-M for Azure Databricks enables you to do the following:

  • Execute Azure Databricks jobs.

  • Manage Azure Databricks credentials in a secure connection profile.

  • Connect to any Azure Databricks endpoint.

  • Introduce all Control-M capabilities to Control-M for Azure Databricks, including advanced scheduling criteria, complex dependencies, resource pools, lock resources, and variables.

  • Integrate Azure Databricks jobs with other Control-M jobs into a single scheduling environment.

  • Monitor the status, results, and output of Azure Databricks jobs.

  • Attach an SLA job to the Azure Databricks jobs.

  • Run 100 Azure Databricks jobs simultaneously per Agent.

Setting up Control-M for Azure Databricks

This procedure describes how to deploy the Azure Databricks plug-in, create a connection profile, and define an Azure Databricks job in Control-M SaaS and Automation API.

Before You Begin

  • Verify that Java is installed, as described in Control-M External Java Installation.

  • Verify that Automation API is installed, as described in Setting Up the API.

  • Verify that Agent version 9.0.21.080 or higher is installed.

Begin

  1. Do one of the following:

    • Install: Run one of the following provision image commands:

      • Linux: ctm provision image ZDX_plugin.Linux

      • Windows: ctm provision image ZDX_plugin.Windows

    • Upgrade: Run the following command:

      ctm provision agent::update

  2. Create an Azure Databricks connection profile in Control-M SaaS or Automation API, as follows:

  3. Define an Azure Databricks job in Control-M SaaS or Automation API, as follows:

To remove this plug-in from an Agent, see Removing a Plug-in. The plug-in ID is ZDX112021.

Change Log

The following table provides details about changes that were introduced in new versions of this plug-in:

Plug-in Version

Details

1.0.10

Included support for additional Databricks result states.

1.0.09

Added repair on rerun functionality to repair a job run when you re-run one or more tasks as part of the original job run.

1.0.08

  • Added a User-Agent header in requests when you call Databricks REST APIs

  • Upgraded Databricks API v2.1 to Databricks API v2.2

1.0.07

  • Displayed task level output according to user preference or Databricks job failure

  • Added HTTP Codes, Rerun Interval, and Attempt Reruns parameters to the connection profile to rerun an execution step with an HTTP code

1.0.06

Added Managed Identity authentication added

1.0.05

Added Failure Tolerance job parameter

1.0.04

Removed the Job Name attribute

1.0.03

Added new job icon

1.0.02

Added idempotency enhancements

1.0.01

Added multiple task enhancements

1.0.00

Initial version