Control-M for Hadoop

Control-M for Hadoop is a plug-in that enables you to do the following:

  • Connect to the Hadoop framework, which enables the distributed processing of large data sets across clusters of commodity servers.

  • Connect to your Hadoop cluster from a single computer with secure login, which eliminates the need to provide authentication.

  • Integrate Hadoop jobs with other Control-M jobs into a single scheduling environment.

  • Attach an SLA job to your Hadoop jobs.

  • Introduce all Control-M capabilities to Control-M for Hadoop, including advanced scheduling criteria, complex dependencies, Resource Pools, Lock Resources, and variables.

The Control-M for Hadoop plug-in is only available on Linux systems.

Compatibility

The following table lists the Control-M for Hadoop plug-in prerequisites, each with its minimum required version.

Prerequisites

Version

Helix Control-M/Agent

9.0.20.080 or higher

Setting Up Control-M for Hadoop

This procedure describes how to install the Hadoop plug-in, create a connection profile, and define a Hadoop job in Control-M Web and in Automation API.

Begin

  1. Create a temporary directory to save the downloaded files.

  2. Download the Control-M for Hadoop plug-in.

  3. Install Control-M for Hadoop, as described in Installing a Plug-in.

  4. Create a Hadoop connection profile, as follows:

  5. Configure Control-M for Hadoop to work in a secure Hadoop environment (Kerberos), as described in Control-M for Hadoop Kerberos Configuration.

  6. Configure Control-M for Hadoop to fetch Oozie workflows and push the actions of each workflow as submitted jobs using the Oozie Extractor, as described in Configuring the Oozie Extractor.

  7. Define a Hadoop job, as follows: