Control-M for Hadoop
Control-M for Hadoop is a plug-in that enables you to do the following:
-
Connect to the Hadoop framework, which enables the distributed processing of large data sets across clusters of commodity servers.
-
Connect to your Hadoop cluster from a single computer with secure login, which eliminates the need to provide authentication.
-
Integrate Hadoop jobs with other Control-M jobs into a single scheduling environment.
-
Attach an SLA job to your Hadoop jobs.
-
Introduce all Control-M capabilities to Control-M for Hadoop, including advanced scheduling criteria, complex dependencies, Resource Pools, Lock Resources, and variables.
The Control-M for Hadoop plug-in is only available on Linux systems.
Compatibility
The following table lists the Control-M for Hadoop plug-in prerequisites, each with its minimum required version.
Prerequisites |
Version |
---|---|
Helix Control-M/Agent |
9.0.20.080 or higher |
Setting Up Control-M for Hadoop
This procedure describes how to install the Hadoop plug-in, create a connection profile, and define a Hadoop job in Control-M Web and in Automation API.
Begin
-
Create a temporary directory to save the downloaded files.
-
Download the Control-M for Hadoop plug-in.
-
Install Control-M for Hadoop, as described in Installing a Plug-in.
-
Create a Hadoop connection profile, as follows:
-
Control-M Web: Create a Centralized Connection Profile with Hadoop Connection Profile Parameters.
-
Automation API: ConnectionProfile:Hadoop
-
-
Configure Control-M for Hadoop to work in a secure Hadoop environment (Kerberos), as described in Control-M for Hadoop Kerberos Configuration.
-
Configure Control-M for Hadoop to fetch Oozie workflows and push the actions of each workflow as submitted jobs using the Oozie Extractor, as described in Configuring the Oozie Extractor.
-
Define a Hadoop job, as follows:
-
Control-M Web: Create a job and then define specific Hadoop job definitions with Hadoop Job attributes.
-
Automation API: Job:Hadoop
-