Running Applications and Programs in Your Environment

Agents are required to run jobs, so you will need an agent on your application host. The Provision Service enables you to install and set up a Control-M/Agent.

Select the relevant application:

Before You Begin

Ensure that you meet the following prerequisites:

  • You have successfully completed API setup, as described in Setting Up the API.

  • You have Git installed. If not, obtain it from the Git Downloads page.

  • You have local copies of the tutorial samples from GitHub and a local copy of the source code using the git clone command:

    Copy
    git clone https://github.com/controlm/automation-api-quickstart.git

Running a Script and Command Job Flow

This example walks you through running a script and command in sequence. You need a Windows 64-bit machine or Linux 64-bit machine that has access to scripts and programs that you would like to run.

Step 1: Find the Image to Provision

The provision images command lists the images available to install.

Copy
> ctm provision images Linux
 
[
   "AWS_plugin.Linux",
   "Agent_Amazon.Linux",
   "Agent_CentOs.Linux",
   "Agent_Oracle.Linux",
   "Agent_RedHat.Linux",
   "Agent_Suse.Linux",
   "Agent_Ubuntu.Linux",
   "Application_Integrator_plugin.Linux",
   "Azure_plugin.Linux",
   "Databases_plugin.Linux",
   "Hadoop_plugin.Linux",
   "Informatica_plugin.Linux",
   "MFT_plugin.Linux",
   "SAP_plugin.Linux"
]

OR

Copy
> ctm provision images Windows
 
[
   "AWS_plugin.Windows",
   "Agent_Windows.Windows",
   "Application_Integrator_plugin.Windows",
   "Azure_plugin.Windows",
   "Databases_plugin.Windows",
   "Informatica_plugin.Windows",
   "MFT_plugin.Windows",
   "SAP_plugin.Windows"
]

As shown in the response, there are several available images:

  • Agent_<Distro>.Linux or Agent_Windows.Windows provides the ability to run scripts and commands.

  • Plugins enable you to run jobs of specific types:

    • AWS

    • Azure

    • Databases

    • Hadoop (only on Linux agent)

    • Informatica

    • MFT

    • SAP

    • Applications Integrator

In this example, you will provision an agent according to the jobs that you would like to run.

Step 2: Provision the Agent Image

On a Windows system, run the following command as Administrator:

Copy
ctm provision saas::install Agent_Windows.Windows <agentTag>

OR

On Linux, run the following command:

Copy
ctm provision saas::install Agent_Amazon.Linux <agentTag>

The agent tag that you specify must have a matching agent token. For information about generating a token, see Generating an Agent Token.

After provisioning the Agent successfully, you now have a running instance of your Control-M/Agent on your host.

Step 3: Access the Tutorial Samples

Go to the directory where the tutorial sample is located:

Copy
cd automation-api-quickstart/helix-control-m/101-running-script-command-job-flow

Step 4: Verify the Code for Control-M

Let's take the AutomationAPISampleFlow.json file, which contains job definitions, and verify that the code within it is valid. To do so, use the build command. The following example shows the command and a typical successful response.

Copy
> ctm build AutomationAPISampleFlow.json
 
[
   {
      "deploymentFile": "AutomationAPISampleFlow.json",
      "successfulFoldersCount": 0,
      "successfulSmartFoldersCount": 1,
      "successfulSubFoldersCount": 0,
      "successfulJobsCount": 2,
      "successfulConnectionProfilesCount": 0,
      "isDeployDescriptorValid": false
   }
]

If the code is not valid, an error is returned.

Step 5: Run the Source Code

Use the run command to run the jobs on the Control-M environment. The returned runId is used to check the job status. The following shows the command and a typical successful response.

Copy
> ctm run AutomationAPISampleFlow.json
 
{
  "runId": "7cba67de-9e0d-409d-8d93-1b8229432eee",
  "statusURI": "https://controlmEndPointHost/automation-api/run/status/7cba67de-9e0d-409d-8d93-1b82294e"
}

This code ran successfully and returned the runId of "7cba67de-9e0d-409d-8d93-1b8229432eee".

Step 6: Check Job Status Using the runId

The following command shows how to check job status using the runId. Note that when there is more than one job in the flow, the status of each job is checked and returned.

Copy
> ctm run status "7cba67de-9e0d-409d-8d93-1b8229432eee"
 
{
  "statuses": [
    {
      "jobId": "IN01:00007",
      "folderId": "IN01:00000",
      "numberOfRuns": 1,
      "name": "AutomationAPISampleFlow",
      "type": "Folder",
      "status": "Executing",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "Apr 26, 2020 10:43:47 AM",
      "endTime": "",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "Folder has no output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:00007/log"
    },
    {
      "jobId": "IN01:00008",
      "folderId": "IN01:00007",
      "numberOfRuns": 0,
      "name": "CommandJob",
      "folder": "AutomationAPISampleFlow",
      "type": "Command",
      "status": "Wait Host",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "",
      "endTime": "",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "Job did not run, it has no output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:00008/log"
    },
    {
      "jobId": "IN01:00009",
      "folderId": "IN01:00007",
      "numberOfRuns": 0,
      "name": "ScriptJob",
      "folder": "AutomationAPISampleFlow",
      "type": "Job",
      "status": "Wait Condition",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "",
      "endTime": "",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "Job did not run, it has no output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:00009/log"
    }
  ],
  "startIndex": 0,
  "itemsPerPage": 25,
  "total": 3
}

Step 7: Examine the Source Code

Let's look at the source code in the AutomationAPISampleFlow.json file. By examining the contents of this file, you'll learn about the structure of the job flow and what it should contain.

Copy
{
   "Defaults" : {
      "Application" : "SampleApp",
      "SubApplication" : "SampleSubApp",
      "RunAs" : "USERNAME",
      "Host" : "HOST",
      "Job": {
         "When" : {
            "Months": ["JAN", "OCT", "DEC"],
            "MonthDays":["22","1","11"],
            "WeekDays":["MON","TUE", "WED", "THU", "FRI"],
            "FromTime":"0300",
            "ToTime":"2100"
         },
         "ActionIfFailure" : {
            "Type": "If",
            "CompletionStatus": "NOTOK",
            "mailToTeam": {
               "Type": "Mail",
               "Message": "%%JOBNAME failed",
               "To": "team@mycomp.com"
            }
         }
      }
   },
   "AutomationAPISampleFlow": {
      "Type": "Folder",
      "Comment" : "Code reviewed by John",
      "CommandJob": {
         "Type": "Job:Command",
         "Command": "COMMAND"
      },
      "ScriptJob": {
         "Type": "Job:Script",
         "FilePath":"SCRIPT_PATH",
         "FileName":"SCRIPT_NAME"
      },
      "Flow": {
         "Type": "Flow",
         "Sequence": ["CommandJob", "ScriptJob"]
      }
   }
}

The first object is called "Defaults". It allows you to define a parameter once for all objects. For example, it includes scheduling using the When parameter, which configures all jobs to run according to the same scheduling criteria. The "ActionIfFailure" object determines what action is taken if a job ends unsuccessfully.

This example contains two jobs: CommandJob and ScriptJob. These jobs are contained within a Folders and Flows named AutomationAPISampleFlow. To define the sequence of job execution, the Flow object is used.

Step 8: Modify the Code to Run in Your Environment

In the code above, the following parameters need to be set to run the jobs in your environment:

Copy
"RunAs" : "USERNAME"
"Host" : "HOST"
 
"Command": "COMMAND"
"FilePath":"SCRIPT_PATH"
"FileName":"SCRIPT_NAME"
  • RunAs: Identifies the operating system user that will execute the jobs.

  • Host: Defines the machine where you provisioned the Control-M/Agent.

  • Command: Defines the command to run according to your operating system.

  • FilePath and FileName: Define the location and name of the file that contains the script to run.

In JSON, the backslash character must be doubled (\\) when used in a Windows file path.

Step 9: Rerun the Code Sample

Now that we've modified the source code in the AutomationAPISampleFlow.json file, let's rerun the sample:

Copy
> ctm run AutomationAPISampleFlow.json
 
{
   "runId": "ed40f73e-fb7a-4f07-a71c-bc2dfbc48494",
   "statusURI": "https://controlmEndPointHost/automation-api/run/status/ed40f73e-fb7a-4f07-a71c-bc2dfbc48494"
}

Each time you run the code, a new runId is generated. Let's take the new runId, and check the jobs statuses again:

Copy
> ctm run status "ed40f73e-fb7a-4f07-a71c-bc2dfbc48494"
 
{
  "statuses": [
    {
      "jobId": "IN01:0000p",
      "folderId": "IN01:00000",
      "numberOfRuns": 1,
      "name": "AutomationAPISampleFlow",
      "type": "Folder",
      "status": "Ended OK",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "May 3, 2020 4:57:25 PM",
      "endTime": "May 3, 2020 4:57:28 PM",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "Folder has no output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:0000p/log"
    },
    {
      "jobId": "IN01:0000q",
      "folderId": "IN01:0000p",
      "numberOfRuns": 1,
      "name": "CommandJob",
      "folder": "AutomationAPISampleFlow",
      "type": "Command",
      "status": "Ended OK",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "May 3, 2020 4:57:26 PM",
      "endTime": "May 3, 2020 4:57:26 PM",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "https://controlmEndPointHost/automation-api/run/job/IN01:0000q/output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:0000q/log"
    },
    {
      "jobId": "IN01:0000r",
      "folderId": "IN01:0000p",
      "numberOfRuns": 1,
      "name": "ScriptJob",
      "folder": "AutomationAPISampleFlow",
      "type": "Job",
      "status": "Ended OK",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "May 3, 2020 4:57:27 PM",
      "endTime": "May 3, 2020 4:57:27 PM",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "https://controlmEndPointHost/automation-api/run/job/IN01:0000r/output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:0000r/log"
    }
  ],
  "startIndex": 0,
  "itemsPerPage": 25,
  "total": 3
}

You can now see that both jobs Ended OK.

Let's view the output of CommandJob. Use the jobId to get this information.

Copy
ctm run job:output::get "IN01:0000q"

Verify that the output contains your script or command details.

Where to Go from Here

To learn more about what you can do with the Control-M Automation API, read through Code Reference and Services.

Running a File Transfer and Database Queries Job Flow

This example walks you through running file transfer and database query jobs in sequence. To complete this tutorial, you need a PostgreSQL database (or you can use other databases) and SFTP server. For this example, you need to install the Agent on a machine that has a network connection to these servers.

Step 1: Find the Image to Provision

The provision images command lists the images available to install.

Copy
> ctm provision images Linux
 
[
   "AWS_plugin.Linux",
   "Agent_Amazon.Linux",
   "Agent_CentOs.Linux",
   "Agent_Oracle.Linux",
   "Agent_RedHat.Linux",
   "Agent_Suse.Linux",
   "Agent_Ubuntu.Linux",
   "Application_Integrator_plugin.Linux",
   "Azure_plugin.Linux",
   "Databases_plugin.Linux",
   "Hadoop_plugin.Linux",
   "Informatica_plugin.Linux",
   "MFT_plugin.Linux",
   "SAP_plugin.Linux"
]

OR

Copy
> ctm provision images Windows
 
[
   "AWS_plugin.Windows",
   "Agent_Windows.Windows",
   "Application_Integrator_plugin.Windows",
   "Azure_plugin.Windows",
   "Databases_plugin.Windows",
   "Informatica_plugin.Windows",
   "MFT_plugin.Windows",
   "SAP_plugin.Windows"
]

In this example, you will provision Databases_plugin.Windows and MFT_plugin.Windows or Databases_plugin.Linux and MFT_plugin.Linux according to the machine that you use to run jobs.

Step 2: Provision the Agent image and Plug-ins

On a Windows system, run the following commands as Administrator:

Copy
ctm provision saas::install Agent_Windows.Windows <agentTag>
ctm provision image Databases_plugin.Windows
ctm provision image MFT_plugin.Windows

OR

On Linux, run the following commands:

Copy
ctm provision saas::install Agent_Amazon.Linux <agentTag>
ctm provision image Databases_plugin.Linux
ctm provision image MFT_plugin.Linux

The agent tag that you specify must have a matching agent token. For information about generating a token, see Generating an Agent Token.

After provisioning the agent successfully, you now have a running instance of your agent on your host with the plugins installed.

Step 3: Access the Tutorial Samples

Go to the directory where the tutorial sample is located:

Copy
cd automation-api-quickstart/helix-control-m/101-running-file-transfer-and-database-query-job-flow

Step 4: Verify the Code for Control-M

Let's take the AutomationAPIFileTransferDatabaseSampleFlow.json file, which contains job definitions, and verify that the code within it is valid. To do so, use the build command. The following example shows the command and a typical successful response.

Copy
> ctm build AutomationAPIFileTransferDatabaseSampleFlow.json
 
[
   {
      "deploymentFile": "AutomationAPIFileTransferDatabaseSampleFlow.json",
      "successfulFoldersCount": 0,
      "successfulSmartFoldersCount": 1,
      "successfulSubFoldersCount": 0,
      "successfulJobsCount": 2,
      "successfulConnectionProfilesCount": 3,
      "successfulDriversCount": 0,
      "isDeployDescriptorValid": false
   }
]

If the code is not valid, an error is returned.

Step 5: Examine the Source Code

Let's look at the source code in the AutomationAPIFileTransferDatabaseSampleFlow.json file. By examining the contents of this file, you'll learn about the structure of the job flow and what it should contain.

Copy
{
    "Defaults" : {
        "Application" : "SampleApp",
        "SubApplication" : "SampleSubApp",
        "Host" : "HOST",
        "Centralized" : true,
                                 
        "Variables": [
           {"DestDataFile": "DESTINATION_FILE"},
           {"SrcDataFile":  "SOURCE_FILE"}
        ],
                                 
        "When" : {
            "FromTime":"0300",
            "ToTime":"2100"
        }
    },
    "SFTP-CP": {
        "Type": "ConnectionProfile:FileTransfer:SFTP",
        "HostName": "SFTP_SERVER",
        "Port": "22",
        "User" : "SFTP_USER",
        "Password" : "SFTP_PASSWORD"
    },
    "LOCAL-CP" : {
        "Type" : "ConnectionProfile:FileTransfer:Local",
        "User" : "USER",
        "Password" : "PASSWORD"
    },
    "DB-CP": {
        "Type": "ConnectionProfile:Database:PostgreSQL",
        "Host": "DATABASE_SERVER",
        "Port":"5432",
        "User": "DATABASE_USER",
        "Password": "DATABASE_PASSWORD",
        "DatabaseName": "postgres"
    },
    "AutomationAPIFileTransferDatabaseSampleFlow": {
        "Type": "Folder",
        "Comment" : "Code reviewed by John",
        "GetData": {
            "Type" : "Job:FileTransfer",
            "ConnectionProfileSrc" : "SFTP-CP",
            "ConnectionProfileDest" : "LOCAL-CP",
                                 
            "FileTransfers" :
            [
                {
                    "Src" : "%%SrcDataFile",
                    "Dest": "%%DestDataFile",
                    "TransferOption": "SrcToDest",
                    "TransferType": "Binary",
                    "PreCommandDest": {
                        "action": "rm",
                        "arg1": "%%DestDataFile"
                    },
                    "PostCommandDest": {
                        "action": "chmod",
                        "arg1": "700",
                        "arg2": "%%DestDataFile"
                    }
                }
            ]
        },
        "UpdateRecords": {
            "Type": "Job:Database:SQLScript",
            "SQLScript": "/home/USER/automation-api-quickstart/helix-control-m/101-running-file-transfer-and-database-query-job-flow/processRecords.sql",
            "ConnectionProfile": "DB-CP"
        },
        "Flow": {
            "Type": "Flow",
            "Sequence": ["GetData", "UpdateRecords"]
        }
    }
}

The first object is called "Defaults". It allows you to define a parameter once for all objects. For example, it includes scheduling using the When parameter, which configures all jobs to run according to the same scheduling criteria. The Defaults also includes Variables that are referenced several times in the jobs.

The sample contains two jobs: GetData and UpdateRecords. GetData transfers files from the SFTP server to the host machine. UpdateRecords performs a SQL query on the database. Both jobs are contained within a Folders and Flows named AutomationAPIFileTransferDatabaseSampleFlow. To define the sequence of job execution, the Flow object is used.

The sample also includes the following three Connection Profiles:

  • SFTP-CP defines access and security credentials for the SFTP server.

  • DB-CP defines access and security credentials for the database.

  • Local-CP defines access and security credentials for files that are transferred to the local machine.

Step 6: Modify the Code to Run in Your Environment

In the code sample, perform the following modifications:

  • Replace the value of "Host" with the host name of the machine where we provisioned the Control-M/Agent.

    Copy
    "Host" : "HOST"
  • Replace the value of "SrcDataFile" with the file that is transferred from the SFTP server, and the value of "DestDataFile" with the path of the transferred file on the host machine.

    Copy
    {"DestDataFile": "DESTINATION_FILE"},
    {"SrcDataFile":  "SOURCE_FILE"}
  • Modify the path to the samples directory for the jobs to run successfully in your environment. Replace the path /home/USER/automation-api-quickstart/helix-control-m/101-running-file-transfer-and-database-query-job-flow with the location of the samples that you installed on your machine.

    Copy
    "SQLScript": "/home/USER/automation-api-quickstart/helix-control-m/101-running-file-transfer-and-database-query-job-flow/processRecords.sql"
  • Replace the following parameters with the credentials used to login to the SFTP server.

    Copy
    "HostName": "SFTP_SERVER",
    "User" : "SFTP_USER",
    "Password" : "SFTP_PASSWORD"
  • Replace the following parameters with the credentials used to access the database server.

    Copy
    "Host": "DATABASE_SERVER",
    "Port":"5432",
    "User": "DATABASE_USER",
    "Password": "DATABASE_PASSWORD",
  • Replace the following parameters with the credentials used for read/write files on the host machine.

    Copy
    "LOCAL-CP" : {
       "Type" : "ConnectionProfile:FileTransfer:Local",
       "User" : "USER",
       "Password" : ""
    }

Step 7: Run the Code Sample

Now that we've modified the source code in the previous step, let's run the sample:

Copy
> ctm run AutomationAPIFileTransferDatabaseSampleFlow.json
 
{
  "runId": "ce62ace0-4a6e-4b17-afdd-35335cbf179e",
  "statusURI": "https://controlmEndPointHost/automation-api/run/status/ce62ace0-4a6e-4b17-afdd-35335cbf179e"
}

Each time you run the code, a new runId is generated. Let's take the runId and check the jobs statuses:

Copy
> ctm run status "ce62ace0-4a6e-4b17-afdd-35335cbf179e"
 
{
  "statuses": [
    {
      "jobId": "IN01:000c1",
      "folderId": "IN01:00000",
      "numberOfRuns": 1,
      "name": "AutomationAPIFileTransferDatabaseSampleFlow",
      "type": "Folder",
      "status": "Ended OK",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "May 23, 2020 4:25:10 PM",
      "endTime": "May 23, 2020 4:25:26 PM",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "Folder has no output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000c1/log"
    },
    {
      "jobId": "IN01:000c2",
      "folderId": "IN01:000c1",
      "numberOfRuns": 1,
      "name": "GetData",
      "folder": "AutomationAPIFileTransferDatabaseSampleFlow",
      "type": "Job",
      "status": "Ended OK",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "May 23, 2020 4:25:10 PM",
      "endTime": "May 23, 2020 4:25:17 PM",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000c2/output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000c2/log"
    },
    {
      "jobId": "IN01:000c3",
      "folderId": "IN01:000c1",
      "numberOfRuns": 1,
      "name": "UpdateRecords",
      "folder": "AutomationAPIFileTransferDatabaseSampleFlow",
      "type": "Job",
      "status": "Ended OK",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "May 23, 2020 4:25:18 PM",
      "endTime": "May 23, 2020 4:25:25 PM",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000c3/output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000c3/log"
    }
  ],
  "startIndex": 0,
  "itemsPerPage": 25,
  "total": 3
}

You can now see that both jobs Ended OK.

Let's view the output of GetData. Use the jobId to get this information.

Copy
> ctm run job:output::get "IN01:000c2"
 
+ Job started at '0523 16:25:15:884' orderno - '000c2' runno - '00001' Number of transfers - 1
+ Host1 XXXXX' username XXXX - Host2 'controlmEndPointHost' username XXXX
Local host is XXX
Connection to SFTP server on host XXX was established
Connection to Local server on host controlmEndPointHost was established
+********** Starting transfer #1 out of 1**********
* Executing pre-commands on host controlmEndPointHost
rm c:\temp\XXXX
File 'c:\temp\XXX removed successfully
Transfer type: BINARY
Open data connection to retrieve file /home/user/XXX
Open data connection to store file c:\temp\XXX
Transfer #1 transferring
Src file: '/ home/user/XXX ' on host 'XXXX'
Dst file: 'c:\temp\XXX on host 'controlmEndPointHost'
Transferred:          628       Elapsed:    0 sec       Percent: 100    Status: In Progress
File transfer status: Ended OK
Destination file size vs. source file size validation passed
* Executing post-commands on host controlmEndPointHost
chmod 700 c:\temp\XXX
Transfer #1 completed successfully
Job executed successfully. exiting.
Job ended at '0523 16:25:16:837'
Elapsed time [0 sec]

Let's view the output of UpdateRecords. Use the jobId to get this information.

Copy
> ctm run job:output::get "IN01:000c6"
 
Environment information:
+--------------------+--------------------------------------------------+
|Account Name        |DB-CP                                             |
+--------------------+--------------------------------------------------+
|Database Vendor     |PostgreSQL                                        |
+--------------------+--------------------------------------------------+
|Database Version    |9.2.8                                             |
+--------------------+--------------------------------------------------+
 
Request statement:
------------------
select 'Parameter';
 
Job statistics:
+-------------------------+-------------------------+
|Start Time               |20200523163619           |
+-------------------------+-------------------------+
|End Time                 |20200523163619           |
+-------------------------+-------------------------+
|Elapsed Time             |13                       |
+-------------------------+-------------------------+
|Number Of Affected Rows  |1                        |
+-------------------------+-------------------------+
Exit Code    = 0
Exit Message = Normal completion

Where to Go from Here

To learn more about what you can do with the Control-M Automation API, read through Code Reference and Services.

Running a Hadoop Spark Job Flow

This example walks you through writing Hadoop and Spark jobs that run in sequence. To complete this tutorial, you need a Hadoop edge node where the Hadoop client software is installed.

Let's verify that Hadoop and HDFS are operational using the following commands:

Copy
> hadoop version
 
Hadoop 2.6.0-cdh5.4.2
Subversion http://github.com/cloudera/hadoop -r 15b703c8725733b7b2813d2325659eb7d57e7a3f
Compiled by jenkins on 2015-05-20T00:03Z
Compiled with protoc 2.5.0
From source with checksum de74f1adb3744f8ee85d9a5b98f90d
This command was run using /usr/jars/hadoop-common-2.6.0-cdh5.4.2.jar
 
> hadoop fs -ls /
 
Found 5 items
drwxr-xr-x   - hbase supergroup          0 2015-12-13 02:32 /hbase
drwxr-xr-x   - solr  solr                0 2015-06-09 03:38 /solr
drwxrwxrwx   - hdfs  supergroup          0 2016-03-20 07:11 /tmp
drwxr-xr-x   - hdfs  supergroup          0 2016-03-29 06:51 /user
drwxr-xr-x   - hdfs  supergroup          0 2015-06-09 03:36 /var

Step 1: Find the Image to Provision

The provision images command lists the images available to install.

Copy
> ctm provision images Linux
 
[
   "AWS_plugin.Linux",
   "Agent_Amazon.Linux",
   "Agent_CentOs.Linux",
   "Agent_Oracle.Linux",
   "Agent_RedHat.Linux",
   "Agent_Suse.Linux",
   "Agent_Ubuntu.Linux",
   "Application_Integrator_plugin.Linux",
   "Azure_plugin.Linux",
   "Databases_plugin.Linux",
   "Hadoop_plugin.Linux",
   "Informatica_plugin.Linux",
   "MFT_plugin.Linux",
   "SAP_plugin.Linux"
]

In this example, we will provision the Hadoop_plugin.Linux image.

Step 2: Provision an agent with the Hadoop plugin

Run the following command on a Linux system:

Copy
ctm provision saas::install Agent_Amazon.Linux <agentTag>
ctm provision image Hadoop_plugin.Linux

The agent tag that you specify must have a matching agent token. For information about generating a token, see Generating an Agent Token.

After provisioning the Agent with the Hadoop plugin successfully, you now have a running instance of Control-M/Agent on your Hadoop edge node.

Now let's access the tutorial samples code.

Step 3: Access the Tutorial Samples

Go to the directory where the tutorial sample is located:

Copy
cd automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow

Step 4: Verify the Code for Control-M

Let's take the AutomationAPISampleHadoopFlow.json file, which contains job definitions, and verify that the code within it is valid. To do so, use the build command. The following example shows the command and a typical successful response.

Copy
> ctm build AutomationAPISampleHadoopFlow.json
 
[
   {
      "deploymentFile": "AutomationAPISampleHadoopFlow.json",
      "successfulFoldersCount": 0,
      "successfulSmartFoldersCount": 1,
      "successfulSubFoldersCount": 0,
      "successfulJobsCount": 2,
      "successfulConnectionProfilesCount": 0,
      "isDeployDescriptorValid": false
   }
]

If the code is not valid, an error is returned.

Step 5: Examine the Source Code

Let's look at the source code in the AutomationAPISampleHadoopFlow.json file. By examining the contents of this file, you'll learn about the structure of the job flow and what it should contain.

Copy
{
    "Defaults" : {
        "Application": "SampleApp",
        "SubApplication": "SampleSubApp",
        "Host" : "HOST",
        "When" : {
            "FromTime":"0300",
            "ToTime":"2100"
        },
        "Job:Hadoop" : {
            "ConnectionProfile": "SAMPLE_CONNECTION_PROFILE"
        }
    },
    "SAMPLE_CONNECTION_PROFILE" :
    {
        "Type" : "ConnectionProfile:Hadoop",
        "Centralized" : true
    },
    "AutomationAPIHadoopSampleFlow": {
        "Type": "Folder",
        "Comment" : "Code reviewed by John",
        "ProcessData": {
            "Type": "Job:Hadoop:Spark:Python",
            "SparkScript": "file:///home/USER/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/processData.py",
             
            "Arguments": [
                "file:///home/USER/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/processData.py",
                "file:///home/USER/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/processDataOutDir"
            ],
            "PreCommands" : {
                "Commands" : [
                    { "rm":"-R -f file:///home/USER/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/processDataOutDir" }
                ]                  
            }
        },
        "CopyOutputData" :
        {
            "Type" : "Job:Hadoop:HDFSCommands",
            "Commands" : [
                {"rm"    : "-R -f samplesOut" },
                {"mkdir" : "samplesOut" },
                {"cp"   : "file:///home/USER/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/* samplesOut" }
            ]
        },
        "DataProcessingFlow": {
            "Type": "Flow",
            "Sequence": ["ProcessData","CopyOutputData"]
        }
    }
}

This example contains the following two jobs — a Spark job named ProcessData, and an HDFS Commands job named CopyOutputData. These jobs are contained within a Folders and Flows named AutomationAPIHadoopSampleFlow. To define the sequence of job execution, the Flow object is used.

Note that in the Spark job we use the "PreCommands" object to clean up output from any previous Spark job runs.

The "SAMPLE_CONNECTION_PROFILE" object is used to define the connection parameters to the Hadoop cluster. Note that for Sqoop and Hive, it is used to set data sources and credentials.

Here is the code of processData.py:

Copy
from __future__ import print_function
 
import sys
from pyspark import SparkContext
 
inputFile  = sys.argv[1]
outputDir = sys.argv[2]
 
sc = SparkContext(appName="processDataSampel")
text_file = sc.textFile(inputFile)
counts = text_file.flatMap(lambda line: line.split(" ")) \
      .map(lambda word: (word, 1)) \
      .reduceByKey(lambda a, b: a + b)
 
counts.saveAsTextFile(outputDir)

Step 6: Modify the Code to Run in Your Environment

You need to modify the path to the samples directory for the jobs to run successfully in your environment. Replace the URI file:///home/USER/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/ with the location of the samples that you installed on your machine.

Copy
"SparkScript": "file:///home/USER/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/processData.py",
"Arguments": [
    "file:///home/USER/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/processData.py",
    "file:///home/USER/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/processDataOutDir"
],
{ "rm":"-R -f file:///home/USER/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/processDataOutDir" }
{"cp" : "file:///home/USER/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/* samplesOut" }

For example: file:///home/user1/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/

In addition, replace the value of "Host" with the host name of the machine where we provisioned the Control-M/Agent.

Copy
"Host" : "HOST"

Step 7: Run the Sample

Now that we've modified the source code in the AutomationAPISampleHadoopFlow.json file, let's run the sample:

Copy
> ctm run AutomationAPISampleHadoopFlow.json
 
{
   "runId": "6aef1ce1-3c57-4866-bf45-3a6afc33e27c",
   "statusURI": "https://controlmEndPointHost/automation-api/run/status/6aef1ce1-3c57-4866-bf45-3a6afc33e27c"
}

Each time the code runs, a new runId is generated. Let's take the runId, and check the job statuses:

Copy
> ctm run status "6aef1ce1-3c57-4866-bf45-3a6afc33e27c"
 
{
  "statuses": [
    {
      "jobId": "IN01:000ca",
      "folderId": "IN01:00000",
      "numberOfRuns": 1,
      "name": "AutomationAPIHadoopSampleFlow",
      "type": "Folder",
      "status": "Ended OK",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "May 24, 2020 1:03:18 PM",
      "endTime": "May 24, 2020 1:03:45 PM",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "Folder has no output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000ca/log"
    },
    {
      "jobId": "IN01:000cb",
      "folderId": "IN01:000ca",
      "numberOfRuns": 1,
      "name": "ProcessData",
      "folder": "AutomationAPIHadoopSampleFlow",
      "type": "Job",
      "status": "Ended OK",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "May 24, 2020 1:03:18 PM",
      "endTime": "May 24, 2020 1:03:32 PM",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000cb/output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000cb/log"
    },
    {
      "jobId": "IN01:000cc",
      "folderId": "IN01:000ca",
      "numberOfRuns": 1,
      "name": "CopyOutputData",
      "folder": "AutomationAPIHadoopSampleFlow",
      "type": "Job",
      "status": "Ended OK",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "May 24, 2020 1:03:33 PM",
      "endTime": "May 24, 2020 1:03:44 PM",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000cc/output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000cc/log"
    }
  ],
  "startIndex": 0,
  "itemsPerPage": 25,
  "total": 3
}

You can see that the status of both jobs is "Ended OK".

Let's view the output of CopyOutputData. Use the jobId to get this information.

Copy
> ctm run job:output::get IN01:000cc
 
Environment information:
+--------------------+--------------------------------------------------+
|Account Name        |SAMPLE_CONNECTION_PROFILE                         |
+--------------------+--------------------------------------------------+
 
Job is running as user: cloudera
-----------------------
Running the following HDFS command:
-----------------------------------
hadoop fs -rm -R -f samplesOut
 
HDFS command output:
-------------------
Deleted samplesOut
script return value 0
-----------------------------------------------------------
-----------------------------------------------------------
 
Job is running as user: cloudera
-----------------------
Running the following HDFS command:
-----------------------------------
hadoop fs -mkdir samplesOut
 
HDFS command output:
-------------------
script return value 0
-----------------------------------------------------------
-----------------------------------------------------------
 
Job is running as user: cloudera
-----------------------
Running the following HDFS command:
-----------------------------------
hadoop fs -cp file:///home/cloudera/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/* samplesOut
 
HDFS command output:
-------------------
script return value 0
-----------------------------------------------------------
-----------------------------------------------------------
 
Application reports:
--------------------
-> no hadoop application reports were created for the job execution.
 
Job statistics:
--------------
+-------------------------+-------------------------+
|Start Time               |20200524030335           |
+-------------------------+-------------------------+
|End Time                 |20200524030346           |
+-------------------------+-------------------------+
|Elapsed Time             |1065                     |
+-------------------------+-------------------------+
Exit Message = Normal completion

Where to Go from Here

To learn more about what you can do with the Control-M Automation API, read through Code Reference and Services.