Running Applications and Programs in Your Environment

This tutorial demonstrates how to provision images and plug-ins and then run several types of workflows, according to your environment type:

Running a Script and Command Job Flow
Running a File Transfer and Database Queries Job Flow
Running a Hadoop Spark Job Flow

Running a Script and Command Job Flow

This tutorial guides you through running a script and command in sequence.

Before You Begin

Ensure that you meet the following prerequisites:

You have successfully completed API setup, as described inSetting Up the API.
You have Git installed. If not, obtain it from the Git Downloads page.
You have local copies of the tutorial samples from GitHub and a local copy of the source code using the git clone command:

git clone https://github.com/controlm/automation-api-quickstart.git
You have a Windows 64-bit machine or Linux 64-bit machine that has access to scripts and programs that you would like to run.

Begin

Provision the required image by doing the following:

Obtain the list of available images using the provision images command, as in the following examples:

Linux:

Copy

> ctm provision images Linux
 
[
   "AWS_plugin.Linux",
   "Agent_Amazon.Linux",
   "Agent_CentOs.Linux",
   "Agent_Oracle.Linux",
   "Agent_RedHat.Linux",
   "Agent_Suse.Linux",
   "Agent_Ubuntu.Linux",
   "Application_Integrator_plugin.Linux",
   "Azure_plugin.Linux",
   "Databases_plugin.Linux",
   "Hadoop_plugin.Linux",
   "Informatica_plugin.Linux",
   "MFT_plugin.Linux",
   "SAP_plugin.Linux"
]

Windows:

Copy

> ctm provision images Windows
 
[
   "AWS_plugin.Windows",
   "Agent_Windows.Windows",
   "Application_Integrator_plugin.Windows",
   "Azure_plugin.Windows",
   "Databases_plugin.Windows",
   "Informatica_plugin.Windows",
   "MFT_plugin.Windows",
   "SAP_plugin.Windows"
]

Provision an Agent image by running one of the following commands as an administrator:
- Linux:
  
  ctm provision saas::install Agent_Amazon.Linux <agentTag>
- Windows:
  
  ctm provision saas::install Agent_Windows.Windows <agentTag>
The Agent tag that you specify must have a matching Agent token. For information about generating a token, see Generating an Agent Token.

The Agent is provisioned and you now have a running instance of your Agent on your host.

Access the tutorial sample with the following command:

cd automation-api-quickstart/helix-control-m/101-running-script-command-job-flow

Verify that the code within the AutomationAPISampleFlow.json file is valid by running the build command.

The following example shows the build command and a typical successful response:

Copy

> ctm build AutomationAPISampleFlow.json
 
[
   {
      "deploymentFile": "AutomationAPISampleFlow.json",
      "successfulFoldersCount": 0,
      "successfulSmartFoldersCount": 1,
      "successfulSubFoldersCount": 0,
      "successfulJobsCount": 2,
      "successfulConnectionProfilesCount": 0,
      "isDeployDescriptorValid": false
   }
]

Run the jobs on the Control-M environment using the run command.

The returned runId is used to check the job status.
The following example shows the run command and a typical successful response:
Copy
```
> ctm run AutomationAPISampleFlow.json
 
{
  "runId": "7cba67de-9e0d-409d-8d93-1b8229432eee",
  "statusURI": "https://controlmEndPointHost/automation-api/run/status/7cba67de-9e0d-409d-8d93-1b82294e"
}
```
In this example, the code ran successfully and returned a runId of "7cba67de-9e0d-409d-8d93-1b8229432eee".

Check job status for the runId that you obtained in the previous step using the run status command.

The following example shows the run status command and a typical successful response, with job status information for each of the jobs in the flow:

Copy

> ctm run status "7cba67de-9e0d-409d-8d93-1b8229432eee"
 
{
  "statuses": [
    {
      "jobId": "IN01:00007",
      "folderId": "IN01:00000",
      "numberOfRuns": 1,
      "name": "AutomationAPISampleFlow",
      "type": "Folder",
      "status": "Executing",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "Apr 26, 2020 10:43:47 AM",
      "endTime": "",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "Folder has no output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:00007/log"
    },
    {
      "jobId": "IN01:00008",
      "folderId": "IN01:00007",
      "numberOfRuns": 0,
      "name": "CommandJob",
      "folder": "AutomationAPISampleFlow",
      "type": "Command",
      "status": "Wait Host",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "",
      "endTime": "",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "Job did not run, it has no output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:00008/log"
    },
    {
      "jobId": "IN01:00009",
      "folderId": "IN01:00007",
      "numberOfRuns": 0,
      "name": "ScriptJob",
      "folder": "AutomationAPISampleFlow",
      "type": "Job",
      "status": "Wait Condition",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "",
      "endTime": "",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "Job did not run, it has no output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:00009/log"
    }
  ],
  "startIndex": 0,
  "itemsPerPage": 25,
  "total": 3
}

Examine the contents of the AutomationAPISampleFlow.json file to learn about the structure of the job flow, as shown below:

Copy

{
   "Defaults" : {
      "Application" : "SampleApp",
      "SubApplication" : "SampleSubApp",
      "RunAs" : "USERNAME",
      "Host" : "HOST",
      "Job": {
         "When" : {
            "Months": ["JAN", "OCT", "DEC"],
            "MonthDays":["22","1","11"],
            "WeekDays":["MON","TUE", "WED", "THU", "FRI"],
            "FromTime":"0300",
            "ToTime":"2100"
         },
         "ActionIfFailure" : {
            "Type": "If",
            "CompletionStatus": "NOTOK",
            "mailToTeam": {
               "Type": "Mail",
               "Message": "%%JOBNAME failed",
               "To": "[email protected]"
            }
         }
      }
   },
   "AutomationAPISampleFlow": {
      "Type": "Folder",
      "Comment" : "Code reviewed by John",
      "CommandJob": {
         "Type": "Job:Command",
         "Command": "COMMAND"
      },
      "ScriptJob": {
         "Type": "Job:Script",
         "FilePath":"SCRIPT_PATH",
         "FileName":"SCRIPT_NAME"
      },
      "Flow": {
         "Type": "Flow",
         "Sequence": ["CommandJob", "ScriptJob"]
      }
   }
}

This source code contains the following main objects:

The "Defaults" object at the top of this example allows you to define parameters once for all objects.

For example, it includes scheduling using the When parameter, which configures all jobs to run according to the same scheduling criteria.
The "ActionIfFailure" object determines what action to take when a job ends unsuccessfully.
The folder in this example is named AutomationAPISampleFlow, and it contains two jobs, CommandJob and ScriptJob.
The Flow object defines the sequence of job execution.

Set parameter values that match your Control-M environment in the following lines in the AutomationAPISampleFlow.json file:
- "RunAs" : "USERNAME"
  
  Replace USERNAME with the name of an operating system user that will execute jobs on the Agent.
- "Host" : "HOST"
  
  Replace HOST with the hostname where you provisioned the Agent.
- "Command": "COMMAND"
  
  Defines the command to run according to your operating system.
- "FilePath":"SCRIPT_PATH"
  
  Replace SCRIPT_PATH with the path to the script file to run.
- "FileName":"SCRIPT_NAME"
  
  Replace SCRIPT_NAME with the name of the script file to run.
In JSON syntax, the backslash character must be doubled (\\) when used in a Windows pathname.
After modifying the AutomationAPISampleFlow.json file, rerun the sample code using the run command.
The following example shows the run command and a typical successful response after setting parameter values that match your Control-M environment:
Copy
```
> ctm run AutomationAPISampleFlow.json
 
{
   "runId": "ed40f73e-fb7a-4f07-a71c-bc2dfbc48494",
   "statusURI": "https://controlmEndPointHost/automation-api/run/status/ed40f73e-fb7a-4f07-a71c-bc2dfbc48494"
}
```

Check job status for the new runId that you obtained in the previous step using the run status command.

The following example shows the run status command and a typical successful response. This time, both jobs have the Ended OK status.

Copy

> ctm run status "ed40f73e-fb7a-4f07-a71c-bc2dfbc48494"
 
{
  "statuses": [
    {
      "jobId": "IN01:0000p",
      "folderId": "IN01:00000",
      "numberOfRuns": 1,
      "name": "AutomationAPISampleFlow",
      "type": "Folder",
      "status": "Ended OK",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "May 3, 2020 4:57:25 PM",
      "endTime": "May 3, 2020 4:57:28 PM",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "Folder has no output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:0000p/log"
    },
    {
      "jobId": "IN01:0000q",
      "folderId": "IN01:0000p",
      "numberOfRuns": 1,
      "name": "CommandJob",
      "folder": "AutomationAPISampleFlow",
      "type": "Command",
      "status": "Ended OK",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "May 3, 2020 4:57:26 PM",
      "endTime": "May 3, 2020 4:57:26 PM",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "https://controlmEndPointHost/automation-api/run/job/IN01:0000q/output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:0000q/log"
    },
    {
      "jobId": "IN01:0000r",
      "folderId": "IN01:0000p",
      "numberOfRuns": 1,
      "name": "ScriptJob",
      "folder": "AutomationAPISampleFlow",
      "type": "Job",
      "status": "Ended OK",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "May 3, 2020 4:57:27 PM",
      "endTime": "May 3, 2020 4:57:27 PM",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "https://controlmEndPointHost/automation-api/run/job/IN01:0000r/output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:0000r/log"
    }
  ],
  "startIndex": 0,
  "itemsPerPage": 25,
  "total": 3
}

Retrieve the output of the CommandJob using the run job:output::get command with the jobId that you obtained in the previous step, and verify that the output contains your script or command details:

ctm run job:output::get "IN01:0000q"

Running a File Transfer and Database Queries Job Flow

This tutorial guides you through running file transfer and database query jobs in sequence.

Before You Begin

Ensure that you meet the following prerequisites:

You have successfully completed API setup, as described inSetting Up the API.
You have Git installed. If not, obtain it from the Git Downloads page.
You have local copies of the tutorial samples from GitHub and a local copy of the source code using the git clone command:

git clone https://github.com/controlm/automation-api-quickstart.git
You have a PostgreSQL database or some other database and SFTP server, and the Agent host machine has a network connection with the database and SFTP server.

Begin

Provision the required image by doing the following:

Obtain the list of available images using the provision images command, as in the following examples:

Linux:

Copy

> ctm provision images Linux
 
[
   "AWS_plugin.Linux",
   "Agent_Amazon.Linux",
   "Agent_CentOs.Linux",
   "Agent_Oracle.Linux",
   "Agent_RedHat.Linux",
   "Agent_Suse.Linux",
   "Agent_Ubuntu.Linux",
   "Application_Integrator_plugin.Linux",
   "Azure_plugin.Linux",
   "Databases_plugin.Linux",
   "Hadoop_plugin.Linux",
   "Informatica_plugin.Linux",
   "MFT_plugin.Linux",
   "SAP_plugin.Linux"
]

Windows:

Copy

> ctm provision images Windows
 
[
   "AWS_plugin.Windows",
   "Agent_Windows.Windows",
   "Application_Integrator_plugin.Windows",
   "Azure_plugin.Windows",
   "Databases_plugin.Windows",
   "Informatica_plugin.Windows",
   "MFT_plugin.Windows",
   "SAP_plugin.Windows"
]

Provision an Agent and plug-ins by running one of the following sets of commands as an administrator:
- Linux:
  - ctm provision saas::install Agent_Amazon.Linux <agentTag>
  - ctm provision image Databases_plugin.Linux
  - ctm provision image MFT_plugin.Linux
- Windows:
  - ctm provision saas::install Agent_Windows.Windows <agentTag>
  - ctm provision image Databases_plugin.Windows
  - ctm provision image MFT_plugin.Windows
The Agent tag that you specify must have a matching Agent token. For information about generating a token, see Generating an Agent Token.

The Agent is provisioned and you now have a running instance of your Agent on your host with additional plug-ins.

Access the tutorial sample with the following command:

cd automation-api-quickstart/helix-control-m/101-running-file-transfer-and-database-query-job-flow

Verify that the code within the AutomationAPIFileTransferDatabaseSampleFlow.json file is valid by running the build command.

The following example shows the build command and a typical successful response:

Copy

> ctm build AutomationAPIFileTransferDatabaseSampleFlow.json
 
[
   {
      "deploymentFile": "AutomationAPIFileTransferDatabaseSampleFlow.json",
      "successfulFoldersCount": 0,
      "successfulSmartFoldersCount": 1,
      "successfulSubFoldersCount": 0,
      "successfulJobsCount": 2,
      "successfulConnectionProfilesCount": 3,
      "successfulDriversCount": 0,
      "isDeployDescriptorValid": false
   }
]

Examine the contents of the AutomationAPIFileTransferDatabaseSampleFlow.json file to learn about the structure of the job flow, as shown below:

Copy

{
    "Defaults" : {
        "Application" : "SampleApp",
        "SubApplication" : "SampleSubApp",
        "Host" : "HOST",
        "Centralized" : true,
                                 
        "Variables": [
           {"DestDataFile": "DESTINATION_FILE"},
           {"SrcDataFile":  "SOURCE_FILE"}
        ],
                                 
        "When" : {
            "FromTime":"0300",
            "ToTime":"2100"
        }
    },
    "SFTP-CP": {
        "Type": "ConnectionProfile:FileTransfer:SFTP",
        "HostName": "SFTP_SERVER",
        "Port": "22",
        "User" : "SFTP_USER",
        "Password" : "SFTP_PASSWORD"
    },
    "LOCAL-CP" : {
        "Type" : "ConnectionProfile:FileTransfer:Local",
        "User" : "USER",
        "Password" : "PASSWORD"
    },
    "DB-CP": {
        "Type": "ConnectionProfile:Database:PostgreSQL",
        "Host": "DATABASE_SERVER",
        "Port":"5432",
        "User": "DATABASE_USER",
        "Password": "DATABASE_PASSWORD",
        "DatabaseName": "postgres"
    },
    "AutomationAPIFileTransferDatabaseSampleFlow": {
        "Type": "Folder",
        "Comment" : "Code reviewed by John",
        "GetData": {
            "Type" : "Job:FileTransfer",
            "ConnectionProfileSrc" : "SFTP-CP",
            "ConnectionProfileDest" : "LOCAL-CP",
                                 
            "FileTransfers" :
            [
                {
                    "Src" : "%%SrcDataFile",
                    "Dest": "%%DestDataFile",
                    "TransferOption": "SrcToDest",
                    "TransferType": "Binary",
                    "PreCommandDest": {
                        "action": "rm",
                        "arg1": "%%DestDataFile"
                    },
                    "PostCommandDest": {
                        "action": "chmod",
                        "arg1": "700",
                        "arg2": "%%DestDataFile"
                    }
                }
            ]
        },
        "UpdateRecords": {
            "Type": "Job:Database:SQLScript",
            "SQLScript": "/home/USER/automation-api-quickstart/helix-control-m/101-running-file-transfer-and-database-query-job-flow/processRecords.sql",
            "ConnectionProfile": "DB-CP"
        },
        "Flow": {
            "Type": "Flow",
            "Sequence": ["GetData", "UpdateRecords"]
        }
    }
}

This source code contains the following main objects:

The "Defaults" object at the top of this example allows you to define parameters once for all objects, including the following:
- Scheduling criteria for all jobs, defined by the When parameter.
- Variables that are referenced several times in the jobs.
Three Connection Profiles:
- SFTP-CP defines access and security credentials for the SFTP server.
- DB-CP defines access and security credentials for the database.
- Local-CP defines access and security credentials for files that are transferred to the local machine.
The folder in this example is named AutomationAPIFileTransferDatabaseSampleFlow, and it contains two jobs:
- GetData transfers files from the SFTP server to the host machine.
- UpdateRecords performs a SQL query on the database.
The Flow object defines the sequence of job execution.

Set parameter values in the AutomationAPIFileTransferDatabaseSampleFlow.json file to match your Control-M environment, as follows:
- Replace the value of "SrcDataFile" with the file that is transferred from the SFTP server, and the value of "DestDataFile" with the path of the transferred file on the host machine.
  Copy
```
{"DestDataFile": "DESTINATION_FILE"},
{"SrcDataFile":  "SOURCE_FILE"}
```
- Modify the path to the samples directory for the jobs to run successfully in your environment. Replace the path /home/USER/automation-api-quickstart/helix-control-m/101-running-file-transfer-and-database-query-job-flow with the location of the samples that you installed on your machine.
  
  "SQLScript": "/home/USER/automation-api-quickstart/helix-control-m/101-running-file-transfer-and-database-query-job-flow/processRecords.sql"
- Replace the following parameters with the credentials used to login to the SFTP server.
  Copy
```
"HostName": "SFTP_SERVER",
"User" : "SFTP_USER",
"Password" : "SFTP_PASSWORD"
```
- Replace the following parameters with the credentials used to access the database server.
  Copy
```
"Host": "DATABASE_SERVER",
"Port":"5432",
"User": "DATABASE_USER",
"Password": "DATABASE_PASSWORD",
```
- Replace the following parameters with the credentials used for read/write files on the host machine.
  Copy
```
"LOCAL-CP" : {
   "Type" : "ConnectionProfile:FileTransfer:Local",
   "User" : "USER",
   "Password" : ""
}
```
After modifying the AutomationAPIFileTransferDatabaseSampleFlow.json file, run the sample code using the run command.
The following example shows the run command and a typical successful response after setting parameter values that match your Control-M environment:
Copy
```
> ctm run AutomationAPIFileTransferDatabaseSampleFlow.json
 
{
  "runId": "ce62ace0-4a6e-4b17-afdd-35335cbf179e",
  "statusURI": "https://controlmEndPointHost/automation-api/run/status/ce62ace0-4a6e-4b17-afdd-35335cbf179e"
}
```

Check job status for the new runId that you obtained in the previous step using the run status command.

The following example shows the run status command and a typical successful response. Both jobs have the Ended OK status.

Copy

> ctm run status "ce62ace0-4a6e-4b17-afdd-35335cbf179e"
 
{
  "statuses": [
    {
      "jobId": "IN01:000c1",
      "folderId": "IN01:00000",
      "numberOfRuns": 1,
      "name": "AutomationAPIFileTransferDatabaseSampleFlow",
      "type": "Folder",
      "status": "Ended OK",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "May 23, 2020 4:25:10 PM",
      "endTime": "May 23, 2020 4:25:26 PM",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "Folder has no output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000c1/log"
    },
    {
      "jobId": "IN01:000c2",
      "folderId": "IN01:000c1",
      "numberOfRuns": 1,
      "name": "GetData",
      "folder": "AutomationAPIFileTransferDatabaseSampleFlow",
      "type": "Job",
      "status": "Ended OK",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "May 23, 2020 4:25:10 PM",
      "endTime": "May 23, 2020 4:25:17 PM",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000c2/output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000c2/log"
    },
    {
      "jobId": "IN01:000c3",
      "folderId": "IN01:000c1",
      "numberOfRuns": 1,
      "name": "UpdateRecords",
      "folder": "AutomationAPIFileTransferDatabaseSampleFlow",
      "type": "Job",
      "status": "Ended OK",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "May 23, 2020 4:25:18 PM",
      "endTime": "May 23, 2020 4:25:25 PM",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000c3/output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000c3/log"
    }
  ],
  "startIndex": 0,
  "itemsPerPage": 25,
  "total": 3
}

Retrieve the output of GetData using the run job:output::get command with the jobId that you obtained in the previous step, as in the following example:

Copy

> ctm run job:output::get "IN01:000c2"
 
+ Job started at '0523 16:25:15:884' orderno - '000c2' runno - '00001' Number of transfers - 1
+ Host1 XXXXX' username XXXX - Host2 'controlmEndPointHost' username XXXX
Local host is XXX
Connection to SFTP server on host XXX was established
Connection to Local server on host controlmEndPointHost was established
+********** Starting transfer #1 out of 1**********
* Executing pre-commands on host controlmEndPointHost
rm c:\temp\XXXX
File 'c:\temp\XXX removed successfully
Transfer type: BINARY
Open data connection to retrieve file /home/user/XXX
Open data connection to store file c:\temp\XXX
Transfer #1 transferring
Src file: '/ home/user/XXX ' on host 'XXXX'
Dst file: 'c:\temp\XXX on host 'controlmEndPointHost'
Transferred:          628       Elapsed:    0 sec       Percent: 100    Status: In Progress
File transfer status: Ended OK
Destination file size vs. source file size validation passed
* Executing post-commands on host controlmEndPointHost
chmod 700 c:\temp\XXX
Transfer #1 completed successfully
Job executed successfully. exiting.
Job ended at '0523 16:25:16:837'
Elapsed time [0 sec]

Retrieve the output of UpdateRecords using the run job:output::get command with the jobId that you obtained in a previous step, as in the following example:

Copy

> ctm run job:output::get "IN01:000c6"
 
Environment information:
+--------------------+--------------------------------------------------+
|Account Name        |DB-CP                                             |
+--------------------+--------------------------------------------------+
|Database Vendor     |PostgreSQL                                        |
+--------------------+--------------------------------------------------+
|Database Version    |9.2.8                                             |
+--------------------+--------------------------------------------------+
 
Request statement:
------------------
select 'Parameter';
 
Job statistics:
+-------------------------+-------------------------+
|Start Time               |20200523163619           |
+-------------------------+-------------------------+
|End Time                 |20200523163619           |
+-------------------------+-------------------------+
|Elapsed Time             |13                       |
+-------------------------+-------------------------+
|Number Of Affected Rows  |1                        |
+-------------------------+-------------------------+
Exit Code    = 0
Exit Message = Normal completion

Running a Hadoop Spark Job Flow

This tutorial guides you through writing Hadoop and Spark jobs that run in sequence.

Before You Begin

Ensure that you meet the following prerequisites:
- You have successfully completed API setup, as described inSetting Up the API.
- You have Git installed. If not, obtain it from the Git Downloads page.
- You have local copies of the tutorial samples from GitHub and a local copy of the source code using the git clone command:
  
  git clone https://github.com/controlm/automation-api-quickstart.git

Ensure that you have a Hadoop edge node where the Hadoop client software is installed and that Hadoop and HDFS are operational. To verify, use hadoop commands , as in the following example:

Copy

> hadoop version
 
Hadoop 2.6.0-cdh5.4.2
Subversion http://github.com/cloudera/hadoop -r 15b703c8725733b7b2813d2325659eb7d57e7a3f
Compiled by jenkins on 2015-05-20T00:03Z
Compiled with protoc 2.5.0
From source with checksum de74f1adb3744f8ee85d9a5b98f90d
This command was run using /usr/jars/hadoop-common-2.6.0-cdh5.4.2.jar
 
> hadoop fs -ls /
 
Found 5 items
drwxr-xr-x   - hbase supergroup          0 2015-12-13 02:32 /hbase
drwxr-xr-x   - solr  solr                0 2015-06-09 03:38 /solr
drwxrwxrwx   - hdfs  supergroup          0 2016-03-20 07:11 /tmp
drwxr-xr-x   - hdfs  supergroup          0 2016-03-29 06:51 /user
drwxr-xr-x   - hdfs  supergroup          0 2015-06-09 03:36 /var

Begin

Provision the required image by doing the following:
1. Obtain the list of available images using the provision images command, as in the following example:
  Copy
```
> ctm provision images Linux
 
[
   "AWS_plugin.Linux",
   "Agent_Amazon.Linux",
   "Agent_CentOs.Linux",
   "Agent_Oracle.Linux",
   "Agent_RedHat.Linux",
   "Agent_Suse.Linux",
   "Agent_Ubuntu.Linux",
   "Application_Integrator_plugin.Linux",
   "Azure_plugin.Linux",
   "Databases_plugin.Linux",
   "Hadoop_plugin.Linux",
   "Informatica_plugin.Linux",
   "MFT_plugin.Linux",
   "SAP_plugin.Linux"
]
```
2. Provision an Agent and Hadoop plug-in by running the following set of commands as an administrator:
  
  ctm provision saas::install Agent_Amazon.Linux <agentTag>
  
  ctm provision image Hadoop_plugin.Linux
  
  The Agent tag that you specify must have a matching Agent token. For information about generating a token, see Generating an Agent Token.
The Agent and Hadoop plug-in are provisioned and you now have a running instance of your Agent on your Hadoop edge node.
Access the tutorial sample with the following command:

cd automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow

Verify that the code within the AutomationAPISampleHadoopFlow.json file is valid by running the build command.

The following example shows the build command and a typical successful response:

Copy

> ctm build AutomationAPISampleHadoopFlow.json
 
[
   {
      "deploymentFile": "AutomationAPISampleHadoopFlow.json",
      "successfulFoldersCount": 0,
      "successfulSmartFoldersCount": 1,
      "successfulSubFoldersCount": 0,
      "successfulJobsCount": 2,
      "successfulConnectionProfilesCount": 0,
      "isDeployDescriptorValid": false
   }
]

Examine the contents of the AutomationAPISampleHadoopFlow.json file to learn about the structure of the job flow, as shown below:

Copy

{
    "Defaults" : {
        "Application": "SampleApp",
        "SubApplication": "SampleSubApp",
        "Host" : "HOST",
        "When" : {
            "FromTime":"0300",
            "ToTime":"2100"
        },
        "Job:Hadoop" : {
            "ConnectionProfile": "SAMPLE_CONNECTION_PROFILE"
        }
    },
    "SAMPLE_CONNECTION_PROFILE" :
    {
        "Type" : "ConnectionProfile:Hadoop",
        "Centralized" : true
    },
    "AutomationAPIHadoopSampleFlow": {
        "Type": "Folder",
        "Comment" : "Code reviewed by John",
        "ProcessData": {
            "Type": "Job:Hadoop:Spark:Python",
            "SparkScript": "file:///home/USER/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/processData.py",
             
            "Arguments": [
                "file:///home/USER/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/processData.py",
                "file:///home/USER/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/processDataOutDir"
            ],
            "PreCommands" : {
                "Commands" : [
                    { "rm":"-R -f file:///home/USER/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/processDataOutDir" }
                ]                  
            }
        },
        "CopyOutputData" :
        {
            "Type" : "Job:Hadoop:HDFSCommands",
            "Commands" : [
                {"rm"    : "-R -f samplesOut" },
                {"mkdir" : "samplesOut" },
                {"cp"   : "file:///home/USER/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/* samplesOut" }
            ]
        },
        "DataProcessingFlow": {
            "Type": "Flow",
            "Sequence": ["ProcessData","CopyOutputData"]
        }
    }
}

This source code contains the following main objects:

The "Defaults" object at the top of this example allows you to define parameters once for all objects.

For example, it includes scheduling using the When parameter, which configures all jobs to run according to the same scheduling criteria.
The "SAMPLE_CONNECTION_PROFILE" object is used to define the connection parameters to the Hadoop cluster. Note that for Sqoop and Hive, it is used to set data sources and credentials.

The folder in this example is named AutomationAPIHadoopSampleFlow, and it contains two jobs:

A Spark job named ProcessData, which runs the following Spark Python program named processData.py:

Copy

from __future__ import print_function
 
import sys
from pyspark import SparkContext
 
inputFile  = sys.argv[1]
outputDir = sys.argv[2]
 
sc = SparkContext(appName="processDataSampel")
text_file = sc.textFile(inputFile)
counts = text_file.flatMap(lambda line: line.split(" ")) \
      .map(lambda word: (word, 1)) \
      .reduceByKey(lambda a, b: a + b)
 
counts.saveAsTextFile(outputDir)

The "PreCommands" object in the Spark job cleans up output from any previous Spark job runs.

An HDFS Commands job named CopyOutputData

The Flow object defines the sequence of job execution.

Set parameter values in the AutomationAPISampleHadoopFlow.json file to match your Control-M environment, as follows:
- Modify the path to the samples directory for the jobs to run successfully in your environment. Replace the URI file:///home/USER/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/ with the location of the samples that you installed on your machine.
- Replace the value of "Host" with the host name of the machine where you provisioned the Agent.
After modifying the AutomationAPISampleHadoopFlow.json file, run the sample code using the run command.
The following example shows the run command and a typical successful response after setting parameter values that match your Control-M environment:
Copy
```
> ctm run AutomationAPISampleHadoopFlow.json
 
{
   "runId": "6aef1ce1-3c57-4866-bf45-3a6afc33e27c",
   "statusURI": "https://controlmEndPointHost/automation-api/run/status/6aef1ce1-3c57-4866-bf45-3a6afc33e27c"
}
```

Check job status for the new runId that you obtained in the previous step using the run status command.

The following example shows the run status command and a typical successful response. Both jobs have the Ended OK status.

Copy

> ctm run status "6aef1ce1-3c57-4866-bf45-3a6afc33e27c"
 
{
  "statuses": [
    {
      "jobId": "IN01:000ca",
      "folderId": "IN01:00000",
      "numberOfRuns": 1,
      "name": "AutomationAPIHadoopSampleFlow",
      "type": "Folder",
      "status": "Ended OK",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "May 24, 2020 1:03:18 PM",
      "endTime": "May 24, 2020 1:03:45 PM",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "Folder has no output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000ca/log"
    },
    {
      "jobId": "IN01:000cb",
      "folderId": "IN01:000ca",
      "numberOfRuns": 1,
      "name": "ProcessData",
      "folder": "AutomationAPIHadoopSampleFlow",
      "type": "Job",
      "status": "Ended OK",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "May 24, 2020 1:03:18 PM",
      "endTime": "May 24, 2020 1:03:32 PM",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000cb/output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000cb/log"
    },
    {
      "jobId": "IN01:000cc",
      "folderId": "IN01:000ca",
      "numberOfRuns": 1,
      "name": "CopyOutputData",
      "folder": "AutomationAPIHadoopSampleFlow",
      "type": "Job",
      "status": "Ended OK",
      "held": "false",
      "deleted": "false",
      "cyclic": "false",
      "startTime": "May 24, 2020 1:03:33 PM",
      "endTime": "May 24, 2020 1:03:44 PM",
      "estimatedStartTime": [],
      "estimatedEndTime": [],
      "outputURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000cc/output",
      "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000cc/log"
    }
  ],
  "startIndex": 0,
  "itemsPerPage": 25,
  "total": 3
}

Retrieve the output of CopyOutputData using the run job:output::get command with the jobId that you obtained in the previous step, as in the following example:

Copy

> ctm run job:output::get IN01:000cc
 
Environment information:
+--------------------+--------------------------------------------------+
|Account Name        |SAMPLE_CONNECTION_PROFILE                         |
+--------------------+--------------------------------------------------+
 
Job is running as user: cloudera
-----------------------
Running the following HDFS command:
-----------------------------------
hadoop fs -rm -R -f samplesOut
 
HDFS command output:
-------------------
Deleted samplesOut
script return value 0
-----------------------------------------------------------
-----------------------------------------------------------
 
Job is running as user: cloudera
-----------------------
Running the following HDFS command:
-----------------------------------
hadoop fs -mkdir samplesOut
 
HDFS command output:
-------------------
script return value 0
-----------------------------------------------------------
-----------------------------------------------------------
 
Job is running as user: cloudera
-----------------------
Running the following HDFS command:
-----------------------------------
hadoop fs -cp file:///home/cloudera/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/* samplesOut
 
HDFS command output:
-------------------
script return value 0
-----------------------------------------------------------
-----------------------------------------------------------
 
Application reports:
--------------------
-> no hadoop application reports were created for the job execution.
 
Job statistics:
--------------
+-------------------------+-------------------------+
|Start Time               |20200524030335           |
+-------------------------+-------------------------+
|End Time                 |20200524030346           |
+-------------------------+-------------------------+
|Elapsed Time             |1065                     |
+-------------------------+-------------------------+
Exit Message = Normal completion