Running Applications and Programs in Your Environment

This tutorial demonstrates how to provision images and plug-ins and then run several types of job flows, according to your environment type:

Running a Script and Command Job FlowLink copied to clipboard

This tutorial guides you through running a script and command in sequence.

Before You Begin

Ensure that you meet the following prerequisites:

  • You have successfully completed API setup, as described inSetting Up the API.

  • You have Git installed. If not, obtain it from the Git Downloads page.

  • You have local copies of the tutorial samples from GitHub and a local copy of the source code using the git clone command:

    git clone https://github.com/controlm/automation-api-quickstart.git

  • You have a Windows 64-bit machine or Linux 64-bit machine that has access to scripts and programs that you would like to run.

Begin

  1. Provision the required image by doing the following:

    1. Obtain the list of available images using the provision images command, as in the following examples:

      • Linux:

        CopyCopied to clipboard
        > ctm provision images Linux

        [
        "AWS_plugin.Linux",
        "Agent_Amazon.Linux",
        "Agent_CentOs.Linux",
        "Agent_Oracle.Linux",
        "Agent_RedHat.Linux",
        "Agent_Suse.Linux",
        "Agent_Ubuntu.Linux",
        "Application_Integrator_plugin.Linux",
        "Azure_plugin.Linux",
        "Databases_plugin.Linux",
        "Hadoop_plugin.Linux",
        "Informatica_plugin.Linux",
        "MFT_plugin.Linux",
        "SAP_plugin.Linux"
        ]
      • Windows:

        CopyCopied to clipboard
        > ctm provision images Windows

        [
        "AWS_plugin.Windows",
        "Agent_Windows.Windows",
        "Application_Integrator_plugin.Windows",
        "Azure_plugin.Windows",
        "Databases_plugin.Windows",
        "Informatica_plugin.Windows",
        "MFT_plugin.Windows",
        "SAP_plugin.Windows"
        ]
    2. Provision an Agent image by running one of the following commands as an administrator:

      • Linux:

        ctm provision saas::install Agent_Amazon.Linux <agentTag>

      • Windows:

        ctm provision saas::install Agent_Windows.Windows <agentTag>

      The Agent tag that you specify must have a matching Agent token. For information about generating a token, see Generating an Agent Token.

    The Agent is provisioned and you now have a running instance of your Control-M/Agent on your host.

  2. Access the tutorial sample with the following command:

    cd automation-api-quickstart/helix-control-m/101-running-script-command-job-flow

  3. Verify that the code within the AutomationAPISampleFlow.json file is valid by running the build command.

    The following example shows the build command and a typical successful response:

    CopyCopied to clipboard
    > ctm build AutomationAPISampleFlow.json

    [
    {
    "deploymentFile": "AutomationAPISampleFlow.json",
    "successfulFoldersCount": 0,
    "successfulSmartFoldersCount": 1,
    "successfulSubFoldersCount": 0,
    "successfulJobsCount": 2,
    "successfulConnectionProfilesCount": 0,
    "isDeployDescriptorValid": false
    }
    ]
  4. Run the jobs on the Control-M environment using the run command.

    The returned runId is used to check the job status.

    The following example shows the run command and a typical successful response:

    CopyCopied to clipboard
    > ctm run AutomationAPISampleFlow.json

    {
    "runId": "7cba67de-9e0d-409d-8d93-1b8229432eee",
    "statusURI": "https://controlmEndPointHost/automation-api/run/status/7cba67de-9e0d-409d-8d93-1b82294e"
    }

    In this example, the code ran successfully and returned a runId of "7cba67de-9e0d-409d-8d93-1b8229432eee".

  5. Check job status for the runId that you obtained in the previous step using the run status command.

    The following example shows the run status command and a typical successful response, with job status information for each of the jobs in the flow:

    CopyCopied to clipboard
    > ctm run status "7cba67de-9e0d-409d-8d93-1b8229432eee"

    {
    "statuses": [
    {
    "jobId": "IN01:00007",
    "folderId": "IN01:00000",
    "numberOfRuns": 1,
    "name": "AutomationAPISampleFlow",
    "type": "Folder",
    "status": "Executing",
    "held": "false",
    "deleted": "false",
    "cyclic": "false",
    "startTime": "Apr 26, 2020 10:43:47 AM",
    "endTime": "",
    "estimatedStartTime": [],
    "estimatedEndTime": [],
    "outputURI": "Folder has no output",
    "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:00007/log"
    },
    {
    "jobId": "IN01:00008",
    "folderId": "IN01:00007",
    "numberOfRuns": 0,
    "name": "CommandJob",
    "folder": "AutomationAPISampleFlow",
    "type": "Command",
    "status": "Wait Host",
    "held": "false",
    "deleted": "false",
    "cyclic": "false",
    "startTime": "",
    "endTime": "",
    "estimatedStartTime": [],
    "estimatedEndTime": [],
    "outputURI": "Job did not run, it has no output",
    "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:00008/log"
    },
    {
    "jobId": "IN01:00009",
    "folderId": "IN01:00007",
    "numberOfRuns": 0,
    "name": "ScriptJob",
    "folder": "AutomationAPISampleFlow",
    "type": "Job",
    "status": "Wait Condition",
    "held": "false",
    "deleted": "false",
    "cyclic": "false",
    "startTime": "",
    "endTime": "",
    "estimatedStartTime": [],
    "estimatedEndTime": [],
    "outputURI": "Job did not run, it has no output",
    "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:00009/log"
    }
    ],
    "startIndex": 0,
    "itemsPerPage": 25,
    "total": 3
    }
  6. Examine the contents of the AutomationAPISampleFlow.json file to learn about the structure of the job flow, as shown below:

    CopyCopied to clipboard
    {
    "Defaults" : {
    "Application" : "SampleApp",
    "SubApplication" : "SampleSubApp",
    "RunAs" : "USERNAME",
    "Host" : "HOST",
    "Job": {
    "When" : {
    "Months": ["JAN", "OCT", "DEC"],
    "MonthDays":["22","1","11"],
    "WeekDays":["MON","TUE", "WED", "THU", "FRI"],
    "FromTime":"0300",
    "ToTime":"2100"
    },
    "ActionIfFailure" : {
    "Type": "If",
    "CompletionStatus": "NOTOK",
    "mailToTeam": {
    "Type": "Mail",
    "Message": "%%JOBNAME failed",
    "To": "team@mycomp.com"
    }
    }
    }
    },
    "AutomationAPISampleFlow": {
    "Type": "Folder",
    "Comment" : "Code reviewed by John",
    "CommandJob": {
    "Type": "Job:Command",
    "Command": "COMMAND"
    },
    "ScriptJob": {
    "Type": "Job:Script",
    "FilePath":"SCRIPT_PATH",
    "FileName":"SCRIPT_NAME"
    },
    "Flow": {
    "Type": "Flow",
    "Sequence": ["CommandJob", "ScriptJob"]
    }
    }
    }

    This source code contains the following main objects:

    • The "Defaults" object at the top of this example allows you to define parameters once for all objects.

      For example, it includes scheduling using the When parameter, which configures all jobs to run according to the same scheduling criteria.

    • The "ActionIfFailure" object determines what action to take when a job ends unsuccessfully.

    • The folder in this example is named AutomationAPISampleFlow, and it contains two jobs, CommandJob and ScriptJob.

    • The Flow object defines the sequence of job execution.

  7. Set parameter values that match your Control-M environment in the following lines in the AutomationAPISampleFlow.json file:

    • "RunAs" : "USERNAME"

      Replace USERNAME with the name of an operating system user that will execute jobs on the Agent.

    • "Host" : "HOST"

      Replace HOST with the hostname where you provisioned the Control-M/Agent.

    • "Command": "COMMAND"

      Defines the command to run according to your operating system.

    • "FilePath":"SCRIPT_PATH"

      Replace SCRIPT_PATH with the path to the script file to run.

    • "FileName":"SCRIPT_NAME"

      Replace SCRIPT_NAME with the name of the script file to run.

    In JSON, the backslash character must be doubled (\\) when used in a Windows file path.

  8. After modifying the AutomationAPISampleFlow.json file, rerun the sample code using the run command.

    The following example shows the run command and a typical successful response after setting parameter values that match your Control-M environment:

    CopyCopied to clipboard
    > ctm run AutomationAPISampleFlow.json

    {
    "runId": "ed40f73e-fb7a-4f07-a71c-bc2dfbc48494",
    "statusURI": "https://controlmEndPointHost/automation-api/run/status/ed40f73e-fb7a-4f07-a71c-bc2dfbc48494"
    }
  9. Check job status for the new runId that you obtained in the previous step using the run status command.

    The following example shows the run status command and a typical successful response. This time, both jobs have the Ended OK status.

    CopyCopied to clipboard
    > ctm run status "ed40f73e-fb7a-4f07-a71c-bc2dfbc48494"

    {
    "statuses": [
    {
    "jobId": "IN01:0000p",
    "folderId": "IN01:00000",
    "numberOfRuns": 1,
    "name": "AutomationAPISampleFlow",
    "type": "Folder",
    "status": "Ended OK",
    "held": "false",
    "deleted": "false",
    "cyclic": "false",
    "startTime": "May 3, 2020 4:57:25 PM",
    "endTime": "May 3, 2020 4:57:28 PM",
    "estimatedStartTime": [],
    "estimatedEndTime": [],
    "outputURI": "Folder has no output",
    "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:0000p/log"
    },
    {
    "jobId": "IN01:0000q",
    "folderId": "IN01:0000p",
    "numberOfRuns": 1,
    "name": "CommandJob",
    "folder": "AutomationAPISampleFlow",
    "type": "Command",
    "status": "Ended OK",
    "held": "false",
    "deleted": "false",
    "cyclic": "false",
    "startTime": "May 3, 2020 4:57:26 PM",
    "endTime": "May 3, 2020 4:57:26 PM",
    "estimatedStartTime": [],
    "estimatedEndTime": [],
    "outputURI": "https://controlmEndPointHost/automation-api/run/job/IN01:0000q/output",
    "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:0000q/log"
    },
    {
    "jobId": "IN01:0000r",
    "folderId": "IN01:0000p",
    "numberOfRuns": 1,
    "name": "ScriptJob",
    "folder": "AutomationAPISampleFlow",
    "type": "Job",
    "status": "Ended OK",
    "held": "false",
    "deleted": "false",
    "cyclic": "false",
    "startTime": "May 3, 2020 4:57:27 PM",
    "endTime": "May 3, 2020 4:57:27 PM",
    "estimatedStartTime": [],
    "estimatedEndTime": [],
    "outputURI": "https://controlmEndPointHost/automation-api/run/job/IN01:0000r/output",
    "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:0000r/log"
    }
    ],
    "startIndex": 0,
    "itemsPerPage": 25,
    "total": 3
    }
  10. Retrieve the output of the CommandJob using the run job:output::get command with the jobId that you obtained in the previous step, and verify that the output contains your script or command details:

    ctm run job:output::get "IN01:0000q"

Running a File Transfer and Database Queries Job FlowLink copied to clipboard

This tutorial guides you through running file transfer and database query jobs in sequence.

Before You Begin

Ensure that you meet the following prerequisites:

  • You have successfully completed API setup, as described inSetting Up the API.

  • You have Git installed. If not, obtain it from the Git Downloads page.

  • You have local copies of the tutorial samples from GitHub and a local copy of the source code using the git clone command:

    git clone https://github.com/controlm/automation-api-quickstart.git

  • You have a PostgreSQL database or some other database and SFTP server, and the Agent host machine has a network connection with the database and SFTP server.

Begin

  1. Provision the required image by doing the following:

    1. Obtain the list of available images using the provision images command, as in the following examples:

      • Linux:

        CopyCopied to clipboard
        > ctm provision images Linux

        [
        "AWS_plugin.Linux",
        "Agent_Amazon.Linux",
        "Agent_CentOs.Linux",
        "Agent_Oracle.Linux",
        "Agent_RedHat.Linux",
        "Agent_Suse.Linux",
        "Agent_Ubuntu.Linux",
        "Application_Integrator_plugin.Linux",
        "Azure_plugin.Linux",
        "Databases_plugin.Linux",
        "Hadoop_plugin.Linux",
        "Informatica_plugin.Linux",
        "MFT_plugin.Linux",
        "SAP_plugin.Linux"
        ]
      • Windows:

        CopyCopied to clipboard
        > ctm provision images Windows

        [
        "AWS_plugin.Windows",
        "Agent_Windows.Windows",
        "Application_Integrator_plugin.Windows",
        "Azure_plugin.Windows",
        "Databases_plugin.Windows",
        "Informatica_plugin.Windows",
        "MFT_plugin.Windows",
        "SAP_plugin.Windows"
        ]
    2. Provision an Agent and plug-ins by running one of the following sets of commands as an administrator:

      • Linux:

        • ctm provision saas::install Agent_Amazon.Linux <agentTag>

        • ctm provision image Databases_plugin.Linux

        • ctm provision image MFT_plugin.Linux

      • Windows:

        • ctm provision saas::install Agent_Windows.Windows <agentTag>

        • ctm provision image Databases_plugin.Windows

        • ctm provision image MFT_plugin.Windows

      The Agent tag that you specify must have a matching Agent token. For information about generating a token, see Generating an Agent Token.

    The Agent is provisioned and you now have a running instance of your Control-M/Agent on your host with additional plug-ins.

  2. Access the tutorial sample with the following command:

    cd automation-api-quickstart/helix-control-m/101-running-file-transfer-and-database-query-job-flow

  3. Verify that the code within the AutomationAPIFileTransferDatabaseSampleFlow.json file is valid by running the build command.

    The following example shows the build command and a typical successful response:

    CopyCopied to clipboard
    > ctm build AutomationAPIFileTransferDatabaseSampleFlow.json

    [
    {
    "deploymentFile": "AutomationAPIFileTransferDatabaseSampleFlow.json",
    "successfulFoldersCount": 0,
    "successfulSmartFoldersCount": 1,
    "successfulSubFoldersCount": 0,
    "successfulJobsCount": 2,
    "successfulConnectionProfilesCount": 3,
    "successfulDriversCount": 0,
    "isDeployDescriptorValid": false
    }
    ]
  4. Examine the contents of the AutomationAPIFileTransferDatabaseSampleFlow.json file to learn about the structure of the job flow, as shown below:

    CopyCopied to clipboard
    {
    "Defaults" : {
    "Application" : "SampleApp",
    "SubApplication" : "SampleSubApp",
    "Host" : "HOST",
    "Centralized" : true,

    "Variables": [
    {"DestDataFile": "DESTINATION_FILE"},
    {"SrcDataFile": "SOURCE_FILE"}
    ],

    "When" : {
    "FromTime":"0300",
    "ToTime":"2100"
    }
    },
    "SFTP-CP": {
    "Type": "ConnectionProfile:FileTransfer:SFTP",
    "HostName": "SFTP_SERVER",
    "Port": "22",
    "User" : "SFTP_USER",
    "Password" : "SFTP_PASSWORD"
    },
    "LOCAL-CP" : {
    "Type" : "ConnectionProfile:FileTransfer:Local",
    "User" : "USER",
    "Password" : "PASSWORD"
    },
    "DB-CP": {
    "Type": "ConnectionProfile:Database:PostgreSQL",
    "Host": "DATABASE_SERVER",
    "Port":"5432",
    "User": "DATABASE_USER",
    "Password": "DATABASE_PASSWORD",
    "DatabaseName": "postgres"
    },
    "AutomationAPIFileTransferDatabaseSampleFlow": {
    "Type": "Folder",
    "Comment" : "Code reviewed by John",
    "GetData": {
    "Type" : "Job:FileTransfer",
    "ConnectionProfileSrc" : "SFTP-CP",
    "ConnectionProfileDest" : "LOCAL-CP",

    "FileTransfers" :
    [
    {
    "Src" : "%%SrcDataFile",
    "Dest": "%%DestDataFile",
    "TransferOption": "SrcToDest",
    "TransferType": "Binary",
    "PreCommandDest": {
    "action": "rm",
    "arg1": "%%DestDataFile"
    },
    "PostCommandDest": {
    "action": "chmod",
    "arg1": "700",
    "arg2": "%%DestDataFile"
    }
    }
    ]
    },
    "UpdateRecords": {
    "Type": "Job:Database:SQLScript",
    "SQLScript": "/home/USER/automation-api-quickstart/helix-control-m/101-running-file-transfer-and-database-query-job-flow/processRecords.sql",
    "ConnectionProfile": "DB-CP"
    },
    "Flow": {
    "Type": "Flow",
    "Sequence": ["GetData", "UpdateRecords"]
    }
    }
    }

    This source code contains the following main objects:

    • The "Defaults" object at the top of this example allows you to define parameters once for all objects, including the following:

      • Scheduling criteria for all jobs, defined by the When parameter.

      • Variables that are referenced several times in the jobs.

    • Three Connection Profiles:

      • SFTP-CP defines access and security credentials for the SFTP server.

      • DB-CP defines access and security credentials for the database.

      • Local-CP defines access and security credentials for files that are transferred to the local machine.

    • The folder in this example is named AutomationAPIFileTransferDatabaseSampleFlow, and it contains two jobs:

      • GetData transfers files from the SFTP server to the host machine.

      • UpdateRecords performs a SQL query on the database.

    • The Flow object defines the sequence of job execution.

  5. Set parameter values in the AutomationAPIFileTransferDatabaseSampleFlow.json file to match your Control-M environment, as follows:

    • Replace the value of "SrcDataFile" with the file that is transferred from the SFTP server, and the value of "DestDataFile" with the path of the transferred file on the host machine.

      CopyCopied to clipboard
      {"DestDataFile": "DESTINATION_FILE"},
      {"SrcDataFile": "SOURCE_FILE"}
    • Modify the path to the samples directory for the jobs to run successfully in your environment. Replace the path /home/USER/automation-api-quickstart/helix-control-m/101-running-file-transfer-and-database-query-job-flow with the location of the samples that you installed on your machine.

      "SQLScript": "/home/USER/automation-api-quickstart/helix-control-m/101-running-file-transfer-and-database-query-job-flow/processRecords.sql"

    • Replace the following parameters with the credentials used to login to the SFTP server.

      CopyCopied to clipboard
      "HostName": "SFTP_SERVER",
      "User" : "SFTP_USER",
      "Password" : "SFTP_PASSWORD"
    • Replace the following parameters with the credentials used to access the database server.

      CopyCopied to clipboard
      "Host": "DATABASE_SERVER",
      "Port":"5432",
      "User": "DATABASE_USER",
      "Password": "DATABASE_PASSWORD",
    • Replace the following parameters with the credentials used for read/write files on the host machine.

      CopyCopied to clipboard
      "LOCAL-CP" : {
      "Type" : "ConnectionProfile:FileTransfer:Local",
      "User" : "USER",
      "Password" : ""
      }
  6. After modifying the AutomationAPIFileTransferDatabaseSampleFlow.json file, run the sample code using the run command.

    The following example shows the run command and a typical successful response after setting parameter values that match your Control-M environment:

    CopyCopied to clipboard
    > ctm run AutomationAPIFileTransferDatabaseSampleFlow.json

    {
    "runId": "ce62ace0-4a6e-4b17-afdd-35335cbf179e",
    "statusURI": "https://controlmEndPointHost/automation-api/run/status/ce62ace0-4a6e-4b17-afdd-35335cbf179e"
    }
  7. Check job status for the new runId that you obtained in the previous step using the run status command.

    The following example shows the run status command and a typical successful response. Both jobs have the Ended OK status.

    CopyCopied to clipboard
    > ctm run status "ce62ace0-4a6e-4b17-afdd-35335cbf179e"

    {
    "statuses": [
    {
    "jobId": "IN01:000c1",
    "folderId": "IN01:00000",
    "numberOfRuns": 1,
    "name": "AutomationAPIFileTransferDatabaseSampleFlow",
    "type": "Folder",
    "status": "Ended OK",
    "held": "false",
    "deleted": "false",
    "cyclic": "false",
    "startTime": "May 23, 2020 4:25:10 PM",
    "endTime": "May 23, 2020 4:25:26 PM",
    "estimatedStartTime": [],
    "estimatedEndTime": [],
    "outputURI": "Folder has no output",
    "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000c1/log"
    },
    {
    "jobId": "IN01:000c2",
    "folderId": "IN01:000c1",
    "numberOfRuns": 1,
    "name": "GetData",
    "folder": "AutomationAPIFileTransferDatabaseSampleFlow",
    "type": "Job",
    "status": "Ended OK",
    "held": "false",
    "deleted": "false",
    "cyclic": "false",
    "startTime": "May 23, 2020 4:25:10 PM",
    "endTime": "May 23, 2020 4:25:17 PM",
    "estimatedStartTime": [],
    "estimatedEndTime": [],
    "outputURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000c2/output",
    "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000c2/log"
    },
    {
    "jobId": "IN01:000c3",
    "folderId": "IN01:000c1",
    "numberOfRuns": 1,
    "name": "UpdateRecords",
    "folder": "AutomationAPIFileTransferDatabaseSampleFlow",
    "type": "Job",
    "status": "Ended OK",
    "held": "false",
    "deleted": "false",
    "cyclic": "false",
    "startTime": "May 23, 2020 4:25:18 PM",
    "endTime": "May 23, 2020 4:25:25 PM",
    "estimatedStartTime": [],
    "estimatedEndTime": [],
    "outputURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000c3/output",
    "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000c3/log"
    }
    ],
    "startIndex": 0,
    "itemsPerPage": 25,
    "total": 3
    }
  8. Retrieve the output of GetData using the run job:output::get command with the jobId that you obtained in the previous step, as in the following example:

    CopyCopied to clipboard
    > ctm run job:output::get "IN01:000c2"

    + Job started at '0523 16:25:15:884' orderno - '000c2' runno - '00001' Number of transfers - 1
    + Host1 XXXXX' username XXXX - Host2 'controlmEndPointHost' username XXXX
    Local host is XXX
    Connection to SFTP server on host XXX was established
    Connection to Local server on host controlmEndPointHost was established
    +********** Starting transfer #1 out of 1**********
    * Executing pre-commands on host controlmEndPointHost
    rm c:\temp\XXXX
    File 'c:\temp\XXX removed successfully
    Transfer type: BINARY
    Open data connection to retrieve file /home/user/XXX
    Open data connection to store file c:\temp\XXX
    Transfer #1 transferring
    Src file: '/ home/user/XXX ' on host 'XXXX'
    Dst file: 'c:\temp\XXX on host 'controlmEndPointHost'
    Transferred: 628 Elapsed: 0 sec Percent: 100 Status: In Progress
    File transfer status: Ended OK
    Destination file size vs. source file size validation passed
    * Executing post-commands on host controlmEndPointHost
    chmod 700 c:\temp\XXX
    Transfer #1 completed successfully
    Job executed successfully. exiting.
    Job ended at '0523 16:25:16:837'
    Elapsed time [0 sec]


  9. Retrieve the output of UpdateRecords using the run job:output::get command with the jobId that you obtained in a previous step, as in the following example:

    CopyCopied to clipboard
    > ctm run job:output::get "IN01:000c6"

    Environment information:
    +--------------------+--------------------------------------------------+
    |Account Name |DB-CP |
    +--------------------+--------------------------------------------------+
    |Database Vendor |PostgreSQL |
    +--------------------+--------------------------------------------------+
    |Database Version |9.2.8 |
    +--------------------+--------------------------------------------------+

    Request statement:
    ------------------
    select 'Parameter';

    Job statistics:
    +-------------------------+-------------------------+
    |Start Time |20200523163619 |
    +-------------------------+-------------------------+
    |End Time |20200523163619 |
    +-------------------------+-------------------------+
    |Elapsed Time |13 |
    +-------------------------+-------------------------+
    |Number Of Affected Rows |1 |
    +-------------------------+-------------------------+
    Exit Code = 0
    Exit Message = Normal completion


Running a Hadoop Spark Job FlowLink copied to clipboard

This tutorial guides you through writing Hadoop and Spark jobs that run in sequence.

Before You Begin

  • Ensure that you meet the following prerequisites:

    • You have successfully completed API setup, as described inSetting Up the API.

    • You have Git installed. If not, obtain it from the Git Downloads page.

    • You have local copies of the tutorial samples from GitHub and a local copy of the source code using the git clone command:

      git clone https://github.com/controlm/automation-api-quickstart.git

  • Ensure that you have a Hadoop edge node where the Hadoop client software is installed and that Hadoop and HDFS are operational. To verify, use hadoop commands , as in the following example:

    CopyCopied to clipboard
    > hadoop version

    Hadoop 2.6.0-cdh5.4.2
    Subversion http://github.com/cloudera/hadoop -r 15b703c8725733b7b2813d2325659eb7d57e7a3f
    Compiled by jenkins on 2015-05-20T00:03Z
    Compiled with protoc 2.5.0
    From source with checksum de74f1adb3744f8ee85d9a5b98f90d
    This command was run using /usr/jars/hadoop-common-2.6.0-cdh5.4.2.jar

    > hadoop fs -ls /

    Found 5 items
    drwxr-xr-x - hbase supergroup 0 2015-12-13 02:32 /hbase
    drwxr-xr-x - solr solr 0 2015-06-09 03:38 /solr
    drwxrwxrwx - hdfs supergroup 0 2016-03-20 07:11 /tmp
    drwxr-xr-x - hdfs supergroup 0 2016-03-29 06:51 /user
    drwxr-xr-x - hdfs supergroup 0 2015-06-09 03:36 /var

Begin

  1. Provision the required image by doing the following:

    1. Obtain the list of available images using the provision images command, as in the following example:

      CopyCopied to clipboard
      > ctm provision images Linux

      [
      "AWS_plugin.Linux",
      "Agent_Amazon.Linux",
      "Agent_CentOs.Linux",
      "Agent_Oracle.Linux",
      "Agent_RedHat.Linux",
      "Agent_Suse.Linux",
      "Agent_Ubuntu.Linux",
      "Application_Integrator_plugin.Linux",
      "Azure_plugin.Linux",
      "Databases_plugin.Linux",
      "Hadoop_plugin.Linux",
      "Informatica_plugin.Linux",
      "MFT_plugin.Linux",
      "SAP_plugin.Linux"
      ]
    2. Provision an Agent and Hadoop plug-in by running the following set of commands as an administrator:

      ctm provision saas::install Agent_Amazon.Linux <agentTag>

      ctm provision image Hadoop_plugin.Linux

      The Agent tag that you specify must have a matching Agent token. For information about generating a token, see Generating an Agent Token.

    The Agent and Hadoop plug-in are provisioned and you now have a running instance of your Control-M/Agent on your Hadoop edge node.

  2. Access the tutorial sample with the following command:

    cd automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow

  3. Verify that the code within the AutomationAPISampleHadoopFlow.json file is valid by running the build command.

    The following example shows the build command and a typical successful response:

    CopyCopied to clipboard
    > ctm build AutomationAPISampleHadoopFlow.json

    [
    {
    "deploymentFile": "AutomationAPISampleHadoopFlow.json",
    "successfulFoldersCount": 0,
    "successfulSmartFoldersCount": 1,
    "successfulSubFoldersCount": 0,
    "successfulJobsCount": 2,
    "successfulConnectionProfilesCount": 0,
    "isDeployDescriptorValid": false
    }
    ]
  4. Examine the contents of the AutomationAPISampleHadoopFlow.json file to learn about the structure of the job flow, as shown below:

    CopyCopied to clipboard
    {
    "Defaults" : {
    "Application": "SampleApp",
    "SubApplication": "SampleSubApp",
    "Host" : "HOST",
    "When" : {
    "FromTime":"0300",
    "ToTime":"2100"
    },
    "Job:Hadoop" : {
    "ConnectionProfile": "SAMPLE_CONNECTION_PROFILE"
    }
    },
    "SAMPLE_CONNECTION_PROFILE" :
    {
    "Type" : "ConnectionProfile:Hadoop",
    "Centralized" : true
    },
    "AutomationAPIHadoopSampleFlow": {
    "Type": "Folder",
    "Comment" : "Code reviewed by John",
    "ProcessData": {
    "Type": "Job:Hadoop:Spark:Python",
    "SparkScript": "file:///home/USER/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/processData.py",

    "Arguments": [
    "file:///home/USER/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/processData.py",
    "file:///home/USER/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/processDataOutDir"
    ],
    "PreCommands" : {
    "Commands" : [
    { "rm":"-R -f file:///home/USER/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/processDataOutDir" }
    ]
    }
    },
    "CopyOutputData" :
    {
    "Type" : "Job:Hadoop:HDFSCommands",
    "Commands" : [
    {"rm" : "-R -f samplesOut" },
    {"mkdir" : "samplesOut" },
    {"cp" : "file:///home/USER/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/* samplesOut" }
    ]
    },
    "DataProcessingFlow": {
    "Type": "Flow",
    "Sequence": ["ProcessData","CopyOutputData"]
    }
    }
    }

    This source code contains the following main objects:

    • The "Defaults" object at the top of this example allows you to define parameters once for all objects.

      For example, it includes scheduling using the When parameter, which configures all jobs to run according to the same scheduling criteria.

    • The "SAMPLE_CONNECTION_PROFILE" object is used to define the connection parameters to the Hadoop cluster. Note that for Sqoop and Hive, it is used to set data sources and credentials.

    • The folder in this example is named AutomationAPIHadoopSampleFlow, and it contains two jobs:

      • A Spark job named ProcessData, which runs the following Spark Python program named processData.py:

        CopyCopied to clipboard
        from __future__ import print_function

        import sys
        from pyspark import SparkContext

        inputFile = sys.argv[1]
        outputDir = sys.argv[2]

        sc = SparkContext(appName="processDataSampel")
        text_file = sc.textFile(inputFile)
        counts = text_file.flatMap(lambda line: line.split(" ")) \
        .map(lambda word: (word, 1)) \
        .reduceByKey(lambda a, b: a + b)

        counts.saveAsTextFile(outputDir)

        The "PreCommands" object in the Spark job cleans up output from any previous Spark job runs.

      • An HDFS Commands job named CopyOutputData

    • The Flow object defines the sequence of job execution.

  5. Set parameter values in the AutomationAPISampleHadoopFlow.json file to match your Control-M environment, as follows:

    • Modify the path to the samples directory for the jobs to run successfully in your environment. Replace the URI file:///home/USER/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/ with the location of the samples that you installed on your machine.

    • Replace the value of "Host" with the host name of the machine where you provisioned the Control-M/Agent.

  6. After modifying the AutomationAPISampleHadoopFlow.json file, run the sample code using the run command.

    The following example shows the run command and a typical successful response after setting parameter values that match your Control-M environment:

    CopyCopied to clipboard
    > ctm run AutomationAPISampleHadoopFlow.json

    {
    "runId": "6aef1ce1-3c57-4866-bf45-3a6afc33e27c",
    "statusURI": "https://controlmEndPointHost/automation-api/run/status/6aef1ce1-3c57-4866-bf45-3a6afc33e27c"
    }
  7. Check job status for the new runId that you obtained in the previous step using the run status command.

    The following example shows the run status command and a typical successful response. Both jobs have the Ended OK status.

    CopyCopied to clipboard
    > ctm run status "6aef1ce1-3c57-4866-bf45-3a6afc33e27c"

    {
    "statuses": [
    {
    "jobId": "IN01:000ca",
    "folderId": "IN01:00000",
    "numberOfRuns": 1,
    "name": "AutomationAPIHadoopSampleFlow",
    "type": "Folder",
    "status": "Ended OK",
    "held": "false",
    "deleted": "false",
    "cyclic": "false",
    "startTime": "May 24, 2020 1:03:18 PM",
    "endTime": "May 24, 2020 1:03:45 PM",
    "estimatedStartTime": [],
    "estimatedEndTime": [],
    "outputURI": "Folder has no output",
    "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000ca/log"
    },
    {
    "jobId": "IN01:000cb",
    "folderId": "IN01:000ca",
    "numberOfRuns": 1,
    "name": "ProcessData",
    "folder": "AutomationAPIHadoopSampleFlow",
    "type": "Job",
    "status": "Ended OK",
    "held": "false",
    "deleted": "false",
    "cyclic": "false",
    "startTime": "May 24, 2020 1:03:18 PM",
    "endTime": "May 24, 2020 1:03:32 PM",
    "estimatedStartTime": [],
    "estimatedEndTime": [],
    "outputURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000cb/output",
    "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000cb/log"
    },
    {
    "jobId": "IN01:000cc",
    "folderId": "IN01:000ca",
    "numberOfRuns": 1,
    "name": "CopyOutputData",
    "folder": "AutomationAPIHadoopSampleFlow",
    "type": "Job",
    "status": "Ended OK",
    "held": "false",
    "deleted": "false",
    "cyclic": "false",
    "startTime": "May 24, 2020 1:03:33 PM",
    "endTime": "May 24, 2020 1:03:44 PM",
    "estimatedStartTime": [],
    "estimatedEndTime": [],
    "outputURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000cc/output",
    "logURI": "https://controlmEndPointHost/automation-api/run/job/IN01:000cc/log"
    }
    ],
    "startIndex": 0,
    "itemsPerPage": 25,
    "total": 3
    }
  8. Retrieve the output of CopyOutputData using the run job:output::get command with the jobId that you obtained in the previous step, as in the following example:

    CopyCopied to clipboard
    > ctm run job:output::get IN01:000cc

    Environment information:
    +--------------------+--------------------------------------------------+
    |Account Name |SAMPLE_CONNECTION_PROFILE |
    +--------------------+--------------------------------------------------+

    Job is running as user: cloudera
    -----------------------
    Running the following HDFS command:
    -----------------------------------
    hadoop fs -rm -R -f samplesOut

    HDFS command output:
    -------------------
    Deleted samplesOut
    script return value 0
    -----------------------------------------------------------
    -----------------------------------------------------------

    Job is running as user: cloudera
    -----------------------
    Running the following HDFS command:
    -----------------------------------
    hadoop fs -mkdir samplesOut

    HDFS command output:
    -------------------
    script return value 0
    -----------------------------------------------------------
    -----------------------------------------------------------

    Job is running as user: cloudera
    -----------------------
    Running the following HDFS command:
    -----------------------------------
    hadoop fs -cp file:///home/cloudera/automation-api-quickstart/helix-control-m/101-running-hadoop-spark-job-flow/* samplesOut

    HDFS command output:
    -------------------
    script return value 0
    -----------------------------------------------------------
    -----------------------------------------------------------

    Application reports:
    --------------------
    -> no hadoop application reports were created for the job execution.

    Job statistics:
    --------------
    +-------------------------+-------------------------+
    |Start Time |20200524030335 |
    +-------------------------+-------------------------+
    |End Time |20200524030346 |
    +-------------------------+-------------------------+
    |Elapsed Time |1065 |
    +-------------------------+-------------------------+
    Exit Message = Normal completion