Data Integration Jobs

The following topics describe job attributes that work with data integration platforms and services:

Job:Airbyte
Job:AWS Glue
Job:AWS Glue DataBrew
Job:ADF (Azure Data Factory)
Job:Boomi
Job:GCP Data Fusion
Job:GCP Dataplex
Job:GCP Dataprep
Job:Informatica
Job:Informatica CS
Job:OCI Data Integration
Job:Talend Data Management
Job:Talend OAuth
Job:Trifacta

Job:Airbyte

Airbyte is an open-source extract, transform, and load (ETL) service that enables you to build data pipelines and load data to a data warehouse, data lake, database, or analytics tool of your choice.

To deploy and run an Airbyte job, ensure that you have done the following:

Configured the Control-M Application Integrator plug-in, as described in Application Integrator Configuration.
Installed the Airbyte plug-in with the provision image command (Control-M/EM 9.0.21 or higher) or the deploy jobtype command (Control-M/EM 9.0.20.200 or lower).

For more information, see Control-M for Airbyte.

The following example shows how to define an Airbyte job:

Copy

"Airbyte_Job_2": 
{ 
   "Type": "Job:Airbyte", 
   "ConnectionProfile": "ABY", 
   "Connection Id": "796055fa-aj5542", 
   "Job Type": "Sync", 
   "Show Results": "unchecked", 
   "Status Polling Frequency": "60",
   "Failure Tolerance": "2"
}

The following table describes the Airbyte job parameters.

Parameter	Description
Connection Profile	Defines the ConnectionProfile:Airbyte name that connects Control-M to Airbyte.
Connection ID	Defines the Airbyte Connection ID, which identifies which data pipeline to execute. In Airbyte, a data pipeline copies data from a source to a destination.
Job Type	Determines one of the following Airbyte actions to perform: Sync: Reads from a source and writes to a destination, depending upon the predefined sync mode in the Airbyte platform. Reset: Deletes all the records from your destination and then performs a sync.
Show Results	Determines whether to append the API REST response to the output. Valid Values: checked unchecked Default: unchecked
Status Polling Frequency	(Optional) Determines the number of seconds to wait before checking the status of the job between intervals. Default: 60
Failure Tolerance	Determines the number of times to check the job status before ending Not OK. Default: 2

Job:AWS Glue

AWS Glue is a serverless data integration service that enables you to define data-driven workflows that automate the movement and transformation of data.

To deploy and run an AWS Glue job, ensure that you have done the following:

Configured the Control-M Application Integrator plug-in, as described in Application Integrator Configuration.
Installed the AWS Glue plug-in with the provision image command (Control-M/EM 9.0.21 or higher) or the deploy jobtype command (Control-M/EM 9.0.20.200 or lower).

The following example shows how to define an AWS Glue job:

Copy

"AwsGlueJob":
{
   "Type": "Job:AWS Glue",
   "ConnectionProfile": "GLUECONNECTION",
   "Glue Job Name": "AwsGlueJobName",
   "Glue Job Arguments": "checked",
   "Arguments": "{\"--myArg1\": \"myVal1\", \"--myArg2\": \"myVal2\"}",
   "Status Polling Frequency": "20",
   "Failure Tolerance": "2"
}

The following table describes the AWS Glue job parameters.

Parameter	Description
ConnectionProfile	Defines the ConnectionProfile:AWS Glue name that connects Control-M to AWS Glue.
Glue Job Name	Defines the name of the AWS Glue job that you want to execute.
Glue Job Arguments	Determines whether to specify arguments to pass when you run the AWS Glue job, as defined in Arguments. Valid Values: checked unchecked Default: unchecked
Arguments	(Optional) Defines specific arguments to pass when you run the AWS Glue job, as shown in the following example: Copy `{\"--myArg1\": \"myVal1\", \"--myArg2\": \"myVal2\"}` For more information about the available arguments, see Special Parameters Used by AWS Glue in the AWS documentation.
Status Polling Frequency	(Optional) Defines the number of seconds to wait before checking the status of the job. Default: 30
Failure Tolerance	Determines the number of times to check the job status before ending Not OK. Default: 2

Job:AWS Glue DataBrew

AWS Glue DataBrew is an extract, transform, and load (ETL) service that enables you to visualize your data and publish it to the Amazon S3 Data Lake.

To deploy and run an AWS Glue DataBrew job, ensure that you have done the following:

Configured the Control-M Application Integrator plug-in, as described in Application Integrator Configuration.
Installed the AWS Glue DataBrew plug-in with the provision image command (Control-M/EM 9.0.21 or higher) or the deploy jobtype command (Control-M/EM 9.0.20.200 or lower).

The following example shows how to define an AWS Glue DataBrew job:

Copy

"AWS Glue DataBrew_Job":
{
   "Type": "Job:AWS Glue DataBrew",
   "ConnectionProfile": "AWSDATABREW",
   "Job Name": "databrew-job",
   "Output Job Logs": "checked",
   "Status Polling Frequency": "10",
   "Failure Tolerance": "2"
}

The following table describes the AWS Glue DataBrew job parameters.

Parameter	Description
ConnectionProfile	Defines the ConnectionProfile:AWS Glue DataBrew name that connects Control-M to AWS Glue DataBrew.
Job Name	Defines the AWS Glue DataBrew job name.
Output Job Logs	Determines whether the DataBrew job logs are included in the Control-M output. Valid Values: checked unchecked Default: unchecked
Status Polling Frequency	Determines the number of seconds to wait before checking the status of the DataBrew job. Default: 10
Failure Tolerance	Determines the number of times to check the job status before ending Not OK. Default: 2

Job:ADF (Azure Data Factory)

Azure Data Factory (ADF) is an extract, transform, and load (ETL) service that enables you to automate the movement and transformation of data.

To deploy and run an ADF job, ensure that you have done the following:

Configured the Control-M Application Integrator plug-in, as described in Application Integrator Configuration.
Installed the ADF plug-in with the provision image command (Control-M/EM 9.0.21 or higher) or the deploy jobtype command (Control-M/EM 9.0.20.200 or lower).

The following example shows how to define an ADF job:

Copy

"AzureDataFactoryJob":
{
   "Type": "Job:ADF",
   "ConnectionProfile": "DataFactoryConnection",
   "Resource Group Name": "AzureResourceGroupName",
   "Data Factory Name": "AzureDataFactoryName",
   "Pipeline Name": "AzureDataFactoryPipelineName",
   "Parameters": "{\"myVar\":\"value1\", \"myOtherVar\": \"value2\"}",
   "Status Polling Frequency": "20",
   "Failure Tolerance": "3"
}

The following table describes the ADF job parameters.

Parameter	Description
ConnectionProfile	Defines the ConnectionProfile:ADF (Azure Data Factory) name that connects Control-M to Azure Data Factory.
Resource Group Name	Defines an Azure Resource Group that is associated with a specific data factory pipeline. A resource group is a container that holds related resources for an Azure solution. The resource group can include all the resources for the solution, or only those resources that you want to manage as a group.
Data Factory Name	Defines an Azure Data Factory Resource to use to execute the pipeline.
Pipeline Name	Defines the data pipeline to run when the job is executed.
Parameters	Defines specific name-and-value-pair parameters to pass when the Data Pipeline runs, as shown in the following example: Copy `<\"var1\":\"value1\", \"var2\":\"value2\">`
Status Polling Frequency	Determines the number of seconds to wait before checking the status of the Data Factory job. Default: 45
Failure Tolerance	Determines the number of times to check the job status before ending Not OK. Default: 3

Job:Boomi

Boomi AtomSphere enables you to develop, test, and run applications in the cloud.

To deploy and run a Boomi job, ensure that you have done the following:

Configured the Control-M Application Integrator plug-in, as described in Application Integrator Configuration.
Installed the Boomi plug-in with the provision image command (Control-M/EM 9.0.21 or higher) or the deploy jobtype command (Control-M/EM 9.0.20.200 or lower).

The following example shows how to define a Boomi job:

Copy

"Boomi_Job_2":
{
   "Type": "Job:Boomi",
   "ConnectionProfile": "BOOMI_CCP",
   "Atom Name": "Atom1",
   "Process Name": "New Process",
   "Polling Intervals": "20",
   "Tolerance": "3"
}

The following table describes the Boomi job parameters.

Parameter	Description
ConnectionProfile	Defines the ConnectionProfile:Boomi name that connects Control-M to Boomi.
Atom Name	Defines the name of a Boomi Atom associated with the Boomi process.
Process Name	Defines the name of a Boomi process associated with the Boomi Atom.
Polling Intervals	(Optional) Number of seconds to wait before checking the status of the job. Default: 20
Tolerance	Determines the number of times to check the job status before ending Not OK. Boomi is limited to five API calls per second. Default: 3

Job:GCP Data Fusion

Google Cloud Platform (GCP) Data Fusion is an extract, transform, and load (ETL) service that enables you to load data from multiple sources, visualize it, and publish it to the cloud.

To deploy and run a GCP Data Fusion job, ensure that you have done the following:

Configured the Control-M Application Integrator plug-in, as described in Application Integrator Configuration.
Installed the GCP Data Fusion plug-in with the provision image command (Control-M/EM 9.0.21 or higher) or the deploy jobtype command (Control-M/EM 9.0.20.200 or lower).

The following example shows how to define a GCP Data Fusion job:

Copy

"GCP Data Fusion_Job":
{
   "Type": "Job:GCP Data Fusion",
   "ConnectionProfile": "GCPDF",
   "Region": "us-west1",
   "Project Name": " Project-Name ",
   "Instance Name": " Instance-Name ",
   "Namespace ID": "default",
   "Pipeline Name": "TestBatchPipeLine",
   "Runtime Parameters": "{ \"Parameter1\":\"Value1}",
   "Get Logs": "checked",
   "Status Polling Frequency": "10",
   "Failure Tolerance": "3"   
}

The following table describes the GCP Data Fusion job parameters.

Parameter	Description
Connection Profile	Defines the ConnectionProfile:GCPDF (GCP Data Fusion) name that connects Control-M to GCP Data Fusion.
Region	Determines the region where the GCP Data Fusion job executes.
Project Name	Defines the name of the predefined Google Cloud project that holds your configured APIs, authentication information, billing details, and job resources.
Instance Name	Defines the name of the predefined virtual machine (instance) that executes your job.
Namespace ID	Defines the name of the namespace, which contains the job, job data, and metadata. Valid Characters: A–Z, a–z, 0–9, and _. Default: default
Pipeline Name	Defines the name of a predefined ETL or data integration pipeline in GCP Data Fusion.
Runtime Parameters	Defines the JSON-based body parameters that are passed to the function, as follows: Copy `"argument" : {\"var1\":\"value1\",\"var2\":\"value2\"}`
Get Logs	Determines whether to append the GCP Data Fusion logs to the output A tab in the job properties pane of the Monitoring domain where the job output appears that indicates whether a job ended OK, and is used, for example, with jobs that check file location.. Valid Values: checked unchecked Default: unchecked
Status Polling Frequency	Determines the number of seconds to wait before checking the job status. Default: 10
Failure Tolerance	Determines the number of times to check the job status before ending Not OK. Default: 3

Job:GCP Dataplex

GCP Dataplex is an extract, transform, and load (ETL) service that enables you to visualize and manage data in GCP BigQuery and the cloud.

To deploy and run a GCP Dataplex job, ensure that you have done the following:

Configured the Control-M Application Integrator plug-in, as described in Application Integrator Configuration.
Installed the GCP Dataplex plug-in with the provision image command (Control-M/EM 9.0.21 or higher) or the deploy jobtype command (Control-M/EM 9.0.20.200 or lower).

The following examples show how to define a GCP Dataplex job.

This JSON defines a Task action:

Copy

"GCP Dataplex_Task ": 
{
   "Type": "Job:GCP Dataplex",
   "ConnectionProfile": "DATAPLEX",
   "Project ID": "applied-lattice-123456",
   "Location": "europe-west2",
   "Action": "Custom Spark Task",
   "Lake Name": "Demo_Lake",
   "Task Name": "Demo_Task",
   "Status Polling Frequency": "10",
   "Failure Tolerance": "2"
}

This JSON defines a Scan action:

Copy

"GCP Dataplex_Scan": 
{
   "Type": "Job:GCP Dataplex",
   "ConnectionProfile": "DATAPLEX",
   "Project ID": "applied-lattice-123456",
   "Location": "europe-west2",
   "Action": "Data Profiling Scan",
   "Scan Name": "Demo",
   "Status Polling Frequency": "10",
   "Failure Tolerance": "2"
}

The following table describes the GCP Dataplex job parameters.

Parameter	Description
ConnectionProfile	Defines the ConnectionProfile:GCP Dataplex name that connects Control-M to GCP Dataplex.
Project ID	Defines the ID of the predefined Google Cloud project that holds your configured APIs, authentication information, billing details, and job resources.
Location	Determines the region where the GCP Dataplex job executes. us-central1
Action	Determines one of the following GCP Dataplex actions to perform: Data Quality Task: Executes a predefined data quality task in GCP BigQuery or Google Cloud Storage locations and defines data controls in BigQuery environments. Custom Spark Task: Executes a predefined, scheduled Apache Spark task to analyze and process your data. Data Profiling Scan: Executes a predefined data scan to identify shared statistical characteristics between BigQuery tables. Data Quality Scan: Executes a predefined data quality scan that validates your data and logs alerts when the data fails validation.
Lake Name	(Data Quality Task and Custom Spark Task actions only) Defines the name of the Google Cloud Storage data lakes where the job executes its task.
Task Name	(Data Quality Task and Custom Spark Task actions only) Defines the name of the predefined task that the job executes.
Scan Name	(Data Profiling Scan and Data Quality actions only) Defines the name of the predefined scan that the job executes.
Status Polling Frequency	Determines the number of seconds to wait before checking the status of the job. Default: 15
Failure Tolerance	Determines the number of times to check the job status before ending Not OK. Default: 3

Job:GCP Dataprep

GCP Dataprep enables you to visualize, format, and prepare your data for analysis.

To deploy and run a GCP Dataprep job, ensure that you have done the following:

Configured the Control-M Application Integrator plug-in, as described in Application Integrator Configuration.
Installed the GCP Dataprep plug-in with the provision image command (Control-M/EM 9.0.21 or higher) or the deploy jobtype command (Control-M/EM 9.0.20.200 or lower).

The following example shows how to define a GCP Dataprep job:

Copy

"GCP Dataprep_Job":
{
   "Type": "Job:GCP Dataprep",
   "ConnectionProfile": "GCP_DATAPREP",
   "Flow Name": "data_manipulation",
   "Parameters": "{schemaDriftOptions":{"schemaValidation": "true","stopJobOnErrorsFound": "true" }}",
   "Execute Job With Idempotency Token": "checked",
   "Idempotency Token": "Control-M-Token-%%ORDERID",
   "Status Polling Frequency": "10",
   "Failure Tolerance": "2"
}

The following table describes the GCP Dataprep job parameters.

Parameter	Description
Connection Profile	Defines the ConnectionProfile:GCP Dataprep name that connects Control-M to GCP Dataprep.
Flow Name	Defines the name of the flow, which is the workspace where you format and prepare your data.
Parameters	Defines parameters that override the flow or its data sets when the job executes. For more information on parameter types, see the properties of runFlow service in the GCP Dataprep API documentation.
Execute Job with Idempotency Token	Determines whether to execute the job with an idempotency token. Valid Values: checked unchecked Default: unchecked
Idempotency Token	Defines a unique ID (idempotency token), which guarantees that the job executes only once. Default: Control-M-Idem-%%ORDERID
Status Polling Frequency	Determines the number of seconds to wait before checking the status of the job. Default: 10
Failure Tolerance	Determines the number of times to check the job status before ending Not OK. Default: 2

Job:Informatica

Informatica enables you to automate tasks or workflows based on the parameters that you define.

To deploy and run an Informatica job, ensure that you have done the following:

Installed the Application Pack, which includes the Control-M for Informatica plug-in.
Created a connection profile, as described in ConnectionProfile:Informatica.

The following example shows how to define an Informatica job:

Copy

"InformaticaApiJob":
{
   "Type": "Job:Informatica",
   "ConnectionProfile": "INFORMATICA_CONNECTION",
   "RepositoryFolder": "POC",
   "Workflow": "WF_Test",
   "InstanceName": "MyInstamce",
   "OsProfile": "MyOSProfile",
   "WorkflowExecutionMode": "RunSingleTask",
   "RunSingleTask": "s_MapTest_Success",
   "WorkflowRestartMode": "ForceRestartFromSpecificTask",
   "RestartFromTask": "s_MapTest_Success",
   "WorkflowParametersFile": "/opt/wf1.prop"
}

The following table describes the Informatica job parameters.

Parameter	Description
ConnectionProfile	Defines the ConnectionProfile:Informatica name that connects Control-M to Informatica.
RepositoryFolder	Defines the Repository folder that contains the workflow that you want to run.
Workflow	Defines the workflow that you want to run in Control-M for Informatica.
InstanceName	(Optional) Defines the specific instance of the workflow that you want to run.
OsProfile	(Optional) Defines the operating system profile in Informatica.
WorkflowExecutionMode	Defines the mode for executing the workflow, one of the following: RunWholeWorkflow: Runs the whole workflow. StartFromTask: Starts running the workflow from a specific task, as specified by the StartFromTask parameter. RunSingleTask: Runs a single task in the workflow, as specified by the RunSingleTask parameter.
StartFromTask	Defines the task from which to start running the workflow. This parameter is required only if you set WorkflowExecutionMode to StartFromTask.
RunSingleTask	Defines the workflow task that you want to run. This parameter is required only if you set WorkflowExecutionMode to RunSingleTask.
Depth	Determines the number of levels within the workflow task hierarchy for the selection of workflow tasks. Default: 10
EnableOutput	Determines whether to include the workflow events log in the job output. Valid Values: true false Default: true
EnableErrorDetails	Determines whether to include a detailed error log for a workflow that failed. Valid Values: true false Default: true
WorkflowRestartMode	Determines one of the following operations to execute when the workflow is in a suspended satus: Recover: Recovers the suspended workflow. ForceRestart: Forces a restart of the suspended workflow. ForceRestartFromSpecificTask: Forces a restart of the suspended workflow from a specific task, as specified by the RestartFromTask parameter.
RestartFromTask	Defines the task from which to restart a suspended workflow. This parameter is required only if you set WorkflowRestartMode to ForceRestartFromSpecificTask.
WorkflowParametersFile	(Optional) Defines the path and name of the workflow parameters file. This parameter enables you to use the same workflow for different actions.

Job:Informatica CS

Informatica Cloud Services (CS) enable you to integrate and synchronize data, applications, and processes that are on-premises or in the cloud.

To deploy and run an Informatica CS job, ensure that you have done the following:

Configured the Control-M Application Integrator plug-in, as described in Application Integrator Configuration.
Installed the Informatica CS plug-in with the provision image command (Control-M/EM 9.0.21 or higher) or the deploy jobtype command (Control-M/EM 9.0.20.200 or lower).

The following example shows how to define an Informatica CS job.

This JSON defines a regular Informatica CS job:

Copy

"InformaticaCloudCSJob":
{
   "Type": "Job:Informatica CS",
   "ConnectionProfile": "INFORMATICA_CS_CONNECTION",
   "Task Type": "Synchronization task",
   "Use Federation ID": "checked",
   "Task Name": "",
   "Folder Path": "Default/defualt-MappingTask1",
   "Call Back URL": "",
   "Status Polling Frequency": "10"
}

This JSON defines an Informatica CS job for a taskflow:

Copy

"InformaticaCloudCSJob":
{
   "Type": "Job:Informatica CS",
   "ConnectionProfile": "INFORMATICA_CS_CONNECTION",
   "Task Type": "Taskflow",
   "TaskFlow URL": "https://xxx.dm-xx.informaticacloud.com/active-bpel/rt/xyz",
   "Input Fields": "input1=val1&input2=val2&input3=val3",
   "Call Back URL": "",
   "Rerun suspended Taskflow": "checked",
   "Rerun Run ID": "RUN-UCM-RUNID",
   "Status Polling Frequency": "10"
}

The following table describes the Informatica CS job parameters.

Parameter	Description
ConnectionProfile	Defines the ConnectionProfile:Informatica CS name that connects Control-M to Informatica Cloud.
Task Type	Determines one of the following task types to run on Informatica Cloud: Mapping Task: A set of instructions that defines how data is transformed and moved from its source to its target system. Masking Task: A data security technique that enables you to protect sensitive data while allowing it to be used for non-production purposes. PowerCenter Task: A data integration tool that enables you to extract, transform, and load data from different sources into a unified target system. Replication Task: A data replication solution that enables you to replicate and synchronize data across different systems and databases in real time. Synchronization Task: A data integration solution that enables you to synchronize data between different systems and databases, ensuring that data is consistent and up-to-date across all systems. Linear Taskflow: A workflow automation feature that enables you to create and automate a sequence of tasks that are executed in a specific order, which helps streamline data integration and processing tasks. Taskflow: A workflow automation feature that enables you to create complex workflows that orchestrate and automate data integration and processing tasks across multiple systems and platforms.
Use Federation ID	Determines whether to identify the task using a Federated Task ID, which is a unique identifier that is used track and manage tasks across distributed environments in a federated environment. This ID is generated by the Informatica domain and is important for monitoring and troubleshooting tasks. This parameter is not required when you run a taskflow. Valid Values: checked unchecked Default: unchecked
Task Name	Defines the name of the task that executes on Informatica Cloud. This parameter is not required when you run a taskflow or use a Federated Task ID.
Folder Path	Defines the folder path of the task that executes on Informatica Cloud. This parameter is required if you are using a Federated Task ID.
TaskFlow URL	Defines the service URL of the taskflow that executes on Informatica Cloud. You can find this URL by clicking in the top, right corner of the TaskFlow main page of Informatica Data Integrator and clicking Properties Detail....
Input Fields	Defines input fields for a taskflow, expressed as input=value pairs separated by the & character.
Call Back URL	(Optional) Defines a publicly available URL where the job status is posted.
Rerun suspended Taskflow	Determines whether to rerun a suspended taskflow. Valid Values: checked unchecked Default: unchecked
Rerun Run ID	Defines the Run ID to rerun a suspended taskflow. The Run ID is unique to each job run and is available in the job output, next to the variable name RUN-UCM-RUNID.
Status Polling Frequency	Determines the number of seconds to wait before checking the status of the Informatica Cloud Services job.

Job:OCI Data Integration

OCI Data Integration is an Oracle Cloud Infrastructure (OCI) platform, that enables data extraction, transformation, and loading (ETL) processes across various sources and targets within the Oracle Cloud.

To deploy and run an OCI Data Integration job, ensure that you have done the following:

Configured the Control-M Application Integrator plug-in, as described in Application Integrator Configuration.
Installed the OCI Data Integration plug-in with the provision image command (Control-M/EM 9.0.21 or higher) or the deploy jobtype command (Control-M/EM 9.0.20.200 or lower).

The following example shows how to define an OCI Data Integration job:

Copy

"OCI Data Integration_Job_2":
{
   "Type": "Job:OCI Data Integration",
   "ConnectionProfile": "ODI",
   "Actions": "Run Task",
   "Workspace OCID": "ocid1.disworkspace.oc1.phx.anyhqljr2ow63uhgkujyt76876984kt3ycpoltpakb57flphqowx3eeia",
   "Application Key": "0dab7145-1e2b-4d2b-844e-kjhiuliuyl236358be",
   "Task Key": "b5636bc5-d642-12ca0-83a0-9b20c17d35bda",
   "Task Run Name": "Task1",
   "Task Run Input Parameters": 
   {
      "PARAMETER":
      {
         "simpleValue":"Hello"
      }   
      "PARAMETER2":   
      {
         "simpleValue":"Hello222"
      }
   }
   "Status Polling Frequency": "15",
   "Failure Tolerance": "2"
}

The following table describes the OCI Data Integration job parameters.

Attribute	Description
ConnectionProfile	Determines the Connection Profile:OCI Data Integration name that connects Control-M to OCI Data Integration.
Actions	Determines one of the following OCI Data Integration actions: Start Workspace: Activates the OCI Data Integration Workspace with the Workspace OCID Stop Workspace: Deactivates the OCI Data Integration Workspace with the Workspace OCID. Run Task: Executes the Oracle dataflow task on the active OCI Data Integration Workspace and defines the task parameters.
Workspace OCID	Determines the ID of the OCI Data Integration workspace, which is a logical container for managing pipelines, data flows, and projects.
Application Key	Defines the Application Key which identifies the project application where the task runs.
Task Key	Determines the Task Key that is used to run the job.
Task Run Name	Defines the unique name for the specific task run.
Task Run Input Parameters	(Optional) Defines the input parameters for the specific task run, in JSON format.
Status Polling Frequency	Determines the number of seconds to wait before checking the status of the OCI Data Integration job. Default: 15
Failure tolerance	Determines the number of times to check the job status before ending Not OK. Default: 2

Job:Talend Data Management

Talend Data Management is an automation service that enables you to integrate applications, and extract, transform, load, and check the quality of large amounts of data.

To deploy and run a Talend Data Management job, ensure that you have done the following:

Configured the Control-M Application Integrator plug-in, as described in Application Integrator Configuration.
Installed the Talend Data Management plug-in with the provision image command (Control-M/EM 9.0.21 or higher) or the deploy jobtype command (Control-M/EM 9.0.20.200 or lower).

The following examples show how to define a Talend Data Management job.

This JSON defines a Talend job that executes a task (job) via the Task Name parameter:

Copy

"Talend Data Management":
{
   "Type": "Job: Talend Data Management",
   "ConnectionProfile": "TALENDDATAM",
   "Task/Plan Execution": "Execute Task",
   "Task Name": "GetWeather job",
   "Parameters": "{"parameter_city":"London","parameter_appid":"43be3fea88g092d9226eb7ca"}"
   "Log Level": "Information",
   "Bring logs to output": "checked",
   "Task Polling Intervals" : "10"
}

This JSON defines a Talend job that executes a task (job) via the Task ID parameter:

Copy

"Talend Data Management":
{
   "Type": "Job: Talend Data Management",
   "ConnectionProfile": "TALENDDATAM",
   "Task/Plan Execution": "Execute Task",
   "Task ID": "12423rwrt2424sdgf32423",
   "Parameters": "{"parameter_city":"London","parameter_appid":"43be3fea88g092d9226eb7ca"}"
   "Log Level": "Information",
   "Bring logs to output": "checked",
   "Task Polling Intervals" : "10"
}

This JSON defines a Talend job that executes a plan (workflow):

Copy

"Talend Data Management":
{
   "Type": "Job: Talend Data Management",
   "ConnectionProfile": "TALENDDATAM",
   "Task/Plan Execution": "Execute Plan",
   "Log Level": "Information",
   "Bring logs to output": "unchecked",
   "Plan Name": "Plan1"
   "Plan Body Parameters": "{"executable": "b91cf8b2-5dd1-4b18-915b-4c447cee5267","rerunOnlyFailedTasks": true,"stepId": "09043c9f-02d0-41f6-b3cb-0ea53ffde377"}"
   "Append Failed Plan Logs to Output": "unchecked",
}

The following table describes the Talend Data Management job parameters.

Parameter	Description
ConnectionProfile	Defines the ConnectionProfile:Talend Data Management name that connects Control-M to Talend Data Management.
Task/Plan Execution	Determines one the following operations to perform: Execute Task: Executes a Talend job. Execute Plan: Executes a Talend workflow.
Task Name	(Execute Task) Defines the name of the Talend task that is executed, as defined in the Tasks and Plans page in the Talend Management Console. If you define this attribute, you do not need to define the Task ID, since both attributes refer to the same task.
Task ID	(Execute Task) Defines the ID of the Talend task that is executed, as defined in the Tasks and Plans page in the Talend Management Console. If you define this attribute, you do not need to define the Task Name, since both attributes refer to the same task.
Parameters	(Execute Task) Defines specific parameters, in JSON format, to pass when the Talend job executes. All parameter names must contain a parameter_ prefix, as appears in the following example: Copy `{"parameter_param1":"value1", "parameter_param2":"value2"}` For no parameters, type {}.
Log Level	(Execute Task) Determines the amount of log details are recorded in the Talend task logs, as follows: Information: Records all task execution details in the logs. Warning: Only records warnings. Error: Only records errors. Off: Records no information in the logs.
Bring Logs to Output	(Execute Task) Determines whether to append Talend log messages to the job output. Valid Values: checked unchecked Default: uncheckeds
Task Polling Intervals	(Execute Task) Determines the number of seconds to wait before checking the status of the triggered task. Default: 10
Plan Name	(Execute Plan) Defines the name of the Talend plan that is executed, which is defined in the Tasks and Plans page in the Talend Management Console.
Plan Body Parameters	(Execute Plan) Defines the specific parameters, in JSON format, that are passed to Talend when the job executes, as shown in the following example: Copy `{ "executable": "b91cf8b2-5dd1-4b18-915b-4c447cee5267", "rerunOnlyFailedTasks": true, "stepId": "09043c9f-02d0-41f6-b3cb-0ea53ffde377" }` where executable is the Plan ID. The Plan Name attribute is ignored when you define this attribute.
Append Failed Plan Logs to Output	(Execute Plan) Determines whether Talend logs are appended to the output when the plan fails to execute. Default: Unchecked
Plan Polling Intervals	(Execute Plan) Determines the number of seconds to wait before checking the status of the triggered plan. Default: 10

Job:Talend OAuth

Talend OAuth (Open Authorization) enables you to use OAuth authentication within the Talend suite of data integration and management tools. It allows third-party applications to access resources on behalf of a user without sharing sensitive credentials.

To deploy and run a Talend OAuth job, ensure that you have done the following:

Configured the Control-M Application Integrator plug-in, as described in Application Integrator Configuration.
Installed the Talend OAuth plug-in with the provision image command (Control-M/EM 9.0.21 or higher) or the deploy jobtype command (Control-M/EM 9.0.20.200 or lower).

For more information about this plug-in, see Control-M for Talend OAuth .

The following examples show how to define a Talend OAuth job.

This JSON defines a Talend OAuth job that executes a task (job) via the Task Name parameter:

Copy

"Talend OAuth":
{
   "Type": "Job:Talend OAuth",
   "ConnectionProfile": "TALEND",
   "Action": "Execute Task by Name",
   "Environment ID": "651c0adef9999442f89b3e682",
   "Task Name": "BMC_ParamTest",
   "Task Timeout": "5",
   "Parameters": "{"custom_message":"Talend test", "number_of_message": 4, "sleep_time": 900}",
   "Log Level": "Information",
   "Append Task Logs to Output": "checked",
   "Status Polling Frequency": "30",
   "Failure Tolerance": "1"
}

This JSON defines a Talend OAuth job that executes a task (job) via the Task ID parameter:

Copy

"Talend OAuth_Job_2": 
{
   "Type": "Job:Talend OAuth",
   "ConnectionProfile": "TALEND",
   "Action": "Execute Task by ID",
   "Task Executable": "6602a2d947ebbf4cb8c997ac",
   "Parameters": "{"custom_message":"Talend test", "number_of_message": 4, "sleep_time": 900}",
   "Log Level": "Information",
   "Append Task Logs to Output": "checked",
   "Task Timeout": "10",
   "Status Polling Frequency": "30",
   "Failure Tolerance": "1"
}

This JSON defines a Talend OAuth job that executes a plan (workflow):

Copy

"Talend OAuth_Job_3": 
{
   "Type": "Job:Talend OAuth",
   "ConnectionProfile": "TALEND",
   "Action": "Execute Plan",
   "Plan Executable": "d2c32f46-d07a-46f9-b68f-f4a95c593557",
   "Append Failed Plan Logs to Output": "checked",
   "Status Polling Frequency": "30",
   "Failure Tolerance": "1"
}

The following table describes Talend OAuth job attributes.

Attribute	Action	Description
ConnectionProfile	All Actions	Defines the a Connection Profile:Talend OAuth name that connects Control-M to Talend OAuth.
Action	NA	Determines one of the following actions to perform: Execute Task by ID: Executes a Talend task (job) by Task Executable (ID). Execute Task by Name: Executes a Talend task (job) by Name. Execute Plan: Executes a Talend plan (workflow).
Environment ID	Execute Task by Name	Defines the Environment ID where the task is executed.
Task Name	Execute Task by Name	Defines the name of the predefined Talend task that is executed, as defined in the Tasks and Plans page in the Talend Management Console.
Task Executable	Execute Task by ID	Defines the ID of the predefined Talend task that is executed, as defined in the Tasks and Plans page in the Talend Management Console.
Parameters	Execute Task by ID Execute Task by Name	Defines specific parameters, in JSON format, to pass when the Talend job executes. Copy `{"custom_message":"Talend test", "number_of_message": 4, "sleep_time": 100 }` For no parameters, type {}.
Log Level	Execute Task by ID Execute Task by Name	Determines the amount of log details that are recorded in the Talend task logs, as follows: Information: Records all task execution details in the logs. Warning: Records warnings. Error: Records errors. Off: Records no information in the logs.
Task Timeout	Execute Task by ID Execute Task by Name	(Optional) Determines the number of minutes to wait before the task is executed.
Append Task Logs to Output	Execute Task by ID Execute Task by Name	Determines whether to append the Talend logs to the job output when the plan fails during the execution. Valid Values: Checked Unchecked Default: Unchecked
Plan Executable	Execute Plan	Defines the Plan executable that you want to execute, as defined in the Tasks and Plans page in the Talend Management Console.
Append Failed Plan Logs to Output	Execute Plan	Determines whether to append the failed Talend Plan logs in the job output when the plan fails during the execution. Valid Values: Checked Unchecked Default: Unchecked
Status Polling Frequency	All Actions	Determines the number of seconds to wait before Control-M checks the status of the job or the job's output. Default :10
Failure Tolerance	All Actions	Determines the number of times to check the job status before the job ends Not Ok. Default: 2

Job:Trifacta

Trifacta is a data-wrangling platform that allows you to discover, organize, edit, and publish data in different formats and to multiple clouds, including AWS, Azure, Google, Snowflake, and Databricks.

To deploy and run a Trifacta job, ensure that you have done the following:

Configured the Control-M Application Integrator plug-in, as described in Application Integrator Configuration.
Installed the Trifacta plug-in with the provision image command (Control-M/EM 9.0.21 or higher) or the deploy jobtype command (Control-M/EM 9.0.20.200 or lower).

The following example shows how to define a Trifacta job:

Copy

"Trifacta_Job_2":
{
   "Type": "Job:Trifacta",
   "ConnectionProfile": "TRIFACTA",
   "Flow Name": "Flow",
   "Rerun with New Idempotency Token": "checked",
   "Idempotent Token": "Control-M-Idem_%%ORDERID'",
   "Retrack Job Status": "checked",
   "Run ID": "Run_ID",
   "Status Polling Frequency": "15"
}

The following table describes the Trifacta job parameters.

Parameter	Description
ConnectionProfile	Defines the ConnectionProfile:TRIFACTA name that connects Control-M to Trifacta.
Flow Name	Determines which Trifacta flow the job runs.
Rerun with New Idempotency Token	Determines whether to allow rerun of the job in Trifacta with a new idempotency token (for example, when the job run times out). Valid Values: checked unchecked Default: unchecked
Idempotent Token	Defines a unique ID (idempotent token), which guarantees that the job executes only once. To allow rerun of the job with a new token, replace the default value with a unique ID that has not been used before. Use the RUN_ID, which can be retrieved from the job output. Default: Control-M-Idem_%%ORDERID
Retrack Job Status	Determines whether to track job run status as the job execution and status changes, for example—from in-progress to failed or to completed. Valid Values: checked unchecked Default: unchecked
Run ID	Defines the RUN_ID number for the job run to be tracked. The RUN_ID is unique to each job run and it can be found in the job output.
Status Polling Frequency	Determines the number of seconds to wait before checking the status of the Trifacta job. Default: 10