Data Integration Jobs
The following topics describe job attributes that work with Data Integration platforms and services:
Alteryx Trifacta Job
Alteryx Trifacta is a data-wrangling platform that allows you to discover, organize, edit, and publish data in different formats and to multiple clouds, including AWS, Azure, Google, Snowflake, and Databricks.
The following table describes the Trifacta job attributes.
Attribute |
Description |
---|---|
Connection Profile |
Determines the authorization credentials that are used to connect Control-M to Alteryx Trifacta. Rules:
|
Flow Name |
Determines which Trifacta flow the job executes. |
Rerun with New Idempotency Token |
Determines whether to allow re-execution of the job in Trifacta with a new idempotency token (for example, when the job execution times out). |
Idempotent Token |
Defines a unique ID (idempotency token), which guarantees that the job executes only once. After successful execution, this ID cannot be used again. To re-execute the job with a new token, replace the default value with the RUN_ID, which can be retrieved from the job output. Default: Control-M-Idem_%%ORDERID |
Retrack Job Status |
Determines whether to track job execution status as the job execution progresses and the status changes (for example, from in-progress to failed or to completed). |
Run ID |
Defines the RUN_ID number for the job execution to be tracked. The RUN_ID is unique to each job execution and can be found in the job output. |
Status Polling Frequency |
Determines the number of seconds to wait before Control-M checks the status of the Trifacta job. Default: 10 |
AWS Glue Job
AWS Glue enables you to define data-driven workflows that automate the movement and transformation of data.
The following table describes AWS Glue job attributes.
Attribute |
Description |
---|---|
Connection Profile |
Determines the authorization credentials that are used to connect Control-M to AWS Glue. Rules:
|
Glue Job Name |
Defines the AWS Glue job name that you want to execute. A job name is automatically saved when you create an AWS Glue pipeline. |
Glue Job Arguments |
Determines whether to add arguments to the AWS Glue job. |
Arguments |
Defines the AWS Glue job execution-time parameters, In JSON format. Copy
|
Status Polling Frequency |
(Optional) Determines the number of seconds to wait before checking the status of the job between intervals. Default: 15 |
Failure Tolerance |
Determines the number of times to check the job status before ending Not OK. Default: 2 |
AWS Glue DataBrew Job
AWS Glue DataBrew is a cloud-based extract, transform, load (ETL) service that you can use to visualize your data and publish it to the Amazon S3 Data Lake.
The following table describes AWS Glue DataBrew job attributes.
Attribute |
Description |
---|---|
Connection Profile |
Determines the authorization credentials that are used to connect Control-M to AWS Glue DataBrew. Rules:
|
Job Name |
Defines the AWS Glue DataBrew job name. |
Output Job Logs |
Determines whether the DataBrew job logs are included in the Control-M output. |
Status Polling Frequency |
Determines the number of seconds to wait before checking the status of the DataBrew job. Default: 10 |
Failure Tolerance |
Determines the number of times to check the job status before ending Not OK. Default: 2 |
Azure Data Factory Job
Azure Data Factory is a cloud-based extract, transform, load (ETL) and data integration service that allows you to create data-driven workflows to automate the movement and transformation of data.
The following table describes the Azure Data Factory job attributes.
Attribute |
Description |
---|---|
Connection Profile |
Determines the authorization credentials that are used to connect Control-M to Azure Data Factory. Rules:
|
Resource Group Name |
Determines the Azure Resource Group that is associated with a specific data factory. A resource group is a container that holds related resources for an Azure solution. The resource group can include all the resources for the solution, or only those resources that you want to manage as a group. |
Data Factory Name |
Determines the name of the Azure Data Factory that contains the pipeline you want to execute. |
Pipeline Name |
Determines which data pipeline executes when you execute the Control-M job. |
Parameters |
Defines specific parameters, in JSON format, that are passed when the data pipeline executes. Copy
|
Status Polling Frequency |
Determines the number of seconds to wait before checking the status of the Data Factory job. Set to 120 seconds or longer for jobs that execute for more than an hour. Default: 45 |
Failure Tolerance |
Determines the number of times to check the job status before ending Not OK. Default: 3 |
Boomi AtomSphere Job
Boomi Atomsphere enables you to develop, test, and run applications in the cloud.
The following table describes the Boomi job attributes.
Parameter |
Description |
---|---|
Connection Profile |
Determines the authorization credentials that are used to connect Control-M to Boomi AtomSphere. |
Atom Name |
Defines the name of a Boomi Atom associated with the Boomi process. |
Process Name |
Defines the name of a Boomi process associated with the Boomi Atom. |
Status Polling Frequency |
Determines the number of seconds to wait before checking the status of the job between intervals. Default: 20 seconds |
Tolerance |
Determines the number of times to check the job status before ending Not OK. If the API call that checks the execution status fails due to the Boomi limitation of a maximum of 5 calls per second, it will retry again according to the number in the Tolerance field. Default: 3 times |
GCP Dataprep Job
GCP Dataprep enables you to visualize, format, and prepare your data for analysis.
The following table describes the GCP Dataprep job attributes.
Attribute |
Description |
---|---|
Connection Profile |
Determines the authorization credentials that are used to connect Control-M to GCP Dataprep. Rules:
|
Flow Name |
Defines the name of the flow, which is the workspace where you format and prepare your data. |
Parameters |
Defines parameters that override the flow or its datasets when the job executes. Copy
For more information on parameter types, see the properties of runFlow service in the GCP Dataprep API documentation. |
Execute Job with Idempotency Token |
Determines whether to execute the job with an idempotency token. |
Idempotency Token |
Defines a unique ID (idempotency token), which guarantees that the job executes only once. Default: Control-M-Idem-%%ORDERID |
Status Polling Frequency |
Determines the number of seconds to wait before checking the status of the job. Default: 10 |
Failure Tolerance |
Determines the number of times to check the job status before ending Not OK. Default: 2 |
Informatica Job
Informatica enables you to automate tasks or workflows based on the parameters you define.
The following table describes the Informatica job attributes.
Attribute |
Description |
---|---|
Connection Profile |
Determines the authorization credentials that are used to connect Control-M to Informatica. Rules:
Variable Name: %%INF-ACCOUNT |
Repository Folder |
Defines the repository folder that contains the workflow that you want to execute. Variable Name: %%INF-REP_FOLDER |
Workflow |
Defines the workflow that you want to execute. Variable Name: %%INF-WORKFLOW |
Instance Name |
Defines the specific instance of the workflow that you want to execute. Variable Name: %%INF-INSTANCE_NAME |
OS Profile |
Enables you to specify an OS profile when executing or re-executing an Informatica job. |
Run Options |
Defines the workflow task hierarchy. |
Depth |
Determines the number of levels within the workflow task hierarchy that are used to select workflow tasks. Default: 10 Variable Name: %%INF- DEPTH |
Run |
Determines whether to execute the whole workflow, start from a specific task, or execute a single task as follows:
|
Parameters |
Determines an array of parameters that is passed to the workflow. Each parameter is comprised of the following:
|
Include Workflow Events Log in Job Output |
Determines whether to include the workflow event log in the job output |
Include Detailed Error Log for Failed Sessions |
Determines whether to include a detailed error log for a workflow that failed. |
Get Session Statistics and Log |
Determines whether to retrieve session statistics and log messages. |
Action on Rerun |
Determines which operation is executed when the workflow is suspended, as follows:
Variable Name: %%INF- RESTART_FROM_TASK |
Workflow Parameters File |
Defines the path and name of the workflow parameters file. Variable Name: %%INF-WORKFLOW_PARAMETERS_FILE |
Informatica CS Job
Informatica Cloud Services (CS) enable you to integrate and synchronize data, applications, and processes that reside on-premises or in the cloud.
The following table describes the Informatica Cloud Services job attributes.
Attribute |
Description |
---|---|
Connection Profile |
Determines the authorization credentials that are used to connect Control-M to Informatica Cloud Services. Rules:
Variable Name: %%INF-ACCOUNT |
Task Type |
Determines one of the following task types to execute on Informatica Cloud:
|
Use Federation ID |
Determines whether to identify the task using a Federated Task ID, which is a unique identifier that is used track and manage tasks across distributed environments in a federated environment. This ID is generated by the Informatica domain and is important for monitoring and troubleshooting tasks. This attribute is not required when you execute a taskflow. |
Task Name |
Defines the name of the task that executes on Informatica Cloud. This attribute is not required when you execute a taskflow or use a Federated Task ID. |
Folder Path |
Defines the folder path of the task that executes on Informatica Cloud. This attribute is required if you are using a Federated Task ID. |
TaskFlow URL |
Defines the service URL of the taskflow that executes on Informatica Cloud. You can find this URL by clicking |
Rerun Suspended Taskflow |
Determines whether to re-execute a suspended taskflow. |
Input Fields |
Defines input fields for a taskflow, in the following format: input1=value1&input2=value2&input3=value3 |
Call Back URL |
(Optional) Defines a publicly available URL where the job status is posted. |
Rerun Run ID |
Defines the Run ID to re-execute a suspended taskflow. The Run ID is unique to each job execution and is available in the job output, next to the variable name RUN-UCM-RUNID. |
Status Polling Frequency |
Determines the number of seconds to wait before checking the status of the Informatica Cloud Services job. |
Talend Data Management Job
The Talend Data Management job enables you to integrate your data management and integration tasks and plans from Talend with your existing Control-M workflows.
The following table describes Talend Data Management job attributes.
Attribute |
Description |
---|---|
Connection Profile |
Determines the authorization credentials that are used to connect Control-M to Talend Data Management. Rules:
|
Task/Plan Execution |
Determines one of the following options for execution in Talend:
|
Task Name /
|
Defines the name of the Talend task or plan to execute, as defined in the Tasks and Plans page in the Talend Management Console. |
Parameters |
(For a Task) Defines specific parameters, in JSON format, to pass when the Talend job executes. All parameter names must contain the parameter_ prefix. Copy
For no parameters, type {}. |
Log Level |
(For a task) Determines one of the following levels of detail in log messages for the triggered task in the Talend Management Console:
|
Bring logs to output |
(For a task) Determines whether to show Talend log messages in the job output. Default: unchecked |
Task Polling Intervals / Plan Polling Intervals |
Determines the number of seconds to wait before checking the status of the triggered task or plan. Default: 10 second |