Data Integration Jobs

The following topics describe job attributes that work with Data Integration platforms and services:

Alteryx Trifacta Job
AWS Glue Job
AWS Glue DataBrew Job
Azure Data Factory Job
Boomi AtomSphere Job
Informatica Job
Informatica CS Job
Talend Data Management Job

Alteryx Trifacta Job

Alteryx Trifacta is a data-wrangling platform that allows you to discover, organize, edit, and publish data in different formats and to multiple clouds, including AWS, Azure, Google, Snowflake, and Databricks.

The following table describes the Trifacta job type attributes.

Attribute	Description
Connection Profile	Determines the authorization credentials that are used to connect Control-M to Alteryx Trifacta. Rules: Characters: 1−30 Case Sensitive: Yes Invalid Characters: Blank spaces.
Flow Name	Determines which Trifacta flow the job runs.
Rerun with New Idempotency Token	Determines whether to allow rerun of the job in Trifacta with a new idempotency token (for example, when the job run times out).
Idempotent Token	Defines the unique ID (idempotency token) that guarantees the job run is executed only once. After successful execution, this ID cannot be used again. To allow rerun of the job with a new token, replace the default value with a unique ID that has not been used before. Use the RUN_ID, which can be retrieved from the job output. Default: Control-M-Idem_%%ORDERID — job run cannot be executed again.
Retrack Job Status	Determines whether to track job run status as the job run progresses and the status changes (for example, from in-progress to failed or to completed).
Run ID	Defines the RUN_ID number for the job run to be tracked. The RUN_ID is unique to each job run and can be found in the job output.
Status Polling Frequency	Determines the number of seconds to wait before checking the status of the Trifacta job. Set to 120 seconds or longer for jobs that run for more than an hour. Default: 10

AWS Glue Job

The following table describes AWS Glue job attributes.

Attribute	Description
Connection profile	Determines the authorization credentials that are used to connect Control-M to AWS Glue. Rules: Characters: 1−30 Case Sensitive: Yes Invalid Characters: Blank spaces.
Glue Job Name	Defines the AWS Glue job name that you want to run. After you create an AWS Glue pipeline, it is saved as a jobname and you can run it.
Glue Job Arguments	Determines whether to add arguments to the AWS Glue job.
Arguments	Defines the AWS Glue job runtime parameters, In JSON format. Copy `{"--myArg1": "myVal1", "--myArg2": "myVal2"}`
Status Polling Frequency	(Optional) Determines the number of seconds to wait before checking the status of the job between intervals. Set to 120 seconds or longer for jobs that run for more than an hour. Default: 15
Failure Tolerance	Determines the number of times to check the job status before ending Not OK. Default: 2

AWS Glue DataBrew Job

AWS Glue DataBrew is a cloud-based ETL service that you can use to visualize your data and publish it to the Amazon S3 Data Lake.

The following table describes AWS Glue DataBrew job attributes.

Attribute	Description
Connection Profile	Determines the authorization credentials that are used to connect Control-M to AWS Glue DataBrew. Rules: Characters: 1−30 Case Sensitive: Yes Invalid Characters: Blank spaces.
Job Name	Defines the AWS Glue DataBrew job name.
Output Job Logs	Determines whether the DataBrew job logs are included in the Control-M output.
Status Polling Frequency	Determines the number of seconds to wait before checking the status of the DataBrew job. Set to 120 seconds or longer for jobs that run for more than an hour. Default: 10
Failure Tolerance	Determines the number of times to check the job status before ending Not OK. Default: 2

Azure Data Factory Job

Azure Data Factory is a cloud-based ETL and data integration service that allows you to create data-driven workflows to automate the movement and transformation of data.

The following table describes the Azure Data Factory job type attributes.

Attribute	Description
Connection Profile	Determines the authorization credentials that are used to connect Control-M to Azure Data Factory. Rules: Characters: 1−30 Case Sensitive: Yes Invalid Characters: Blank spaces. Variable Name: %%AZURE-ACCOUNT
Resource Group Name	Determines the Azure Resource Group that is associated with a specific data factory. A resource group is a container that holds related resources for an Azure solution. The resource group can include all the resources for the solution, or only those resources that you want to manage as a group.
Data Factory Name	Determines the name of the Azure Data Factory that contains the pipeline you want to run.
Pipeline Name	Determines which data pipeline runs when the Control-M job is executed.
Parameters	Defines specific parameters, in JSON format, that are passed when the data pipeline runs. Copy `{"var1":"value1", "var2":"value2"}`
Status Polling Frequency	Determines the number of seconds to wait before checking the status of the Data Factory job. Set to 120 seconds or longer for jobs that run for more than an hour. Default: 45
Failure Tolerance	Determines the number of times to check the job status before ending Not OK. Default: 3

Boomi AtomSphere Job

The Boomi AtomSphere job enables you to integrate Boomi processes with your existing Control-M workflows.

The following table describes Boomi job parameters:

Parameter	Description
Connection Profile	Determines the authorization credentials that are used to connect Control-M to Boomi AtomSphere.
Atom Name	Defines the name of a Boomi Atom associated with the Boomi process.
Process Name	Defines the name of a Boomi process associated with the Boomi Atom.
Status Polling Frequency	Determines the number of seconds to wait before checking the status of the job between intervals. Set to 120 seconds or longer for jobs that run for more than an hour. Default: 20 seconds
Tolerance	Determines the number of times to check the job status before ending Not OK. If the API call that checks the run status fails due to the Boomi limitation of a maximum of 5 calls per second, it will retry again according to the number in the Tolerance field. Default: 3 times

Informatica Job

The Informatica job enables you to automate an Informatica workflow or tasks within the workflow, and define the parameters to pass to the workflow.

The following table describes the Informatica job type attributes.

Attribute	Description
Connection Profile	Determines the authorization credentials that are used to connect Control-M to Informatica. Rules: Characters: 1−30 Case Sensitive: Yes Invalid Characters: Blank spaces. Variable Name: %%INF-ACCOUNT
Repository Folder	Defines the repository folder that contains the workflow that you want to run. Variable Name: %%INF-REP_FOLDER
Workflow	Defines the workflow that you want to run. Variable Name: %%INF-WORKFLOW
Instance Name	Defines the specific instance of the workflow that you want to run. Variable Name: %%INF-INSTANCE_NAME
OS Profile	Enables you to specify an OS profile when running or rerunning an Informatica job.
Run Options	Defines the workflow task hierarchy.
Depth	Determines the number of levels within the workflow task hierarchy that are used to select workflow tasks. Default: 10 Variable Name: %%INF- DEPTH
Run	Determines whether to run the whole workflow, start from a specific task, or run a single task as follows: Run the Whole Workflow: Runs the whole workflow. Start from Task: Starts the workflow from the task that you specify. Variable Name: %%INF- START_FROM_TASK Run Single Task: Runs the task that you specify. Variable Name: %%INF-RUN_SINGLE_TASK
Parameters	Determines an array of parameters that is passed to the workflow. Each parameter is comprised of the following: Scope: Defines the scope of the parameter in an array definition. Name: Defines the parameter name in an array definition. Value: Defines the parameter value in an array definition.
Include Workflow Events Log in Job Output	Determines whether to include the workflow event log in the job outputA tab in the job properties pane in the Monitoring domain that shows the output of a job, which indicates whether a job ended OK, and used, for example, with jobs that check file location.
Include Detailed Error Log for Failed Sessions	Determines whether to include a detailed error log for a workflow that failed.
Get Session Statistics and Log	Determines whether to retrieve session statistics and log messages.
Action on Rerun	Determines which operation is executed when the workflow is suspended, as follows: Recover: Restarts a suspended workflow from the point of failure. Force Restart: Restarts a suspended workflow from the beginning. Force Restart from a Specific Task: Restarts the suspended workflow from the task that you define. Variable Name: %%INF- RESTART_FROM_TASK
Workflow Parameters File	Defines the path and name of the workflow parameters file. Variable Name: %%INF-WORKFLOW_PARAMETERS_FILE

Informatica CS Job

Informatica Cloud Services (CS) jobs enable you to automate your Informatica workflows for multi-cloud and on-premises data integration.

The following table describes the Informatica Cloud Services job attributes.

Attribute	Description
Connection Profile	Determines the authorization credentials that are used to connect Control-M to Informatica Cloud Services. Rules: Characters: 1−30 Case Sensitive: Yes Invalid Characters: Blank spaces. Variable Name: %%INF-ACCOUNT
Task Type	Determines one of the following task types to run on Informatica Cloud: Mapping Task: A set of instructions that defines how data is transformed and moved from its source to its target system. Masking Task: A data security technique that enables you to protect sensitive data while allowing it to be used for non-production purposes. PowerCenter Task: A data integration tool that enables you to extract, transform, and load data from different sources into a unified target system. Replication Task: A data replication solution that enables you to replicate and synchronize data across different systems and databases in real-time. Synchronization Task: A data integration solution that enables you to synchronize data between different systems and databases, ensuring that data is consistent and up-to-date across all systems. Linear Taskflow: A workflow automation feature that enables you to create and automate a sequence of tasks that are executed in a specific order, which helps streamline data integration and processing tasks. Taskflow: A workflow automation feature that enables you to create complex workflows that orchestrate and automate data integration and processing tasks across multiple systems and platforms.
Use Federation ID	Determines whether to identify the task using a Federated Task ID, which is a unique identifier that is used track and manage tasks across distributed environments in a federated environment. This ID is generated by the Informatica domain and is important for monitoring and troubleshooting tasks. This attribute is not required when you run a taskflow.
Task Name	Defines the name of the task that executes on Informatica Cloud. This attribute is not required when you run a taskflow or use a Federated Task ID.
Folder Path	Defines the folder path of the task that executes on Informatica Cloud. This attribute is required if you are using a Federated Task ID.
TaskFlow URL	Defines the service URL of the taskflow that executes on Informatica Cloud. You can find this URL by clicking in the top, right corner of the TaskFlow main page of Informatica Data Integrator and clicking Properties Detail....
Rerun Suspended Taskflow	Determines whether to rerun a suspended taskflow.
Input Fields	Defines input fields for a taskflow, in the following format: input1=value1&input2=value2&input3=value3
Call Back URL	(Optional) Defines a publicly available URL where the job status is posted.
Rerun Run ID	Defines the Run ID to rerun a suspended taskflow. The Run ID is unique to each job run and is available in the job output, next to the variable name RUN-UCM-RUNID.
Status Polling Frequency	Determines the number of seconds to wait before checking the status of the Informatica Cloud Services job. Set to 120 seconds or longer for jobs that run for more than an hour.

Talend Data Management Job

The Talend Data Management Job enables the integration of data management and data integration tasks or plans from Talend with your existing Control-M workflows.

The following table describes Talend Data Management job attributes.

Attribute	Description
Connection Profile	Determines the authorization credentials that are used to connect Control-M to Talend Data Management. Rules: Characters: 1−30 Case Sensitive: Yes Invalid Characters: Blank spaces.
Task/Plan Execution	Determines one of the following options for execution in Talend: Execute Task Execute Plan
Task Name / Plan Name	Defines the name of the Talend task or plan to execute, as defined in the Tasks and Plans page in the Talend Management Console.
Parameters	(For a task) Defines specific parameters, in JSON format, to pass when the Talend job runs. All parameter names must contain the parameter_ prefix. Copy `{"parameter_param1":"value1", "parameter_param2":"value2"}` For no parameters, type {}.
Log Level	(For a task) Determines one of the following levels of detail in log messages for the triggered task in the Talend Management Console: Information: All logs available. Warning: Only warning logs. Error: Only Error logs. Off: No logs.
Bring logs to output	(For a task) Determines whether to show Talend log messages in the job output. Default: unchecked
Task Polling Intervals / Plan Polling Intervals	Determines the number of seconds to wait before checking the status of the triggered task or plan. Set to 120 seconds or longer for jobs that run for more than an hour. Default: 10 second

IBM InfoSphere DataStage Job

The following table describes the IBM InfoSphere DataStage job type attributes.

Attribute	Description
Connection Profile	Determines the authorization credentials that are used to connect Control-M to IBM InfoSphere DataStage. Rules: Characters: 1−30 Case Sensitive: Yes Invalid Characters: Blank spaces. Variable Name: %%DataStage-ACCOUNT For more information about creating a local connection profile for this job, see Creating a connection profile.
Project	Defines the Control-M for IBM InfoSphere DataStage project name. Variable Name: %%DataStage-PROJECT
DataStage Job	Defines the Control-M for IBM InfoSphere DataStage job name. Variable Name: %%DataStage-JOB_NAME
Job Invocation ID	Defines the Control-M for IBM InfoSphere DataStage job invocation ID. Variable Name: %%DataStage-JOB_INVOCATION_ID
Parameters Type	Determines from where Control-M retrieves the parameters, with options as follows: None: Does not retrieve parameters. Server: Retrieves the parameters from the DataStage Job, and displays them in the Parameters list. File: Retrieves the parameters from the file that you define in Parameters File. Server and File: Retrieves the parameters from the file that you define in Parameters File and from the DataStage Job, and displays those from the job in the Parameters list.
Parameters	Displays the parameters and their values from the DataStage Job when Server or Server and File is selected from Parameters Type.
Parameters File	Defines the IBM InfoSphere DataStage parameter file. Variable Name: %%DataStage-PARAMS_FILE
More Options	Opens more options.
Limits	Defines limits on the job.
Stop stages after <value> Rows	Defines the maximum number of rows that the job can contain. Control-M stops the stages after the maximum is reached. Variable Name: %%DataStage-MAX_ROWS
Abort job after <value> Warnings	Defines the maximum number of warnings about the job. Control-M aborts the job after the maximum is reached. Variable Name: %%DataStage-MAX_WARNINGS
Job Output	Determines the type of information that goes in the output as follows: Info: Writes Info log entries to Control-M for IBM InfoSphere DataStage job Sysout. Variable Name: %%DataStage-OUTPUT_INFO Warnings: Writes Warning log entries to Control-M for IBM InfoSphere DataStage job Sysout. Variable Name: %%DataStage-OUTPUT_WARNING Fatal: Writes Fatal log entries to Control-M for IBM InfoSphere DataStage job Sysout. Variable Name: %%DataStage-OUTPUT_FATAL Reject: Writes Reject log entries to Control-M for IBM InfoSphere DataStage job Sysout. Variable Name: %%DataStage-OUTPUT_REJECT Started: Writes Started log entries to Control-M for IBM InfoSphere DataStage job Sysout. Variable Name: %%DataStage-OUTPUT_STARTED Reset: Writes Reset log entries to Control-M for IBM InfoSphere DataStage job Sysout. Variable Name: %%DataStage-OUTPUT_RESET Batch: Writes Batch log entries to Control-M for IBM InfoSphere DataStage job Sysout. Variable Name: %%DataStage-OUTPUT_BATCH
Run Options	Defines run options for the job.
Run in restart mode	Runs the Control-M for IBM InfoSphere DataStage job in restart mode. Variable Name: %%DataStage-RESTART_SEQUENCE
Reset job before run	Resets the Control-M for IBM InfoSphere DataStage job before the job runs. Variable Name: %%DataStage-RESET_JOB