Data Integration Jobs
The following topics describe job attributes that work with data integration platforms and services:
Airbyte Job
Airbyte is an open-source extract, transform, and load (ETL) service that enables you to build data pipelines and load data to a data warehouse, data lake, database, or analytics tool of your choice.
To create an Airbyte job, see Creating a Job. For more information about this plug-in, see
The following table describes the Airbyte job attributes.
Attribute |
Description |
---|---|
Connection Profile |
Determines the authorization credentials that are used to connect Control-M to Airbyte, as described in Airbyte Connection Profile Parameters. Rules:
|
Connection ID |
Defines the Airbyte Connection ID, which identifies which data pipeline to execute. In Airbyte, a data pipeline copies data from a source to a destination. |
Job Type |
Determines one of the following Airbyte actions to perform:
|
Show Results |
Determines whether to append the API REST response to the output. |
Status Polling Frequency |
(Optional) Determines the number of seconds to wait before checking the status of the job between intervals. Default: 60 |
Failure Tolerance |
Determines the number of times to check the job status before ending Not OK. Default: 2 |
Alteryx Trifacta Job
Alteryx Trifacta is a data-wrangling platform that allows you to discover, organize, edit, and publish data in different formats and to multiple clouds, including AWS, Azure, Google, Snowflake, and Databricks.
To create an Alteryx Trifacta job, see Creating a Job. For more information about this plug-in, see Control-M for Alteryx Trifacta.
The following table describes the Alteryx Trifacta job attributes.
Attribute |
Description |
---|---|
Connection Profile |
Determines the authorization credentials that are used to connect Control-M to Alteryx Trifacta, as described in Alteryx Trifacta Connection Profile Parameters. Rules:
|
Flow Name |
Determines which Alteryx Trifacta flow the job executes. |
Rerun with New Idempotency Token |
Determines whether to allow re-execution of the job in Alteryx Trifacta with a new idempotency token—for example, when the job execution times out). |
Idempotent Token |
Defines a unique ID (idempotency token), which guarantees that the job executes only once. After successful execution, this ID cannot be used again. To re-execute the job with a new token, replace the default value with the RUN_ID, which can be retrieved from the job output. Default: Control-M-Idem_%%ORDERID |
Retrack Job Status |
Determines whether to track job execution status as the job execution progresses and the status changes (for example, from in-progress to failed or to completed). |
Run ID |
Defines the RUN_ID number for the job execution to be tracked. The RUN_ID is unique to each job execution and can be found in the job output. |
Status Polling Frequency |
Determines the number of seconds to wait before Control-M checks the status of the Alteryx Trifacta job. Default: 10 |
AWS Glue Job
AWS Glue is a serverless data integration service that enables you to define data-driven workflows that automate the movement and transformation of data.
To create an AWS Glue job, see Creating a Job. For more information about this plug-in, see
The following table describes AWS Glue job attributes.
Attribute |
Description |
---|---|
Connection Profile |
Determines the authorization credentials that are used to connect Control-M to AWS Glue, as described in AWS Glue Connection Profile Parameters. Rules:
|
Glue Job Name |
Defines the AWS Glue job name that you want to execute. A job name is automatically saved when you create an AWS Glue pipeline. |
Glue Job Arguments |
Determines whether to add arguments to the AWS Glue job. |
Arguments |
Defines the AWS Glue job execution-time parameters, in JSON format, as shwon in the following example: Copy
|
Status Polling Frequency |
(Optional) Determines the number of seconds to wait before checking the status of the job between intervals. Default: 15 |
Failure Tolerance |
Determines the number of times to check the job status before ending Not OK. Default: 2 |
AWS Glue DataBrew Job
AWS Glue DataBrew is an extract, transform, and load (ETL) service that enables you to visualize your data and publish it to the Amazon S3 Data Lake.
To create an AWS Glue DataBrew job, see Creating a Job. For more information about this plug-in, see
The following table describes AWS Glue DataBrew job attributes.
Attribute |
Description |
---|---|
Connection Profile |
Determines the authorization credentials that are used to connect Control-M to AWS Glue DataBrew, as described in AWS Glue DataBrew Connection Profile Parameters. Rules:
|
Job Name |
Defines the AWS Glue DataBrew job name. |
Output Job Logs |
Determines whether the DataBrew job logs are included in the Control-M output. |
Status Polling Frequency |
Determines the number of seconds to wait before checking the status of the DataBrew job. Default: 10 |
Failure Tolerance |
Determines the number of times to check the job status before ending Not OK. Default: 2 |
Azure Data Factory Job
Azure Data Factory is an extract, transform, and load (ETL) service that enables you to automate the movement and transformation of data.
To create an Azure Data Factory job, see Creating a Job. For more information about this plug-in, see
The following table describes the Azure Data Factory job attributes.
Attribute |
Description |
---|---|
Connection Profile |
Determines the authorization credentials that are used to connect Control-M to Azure Data Factory, as described in Azure Data Factory Connection Profile Parameters. Rules:
|
Resource Group Name |
Determines the Azure Resource Group that is associated with a specific data factory. A resource group is a container that holds related resources for an Azure solution. The resource group can include all the resources for the solution, or only those resources that you want to manage as a group. |
Data Factory Name |
Determines the name of the Azure Data Factory that contains the pipeline you want to execute. |
Pipeline Name |
Determines which data pipeline executes when you execute the Control-M job. |
Parameters |
Defines specific parameters, in JSON format, that are passed when the data pipeline executes, in the following format: Copy
|
Status Polling Frequency |
Determines the number of seconds to wait before checking the status of the Data Factory job. Set to 120 seconds or longer for jobs that execute for more than an hour. Default: 45 |
Failure Tolerance |
Determines the number of times to check the job status before ending Not OK. Default: 3 |
Boomi AtomSphere Job
Boomi AtomSphere enables you to develop, test, and run applications in the cloud.
To create a Boomi job, see Creating a Job. For more information about this plug-in, see
The following table describes the Boomi job attributes.
Parameter |
Description |
---|---|
Connection Profile |
Determines the authorization credentials that are used to connect Control-M to Boomi AtomSphere, as described in Boomi AtomSphere Connection Profile Parameters. |
Atom Name |
Defines the name of a Boomi Atom associated with the Boomi process. |
Process Name |
Defines the name of a Boomi process associated with the Boomi Atom. |
Status Polling Frequency |
Determines the number of seconds to wait before checking the status of the job between intervals. Default: 20 seconds |
Tolerance |
Determines the number of times to check the job status before ending Not OK. If the API call that checks the execution status fails due to the Boomi limitation of a maximum of 5 calls per second, it will retry again according to the number in the Tolerance field. Default: 3 times |
GCP Data Fusion Job
Google Cloud Platform (GCP) Data Fusion is an extract, transform, and load (ETL) service that enables you to load data from multiple sources, visualize it, and publish it to the cloud.
To create a GCP Data Fusion job, see Creating a Job. For more information about this plug-in, see
The following table describes the GCP Data Fusion job attributes.
Attribute |
Description |
---|---|
Connection Profile |
Determines the authorization credentials that are used to connect Control-M to GCP Data Fusion, as described in GCP Data Fusion Connection Profile Parameters. Rules:
|
Region |
Determines the region where the GCP Data Fusion job executes. us-east1 |
Project Name |
Defines the name of the predefined Google Cloud project that holds your configured APIs, authentication information, billing details, and job resources. |
Instance Name |
Defines the name of the predefined virtual machine (instance) that executes your job. |
Namespace ID |
Defines the name of the namespace, which contains the job, job data, and metadata. Valid Characters: A–Z, a–z, 0–9, and _. Default: default |
Pipeline Name |
Defines the name of a predefined ETL service or data integration pipeline in GCP Data Fusion. |
Runtime Parameters |
Defines the JSON-based body parameters that are passed to the function, in the following format: Copy
|
Get Logs |
Determines whether to append the GCP Data Fusion logs to the output A tab in the job properties pane of the Monitoring domain where the job output appears that indicates whether a job ended OK, and is used, for example, with jobs that check file location.. |
Status Polling Frequency |
Determines the number of seconds to wait before checking the job status. Default: 10 |
Failure Tolerance |
Determines the number of times to check the job status before ending Not OK. Default: 3 |
GCP Dataplex Job
GCP Dataplex is an extract, transform, and load (ETL) service that enables you to visualize and manage data in GCP BigQuery and the cloud.
To create a GCP Dataplex job, see Creating a Job. For more information about this plug-in, see Control-M for GCP Dataplex.
The following table describes the GCP Dataplex job attributes.
Attribute |
Description |
---|---|
Connection Profile |
Determines the authorization credentials that are used to connect Control-M to GCP Dataplex, as described in GCP Dataplex Connection Profile Parameters. Rules:
|
Project ID |
Defines the ID of the predefined Google Cloud project that holds your configured APIs, authentication information, billing details, and job resources. |
Location |
Determines the region where the GCP Dataplex job executes. us-central1 |
Action |
Determines one of the following GCP Dataplex actions to perform:
|
Lake Name |
(Data Quality Task and Custom Spark Task actions only) Defines the name of the Google Cloud Storage data lakes where the job executes its task. |
Task Name |
(Data Quality Task and Custom Spark Task actions only) Defines the name of the predefined task that the job executes. |
Scan Name |
(Data Profiling Scan and Data Quality actions only) Defines the name of the predefined scan that the job executes. |
Status Polling Frequency |
Determines the number of seconds to wait before checking the status of the job. Default: 15 |
Failure Tolerance |
Determines the number of times to check the job status before ending Not OK. Default: 3 |
GCP Dataprep Job
GCP Dataprep enables you to visualize, format, and prepare your data for analysis.
To create a GCP Dataprep job, see Creating a Job. For more information about this plug-in, see
The following table describes the GCP Dataprep job attributes.
Attribute |
Description |
---|---|
Connection Profile |
Determines the authorization credentials that are used to connect Control-M to GCP Dataprep, as described in GCP Dataprep Connection Profile Parameters. Rules:
|
Flow Name |
Defines the name of the flow, which is the workspace where you format and prepare your data. |
Parameters |
Defines parameters that override the flow or its datasets when the job executes, as shown in the following example: Copy
For more information on parameter types, see the properties of runFlow service in the GCP Dataprep API documentation. |
Execute Job with Idempotency Token |
Determines whether to execute the job with an idempotency token. |
Idempotency Token |
Defines a unique ID (idempotency token), which guarantees that the job executes only once. Default: Control-M-Idem-%%ORDERID |
Status Polling Frequency |
Determines the number of seconds to wait before checking the status of the job. Default: 10 |
Failure Tolerance |
Determines the number of times to check the job status before ending Not OK. Default: 2 |
IBM InfoSphere DataStage Job
To create an IBM InfoSphere DataStage job, see Creating a Job. For more information about this plug-in, see Control-M for IBM InfoSphere DataStage.
The following table describes the IBM InfoSphere DataStage job type attributes.
Attribute |
Description |
---|---|
Connection Profile |
Determines the authorization credentials that are used to connect Control-M to IBM InfoSphere DataStage, as described in IBM InfoSphere DataStage Connection Profile Parameters. Rules:
Variable Name: %%DataStage-ACCOUNT For more information about creating a local connection profile for this job, see Creating a connection profile. |
Project |
Defines the Control-M for IBM InfoSphere DataStage project name. Variable Name: %%DataStage-PROJECT |
DataStage Job |
Defines the Control-M for IBM InfoSphere DataStage job name. Variable Name: %%DataStage-JOB_NAME |
Job Invocation ID |
Defines the Control-M for IBM InfoSphere DataStage job invocation ID. Variable Name: %%DataStage-JOB_INVOCATION_ID |
Parameters Type |
Determines from where Control-M retrieves the parameters, with options as follows:
|
Parameters |
Displays the parameters and their values from the DataStage Job when Server or Server and File is selected from Parameters Type. |
Parameters File |
Defines the IBM InfoSphere DataStage parameter file. Variable Name: %%DataStage-PARAMS_FILE |
More Options |
Opens more options. |
Limits |
Defines limits on the job. |
Stop Stages after <value> Rows |
Defines the maximum number of rows that the job can contain. Control-M stops the stages after the maximum is reached. Variable Name: %%DataStage-MAX_ROWS |
Abort job after <value> Warnings |
Defines the maximum number of warnings about the job. Control-M aborts the job after the maximum is reached. Variable Name: %%DataStage-MAX_WARNINGS |
Job Output |
Determines the type of information that goes in the output as follows:
|
Run Options |
Defines execution options for the job. |
Run in Restart Mode |
Executes the Control-M for IBM InfoSphere DataStage job in restart mode. Variable Name: %%DataStage-RESTART_SEQUENCE |
Reset Job before Run |
Resets the Control-M for IBM InfoSphere DataStage job before the job executes. Variable Name: %%DataStage-RESET_JOB |
Informatica Job
Informatica enables you to automate tasks or workflows based on the parameters that you define.
To create an Informatica job, see Creating a Job. For more information about this plug-in, see Control-M for Informatica.
The following table describes the Informatica job attributes.
Attribute |
Description |
---|---|
Connection Profile |
Determines the authorization credentials that are used to connect Control-M to Informatica, as described in Informatica Connection Profile Parameters. Rules:
Variable Name: %%INF-ACCOUNT |
Repository Folder |
Defines the repository folder that contains the workflow that you want to execute. Variable Name: %%INF-REP_FOLDER |
Workflow |
Defines the workflow that you want to execute. Variable Name: %%INF-WORKFLOW |
Instance Name |
Defines the specific instance of the workflow that you want to execute. Variable Name: %%INF-INSTANCE_NAME |
OS Profile |
Enables you to specify an OS profile when executing or re-executing an Informatica job. |
Run Options |
Defines the workflow task hierarchy. |
Depth |
Determines the number of levels within the workflow task hierarchy that are used to select workflow tasks. Default: 10 Variable Name: %%INF- DEPTH |
Run |
Determines whether to execute the whole workflow, start from a specific task, or execute a single task as follows:
|
Parameters |
Determines an array of parameters that is passed to the workflow. Each parameter is comprised of the following:
|
Include Workflow Events Log in Job Output |
Determines whether to include the workflow event log in the job output A tab in the job properties pane of the Monitoring domain where the job output appears that indicates whether a job ended OK, and is used, for example, with jobs that check file location.. |
Include Detailed Error Log for Failed Sessions |
Determines whether to include a detailed error log for a workflow that failed. |
Get Session Statistics and Log |
Determines whether to retrieve session statistics and log messages. |
Action on Rerun |
Determines which operation is executed when the workflow is suspended, as follows:
Variable Name: %%INF- RESTART_FROM_TASK |
Workflow Parameters File |
Defines the path and name of the workflow parameters file. Variable Name: %%INF-WORKFLOW_PARAMETERS_FILE |
Informatica CS Job
Informatica Cloud Services (CS) enable you to integrate and synchronize data, applications, and processes that are on-premises or in the cloud.
To create an Informatica CS job, see Creating a Job. For more information about this plug-in, see
The following table describes the Informatica Cloud Services job attributes.
Attribute |
Description |
---|---|
Connection Profile |
Determines the authorization credentials that are used to connect Control-M to Informatica Cloud Services, as described in Informatica CS Connection Profile Parameters. Rules:
Variable Name: %%INF-ACCOUNT |
Task Type |
Determines one of the following task types to execute on Informatica Cloud:
|
Use Federation ID |
Determines whether to identify the task using a Federated Task ID, which is a unique identifier that is used track and manage tasks across distributed environments in a federated environment. This ID is generated by the Informatica domain and is important for monitoring and troubleshooting tasks. This attribute is not required when you execute a taskflow. |
Task Name |
Defines the name of the task that executes on Informatica Cloud. This attribute is not required when you execute a taskflow or use a Federated Task ID. |
Folder Path |
Defines the folder path of the task that executes on Informatica Cloud. This attribute is required if you are using a Federated Task ID. |
TaskFlow URL |
Defines the service URL of the taskflow that executes on Informatica Cloud. You can find this URL by clicking in the top, right corner of the TaskFlow main page of Informatica Data Integrator and clicking Properties Detail.... |
Rerun Suspended Taskflow |
Determines whether to re-execute a suspended taskflow. |
Input Fields |
Defines input fields for a taskflow, in the following format: input1=value1&input2=value2&input3=value3 |
Call Back URL |
(Optional) Defines a publicly available URL where the job status is posted. |
Rerun Run ID |
Defines the Run ID to re-execute a suspended taskflow. The Run ID is unique to each job execution and is available in the job output, next to the variable name RUN-UCM-RUNID. |
Status Polling Frequency |
Determines the number of seconds to wait before checking the status of the Informatica Cloud Services job. |
OCI Data Integration Job
OCI Data Integration is an Oracle Cloud Infrastructure (OCI) platform, that enables data extraction, transformation, and loading (ETL) processes across various sources and targets within the Oracle Cloud.
To create an OCI Data Integration job, see Creating a Job. For more information about this plug-in, see Control-M for OCI Data Integration.
The following table describes OCI Data Integration job attributes.
Attribute |
Description |
---|---|
Connection Profile |
Determines the authorization credentials that are used to connect Control-M to OCI Data Integration Services, as described in OCI Data Integration Connection Profile Parameters. Rules:
Variable Name: %%INF-ACCOUNT |
Actions |
Determines one of the following OCI Data Integration actions:
|
Workspace OCID |
Determines the ID of the OCI Data Integration workspace, which is a logical container for managing pipelines, data flows, and projects. |
Application Key |
Defines the Application Key which identifies the project application where the task runs. |
Task Key |
Determines the Task Key that is used to run the job. |
Task Run Name |
Defines the unique name for the specific task run. |
Task Run Input Parameters |
(Optional) Defines the input parameters for the specific task run, in JSON format. |
Status Polling Frequency |
Determines the number of seconds to wait before checking the status of the OCI Data Integration job. Default: 15 |
Failure tolerance |
Determines the number of times to check the job status before ending Not OK. Default: 2 |
Talend Data Management Job
Talend Data Management is an automation service that enables you to integrate applications, and extract, transform, load, and check the quality of large amounts of data.
To create a Talend Data Management job, see Creating a Job. For more information about this plug-in, see
The following table describes Talend Data Management job attributes.
Attribute |
Description |
---|---|
Connection Profile |
Determines the authorization credentials that are used to connect Control-M to Talend Data Management, as described in Talend Data Management Connection Profile Parameters. Rules:
|
Task/Plan Execution |
Determines one the following operations to perform:
|
Task Name |
(Execute Task) Defines the name of the Talend task that is executed, as defined in the Tasks and Plans page in the Talend Management Console. If you define this attribute, you do not need to define the Task ID, since both attributes refer to the same task. |
Task ID |
(Execute Task) Defines the ID of the Talend task that is executed, as defined in the Tasks and Plans page in the Talend Management Console. If you define this attribute, you do not need to define the Task Name, since both attributes refer to the same task. |
Parameters |
(Execute Task) Defines specific parameters, in JSON format, to pass when the Talend job executes. All parameter names must contain the parameter_ prefix, as shown in the following example: Copy
For no parameters, type {}. |
Log Level |
(Execute Task) Determines the amount of log details are recorded in the Talend task logs, as follows:
|
Bring Logs to Output |
(Execute Task) Determines whether to append Talend log messages to the job output. Default: Unchecked |
Task Polling Intervals |
(Execute Task) Determines the number of seconds to wait before checking the status of the triggered task. Default: 10 |
Plan Name |
(Execute Plan) Defines the name of the Talend plan that is executed, which is defined in the Tasks and Plans page in the Talend Management Console. |
Plan Body Parameters |
(Execute Plan) Defines the specific parameters, in JSON format, that are passed to Talend when the job executes, as shown in the following example: Copy
where executable is the Plan ID. The Plan Name attribute is ignored when you define this attribute. |
Append Failed Plan Logs to Output |
(Execute Plan) Determines whether Talend logs are appended to the output when the plan fails to execute. Default: Unchecked |
Plan Polling Intervals |
(Execute Plan) Determines the number of seconds to wait before checking the status of the triggered plan. Default: 10 |
Talend OAuth Job
Talend OAuth (Open Authorization) enables you to use OAuth authentication within the Talend suite of data integration and management tools. It allows third-party applications to access resources on behalf of a user without sharing sensitive credentials.
To create a Talend OAuth job, see Creating a Job. For more information about this plug-in, see Control-M for Talend OAuth .
The following table describes Talend OAuth job attributes.
Attribute |
Action |
Description |
---|---|---|
Connection Profile |
All actions |
Determines the authorization credentials that are used to connect Control-M to Talend OAuth, as described in Talend OAuth Connection Profile Parameters. Rules:
|
Action |
NA |
Determines one of the following actions to perform:
|
Environment ID |
Execute Task by Name |
Determines the Environment ID where the task is executed. |
Task Name |
Execute Task by Name |
Determines the name of the predefined Talend task that is executed, as defined in the Tasks and Plans page in the Talend Management Console. |
Task Executable |
Execute Task by ID |
Determines the ID of the predefined Talend task that is executed, as defined in the Tasks and Plans page in the Talend Management Console. |
Parameters |
|
Defines specific parameters, in JSON format, to pass when the Talend job executes. Copy
For no parameters, type {}. |
Log Level |
|
Determines the amount of log details that are recorded in the Talend task logs, as follows:
|
Task Timeout |
|
(Optional): Determines the number of minutes to wait before the task is executed. |
Append Task Logs to Output |
|
Determines whether to append the Talend task logs to the job output when the plan fails during the execution. |
Plan Executable |
Execute Plan |
Defines the Plan executable that you want to execute, as defined in the Tasks and Plans page in the Talend Management Console. |
Append Failed Plan Logs to Output |
Execute Plan |
Determines whether to append the failed Talend Plan logs in the job output when the plan fails during the execution. |
Rerun Only Failed Tasks |
Execute Plan |
(Optional) Determines whether to rerun only the failed tasks. |
Execution Plan ID |
Execute Plan |
(Optional) Defines the ID for the re-run plan. |
Step ID |
Execute Plan |
(Optional) Defines the step that the task re-runs. |
Status Polling Frequency |
All Actions |
Determines the number of seconds to wait before Control-M checks the status of the job or the job's output. Default :10 |
Failure Tolerance |
All Actions |
Determines the number of times to check the job status before the job ends Not Ok. Default: 2 |