Data Integration Jobs

The following topics describe job attributes that work with data integration platforms and services:

Alteryx Trifacta Job

Alteryx Trifacta is a data-wrangling platform that allows you to discover, organize, edit, and publish data in different formats and to multiple clouds, including AWS, Azure, Google, Snowflake, and Databricks.

To create an Alteryx Trifacta job, see Creating a Job. For more information about this plug-in, see Control-M for Trifacta.

The following table describes the Trifacta job attributes.

Attribute

Description

Connection Profile

Determines the authorization credentials that are used to connect Control-M to Alteryx Trifacta, as described in Alteryx Trifacta Connection Profile Parameters.

Rules:

  • Characters: 1−30

  • Case Sensitive: Yes

  • Invalid Characters: Blank spaces.

Flow Name

Determines which Trifacta flow the job executes.

Rerun with New Idempotency Token

Determines whether to allow re-execution of the job in Trifacta with a new idempotency token—for example, when the job execution times out).

Idempotent Token

Defines a unique ID (idempotency token), which guarantees that the job executes only once. After successful execution, this ID cannot be used again.

To re-execute the job with a new token, replace the default value with the RUN_ID, which can be retrieved from the job output.

Default: Control-M-Idem_%%ORDERID

Retrack Job Status

Determines whether to track job execution status as the job execution progresses and the status changes (for example, from in-progress to failed or to completed).

Run ID

Defines the RUN_ID number for the job execution to be tracked.

The RUN_ID is unique to each job execution and can be found in the job output.

Status Polling Frequency

Determines the number of seconds to wait before Control-M checks the status of the Trifacta job.

Default: 10

AWS Glue Job

AWS Glue is a serverless data integration service that enables you to define data-driven workflows that automate the movement and transformation of data.

To create an AWS Glue job, see Creating a Job. For more information about this plug-in, see Control-M for AWS Glue.

The following table describes AWS Glue job attributes.

Attribute

Description

Connection Profile

Determines the authorization credentials that are used to connect Control-M to AWS Glue, as described in AWS Glue Connection Profile Parameters.

Rules:

  • Characters: 1−30

  • Case Sensitive: Yes

  • Invalid Characters: Blank spaces.

Glue Job Name

Defines the AWS Glue job name that you want to execute.

A job name is automatically saved when you create an AWS Glue pipeline.

Glue Job Arguments

Determines whether to add arguments to the AWS Glue job.

Arguments

Defines the AWS Glue job execution-time parameters, in JSON format, as shwon in the following example:

Copy
{"--myArg1": "myVal1", "--myArg2": "myVal2"}

Status Polling Frequency

(Optional) Determines the number of seconds to wait before checking the status of the job between intervals.

Default: 15

Failure Tolerance

Determines the number of times to check the job status before ending Not OK.

Default: 2

AWS Glue DataBrew Job

AWS Glue DataBrew is an extract, transform, and load (ETL) service that enables you to visualize your data and publish it to the Amazon S3 Data Lake.

To create an AWS Glue DataBrew job, see Creating a Job. For more information about this plug-in, see Control-M for AWS Glue DataBrew.

The following table describes AWS Glue DataBrew job attributes.

Attribute

Description

Connection Profile

Determines the authorization credentials that are used to connect Control-M to AWS Glue DataBrew, as described in AWS Glue DataBrew Connection Profile Parameters.

Rules:

  • Characters: 1−30

  • Case Sensitive: Yes

  • Invalid Characters: Blank spaces.

Job Name

Defines the AWS Glue DataBrew job name.

Output Job Logs

Determines whether the DataBrew job logs are included in the Control-M output.

Status Polling Frequency

Determines the number of seconds to wait before checking the status of the DataBrew job.

Default: 10

Failure Tolerance

Determines the number of times to check the job status before ending Not OK.

Default: 2

Azure Data Factory Job

Azure Data Factory is an extract, transform, and load (ETL) service that enables you to automate the movement and transformation of data.

To create an Azure Data Factory job, see Creating a Job. For more information about this plug-in, see Control-M for Azure Data Factory.

The following table describes the Azure Data Factory job attributes.

Attribute

Description

Connection Profile

Determines the authorization credentials that are used to connect Control-M to Azure Data Factory, as described in Azure Data Factory Connection Profile Parameters.

Rules:

  • Characters: 1−30

  • Case Sensitive: Yes

  • Invalid Characters: Blank spaces.

  • Variable Name: %%AZURE-ACCOUNT

Resource Group Name

Determines the Azure Resource Group that is associated with a specific data factory. A resource group is a container that holds related resources for an Azure solution.

The resource group can include all the resources for the solution, or only those resources that you want to manage as a group.

Data Factory Name

Determines the name of the Azure Data Factory that contains the pipeline you want to execute.

Pipeline Name

Determines which data pipeline executes when you execute the Control-M job.

Parameters

Defines specific parameters, in JSON format, that are passed when the data pipeline executes.

Copy
 {"var1":"value1", "var2":"value2"}

Status Polling Frequency

Determines the number of seconds to wait before checking the status of the Data Factory job.

Set to 120 seconds or longer for jobs that execute for more than an hour.

Default: 45

Failure Tolerance

Determines the number of times to check the job status before ending Not OK.

Default: 3

Boomi AtomSphere Job

Boomi AtomSphere enables you to develop, test, and run applications in the cloud.

To create a Boomi job, see Creating a Job. For more information about this plug-in, see Control-M for Boomi.

The following table describes the Boomi job attributes.

Parameter

Description

Connection Profile

Determines the authorization credentials that are used to connect Control-M to Boomi AtomSphere, as described in Boomi AtomSphere Connection Profile Parameters.

Atom Name

Defines the name of a Boomi Atom associated with the Boomi process.

Process Name

Defines the name of a Boomi process associated with the Boomi Atom.

Status Polling Frequency

Determines the number of seconds to wait before checking the status of the job between intervals.

Default: 20 seconds

Tolerance

Determines the number of times to check the job status before ending Not OK. If the API call that checks the execution status fails due to the Boomi limitation of a maximum of 5 calls per second, it will retry again according to the number in the Tolerance field.

Default: 3 times

GCP Data Fusion Job

Google Cloud Platform (GCP) Data Fusion is an extract, transform, and load (ETL) service that enables you to load data from multiple sources, visualize it, and publish it to the cloud.

To create a GCP Data Fusion job, see Creating a Job. For more information about this plug-in, see Control-M for GCP Data Fusion.

The following table describes the GCP Data Fusion job attributes.

Attribute

Description

Connection Profile

Determines the authorization credentials that are used to connect Control-M to GCP Data Fusion, as described in GCP Data Fusion Connection Profile Parameters.

Rules:

  • Characters: 1−30

  • Case Sensitive: Yes

  • Invalid Characters: Blank spaces.

Region

Determines the region where the GCP Data Fusion job executes.

us-east1

Project Name

Defines the name of the predefined Google Cloud project that holds your configured APIs, authentication information, billing details, and job resources.

Instance Name

Defines the name of the predefined virtual machine (instance) that executes your job.

Namespace ID

Defines the name of the namespace, which contains the job, job data, and metadata.

Valid Characters: A–Z, a–z, 0–9, and _.

Default: default

Pipeline Name

Defines the name of a predefined ETL service or data integration pipeline in GCP Data Fusion.

Runtime Parameters

Defines the JSON-based body parameters that are passed to the function, in the following format:

Copy
"argument" : {\"var1\":\"value1\",\"var2\":\"value2\"}

Get Logs

Determines whether to append the GCP Data Fusion logs to the outputClosed A tab in the job properties pane of the Monitoring domain where the job output appears that indicates whether a job ended OK, and is used, for example, with jobs that check file location..

Status Polling Frequency

Determines the number of seconds to wait before checking the job status.

Default: 10

Failure Tolerance

Determines the number of times to check the job status before ending Not OK.

Default: 3

GCP Dataplex Job

GCP Dataplex is an extract, transform, and load (ETL) service that enables you to visualize and manage data in GCP BigQuery and the cloud.

To create a GCP Dataplex job, see Creating a Job. For more information about this plug-in, see Control-M for GCP Dataplex.

The following table describes the GCP Dataplex job attributes.

Attribute

Description

Connection Profile

Determines the authorization credentials that are used to connect Control-M to GCP Dataplex, as described in GCP Dataplex Connection Profile Parameters.

Rules:

  • Characters: 1−30

  • Case Sensitive: Yes

  • Invalid Characters: Blank spaces.

Project ID

Defines the ID of the predefined Google Cloud project that holds your configured APIs, authentication information, billing details, and job resources.

Location

Determines the region where the GCP Dataplex job executes.

us-central1

Action

Determines one of the following GCP Dataplex actions to perform:

  • Data Quality Task: Executes a predefined data quality task in GCP BigQuery or Google Cloud Storage locations and defines data controls in BigQuery environments.

  • Custom Spark Task: Executes a predefined, scheduled Apache Spark task to analyze and process your data.

  • Data Profiling Scan: Executes a predefined data scan to identify shared statistical characteristics between BigQuery tables.

  • Data Quality Scan: Executes a predefined data quality scan that validates your data and logs alerts when the data fails validation.

Lake Name

(Data Quality Task and Custom Spark Task actions only) Defines the name of the Google Cloud Storage data lakes where the job executes its task.

Task Name

(Data Quality Task and Custom Spark Task actions only) Defines the name of the predefined task that the job executes.

Scan Name

(Data Profiling Scan and Data Quality actions only) Defines the name of the predefined scan that the job executes.

Status Polling Frequency

Determines the number of seconds to wait before checking the status of the job.

Default: 15

Failure Tolerance

Determines the number of times to check the job status before ending Not OK.

Default: 3

GCP Dataprep Job

GCP Dataprep enables you to visualize, format, and prepare your data for analysis.

To create a GCP Dataprep job, see Creating a Job. For more information about this plug-in, see Control-M for GCP Dataprep.

The following table describes the GCP Dataprep job attributes.

Attribute

Description

Connection Profile

Determines the authorization credentials that are used to connect Control-M to GCP Dataprep, as described in GCP Dataprep Connection Profile Parameters.

Rules:

  • Characters: 1−30

  • Case Sensitive: Yes

  • Invalid Characters: Blank spaces.

Flow Name

Defines the name of the flow, which is the workspace where you format and prepare your data.

Parameters

Defines parameters that override the flow or its datasets when the job executes, as shown in the following example:

Copy
{
   "schemaDriftOptions": {
      "schemaValidation": "true",
      "stopJobOnErrorsFound": "true" 
   }
}

For more information on parameter types, see the properties of runFlow service in the GCP Dataprep API documentation.

Execute Job with Idempotency Token

Determines whether to execute the job with an idempotency token.

Idempotency Token

Defines a unique ID (idempotency token), which guarantees that the job executes only once.

Default: Control-M-Idem-%%ORDERID

Status Polling Frequency

Determines the number of seconds to wait before checking the status of the job.

Default: 10

Failure Tolerance

Determines the number of times to check the job status before ending Not OK.

Default: 2

Informatica Job

Informatica enables you to automate tasks or workflows based on the parameters that you define.

To create an Informatica job, see Creating a Job. For more information about this plug-in, see Control-M for Informatica.

The following table describes the Informatica job attributes.

Attribute

Description

Connection Profile

Determines the authorization credentials that are used to connect Control-M to Informatica, as described in Informatica Connection Profile Parameters.

Rules:

  • Characters: 1−30

  • Case Sensitive: Yes

  • Invalid Characters: Blank spaces.

Variable Name: %%INF-ACCOUNT

Repository Folder

Defines the repository folder that contains the workflow that you want to execute.

Variable Name: %%INF-REP_FOLDER

Workflow

Defines the workflow that you want to execute.

Variable Name: %%INF-WORKFLOW

Instance Name

Defines the specific instance of the workflow that you want to execute.

Variable Name: %%INF-INSTANCE_NAME

OS Profile

Enables you to specify an OS profile when executing or re-executing an Informatica job.

Run Options

Defines the workflow task hierarchy.

Depth

Determines the number of levels within the workflow task hierarchy that are used to select workflow tasks.

Default: 10

Variable Name: %%INF- DEPTH

Run

Determines whether to execute the whole workflow, start from a specific task, or execute a single task as follows:

  • Run the Whole Workflow: Executes the whole workflow.

  • Start from Task: Starts the workflow from the task that you specify.

    Variable Name: %%INF- START_FROM_TASK

  • Run Single Task: Executes the task that you specify.

    Variable Name: %%INF-RUN_SINGLE_TASK

Parameters

Determines an array of parameters that is passed to the workflow.

Each parameter is comprised of the following:

  • Scope: Defines the scope of the parameter in an array definition.

  • Name: Defines the parameter name in an array definition.

  • Value: Defines the parameter value in an array definition.

Include Workflow Events Log in Job Output

Determines whether to include the workflow event log in the job outputClosed A tab in the job properties pane of the Monitoring domain where the job output appears that indicates whether a job ended OK, and is used, for example, with jobs that check file location..

Include Detailed Error Log for Failed Sessions

Determines whether to include a detailed error log for a workflow that failed.

Get Session Statistics and Log

Determines whether to retrieve session statistics and log messages.

Action on Rerun

Determines which operation is executed when the workflow is suspended, as follows:

  • Recover: Restarts a suspended workflow from the point of failure.

  • Force Restart: Restarts a suspended workflow from the beginning.

  • Force Restart from a Specific Task: Restarts the suspended workflow from the task that you define.

Variable Name: %%INF- RESTART_FROM_TASK

Workflow Parameters File

Defines the path and name of the workflow parameters file.

Variable Name: %%INF-WORKFLOW_PARAMETERS_FILE

Informatica CS Job

Informatica Cloud Services (CS) enable you to integrate and synchronize data, applications, and processes that are on-premises or in the cloud.

To create an Informatica CS job, see Creating a Job. For more information about this plug-in, see Control-M for Informatica CS.

The following table describes the Informatica Cloud Services job attributes.

Attribute

Description

Connection Profile

Determines the authorization credentials that are used to connect Control-M to Informatica Cloud Services, as described in Informatica CS Connection Profile Parameters.

Rules:

  • Characters: 1−30

  • Case Sensitive: Yes

  • Invalid Characters: Blank spaces.

Variable Name: %%INF-ACCOUNT

Task Type

Determines one of the following task types to execute on Informatica Cloud:

  • Mapping Task: A set of instructions that defines how data is transformed and moved from its source to its target system.

  • Masking Task: A data security technique that enables you to protect sensitive data while allowing it to be used for non-production purposes.

  • PowerCenter Task: A data integration tool that enables you to extract, transform, and load data from different sources into a unified target system.

  • Replication Task: A data replication solution that enables you to replicate and synchronize data across different systems and databases in real time.

  • Synchronization Task: A data integration solution that enables you to synchronize data between different systems and databases, ensuring that data is consistent and up-to-date across all systems.

  • Linear Taskflow: A workflow automation feature that enables you to create and automate a sequence of tasks that are executed in a specific order, which helps streamline data integration and processing tasks.

  • Taskflow: A workflow automation feature that enables you to create complex workflows that orchestrate and automate data integration and processing tasks across multiple systems and platforms.

Use Federation ID

Determines whether to identify the task using a Federated Task ID, which is a unique identifier that is used track and manage tasks across distributed environments in a federated environment. This ID is generated by the Informatica domain and is important for monitoring and troubleshooting tasks.

This attribute is not required when you execute a taskflow.

Task Name

Defines the name of the task that executes on Informatica Cloud.

This attribute is not required when you execute a taskflow or use a Federated Task ID.

Folder Path

Defines the folder path of the task that executes on Informatica Cloud.

This attribute is required if you are using a Federated Task ID.

TaskFlow URL

Defines the service URL of the taskflow that executes on Informatica Cloud.

You can find this URL by clicking in the top, right corner of the TaskFlow main page of Informatica Data Integrator and clicking Properties Detail....

Rerun Suspended Taskflow

Determines whether to re-execute a suspended taskflow.

Input Fields

Defines input fields for a taskflow, in the following format:

input1=value1&input2=value2&input3=value3

Call Back URL

(Optional) Defines a publicly available URL where the job status is posted.

Rerun Run ID

Defines the Run ID to re-execute a suspended taskflow.

The Run ID is unique to each job execution and is available in the job output, next to the variable name RUN-UCM-RUNID.

Status Polling Frequency

Determines the number of seconds to wait before checking the status of the Informatica Cloud Services job.

Talend Data Management Job

Talend Data Management is an automation service that enables you to integrate applications, and extract, transform, load, and check the quality of large amounts of data.

To create a Talend Data Management job, see Creating a Job. For more information about this plug-in, see Control-M for Talend Data Management.

The following table describes Talend Data Management job attributes.

Attribute

Description

Connection Profile

Determines the authorization credentials that are used to connect Control-M to Talend Data Management, as described in Talend Data Management Connection Profile Parameters.

Rules:

  • Characters: 1−30

  • Case Sensitive: Yes

  • Invalid Characters: Blank spaces.

Task/Plan Execution

Determines one the following operations to perform:

  • Execute Task: Executes a Talend job.

  • Execute Plan: Executes a Talend workflow.

Task Name

(Execute Task) Defines the name of the Talend task that is executed, as defined in the Tasks and Plans page in the Talend Management Console.

Parameters

(Execute Task) Defines specific parameters, in JSON format, to pass when the Talend job executes.

All parameter names must contain the parameter_ prefix, as appears in the following example:

Copy
{"parameter_param1":"value1", "parameter_param2":"value2"}

For no parameters, type {}.

Log Level

(Execute Task) Determines the amount of log details are recorded in the Talend task logs, as follows:

  • Information: Records all task execution details in the logs.

  • Warning: Only records warnings.

  • Error: Only records errors.

  • Off: Records no information in the logs.

Bring Logs to Output

(Execute Task) Determines whether to append Talend log messages to the job output.

Default: Unchecked

Task Polling Intervals

(Execute Task) Determines the number of seconds to wait before checking the status of the triggered task.

Default: 10

Plan Name

(Execute Plan) Defines the name of the Talend plan that is executed, which is defined in the Tasks and Plans page in the Talend Management Console.

Plan Polling Intervals

(Execute Plan) Determines the number of seconds to wait before checking the status of the triggered plan.

Default: 10