Data Processing and Analytics Jobs

The following topics describe job attributes that work with data processing platforms and services:

Amazon Athena Job
AWS Data Pipeline Job
Amazon DynamoDB Job
Amazon EMR Job
Amazon Redshift Job
Azure Databricks Job
Azure HDInsight Job
Azure Synapse Job
Databricks Job
dbt Job
GCP BigQuery Job
GCP Dataflow Job
GCP Dataproc Job
Hadoop Job
OCI Data Flow Job
Snowflake Job

Amazon Athena Job

Amazon Athena enables you to process, analyze, and store your data in the cloud.

To create an Amazon Athena job, see Creating a Job. For more information about this plug-in, see Control-M for Amazon Athena.

The following table describes the Amazon Athena job attributes.

Attribute	Description
Connection Profile	Determines the authorization credentials that are used to connect Control-M to Amazon Athena, as described in Amazon Athena Connection Profile Parameters. Rules: Characters: 1−30 Case Sensitive: Yes Invalid Characters: Blank spaces.
Athena Client Request Token	Defines a unique ID (idempotency token), which guarantees that the job executes only once. Default: aws-athena-client-request-token-%%ORDERID-%%TIME
DB Catalog Name	Defines the name of the group of databases (catalog) that the query references.
Database Name	Defines the name of the database that the query references.
Action	Determines which of the following queries executes: Query: Executes the query that you enter in the Query attribute. Run Prepared Query: Executes a predefined query that is stored in the Amazon Athena platform. Query and Create Table: Executes the query that you enter in the Query attribute and saves the results to a new table. Unload: Executes the query that you enter in the Query attribute and saves the results to a file in an Amazon S3 bucket.
Query	Defines the SQL-based query that executes.
Prepared Query Name	Defines the name of the predefined query that is stored in the Amazon Athena platform.
Table Name	Defines the name of the table that is created, which is populated by the results of a query in Amazon Athena.
Unload File Type	Determines which file format the query results are saved in, as follows: .json .csv .orc .parquet .avro .txt
Output Location	Defines the AWS S3 bucket path where the file is saved, in the following format: s3://<path> Amazon Athena automatically generates a filename that incorporates the Query Execution ID, which is a unique ID applied to each query that is executed.
Workgroup	Defines the workgroup for this job. Workgroups can consist of users, teams, applications, or workloads, and they can set limits on the data that each query or group processes.
Add Configurations	Determines whether to add additional job definitions. Valid Values: Yes No
S3 ACL Option	Defines the Amazon S3 canned access control list (ACL), which is a predefined set of grantees and permissions assigned to your stored query results. BUCKET_OWNER_FULL_CONTROL is the only canned ACL that is currently supported in Amazon Athena. This setting gives you and the bucket owner full control of the query results.
Encryption Options	Determines one of the following ways to encrypt the query results: SSE_S3: Encrypts the data in the Amazon S3 with Server-Side Encryption (SSE) and Amazon S3-managed encryption keys. SSE_KMS: Encrypts the data in the Amazon S3 with SSE and the AWS Key Management Service (KMS), which enables you to manage the encryption keys. CSE_KMS: Encrypts the data in the Amazon S3 object storage with SSE and enables you to provide your own encryption keys.
KMS Key	(SSE_KMS and CSE_KMS only) Defines the Amazon Resource Name (ARN) of the KMS key. An ARN is a standardized AWS resource address. arn:aws:kms:us-west-2:123456789012:key/abcd1234-5678-9012-efgh-ijklmnopqrst
Bucket Owner	Defines the AWS account ID of the Amazon S3 bucket owner.
Show JSON Output	Determines whether to show the full JSON API response in the job output.
Status Polling Frequency	Determines the number of seconds to wait before checking the job status. Default: 10
Tolerance	Determines the number of times to check the job status before the job ends Not OK. Default: 2

AWS Data Pipeline Job

AWS Data Pipeline is a cloud-based extract, transform, load (ETL) service that enables you to automate the transfer, processing, and storage of your data.

To create an AWS Data Pipeline job, see Creating a Job. For more information about this plug-in, see Control-M for AWS Data Pipeline.

The following table describes the AWS Data Pipeline job attributes.

Attribute	Action	Description
Connection Profile		Determines the authorization credentials that are used to connect Control-M to AWS Data Pipeline, as described in AWS Data Pipeline Connection Profile Parameters. Rules: Characters: 1−30 Case Sensitive: Yes Invalid Characters: Blank spaces.
Action		Determines one of the following AWS Data Pipeline actions: Trigger Pipeline: Runs an existing AWS Data Pipeline. Create Pipeline: Creates a new AWS Data Pipeline.
Pipeline Name	Create Pipeline	Defines the name of the new AWS Data Pipeline.
Pipeline Unique ID	Create Pipeline	Defines the unique ID (idempotency key) that guarantees the pipeline is created only once. After successful execution, this ID cannot be reused. Valid Characters: Any alphanumeric characters.
Parameters	Create Pipeline	Defines the parameter objects, in JSON format, which define the variables for your AWS Data Pipeline, as shown in the following example: Copy "parameterObjects": [ { "attributes": [ { "key":"description", "stringValue":"S3outputfolder" } ], "id": "myS3OutputLoc" } ], "parameterValues": [ { "id":"myShellCmd", "stringValue":"grep -rc \"GET\" ${IN_DIR}/* > ${OUT_DIR}/out.txt" } ], "pipelineObjects": [ { "fields": [ { "key":"input", "refValue":"S3InputLocation" }, { "key":"stage", "stringValue":"true" } ], "id":"ShellCommandActivityObj", "name":"ShellCommandActivityObj" } ] For more information about the available parameter objects, see the descriptions of the PutPipelineDefinition and GetPipelineDefinition actions in the AWS Data Pipeline API Reference.
Trigger Created Pipeline	Create Pipeline	Determines whether to run (trigger) the newly created AWS Data Pipeline.
Pipeline ID	Trigger Pipeline	Determines which pipeline to run (trigger).
Status Polling Frequency	All Actions	Determines the number of seconds to wait before checking the job status. Default: 20
Failure Tolerance	All Actions	Determines the number of times to check the job status before the job ends Not OK. Default: 2

Amazon DynamoDB Job

Amazon DynamoDB is a NoSQL database service that enables you to create database tables, execute statements and transactions, export and import data to and from the Amazon S3 storage service.

To create an Amazon DynamoDB job, see Creating a Job. For more information about this plug-in, see Control-M for Amazon DynamoDB.

The following table describes the Amazon DynamoDB job type attributes.

Attribute	Action	Description
Connection Profile		Determines the authorization credentials that are used to connect Control-M to Amazon DynamoDB, as described in Amazon DynamoDB Connection Profile Parameters. Rules: Characters: 1−30 Case Sensitive: Yes Invalid Characters: Blank spaces
Action		Determines one of the following Amazon DynamoDB actions to perform: Execute Statement Execute Transaction Export Job to S3 Bucket Import Job from S3 Bucket
Run Statement with Parameter	Execute Statement	Determines whether to execute the statement with your own JSON parameters.
Statement	Execute Statement	Defines one or more PartiQL statements that are supported by Amazon DynamoDB.
Statement Parameters	Execute Statement	Defines the job parameters, in JSON format, that enable you to control how the job runs, as appears in the following example: Copy `[{ "N": "20" }, { "S":"BMC30" }]`
Transaction Statements	Execute Transaction	Defines one or more PartiQL transaction statements, in JSON format, as appears in the following example: Copy `[{ "Parameters": [ { "N": "20" }, { "S"."Stas30 }], "Statement": "Select*From IFteam where Id=17" }]`
Idempotency Token	Export Job to S3 Bucket Import Job from S3 Bucket	Defines the unique ID (idempotency token) that guarantees the job runs only once. After it runs successfully, this ID cannot be reused.
Export Format	Export Job to S3 Bucket	Determines one of the following formats to export data: DYNAMODB JSON ION
Import Format	Import Job from S3 Bucket	Determines the source data file format, as follows: .csv .json *.ion
S3 Bucket Name	Export Job to S3 Bucket Import Job from S3 Bucket	Defines the Amazon S3 bucket name to export and import to and from the table.
S3 Path Prefix	Export Job to S3 Bucket Import Job from S3 Bucket	Defines the Amazon S3 bucket prefix to use as the filename and path of the table. AWSDynamoDB/01654668915125-be3574ee/data/vejljoqgiqyexew2cxgetylg6u.json.gz
S3 Bucket Owner ID	Export Job to S3 Bucket Import Job from S3 Bucket	Defines the ID of the AWS account that owns the bucket.
Table ARN	Export Job to S3 Bucket Import Job from S3 Bucket	Defines the Amazon Resource Name (ARN) associated with the table to export.
Import Compression Type	Import Job from S3 Bucket	Determines one of the following compression types to compress the data from the imported table: GZIP ZSTD No Compression
Table Creation Parameters	Import Job from S3 Bucket	Defines the name of the new table where the data is imported, as appears in the following example: Copy `"Attribute Definitions": [ { "AttributeName": "Id". "AttributeType": "N" }] "KeySchema": [ { "AttributeName": "Id". "KeyType": "HASH" }] "BillingMode": "PROVISIONED", "ProvisionedThroughput": { "RealCapacityUnits": 1, "WriteCapacityUnits": 1 }`
Table Name	Import Job from S3 Bucket	Defines the name of the new table where the data is imported.
Status Polling Frequency	All Actions	Determines the number of seconds to wait before checking the job status. Default: 20
Failure Tolerance	Export Job to S3 Bucket Import Job from S3 Bucket	Determines the number of times to check the job status before the job ends Not OK. Default: 0

Amazon EMR Job

Amazon EMR is a managed cluster platform that enables you to execute big data frameworks, such as Apache Hadoop and Apache Spark, to process and analyze vast amounts of data.

To create an Amazon EMR job, see Creating a Job. For more information about this plug-in, see Control-M for Amazon EMR.

The following table describes Amazon EMR job attributes.

Attribute	Description
Connection Profile	Determines the authorization credentials that are used to connect Control-M to Amazon EMR, as described in Amazon EMR Connection Profile Parameters. Rules: Characters: 1−30 Case Sensitive: Yes Invalid Characters: Blank spaces.
Cluster ID	Defines the name of the Amazon EMR cluster to connect to the Notebook. In the EMR API, this field is called the Execution Engine ID.
Notebook ID	Determines which Notebook ID executes the script. In the EMR API, this field is called the Editor ID.
Relative Path	Defines the pathname of the script file in the Notebook.
Notebook Execution Name	Defines the job execution name.
Service Role	Defines the service role that connects to the Notebook.
Use Advanced JSON Format	Determines whether to provide Notebook execution information through JSON code. This JSON Body parameter replaces the values of the following parameters (Cluster ID, Notebook ID, Relative Path, Notebook Execution Name, and Service Role).
JSON Body	Defines Notebook execution settings, in JSON format, as shown in the following example: Copy `{ "EditorId": "e-DJJ0HFJKU71I9DWX8GJAOH734", "RelativePath": "ShowWaitingAndRunningClustersTest2.ipynb", "NotebookExecutionName": "Tests", "ExecutionEngine": { "Id": "j-AR2G6DPQSGUB" }, "ServiceRole": "EMR_Notebooks_DefaultRole" }` For a description of the syntax of this JSON, see the description of StartNotebookExecution in the Amazon EMR API Reference.

Amazon Redshift Job

Amazon Redshift is a cloud data warehouse service to handle large-scale data analytics.

To create an Amazon Redshift job, see Creating a Job. For more information about this plug-in, see Control-M for Amazon Redshift.

The following table describes the Amazon Redshift job type attributes.

Attribute	Action	Description
Connection Profile	All	Determines the authorization credentials that are used to connect Control-M to Amazon Redshift, as described in Amazon Redshift Connection Profile Parameters. Rules: Characters: 1−30 Case Sensitive: Yes Invalid Characters: Blank spaces.
Actions	All	Determines one of the following Amazon Redshift actions to perform: Redshift SQL Statement: Enables you to run all Redshift commands. Unload Data Into S3: Enables you to move data from Redshift to an S3 bucket in CSV or JSON file format. Copy Data Into Redshift: Enables you to copy CSV files from an S3 bucket to a Redshift table. Run Procedure: Enables you to run a stored procedure.
Workgroup Name	All	Defines the workgroup for this job. Workgroups can consist of users, teams, applications, or workloads, and they can set limits for the data that each query or group processes.
Secret Manager ARN	All	Defines the Amazon Resource Name (ARN) associated with the AWS Secrets Manager, which securely stores and manages the database credentials.
Database	All	Defines the database in Amazon Redshift.
Load Redshift SQL Statement	Redshift SQL Statement Unload Data Into S3	Defines a query generated in the database as an .sql file.
Show Statement Results	Redshift SQL Statement	Determines whether to display the statement results.
S3 Bucket URI	Unload Data Into S3 Copy Data Into Redshift	Defines the full URI of the S3 bucket that contains the extracted query results.
File Format	Unload Data Into S3	Determines one of the following file formats of the file placed in the S3 bucket: csv json
Use IAM Role for S3 Access	Unload Data Into S3 Copy Data Into Redshift	Determines whether to use an IAM Role to access the S3 bucket.
IAM Role ARN	Unload Data Into S3 Copy Data Into Redshift	Defines the Amazon Resource Name (ARN) of the AWS IAM Role. An ARN is a standardized AWS resource address. The AWS IAM role must be granted read and write privileges to create or update any of the AWS resources that are in the stack. arn:aws:iam::12345678910:role/AWS-QuickSetup-StackSet-Local-AdministrationRole
Table Name	Copy Data Into Redshift	Defines the name of the new table where the data is imported.
Procedure Name	Run Procedure	Defines the name of an existing procedure in Amazon Redshift.
Procedure Arguments	Run Procedure	Defines the arguments for the procedure that you run. If you do not add an argument, type ().
Status Polling Frequency	All	(Optional) Determines the number of seconds to wait before checking the job status. Default: 10
Failure Tolerance	All	Determines the number of times to check the job status before the job ends Not OK. Default: 2

Azure Databricks Job

Azure Databricks is a cloud-based data analytics platform that enables you to process and analyze large workloads of data.

To create an Azure Databricks job, see Creating a Job. For more information about this plug-in, see Control-M for Azure Databricks.

The following table describes the Azure Databricks job type attributes.

Attribute	Description
Connection Profile	Determines the authorization credentials that are used to connect Control-M to Azure Databricks, as described in Azure Databricks Connection Profile Parameters. Rules: Characters: 1−30 Case Sensitive: Yes Invalid Characters: Blank spaces. Variable Name: %%AZURE-ACCOUNT
Databricks Job ID	Defines the job ID created in a Databricks workspace.
Parameters	Defines task parameters to override when the job runs, according to the Databricks convention. Your list of parameters must begin with the name of the parameter type, as shown in the following example: Copy `"notebook_params": {"param1":"val1", "param2":"val2"} "jar_params": ["param1", "param2"]` For more information about the parameter types, review the properties of RunParameters in the OpenAPI specification provided in the Azure Databricks documentation. For no parameters, type the following code: Copy `"params": {}`
Idempotency Token	(Optional) Defines a token to use to rerun job runs that timed out in Databricks. Values: Control-M-Idem_%%ORDERID: With this token, upon re-execution, Control-M invokes the monitoring of the existing job execution in Databricks. Any other value: Replaces the Control-M idempotency token. When you re-execute a job using a different token, Databricks creates a new job execution with a new unique run ID. Default: Control-M-Idem_%%ORDERID
Show Tasks Output	Determines if the task level output according to user preferences should be displayed. Default: Off
Status Polling Frequency	(Optional) Determines the number of seconds to wait before checking the job status. Default: 30
Failure Tolerance	Determines the number of times to check the job status before the job ends Not OK. Default: 1

Azure HDInsight Job

Azure HDInsight enables you to execute an Apache Spark batch job and perform big data analytics.

To create an Azure HDInsight job, see Creating a Job. For more information about this plug-in, see Control-M for Azure HDInsight.

The following table describes Azure HDInsight job parameters:

Attribute	Description
Connection Profile	Determines the authorization credentials that are used to connect Control-M to Azure HDInsight, as described in Azure HDInsight Connection Profile Parameters. Rules: Characters: 1−30 Case Sensitive: Yes Invalid Characters: Blank spaces.
Parameters	Determines which parameters are passed to the Apache Spark Application when the job runs, in JSON format (name:value pairs). This JSON must include the file and className elements.
Status Polling Interval	Determines the number of seconds to wait before checking the job status. Default: 10 seconds
Bring job logs to output	Determines whether logs from Apache Spark appear in the job output.

Azure Synapse Job

Azure Synapse Analytics enables you to perform data integration and big data analytics.

To create an Azure Synapse job, see Creating a Job. For more information about this plug-in, see Control-M for Azure Synapse.

The following table describes Azure Synapse job parameters:

Attribute	Description
Connection Profile	Determines the authorization credentials that are used to connect Control-M to Azure Synapse, as described in Azure Synapse Connection Profile Parameters.
Pipeline Name	Defines the name of a pipeline that you defined in your Azure Synapse workspace.
Parameters	Defines pipeline parameters, in JSON format, to override when the job runs, as shown in the following example: Copy `{"param1":"value1", "param2":"value2"}` For no parameters, type {}.
Status Polling Interval	(Optional) Determines the number of seconds to wait before checking the job status. Default: 20 seconds

Databricks Job

Databricks enables you to integrate jobs created in the Databricks environment with your existing Control-M workflows.

To create a Databricks job, see Creating a Job. For more information about this plug-in, see Control-M for Databricks.

The following table describes the Databricks job type attributes:

Attribute	Description
Connection Profile	Determines the authorization credentials that are used to connect Control-M to Databricks, as described in Databricks Connection Profile Parameters.
Databricks Job ID	Defines the job ID created in a Databricks workspace.
Parameters	Defines task parameters, in JSON format, to override when the job executes, according to the Databricks convention. Your list of parameters must begin with the name of the parameter type, as shown in the following example: Copy `"notebook_params": {"param1":"val1", "param2":"val2"} "jar_params": ["param1", "param2"]` For more information about the parameter types, review the properties of RunParameters in the OpenAPI specification provided through the Azure Databricks documentation. For no parameters, type the following code: Copy `"params": {}`
Idempotency Token	(Optional) Defines a token to use to rerun job runs that timed out in Databricks. Values: Control-M-Idem_%%ORDERID: With this token, upon re-execution, Control-M invokes the monitoring of the existing job execution in Databricks. Any other value: Replaces the Control-M idempotency token. When you re-execute a job using a different token, Databricks creates a new job execution with a new unique run ID. Default: Control-M-Idem_%%ORDERID
Show Tasks Output	Determines if the task level output according to user preferences should be displayed. Default: Off
Status Polling Frequency	(Optional) Determines the number of seconds to wait before checking the job status. Default: 30
Failure Tolerance	Determines the number of times to check the job status before the job ends Not OK. Default: 2

dbt Job

dbt (Data Build Tool) is a cloud-based computing platform that enables you to develop, test, schedule, document, and analyze data models.

To create a dbt job, see Creating a Job. For more information about this plug-in, see Control-M for dbt.

The following table describes the dbt job type attributes.

Attribute	Description
Connection Profile	Determines the authorization credentials that are used to connect Control-M to dbt, as described in dbt Connection Profile Parameters. Rules: Characters: 1−30 Case Sensitive: Yes Invalid Characters: Blank spaces.
DBT Job ID	Defines the ID of the preexisting job in the dbt platform that you want to run.
Run Comment	Defines a free-text description of the job.
Override Job Commands	Determines whether to override the predefined dbt job commands.
Define Commands	Defines the new dbt job commands. dbt test dbt run
Status Polling Frequency	Determines the number of seconds to wait before checking the job status. Default: 10
Failure Tolerance	Determines the number of times to check the job status before the job ends Not OK. Default: 2

GCP BigQuery Job

Google Cloud Platform (GCP) BigQuery is a cloud-computing platform that enables you to process, analyze, and store your data.

To create a GCP BigQuery job, see Creating a Job. For more information about this plug-in, see Control-M for GCP BigQuery.

The following table describes the GCP BigQuery job type attributes.

Attribute	Action	Description
Connection Profile		Determines the authorization credentials that are used to connect Control-M to GCP BigQuery, as described in GCP BigQuery Connection Profile Parameters. Rules: Characters: 1−30 Case Sensitive: Yes Invalid Characters: Blank spaces.
Project Name	All Actions	Defines the name of the predefined Google Cloud project with configured APIs, authentication information, billing details, and job resources.
Dataset Name	Query Extract Routine	Determines the database that the job uses.
Action		Determines one of the following GCP BigQuery actions to perform: Query: Runs one or more SQL statements that are supported by GCP BigQuery. Copy: Creates a copy of an existing table. Load: Loads source data into an existing table. Extract: Exports data from an existing table into Google Cloud Storage. Routine: Runs a stored procedure, table function, or previously defined function.
Run Select Query and Copy to Table	Query	(Optional) Determines whether to paste the results of a SELECT statement into a new table.
Query Type	Query	Determines the type of query. Valid Values: SQL Query Load Query From Text File Default: SQL Query
Load Query From File	Query	Determines the text file that contains the query you want to run. The Query Type value must be Load Query From Text File.
Table Name	Query Extract	Defines the new table name.
SQL Statement	Query	Defines one or more SQL statements supported by GCP BigQuery. Rule: It must be written in a single line, with character strings separated by one space only.
Query Parameters	Query	Defines the query parameters, in JSON format, which enable you to control the presentation of the data, as shown in the following example: Copy `{ "name": "IFteam", "paramterType": {"type": "STRING"}, "parameterValue": {"value": "BMC"} }`
Copy Operation Type	Copy	Determines one of the following copy operations: Clone: Creates a copy of a base table that has write access. Snapshot: Creates a read-only copy of a base table. Copy: Creates a copy of a snapshot. Restore: Creates a writable table from a snapshot.
Source Table Properties	Copy	Defines the properties of the table, in JSON format, that is cloned, backed up, or copied. You can copy or back up one or more tables at a time, as shown in the following example: Copy `{ { "datasetId": "Test1", "projectId": "SomeProj1", "tableId": "IFteam1" } { "datasetId": "Test2", "projectId": "SomepProj2", "tableId": "IFteam2" } }`
Destination Table Properties	Copy Load	Defines the properties of a new table, in JSON format, as shown in the following example: Copy `{ "datasetId": "Test3", "projectId": "SomeProj3", "tableId": "IFteam3" }`
Destination/Source Bucket URIs	Load Extract	Defines the source or destination data URI for the table that you are loading or extracting. You can load or extract multiple tables. Rule: Separate elements with ,. "gs://source1_site1/source1.json"
Show Load Options	Load	Determines whether to add more fields to a table that you are loading.
Load Options	Load	Defines additional fields, in JSON format, for the table that you are loading, as shown in the following example: Copy `"schema": { "fields": [ { "name": "name1", "type": "STRING1" } { "name": "name2", "type": "STRING2" } { "name": "name3", "type": "STRING3" } ] }`
Extract As	Extract	Determines which file format the data is exported in, as follows: .csv .json
Routine	Routine	Defines a routine and the values that it must run, as shown in the following example: Copy `Call new_r(‘value1’)`
Job Timeout	All Actions	Determines the maximum number of milliseconds to run the GCP BigQuery job. Default: 30,000 milliseconds (30 seconds)
Connection Timeout	All Actions	Determines the number of seconds to wait after Control-M initiates a connection request before a timeout occurs. Default: 10
Status Polling Frequency	All Actions	Determines the number of seconds to wait before checking the job status. Default: 5

GCP Dataflow Job

Google Cloud Platform (GCP) Dataflow enables you to perform cloud-based data processing for batch and real-time data streaming applications.

To create a GCP Dataflow job, see Creating a Job. For more information about this plug-in, see Control-M for GCP Dataflow.

The following table describes the GCP Dataflow job type attributes.

Attribute	Description
Connection profile	Determines the authorization credentials that are used to connect Control-M to GCP Dataflow, as described in GCP Dataflow Connection Profile Parameters.
Project ID	Defines the identifier of the GCP project where the job runs. A project is a set of configuration settings that define the resources the jobs utilize and how they interact with GCP.
Location	Defines the Google Compute Engine region to create the job.
Template Type	Defines one of the following types of GCP Dataflow templates: Classic Template: Developers run the pipeline and create a template. The Apache Beam SDK stages files in Cloud Storage, creates a template file (similar to job request), and saves the template file in Cloud Storage. Flex Template: Developers package the pipeline into a Docker image and then use the Google Cloud CLI to build and save the Flex Template spec file in Cloud Storage.
Template Location (gs://)	Defines the URL on Google Cloud Storage for the file that contains the Template definition, in the following format: gs://bucketname/filename
Parameters (JSON Format)	Defines input parameters, in JSON format, to be passed on to job execution. You must include the jobname and parameters elements, as shown in the following example: Copy `{ "jobName": "wordcount", "parameters": { "inputFile": "gs://dataflow-samples/shakespeare/kinglear.txt", "output": "gs://controlmbucket/counts" } }`
Verification Poll Interval (in seconds)	(Optional) Determines the number of seconds to wait before checking the job status. Default: 10
Log Level	Determines one of the following levels of details to retrieve from the GCP logs in the case of job failure: TRACE DEBUG INFO WARN ERROR

GCP Dataproc Job

Google Cloud Platform (GCP) Dataproc enables you to perform cloud-based big data processing and machine learning.

To create a GCP Dataproc job, see Creating a Job. For more information about this plug-in, see Control-M for GCP Dataproc.

The following table describes the GCP Dataproc job type attributes.

Attribute	Description
Connection profile	Determines the authorization credentials that are used to connect Control-M to GCP Dataproc, as described in GCP Dataproc Connection Profile Parameters.
Project ID	Defines the identifier of the GCP project where the job runs. A project is a set of configuration settings that define the resources the jobs utilize and how they interact with GCP.
Account Region	Defines the Google Compute Engine region to create the job.
Dataproc task type	Defines one of the following Dataproc task types to execute: Workflow Template: Reusable workflow configuration that defines a graph of jobs with information on where to execute those jobs. Job: A single Dataproc job. Batches: Dataproc Serverless for Spark Batch. Interactive: An interactive job that deletes or terminates the interactive session resource.
Workflow Template	(For a Workflow Template task type) Defines the ID of a Workflow Template.
Batch ID	Defines the ID that will become the final component of the batch resource name. Valid Values: 4-63 characters. The letters a-z, and the numbers 0-9. batch-e7f10
Requested ID	(Optional) Defines the unique ID that is used to identify the request. If the service receives two CreateBatchRequests with the same Requesed ID, the second request is ignored and the operation that corresponds to the first Batch which was created and stored in the backend is returned. Valid Values: 0-40 characters. The letters a-z, A-Z, numbers 0-9, _, and -.
Parameters (JSON Format)	(For a Job task type) Defines input parameters to be passed on to job execution, in JSON format. You retrieve this JSON content from the GCP Dataproc UI, using the EQUIVALENT REST option in job settings.
Verification Poll Interval (in seconds)	(Optional) Determines the number of seconds to wait before checking the job status. Default: 20
Tolerance	Determines the number of times to check the job status before ending Not OK. Default: 2

Hadoop Job

The Hadoop job connects to the Hadoop framework, which enables you to split up and process large data sets on clusters of commodity servers. You can expand your enterprise business workflows to include tasks that execute in your Big Data Hadoop cluster in Control-M with the different Hadoop-supported tools, including Pig, Hive, HDFS File Watcher, Map Reduce Jobs, and Sqoop.

To create a Hadoop job, see Creating a Job. For more information about this plug-in, see Control-M for Hadoop.

The following table describes the Hadoop job type attributes.

Attribute	Description
Connection Profile	Determines the authorization credentials that are used to connect Control-M to Hadoop, as described in Hadoop Connection Profile Parameters. Rules: Characters: 1−30 Case Sensitive: Yes Invalid Characters: Blank spaces. Variable Name: %%HDP-ACCOUNT
Execution Type	Determines the execution type for Hadoop job execution, as follows: DistCp Job Distributed Shell Job HDFS Commands Job HDFS File Watcher Job Impala Job Hive Job Java-Map-Reduce Job Oozie Job Pig Job Spark Job Sqoop Job Streaming Job Tajo Job Variable Name: %%HDP-EXEC_TYPE
Pre Commands	Defines the Pre commands performed before job execution (not for HDFS Commands jobs and Oozie Extractor jobs), and the argument for each command.
Fail the job if the command fails	Determines whether the entire job fails if any of the Pre commands fail (not for HDFS Commands jobs and Oozie Extractor jobs).
Post Commands	Defines the Post commands performed before job execution (not for HDFS Commands jobs and Oozie Extractor jobs), and the argument for each command.
Fail the job if the command fails	Determines whether the entire job fails if any of the Post commands fail (not for HDFS Commands jobs and Oozie Extractor jobs).

DistCp Job Attributes

The following table describes the DistCp job attributes.

Attribute	Description
Target Path	Defines the absolute destination path. Variable Name: %%HDP-DISTCP_TARGET_PATH
Source Path	Defines the source paths. Variable Name: %%HDP-DISTCP_SOURCE_PATH-Nxxx_ARG
Command Line Options	Defines the sets of attributes and values that are added to the command line. Variable Names: Name: %%HDP-DISTCP_OPTION-Nxxx-NAME Value: %%HDP- DISTCP_OPTION-Nxxx-VAL
Append Yarn aggregated logs to output	Determines whether to add Yarn aggregated logs to the job output A tab in the job properties pane of the Monitoring domain where the job output appears that indicates whether a job ended OK, and is used, for example, with jobs that check file location..

Distributed Shell Job Attributes

The following table describes the Distributed Shell job attributes.

Attribute	Description
Shell Type	Determines what the Distributed Shell job executes, as follows: Command: Executes a shell command entry as defined by Command. Script File: Executes a script file as defined by Command, Script Full Path, and Shell Script Arguments. Variable Name: %%HDP-SHELL_TYPE
Command	Defines the shell command entry to execute for the job execution. Variable Name: %%HDP-SHELL_COMMAND
Script Full Path	Defines the pathname of the script file that is executed. The script file is located in the HDFS. Variable Name: %%HDP-SHELL_SCRIPT_FULL_PATH
Shell Script Arguments	Defines the shell script arguments. Variable Name: %%HDP-SHELL-Nxxx-ARG
More Options	Opens more attributes.
Files/Archives	Defines the pathname of the file or archive that is uploaded as a dependency to the HDFS working directory. Variable Names: Type: %%HDP-SHELL_FILE_DEP-Nxxx-TYPE Path: %%HDP-SHELL_FILE_DEP -Nxxx-PATH
Options	Defines the additional option (Name and Value) to set when executing the job. Variable Names: Name: %%HDP-SHELL_OPTION -Nxxx-NAME Value: %%HDP-SHELL_OPTION -Nxxx-VAL
Environment Variables	Defines the environment variables for the shell script/command. Variable Name: %%HDP-SHELL_ENV_VARIABLE-Nxxx-ARG
Append Yarn aggregated logs to output	Determines whether to add Yarn aggregated logs to the job output.

HDFS Commands Job Attributes

The following table describes the HDFS Commands job attributes.

Attribute	Description
Command	Defines the command for the argument to be performed with job execution. Variable Name: %%HDP-HDFS_CMD_ACTION-Nxxx-CMD
Arguments	Defines the argument used by the command. Variable Name: %%HDP-HDFS_CMD_ACTION-Nxxx-ARG

Attribute

Description

Command

Defines the command for the argument to be performed with job execution.

Variable Name: %%HDP-HDFS_CMD_ACTION-Nxxx-CMD

Arguments

Defines the argument used by the command.

Variable Name: %%HDP-HDFS_CMD_ACTION-Nxxx-ARG

HDFS File Watcher Job Attributes

The following table describes the HDFS File Watcher job attributes.

Attribute	Description
File name full path	Defines the pathname of the watched file. Variable Name: %%HDP-HDFS_FILE_PATH
Min detected size	Determines the minimum file size in bytes to meet the criteria and finish the job as OK. If the file arrives, but the size is not met, the job continues to watch the file. Variable Name: %%HDP-MIN_DETECTED_SIZE
Max time to wait	Determines the maximum number of minutes to wait for the file to meet the watching criteria. If criteria are not met (file did not arrive, or minimum size was not reached) the job fails after this maximum number of minutes. Variable Name: %%HDP-MAX_WAIT_TIME
File Name Variable	Defines the variable name that is used in succeeding jobs. Variable Name: %%HDP-FW_DETECTED _FILE_NAME_VAR

Impala Job Attributes

The following table describes the Impala job attributes.

Attribute	Description
Source	Determines the source type to execute the queries, as follows: Query File: Executes a query file as defined by Query File Full Path. Open Query: Executes an open query command as defined by Query. Variable Name: %%HDP-IMPALA_QUERY_SOURCE
Query File Full Path	Defines the location of the file used to execute the queries. Variable Name: %%HDP-IMPALA_QUERY_FILE_PATH
Query	Defines the query command used to execute the queries. Variable Name: %%HDP-IMPALA_OPEN_QUERY
Command Line Options	Defines the sets of attributes and values that are added to the command line. Variable Name: %%HDP-HDP-IMPALA_CMD_OPTION-Nxxx-ARG

Hive Job Attributes

The following table describes the Hive job attributes.

Attribute	Description
Full path to Hive script	Defines the pathname of the Hive script on the Hadoop host. Variable Name: %%HDP-HIVE_SCRIPT_NAME
Script Parameters	Defines the list of parameters for the script. Variable Names: Name: %%HDP-HIVE_SCRIPT_PARAM_Nxxx-NAME Value: %%HDP-HIVE_SCRIPT_PARAM-Nxxx-VAL
Append Yarn aggregated logs to output	Determines whether to add Yarn aggregated logs to the job output.

Attribute

Description

Full path to Hive script

Defines the pathname of the Hive script on the Hadoop host.

Variable Name: %%HDP-HIVE_SCRIPT_NAME

Script Parameters

Defines the list of parameters for the script.

Variable Names:

Name: %%HDP-HIVE_SCRIPT_PARAM_Nxxx-NAME
Value: %%HDP-HIVE_SCRIPT_PARAM-Nxxx-VAL

Append Yarn aggregated logs to output

Determines whether to add Yarn aggregated logs to the job output.

Java-Map-Reduce Job Attributes

The following table describes the Java Map-Reduce job attributes.

Attribute	Description
Full path to Jar	Defines the pathname of the jar file that contains the Map Reduce Java program on the Hadoop host. Variable Name: %%HDP-JAVA_JAR_NAME
Main Class	Defines the class that is included in the jar containing a main function and the map reduce implementation. Variable Name: %%HDP-JAVA_MAIN_CLASS
Arguments	Defines the argument used by the command. Variable Name: %%HDP-JAVA_Nxxx_ARG
Append Yarn aggregated logs to output	Determines whether to add Yarn aggregated logs to the job output.

Oozie Job Attributes

The following table describes the Oozie job attributes.

Attribute	Description
Job Properties File	Defines the job properties pathname. Variable Name: %%HDP-OOZIE_JOB_PROPERTIES_FILE
Job Properties (Add/Overwrite)	Defines the Oozie job properties. A set of properties is comprised of the following: Key: Defines a key name associated with each property. Variable Name: %%HDP-OOZIE_PROPERTY-Nxxx-KEY Value: Defines a value associated with each property. Variable Name: %%HDP-OOZIE_PROPERTY-Nxxx-VAL You can add new properties or override property values defined in the Job Properties File.
Rerun from point of failure	Determines whether to re-execute an Oozie job from the point of its failure.

Attribute

Description

Job Properties File

Defines the job properties pathname.

Variable Name: %%HDP-OOZIE_JOB_PROPERTIES_FILE

Job Properties (Add/Overwrite)

Defines the Oozie job properties.

A set of properties is comprised of the following:

Key: Defines a key name associated with each property.

Variable Name: %%HDP-OOZIE_PROPERTY-Nxxx-KEY
Value: Defines a value associated with each property.

Variable Name: %%HDP-OOZIE_PROPERTY-Nxxx-VAL

You can add new properties or override property values defined in the Job Properties File.

Rerun from point of failure

Determines whether to re-execute an Oozie job from the point of its failure.

Pig Job Attributes

The following table describes the Pig job attributes.

Attribute	Description
Full Path to Pig Program	Defines the pathname of the Pig program on the Hadoop host. Variable Name: %%HDP-PIG_PROG_NAME
Pig Program Parameters	Defines the list of program parameters.
Append Yarn aggregated logs to output	Determines whether to add Yarn aggregated logs to the job output.
Properties	Defines a list of properties (Name and Value) to be executed with the job. These properties override the Hadoop defaults.
Archives	Defines the location of the Hadoop archives.
Files	Defines the location of the Hadoop files.

Spark Job Attributes

The following table describes the Spark job attributes.

Attribute	Description
Program Type	Determines the Spark program type, as follows: Python Script: As defined by Full Path to Script. Java / Scala Application: As defined by Application Jar File and Full Path to Script. Variable Name: %%HDP-SPARK_PROG_TYPE
Full Path to Script	Defines the pathname of the Python script that executes. Variable Name: %%HDP-SPARK_FULL_PATH_TO_PYTHON_SCRIPT
Application Jar File	Defines the path to the jar including your application and all the dependencies. Variable Name: %%HDP-SPARK_APP_JAR_FULL_PATH
Main Class to Run	Defines the main class of the application. Variable Name: %%HDP-SPARK_MAIN_CLASS_TO_RUN
Application Arguments	Defines the attribute arguments that are added at the end of the Spark command line either after the main class for Java / Scala Applications or after the script of the Python Script. Variable Name: %%HDP-SPARK_Nxxx_ARG
Command Line Options	Defines the sets of attributes and values that are added to the command line. Variable Names: Name: %%HDP-SPARK_OPTION -Nxxx-NAME Value: %%HDP-SPARK_OPTION -Nxxx-VAL
Append Yarn aggregated logs to output	Determines whether to add Yarn aggregated logs to the job output.

Sqoop Job Attributes

The following table describes the Sqoop job attributes.

Attribute	Description
Command Editor	Defines any valid Sqoop command necessary for job execution. Sqoop can only be used for job execution if defined in Sqoop connection attributes. HDP-SQOOP_COMMAND
Append Yarn aggregated logs to output	Determines whether to add Yarn aggregated logs to the job output.
Properties	Defines a list of properties (Name and Value) to be executed with the job. These properties override the Hadoop defaults.
Archives	Defines the location of the Hadoop archives.
Files	Defines the location of the Hadoop files.

Streaming Job Attributes

The following table describes the Streaming job attributes.

Attribute	Description
Input Path	Defines the input file for the Mapper step. Variable Name: %%HDP-INPUT_PATH
Output Path	Defines the HDFS output path for the Reducer step. Variable Name: %%HDP-OUTPUT_PATH
Mapper Command	Defines the command that executes as a mapper. Variable Name: %%HDP-MAPPER_COMMAND
Reducer Command	Defines the command that executes as a reducer. Variable Name: %%HDP-REDUCER_COMMAND
Streaming Options	Defines the sets of attributes (Name and Value) that are added to the end of the Streaming command line. Variable Names: Name: %%HDP-STREAMING_PARAM-Nxxx-NAME Value: %%HDP-STREAMING_PARAM-Nxxx-VAL
Generic Options	Defines the sets of attributes (Name and Value) that are added to the Streaming command line. Variable Names: Name: %%HDP-GENERIC_PARAM-Nxxx-NAME Value: %%HDP-GENERIC_PARAM-Nxxx-VAL
Append Yarn aggregated logs to output	Determines whether to add Yarn aggregated logs to the job output.

Tajo Job Attributes

The following table describes the Tajo job attributes.

Attribute	Description
Command Source	Determines the source of the Tajo command, as follows: Input File: Executes the Tajo command from an input file as defined by the Full File Path. Variable Name: %%HDP-TAJO_INPUT_FILE Open Query: Executes an open query as the Tajo command, as defined by Open Query. Variable Name: %%HDP-TAJO_OPEN_QUERY
Full File Path	Defines the pathname of the input file that executes the Tajo command.
Open Query	Defines the query. Variable Name: %%HDP-TAJO_OPEN_QUERY

Attribute

Description

Command Source

Determines the source of the Tajo command, as follows:

Input File: Executes the Tajo command from an input file as defined by the Full File Path.

Variable Name: %%HDP-TAJO_INPUT_FILE
Open Query: Executes an open query as the Tajo command, as defined by Open Query.

Variable Name: %%HDP-TAJO_OPEN_QUERY

Full File Path

Defines the pathname of the input file that executes the Tajo command.

Open Query

Defines the query.

Variable Name: %%HDP-TAJO_OPEN_QUERY

OCI Data Flow Job

Oracle Cloud Infrastructure (OCI) Data Flow is a fully managed Apache Spark service that performs processing tasks on extremely large datasets.

To create an OCI Data Flow job, see Creating a Job. For more information about this plug-in, see Control-M for OCI Data Flow.

The following table describes the OCI Data Flow job attributes.

Attribute	Description
Connection Profile	Determines the authorization credentials that are used to connect Control-M to OCI Data Science Services, as described in OCI Data Flow Connection Profile Parameters. Rules: Characters: 1−30 Case Sensitive: Yes Invalid Characters: Blank spaces.
Run Name	Defines the name of a new Run.
Compartment OCID	Defines the compartment Oracle Cloud Identifier (OCID) which is a unique identifier assigned to each compartment that is created within the Oracle Data Flow Infrastructure.
Application OCID	Defines the application Oracle Cloud Identifier (OCID) which is a unique identifier assigned to each application that is created within the Oracle Data Flow Infrastructure.
Additional Run Details	(Optional) Determines whether to add more parameters to the new job run. Valid Values: Checked Unchecked Default: Unchecked
Run Details Configuration	(Optional) Defines specific parameters, in JSON format, that are passed when you create a new Run. For more information about the run parameters, see CreateRunDetails Reference 20200129 in the Oracle Cloud Infrastructure Documentation. Copy `{ "displayName": "<run_name>", "applicationId": "<application_ocid>", "compartmentId": "<compartment_ocid>", "driverShape": "VM.Standard.E4.Flex", "executorShape": "VM.Standard.E4.Flex", "numExecutors": 1, "arguments": [], "parameters": [], "configuration": {} }`
Status Polling Frequency	Determines the number of seconds to wait before checking the job status. Default: 60
Failure tolerance	Determines the number of times to check the job status before the job ends Not OK. Default: 2

Snowflake Job

Snowflake is a cloud-computing platform that enables you to process, analyze, and store your data.

To create a Snowflake job, see Creating a Job. For more information about this plug-in, see Control-M for Snowflake.

The following table describes the Snowflake job type attributes.

Attribute	Action	Description
Connection Profile		Determines one of the following types of authorization credentials, which are used to connect Control-M to Snowflake: Snowflake: Described in Snowflake Connection Profile Parameters. Snowflake IdP: Described in Snowflake IdP Connection Profile Parameters. Rules: Characters: 1−30 Case Sensitive: Yes Invalid Characters: Blank spaces.
Database		Determines the database that the job uses.
Schema		Determines the schema that the job uses. A schema is an organizational model that describes the database layout, its table and field definitions, and their relationships to each other.
Action		Determines one of the following Snowflake actions to perform: SQL Statement: Runs any number of Snowflake-supported SQL commands, such as queries, calling or creating procedures, database maintenance tasks, and creating and editing tables. Copy from Query: Copies a queried database and schema into an existing or new file in cloud storage. Copy from Table: Copies from an existing table. Create Table and Query: Creates a table, populated by a query, in the specified database and schema. Copy into Table: Copies data from a cloud storage location into an existing table in Snowflake. Start or Pause Snowpipe: Starts or pauses an existing Snowpipe. Stored Procedure: Calls an existing procedure and its arguments. Snowpipe Load Status: Monitors the status of a Snowpipe for a set period of time. Run SQL File: Uploads a file that contains Snowflake-supported SQL commands.
Snowflake SQL Statement	SQL Statement	Determines one or more Snowflake-supported SQL commands. Rule: Must be written in a single line, with strings separated by one space only.
Load SQL File	Run SQL File	Defines the pathname of the file that contains Snowflake-supported SQL commands.
Statement Timeout	All Actions	Determines the maximum number of seconds to run the job in Snowflake.
Show More Options	All Actions	Determines whether the following job-defining attributes are displayed: Parameters Role Bindings Warehouse
Parameters	All Actions	Defines Snowflake-provided parameters, in JSON format, that let you control how data is presented, as shown in the following example: Copy `{ "param1":"value1", "param2":"value2" }`
Role	All Actions	Determines the Snowflake role used for this Snowflake job. A role is an entity that can be assigned privileges on secure objects. You can be assigned one or more roles from a limited selection.
Bindings	All Actions	Defines the values, in JSON format to bind to the variables used in the Snowflake job. The following JSON script defines two binding variables: Copy `"1": { "type": "FIXED", "value": "123" } "2": { "type": "TEXT", "value": "String" }` For more information on bindings, see the Snowflake documentation.
Warehouse	All Actions	Determines the warehouse used in the Snowflake job. A warehouse is a cluster of virtual machines that processes a Snowflake job.
Show Output	All Actions	Determines whether to show a full JSON response in the log output.
Status Polling Frequency	All Actions	Determines the number of seconds to wait before checking the job status. Default: 20
Query to Location	Copy from Query	Defines the cloud storage location.
Query Input	Copy from Query	Defines the query used to copy the data.
Storage Integration	Copy from Query Copy from Table Copy into Table	Defines the storage integration object, which stores an Identity and Access Management (IAM) entity and an optional set of blocked cloud storage locations.
Overwrite	Copy from Query Copy from Table	Determines whether to overwrite an existing file in the cloud storage, as follows: Yes No
File Format	Copy from Query Copy from Table	Determines the file format that the data is saved in, as follows: .csv .json
Copy Destination	Copy from Table	Defines where the JSON or CSV file is saved. You can save to Amazon Web Services, Google Cloud Platform, or Microsoft Azure. s3://<bucket name>/
From Table	Copy from Table	Defines the name of the copied table.
Create Table Name	Create Table and Query	Defines the name of the new or existing table where the data is queried.
Query	Create Table and Query	Defines the query used for the copied data.
Snowpipe Name	Start or Pause Snowpipe Snowpipe Load Status	Defines the name of the Snowpipe. A Snowpipe loads data from files when they are ready or staged.
Table Name	Copy into Table	Defines the name of the table that the data is copied into.
From Location	Copy into Table	Defines cloud storage location where the data is copied, in CSV or JSON format. s3://location-path/FileName.csv
Start or Pause Snowipe	Start or Pause Snowpipe	Determines whether to start or pause the Snowpipe, as follows: Start Snowpipe Pause Snowpipe
Stored Procedure Name	Stored Procedure	Defines the name of the stored procedure.
Procedure Argument	Stored Procedure	Defines the value of the argument in the stored procedure.
Table Name	Snowpipe Load Status	Defines the table that is monitored when loaded by the Snowpipe.
Stage Location	Snowpipe Load Status	Defines the cloud storage location. A stage is a pointer that indicates where data is stored or staged. s3://CloudStorageLocation/
Days Back	Snowpipe Load Status	Determines the number of days to monitor the Snowpipe load status.
Status File Cloud Location Path	Snowpipe Load Status	Defines the cloud storage location where a CSV file log is created. The CSV file log details the load status for each Snowpipe.
Storage Integration	Snowpipe Load Status	Defines the Snowflake configuration for the cloud storage location, defined in the previous attribute, Status File Cloud Location Path. S3_INT