Data Processing and Analytics Connection Profiles

The following topics describe connection profiles for data processing and analytics platforms and services:

ConnectionProfile:AWS Athena
ConnectionProfile:AWS Data Pipeline
ConnectionProfile:AWS DynamoDB
ConnectionProfile:AWS EMR
ConnectionProfile:Azure Databricks
ConnectionProfile:Azure HDInsight
ConnectionProfile:Azure Synapse
ConnectionProfile:Databricks
ConnectionProfile:DBT
ConnectionProfile:GCP BigQuery
ConnectionProfile:GCP DataFlow
ConnectionProfile:GCP Dataproc
ConnectionProfile:Hadoop
ConnectionProfile:OCI Data Flow
ConnectionProfile:Snowflake
ConnectionProfile:Snowflake IdP

ConnectionProfile:AWS Athena

AWS Athena enables you to process, analyze, and store your data in the cloud.

The following examples show how to define an AWS Athena connection profile.

This JSON-based connection profile authenticates with an AWS access key and secret:

Copy

"AWS_ATHENA":
{
   "Type": "ConnectionProfile:AWS Athena",
   "AWS API Base URL": "https://athena.us-east-1.amazonaws.com",
   "AWS Region": "us-east-1",
   "Authentication": "SECRET",
   "AWS Access Key": "ABCDEF",
   "AWS Secret Key": "******",
   "Description": "",
   "Centralized": true
}

This JSON-based connection profile authenticates with an AWS IAM role from inside an EC2 instance:

Copy

"AWS_ATHENA": 
{
   "Type": "ConnectionProfile:AWS Athena",
   "AWS API Base URL": "https://athena.us-east-1.amazonaws.com",
   "AWS Region": "us-east-1",
   "Authentication": "NOSECRET",
   "IAM Role": "ATHENAIAMROLE",
   "Description": "",
   "Centralized": true
}

The following table describes the AWS Athena connection profile parameters.

Parameter	Description
AWS API Base URL	Determines the authentication endpoint for AWS Athena, as follows: https://athena.<AWSRegion>.amazonaws.com For more information about regional endpoints available for the AWS Athena service, refer to the AWS documentation.
AWS Region	Determines the region where the AWS Athena jobs are located.
Authentication	Determines one of the following authentication methods: SECRET: Authenticates with an AWS access key and secret. NOSECRET: Authenticates with an AWS IAM role from within the AWS infrastructure.
AWS Access Key	Defines the AWS Athena account access key.
AWS Secret Key	Defines the AWS Athena account secret access key. You can use Secrets in Code to hide this value in the code.
IAM Role	Defines the Identity and Access Management (IAM) role for the AWS Athena connection.
Connection Timeout	Determines the number of seconds to wait after Control-M initiates a connection request to AWS Athena before a timeout occurs. Default: 20
Centralized	Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true.

ConnectionProfile:AWS Data Pipeline

AWS Data Pipeline is a cloud-based extract, transform, load (ETL) service that enables you to automate the transfer, processing, and storage of your data.

The following examples show how to define an AWS Data Pipeline connection profile.

This JSON-based connection profile authenticates with an AWS access key and secret:

Copy

"AWSDATAPIPELINESECRET ": 
{
   "Type": "ConnectionProfile:AWS Data Pipeline",
   "Authentication": "SECRET",
   "AWS Access Key": "MYAWSACCESSKEY1234",
   "AWS Secret": "myAwsSecret12345",
   "AWS Region": "us-east-1",
   "Data Pipeline URL": "https://datapipeline.{{AWSRegion}}.amazonaws.com",
   "Connection Timeout": "30",
   "Description": "",
   "Centralized": true
}

This JSON-based connection profile authenticates with an AWS IAM role from inside an EC2 instance:

Copy

"AWSDATAPIPELINEIAM":
{
   "Type": "ConnectionProfile:AWS Data Pipeline",
   "Authentication": "NOSECRET",
   "IAM Role": "IAMROLENAME",
   "AWS Region": "us-east-1",
   "Data Pipeline URL": "https://datapipeline.{{AWSRegion}}.amazonaws.com",
   "Connection Timeout": "30",
   "Description": "",
   "Centralized": true
}

The following table describes the AWS Data Pipeline connection profile parameters.

Parameter	Description
Authentication	Determines one of the following authentication methods for the connection with AWS Data Pipeline: SECRET: Authenticates with an AWS access key and secret. NOSECRET: Authenticates with an AWS IAM role from inside an EC2 instance.
AWS Access Key	(SECRET authentication) Defines the AWS Data Pipeline account access key.
AWS Secret	(SECRET authentication) Defines the AWS Data Pipeline account secret access key. You can use Secrets in Code to hide this value in the code.
IAM Role	(NOSECRET authentication) Defines the Identity and Access Management (IAM) role for connection to AWS.
AWS Region	Determines the region where the AWS Data Pipeline jobs are located.
AWS API Base URL	Defines the REST API URL for the AWS Data Pipeline regional endpoint, as follows: https://datapipeline.<AWS_Region>.amazonaws.com For more information about regional endpoints available for the AWS Data Pipeline service, refer to the AWS documentation.
Connection Timeout	Determines the number of seconds to wait before a timeout occurs after Control-M initiates a request to AWS Data Pipeline. Default: 30
Centralized	Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true.

ConnectionProfile:AWS DynamoDB

AWS DynamoDB is a NoSQL database service that enables you to create database tables, execute statements and transactions, export and import data to and from the Amazon S3 storage service.

For more information, see Control-M for AWS DynamoDB.

The following examples show how to define an AWS DynamoDB connection profile.

This JSON-based connection profile authenticates with AWS Key & Secret:

Copy

"AWS_DynamoDB": 
{
   "Type": "ConnectionProfile:AWS DynamoDB",
   "Authentication": "Secret",
   "AWS Secret": "*****",
   "AWS Region": "us-east-1",
   "AWS Access Key": "ZKIATY7B2LKB2JQ85I6D",
   "AWS Backup URL": "https://dynamodb.{{AWSRegion}}.amazonaws.com",
   "Description": "",
   "Connection Timeout": "100",
   "Centralized": true
}

This JSON-based connection profile authenticates with IAM Role:

Copy

"AWS_ADY_IAM": 
{
   "Type": "ConnectionProfile:AWS DynamoDB",
   "Authentication": "NoSecret",
   "AWS Region": "us-east-1",
   "AWS Backup URL": "https://dynamodb.{{AWSRegion}}.amazonaws.com",
   "IAM Role": "arn:aws:iam::122343212345:role/Amazon12SSMRoleForInstancesQuickSetup",
   "Description": "",
   "Connection Timeout": "50",
   "Centralized": true
}

The following table describes the AWS DynamoDB Connection Profile Parameters.

Parameter	Description
AWS DynamoDB Login URL	Determines the AWS DynamoDB authentication endpoint base URL that includes the region that is defined for the AWS account. https://dynamodb.<us-east-1>.amazonaws.com
AWS Region	Determines the region where the AWS DynamoDB jobs are located. us-east-1
Authentication	Determines one of the following authentication methods: (For AWS Access Key) Secret: Authenticates with an AWS access key and secret. (For AWS Secret) NoSecret: Authenticates with AWS IAM role from inside an EC2 instance.
AWS Access Key	Defines the AWS Access Key account access key.
AWS Secret	Defines the AWS Secret account secret access key. You can use Secrets in Code to hide this value in the code.
IAM Role	Defines the Identity and Access Management (IAM) role for the AWS DynamoDB connection.
Connection Timeout	Determines the number of seconds to wait after Control-M initiates a connection request to AWS Backup before a timeout occurs. Default: 20
Centralized	Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true.

ConnectionProfile:AWS EMR

AWS EMR is a managed cluster platform that enables you to execute big data frameworks, such as Apache Hadoop and Apache Spark, to process and analyze vast amounts of data.

The following example shows how to define an AWS EMR connection profile:

Copy

"AWS_EMR":
{
   "Type": "ConnectionProfile:AWS EMR",
   "AWSRegion": "us-east-1",
   "EMRAccessKey": “ABCDEF",
   "EMRSecretKey": "****",
   "Description": "",
   "Centralized": true
}

The following table describes the AWS EMR connection profile parameters.

Parameter	Description
AWSRegion	Determines the AWS region.
EMRAccessKey	Defines the token for the connection to AWS.
EMRSecretKey	Defines an additional security token for AWS. You can use Secrets in Code to hide this value in the code.
Centralized	Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true.

ConnectionProfile:Azure Databricks

Azure Databricks is a cloud-based data analytics platform that enables you to process and analyze large workloads of data.

The following example shows how to define a connection profile for an Azure Databricks instances job:

Copy

"ADF_SERVPRINC":
{
   "Type": "ConnectionProfile:Azure Databricks",
   "Tenant ID": "tenantId",
   "Application ID": "4f477fa3-1a1g-4877-ca92-f39bb563f3b1",
   "Client Secret": "*****", 
   "Databricks url": "https://adb-1111211144444680.0.azuredatabricks.net",
   "Azure Login url": "https://login.microsoftonline.com",
   "Connection Timeout": "50",
   "Description": "",
   "Centralized": true
}

The following table describes the Azure Databricks connection profile parameters.

Parameter	Description
Tenant ID	Defines the Azure Tenant ID in Azure AD.
Application ID	Defines the application (service principal) ID of the registered application. The service principal must meet the following requirements: It must be an Azure Databricks workspace user and admin. In the Databricks Admin Console, it must appear under users and also under admins. It must be associated with a Contributor or Owner roe.
Client Secret	Defines the client secret (password) associated with the Azure user and the application.You can use Secrets in Code to hide this value in the code.
Databricks url	Defines the URL of your Databricks workspace.
Azure Login url	Defines the Azure AD authentication endpoint base URL.
Connection Timeout	Defines the timeout value, in seconds, for the trigger call made by Control-M to Azure Databricks. Default: 50
Centralized	Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true.

ConnectionProfile:Azure HDInsight

Azure HDInsight enables you to execute an Apache Spark batch job and perform big data analytics.

The following example shows how to define an Azure HDInsight connection profile:

Copy

"AZUREHDINSIGHT":
{
   "Type": "ConnectionProfile:Azure HDInsight",     
   "Cluster Name": "hdcluster", 
   "Cluster Username": "admin",  
   "Cluster Password": "*****", 
   "Description": "",
   "Centralized": true
}

The following table describes the Azure HDInsight connection profile parameters.

Parameter	Description
Cluster Name	Defines the name of the HDInsight cluster to connect to.
Cluster Username	Defines the name of the Administrator to use to connect to Azure HDInsight.
Cluster Password	Defines a password for the Administrator, as configured in Azure HDInsight. You can use Secrets in Code to hide this value in the code.
Centralized	Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true.

ConnectionProfile:Azure Synapse

Azure Synapse enables you to perform data integration and big data analytics.

The following examples show how to define an Azure Synapse connection profile.

This JSON-based connection profile authenticates with an Azure service principal:

Copy

"AZURE_SYNAPSE_1":
{
   "Type": "ConnectionProfile:Azure Synapse",
   "Authentication Method": "PRINCIPAL",
   "Tenant ID": "tenantId",
   "Azure AD url": "https://login.microsoftonline.com",
   "Synapse url": "https://ncu-if-synapse.dev.azuresynapse.net",
   "Synapse Resource": "https://dev.azuresynapse.net/",
   "App ID": "4f477fa3-1a1g-4877-ca92-f39bb563f3b1",
   "Client Secret": "*****",
   "Connection Timeout": "50",
   "Description": "",
   "Centralized": true
}

This JSON-based connection profile authenticates with a managed identity:

Managed identity authentication is based on an Azure token that is valid, by default, for 24 hours. Token lifetime can be extended by Azure.

Copy

"AZURE_SYNAPSE_2":
{
   "Type": "ConnectionProfile:Azure Synapse",
   "Authentication Method": "MANAGEDID",
   "Specify Managed Identity Client ID": "&client_id=",
   "Managed Identity Client ID": "72d448f0-ac32-45ea-9158-f8653e4ee16",
   "Synapse url": "https://ncu-if-synapse.dev.azuresynapse.net",
   "Synapse Resource": "https://dev.azuresynapse.net/",
   "Connection Timeout": "50",
   "Description": "",
   "Centralized": true
}

The following table describes the Azure Synapse connection profile parameters.

Parameter	Description
Authentication Method	Defines one of the following types of authentication to use for the connection with Azure Synapse analytics: PRINCIPAL: Authenticates with a service principal. MANAGEDID: Authenticates with managed identity. To prepare for authentication with each of these methods: Grant your managed identity or service principal access to your Synapse workspace through the Synapse Studio (Manage > Access Control). Assign a contributor role to the Synapse workspace accessed by the managed identity or service principal.
Specify Managed Identity Client ID	(Managed identity) Determines whether the client ID for your managed identity is specified by the Managed Identity Client ID parameter. Include this parameter only if you are using the managed identity authentication method and you have multiple managed identities defined on your Azure virtual machine. Set its value to &client_id=.
Managed Identity Client ID	(Managed identity) Determines which client ID to use as the managed identity. This parameter requires a value only if you have multiple managed identities defined on your Azure virtual machine and you included the Specify Managed Identity Client ID parameter. If you have only one managed identity, it is detected automatically.
Tenant ID	(Service principal authentication) Defines the Azure Tenant ID in Azure AD.
Azure AD url	(Service principal authentication) Defines the Azure AD authentication endpoint base URL.
Synapse url	Defines the workspace development endpoint. https://myworkspace.dev.azuresynapse.net
Synapse Resource	Defines the resource parameter that serves as the identifier for the Azure Synapse login via Azure AD, as follows: https://dev.azuresynapse.net/
App ID	Defines the application (service principal) ID of the registered application for the Azure Synapse service.
Client Secret	(Service principal authentication) Defines the client secret (password) associated with the Azure user and the application. You can use Secrets in Code to hide this value in the code.
Connection Timeout	Defines a timeout value, in seconds, for the trigger call made by Control-M to Azure Synapse Analytics. Default: 50
Centralized	Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true.

ConnectionProfile:Databricks

Databricks enables the integration of jobs created in the Databricks environment with your existing Control-M workflows.

The following example shows how to define a Databricks connection profile:

Copy

"DATABRICKS":
{
   "Type": "ConnectionProfile:Databricks",
   "Databricks workspace url": "https://dbc-7b944b32-faf0.cloud.databricks.com",
   "Databricks personal access token": "*****", 
   "Connection Timeout": "50",
   "Description": "",
   "Centralized": true
}

The following table describes the Databricks connection profile parameters.

Parameter	Description
Databricks workspace url	Defines the URL of your Databricks workspace.
Databricks personal access token	Defines a Databricks token for authentication of connections to the Databricks workspace. You can use Secrets in Code to hide this value in the code.
Connection Timeout	Defines the timeout value, in seconds, for the REST calls made to Databricks. Default: 50
Centralized	Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true.

ConnectionProfile:DBT

dbt (Data Build Tool) is a cloud-based computing platform that enables you to develop, test, schedule, document, and analyze data models.

The following example shows how to define a dbt connection profile:

Copy

"DBT_CP":
{
   "Type": "ConnectionProfile:DBT",
   "DBT URL": "https://cloud.getdbt.com", 
   "DBT Token": "*****",
   "Account ID": "123456",
   "Connection Timeout": "60",
   "Description": "",
   "Centralized": true
}

The following table describes the dbt connection profile parameters.

Parameter	Description
DBT URL	Defines the dbt authentication endpoint, as follows: https://cloud.getdbt.com
DBT Token	Defines the authentication code that is used to create a connection to the dbt platform. This code is located in the API Access section in the dbt cloud platform. You can use Secrets in Code to hide this value in the code.
Account ID	Defines the unique ID that is assigned to your dbt cloud account. This ID is located in the Account Info section in the dbt cloud platform.
Connection Timeout	Determines the number of seconds to wait after Control-M initiates a connection request to dbt before a timeout occurs. Default: 60
Centralized	Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true.

ConnectionProfile:GCP BigQuery

GCP BigQuery is a Google Cloud Platform computing service that you can use for data storage, processing, and analysis.

The following example shows how to define a GCP BigQuery connection profile that authenticates with a service account:

Copy

"BIGQSA":
{
   "Type": "ConnectionProfile:GCP BigQuery",
   "Identity Type": "service_account",
   "Service Account Key": "*****",
   "BigQuery URL": "https://bigquery.googleapis.com",
   "Description": "",
   "Centralized": true
}

The following table describes the GCP BigQuery connection profile parameters.

Parameter	Description
Identity Type	Determines one of the following authentication types using GCP Access Control: service_account: Authenticates with an application ID (service account) and client secret. iam_user: Authenticates based on a detected IAM role, which removes the need to provide additional credentials.
BigQuery URL	Defines the Google Cloud Platform (GCP) authentication endpoint for BigQuery, as follows: https://bigquery.googleapis.com
Service Account Key	(Service account) Defines a service account that is associated with an RSA key pair.You can use Secrets in Code to hide this value in the code.
Centralized	Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true.

ConnectionProfile:GCP DataFlow

Google Cloud Platform (GCP) Dataflow enables you to perform cloud-based data processing for batch and real-time, data-streaming applications.

The following example shows how to define a GCP Dataflow connection profile, based on a service account:

Copy

"GCPDATAFLOW":
{ 
   "Type": "ConnectionProfile:GCP DataFlow", 
   "Identity Type": "service_account", 
   "DataFlow URL": "https://dataflow.googleapis.com", 
   "Service Account Key": "{\"type\":\"service_account\",\"project_id\":\"sso-gcp-dba-ctm1-priv-cc30752\",\"private_key_id\":\"5197d05c5b8212bea944985cec74a34d6c1868aa\",\"private_key\":\"-----BEGIN PRIVATE KEY-----\\nprivate-key\\n-----END PRIVATE KEY-----\\n\",\"client_email\":\"bmc-wla-svc-02@sso-gcp-dba-ctm1-priv-cc30752.iam.gserviceaccount.com\",\"client_id\":\"116650586827623521335\",\"auth_uri\":\"https://accounts.google.com/o/oauth2/auth\",\"token_uri\":\"https://oauth2.googleapis.com/token\",  \"auth_provider_x509_cert_url\":\"https://www.googleapis.com/oauth2/v1/certs\",\"client_x509_cert_url\":\"https://www.googleapis.com/robot/v1/metadata/x509/bmc-wla-svc-02%40sso-gcp-dba-ctm1-priv-cc30752.iam.gserviceaccount.com\"}",
   "Description": "", 
   "Centralized": true 
}

The following table describes the GCP Dataflow connection profile parameters.

Parameter	Description
Identity Type	Determines one of the following authentication types using GCP Access Control: service_account: Authenticates with an application ID (service account) and client secret. os_user: Authenticates based on a detected IAM role, which removes the need to provide additional credentials.
DataFlow URL	Defines the Google Cloud Platform (GCP) authentication endpoint for Dataflow. https://dataflow.googleapis.com
Service Account Key	(Service account) Defines a JSON body that contains the required service account credentials to access GCP. You can use Secrets in Code to hide this value in the code.
Centralized	Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true.

ConnectionProfile:GCP Dataproc

Google Cloud Platform (GCP) Dataproc enables you to perform cloud-based big data processing and machine learning.

The following example shows how to define a GCP Dataproc connection profile that authenticates with a service account:

Copy

"GCPDATAPROC":
{ 
   "Type": "ConnectionProfile:GCP Dataproc", 
   "Identity Type": "service_account", 
   "Dataproc URL": "https://dataproc.googleapis.com", 
   "Service Account Key": "{\"type\":\"service_account\",\"project_id\":\"sso-gcp-dba-ctm1-priv-cc30752\",\"private_key_id\":\"5197d05c5b8212bea944985cec74a34d6c1868aa\",\"private_key\":\"-----BEGIN PRIVATE KEY-----\\nprivate-key\\n-----END PRIVATE KEY-----\\n\",\"client_email\":\"bmc-wla-svc-02@sso-gcp-dba-ctm1-priv-cc30752.iam.gserviceaccount.com\",\"client_id\":\"116650586827623521335\",\"auth_uri\":\"https://accounts.google.com/o/oauth2/auth\",\"token_uri\":\"https://oauth2.googleapis.com/token\",  \"auth_provider_x509_cert_url\":\"https://www.googleapis.com/oauth2/v1/certs\",\"client_x509_cert_url\":\"https://www.googleapis.com/robot/v1/metadata/x509/bmc-wla-svc-02%40sso-gcp-dba-ctm1-priv-cc30752.iam.gserviceaccount.com\"}",
   "Connection timeout": "20",
   "Description": "", 
   "Centralized": true 
}

The following table describes the GCP Dataproc connection profile parameters.

Parameter	Description
Identity Type	Determines one of the following authentication types using GCP Access Control: service_account: Authenticates with an application ID (service account) and client secret. os_user: Authenticates based on a detected IAM role, which removes the need to provide additional credentials.
Dataproc URL	Defines the Google Cloud Platform (GCP) authentication endpoint for Dataproc. https://dataproc.googleapis.com
Service Account Key	(Service account) Defines a JSON body that contains the required service account credentials to access GCP. You can use Secrets in Code to hide this value in the code.
Connection timeout	Defines a timeout value, in seconds, for the trigger call to the Google Cloud Platform. Default: 20
Centralized	Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true.

ConnectionProfile:Hadoop

The Hadoop job connects to the Hadoop framework, which enables you to split up and process large data sets on clusters of commodity servers. You can expand your enterprise business workflows to include tasks that execute in your Big Data Hadoop cluster in Control-M with the different Hadoop-supported tools, including Pig, Hive, HDFS File Watcher, Map Reduce Jobs, and Sqoop.

The following examples show how to define a Hadoop connection profile for various types of Hadoop jobs.

Hadoop (All Types)

The following example defines the parameters required for all Hadoop types:

Copy

"HADOOP_CONNECTION_PROFILE":
{
   "Type": "ConnectionProfile:Hadoop",
   "Centralized": true
}

The following table describes the Hadoop connection profile parameters.

Parameter	Description
Centralized	Defines whether to store the connection profile in a centralized location in the Control-M database, so that it is available to all Agents. You must set this parameter to true.

Parameter

Description

Centralized

Defines whether to store the connection profile in a centralized location in the Control-M database, so that it is available to all Agents.

You must set this parameter to true.

The following example defines the optional parameters for defining the user running the Hadoop job types and choosing between a local or centralized connection profile:

Copy

"HADOOP_CONNECTION_PROFILE":
{
   "Type": "ConnectionProfile:Hadoop",
   "Centralized": true,
   "RunAs": "",
   "KeyTabPath":"",
}

The following table describes the Hadoop connection profile parameters shown in the example above.

Parameter	Description
RunAs	Defines the user of the account on which to run Hadoop jobs. Leave this field empty to run Hadoop jobs with the user account where the Agent was installed. The Agent must run as root, if you define a specific RunAs user.

Parameter

Description

RunAs

Defines the user of the account on which to run Hadoop jobs.

Leave this field empty to run Hadoop jobs with the user account where the Agent was installed.

The Agent must run as root, if you define a specific RunAs user.

The following table describes the parameters that control security in the case of Kerberos security.

Parameter	Description
RunAs	Defines the Principal name of the user.
KeyTabPath	Defines the Keytab file path for the target user.

Apache Spark

Apache Spark enables you to define access to a Spark server:

The following example defines an Apache Spark connection profile:

Copy

"SPARK_CONNECTION_PROFILE" :
{
   "Type": "ConnectionProfile:Hadoop",
   "Centralized": true,
   "Spark":
   {
      "CustomPath": "/home"
   }
}

The CustomPath parameter is optional.

Apache Oozie

Apache Oozie enables you to access an Oozie server for a job that submits an Oozie workflow.

The following example defines an Apache Oozie connection profile:

Copy

"OOZIE_CONNECTION_PROFILE" :
{
   "Type": "ConnectionProfile:Hadoop",
   "Centralized": true,
   "Oozie":
   {
      "SslEnabled": false,
      "Host": "hdp-centos",
      "Port": "11000",
      "ExtractionRules": [
      {
         "RuleName": "rule_name1",
         "WorkFlowName": "work_flow_name1",
         "WorkFlowUserName": "work_flow_user_name1",
         "FolderName": "folder_name1",
         "JobName": "job_name1"
      },
      {
         "RuleName": "rule_name2",
         "WorkFlowName": "work_flow_name2",
         "WorkFlowUserName": "work_flow_user_name2",
         "FolderName": "folder_name2",
         "JobName": "job_name2"
      } ]
   }
}

The following table describes the Apache Oozie connection profile parameters.

Parameter	Description
Host	Defines the Oozie server host.
Port	Defines the Oozie server port. Default: 11,000
SslEnabled	Valid Values: true false Default: false
ExtractionRules	(Optional) Defines the rules for filtering Oozie workflows. Each rule has the following definitions:
RuleName	Defines the name of the rule.
WorkFlowName	Defines the name of the Oozie workflow to get from the Oozie server.
WorkFlowUserName	Defines the name of the user that runs the workflows from the Oozie server.
FolderName	Defines the name of the folder that contains the Hadoop job of the Oozie Extractor, as defined in the Hadoop job template.
JobName	Defines the name of the Hadoop job of the Oozie Extractor, as defined in the Hadoop job template.

Apache Sqoop

Apache Sqoop enables you to run a Sqoop job.

The following example shows a connection profile that defines a Sqoop data source and access credentials:

Copy

"SQOOP_CONNECTION_PROFILE" :
{
   "Type": "ConnectionProfile:Hadoop",
   "Centralized": true,
   "Sqoop":
   {
      "User": "username",
      "Password": "userpassword",
      "ConnectionString": "jdbc:mysql://mysql.server/database",
      "DriverClass": "com.mysql.jdbc.Driver"
   }
}

The following table describes the Sqoop connection profile parameters shown in the example above, as well as several additional optional parameters.

Parameter	Description
User	Defines a database user connected to the Sqoop server.
Password	Defines a password for the specified user. To update an existing connection profile and keep the current password, type five, as follows: ****
ConnectionString	JDBC-compliant database: Defines the connection string used to connect to the database.
DriverClass	JDBC-compliant database: Defines the driver class for the driver .jar file, which indicates the entry-point to the driver.
PasswordFile	(Optional) Defines the full path to a file located on the HDFS that contains the password to the database. To use a JCEKS file, include the .jceks file extension.
DatabaseVendor	(Optional) Defines the database vendor of an automatically supported database used with Sqoop, one of the following: MySQL Oracle (SID) Oracle (Service Name) PostgreSQL
DatabaseName	(Optional) Defines the name of an automatically supported database used with Sqoop.
DatabaseHost	(Optional) Defines the host server of an automatically supported database used with Sqoop.
DatabasePort	(Optional) Defines the port number for an automatically supported database used with Sqoop.

Apache Tajo

Tajo is an advanced data warehousing system on top of HDFS.

The following example shows a connection profile that defines access to a Tajo server:

Copy

"TAJO_CP":
{
   "Type": "ConnectionProfile:Hadoop",
   "Centralized": true,
   "Tajo":
   {
      "BinaryPath": "$TAJO_HOME/bin/",
      "DatabaseName": "myTajoDB",
      "MasterServerName": "myTajoServer",
      "MasterServerPort": "26001"
   }
}

The following table describes the Tajo connection profile parameters.

Parameter	Description
BinaryPath	Defines the path to the bin directory where tsql utility is located.
DatabaseName	Defines the name of the Tajo database.
MasterServerName	Defines the Host name of the server where the Tajo master is running.
MasterServerPort	Defines the Tajo master port number.

Apache Hive

Apache Hive enables you to run a Hive BeeLine job.

The following examples show how to define a Hadoop Hive connection profile.

This JSON defines a connection profile with a Hive BeeLine endpoint and access credentials:

The parameters in the example translate to the following BeeLine command:

beeline -u jdbc:hive2://<Host>:<Port>/<DatabaseName>

Copy

"HIVE_CONNECTION_PROFILE" :
{
   "Type": "ConnectionProfile:Hadoop",
   "Centralized": true,
   "Hive":
   {
      "Host": "hive_host_name",
      "Port": "10000",
      "DatabaseName": "hive_database",
   }
}

This JSON defines a connection profile with optional parameters for a Hadoop Hive type connection profile:

The parameters in the example translate to the following BeeLine command:

beeline -u jdbc:hive2://<Host>:<Port>/<DatabaseName>;principal=<Principal> -n <User> -p <Password>

Copy

"HIVE_CONNECTION_PROFILE1":
{
   "Type": "ConnectionProfile:Hadoop",
   "Centralized": true,
   "Hive":
   {
      "Host": "hive_host_name",
      "Port": "10000",
      "DatabaseName": "hive_database",
      "User": "user_name",
      "Password": "user_password",
      "Principal": "Server_Principal_of_HiveServer2@Realm"
   }
}

To update an existing connection profile and keep the current password, type five *, as follows:

*****

ConnectionProfile:OCI Data Flow

Oracle Cloud Infrastructure (OCI) Data Flow is a fully managed Apache Spark service that performs processing tasks on extremely large datasets.

The following examples show how to define an OCI Data Flow connection profile.

This JSON-based connection profile authenticates with defined parameters:

Copy

"OCI_DATAFLOW": 
{
   "Type": "ConnectionProfile:OCI Data Flow",
   "OCI Data Flow URL": "https://dataflow.region.oci.oraclecloud.com",
   "OCI Region": "us-phoenix-1",
   "Authentication": "DefineParameters",
   "User OCID": "ocid1.user.oc1..aaaaaaaasxjplkxcnplaxxxxyutitixxxxxxxxxxxxxxxxxxxxxxxxxxxx",
   "Tenancy OCID": "ocid1.tenancy.oc1..aaaaaaaak4xxxhtutyuyxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",   
   "Fingerprint": "dc:50:2d:e1:ax:af:x7:x6:xe:x9:ax:cx:cb:x3:x3:6x",
   "Private Key": " ----BEGIN PRIVATE KEY---- XXXXXXXXXX ----END PRIVATE KEY---- ",
   "Connection Timeout": "30",
   "Description": "",
   "Centralized": true
}

This JSON-based connection profile authenticates with a configuration file:

Copy

"OCI_ DATAFLOW_CF": 
{
   "Type": " ConnectionProfile:OCI Data Flow",
   "OCI Data Flow URL": "https://dataflow.region.oci.oraclecloud.com",
   "Authentication": "ConfigurationFile",
   "Config File Path": "\home\dbauser\config.example", 
   "Profile": "Default",
   "Connection Timeout": "30",
   "Description": "",
   "Centralized": true
}

The following table describes the OCI Data Flow connection profile parameters.

Parameter	Authentication Method	Description
OCI Data Flow URL	All methods	Defines the OCI Data Flow URL in the following format: https://dataflow.<region>.oci.oraclecloud.com/20200129
OCI Region	All methods	Determines the region where OCI Data Flow is located. ap-melbourne-1 eu-madrid-1
Authentication	NA	Determines one of the following authentication methods: DefineParameters: Uses the authentication parameters defined in the connection profile. ConfigurationFile: Uses a configuration file that contains authentication information and is stored on the Control-M/Agent. The following example of a configuration file defines the DEFAULT profile for Linux and the PROFILE2 profile for Windows. Copy [DEFAULT] user=ocid1.user.oc1..aaaaaaaa4vcihdfhrdtyry457245636cqqcljd6yrcukszg7gzoymoyvkyupivpjfnq tenancy=ocid1.tenancy.oc1..aaa456y4e3yrtyue9f8djfihhwp2cu4e6t2b7lttna7rcgnhrdi4qzika fingerprint=9f:af:df:f5:5g:95:92:7c:34:ab:46:d3:b4:30:e6:9e region=us-phoenix-1 key_file=/home/dbauser/key.pem [PROFILE2] user=ocid1.user.oc1..aaaaaaaa4v768679dfhrd8989JHGJG36cqqcljd6yrcukszg7gzoymoyvkyupivpjfnq tenancy=ocid1.tenancy.oc1..aaa456y4e3yrtyue987erum,gfwp2cu4e6t2b7lttna7rcgnhrdi4qzika fingerprint=9f:af:c0:f5:7b:95:92:7c:03:a5:46:g3:b4:38:e6:9e region=us-phoenix-1 key_file=C:\\Users\\dbauser\\key.pem
User OCID	Defined Parameters	Defines an individual user within the OCI environment.
Tenancy OCID	Defined Parameters	Defines the OCI Tenancy ID in an OCI Data Flow, which is a global unique identifier for this account within the OCI environment.
Fingerprint	Defined Parameters	Defines a fingerprint which uniquely identifies and verifies the integrity of the associated certificate or key.
Private Key	Defined Parameters	Defines the Private key (critical component) within a set of API signing keys that are used for authentication and secure access to OCI resources. You can use Secrets in Code to hide this value in the code.
Config File Path	Configuration File	Defines the path to the configuration file that contains authentication information. This file is stored on the Control-M/Agent. UNIX: home/user1/config/pem.pem Windows: C:\Users\user1\config\pem.pem
Profile	Configuration File	Defines the name of a specific section in the configuration file, such as DEFAULT or PROFILE2 in the Configuration File code sample.
Connection Timeout	All methods	Determines the number of seconds to wait after Control-M initiates a connection request to OCI Data Flow before a timeout occurs. Default: 30
Centralized	All methods	Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true.

ConnectionProfile:Snowflake

Snowflake is a cloud-computing platform that enables you to process, analyze, and store your data.

The following example shows how to define a Snowflake connection profile:

This connection profile uses token-based authentication. To authenticate with an Identity Provider (IdP), see ConnectionProfile:Snowflake IdP.

Copy

"SNOWFLAKE_CONNECTION_PROFILE":
{ 
   "Type": "ConnectionProfile:Snowflake", 
   "Account Identifier": "{Account_ID}",  
   "Region": "us-east-1",
   "Client ID": "DuHj****************",
   "Client Secret": "*****",
   "Refresh Token": "ver%******************",   
   "Redirect URI": "https%****************” 
   "Description": "", 
   "Centralized": true 
}

The following table describes the Snowflake connection profile parameters.

Parameter	Description
Account Identifier	Defines the Snowflake account identifier. To obtain this string, run the Describe Security Integration command in Snowflake and copy the initial string from one of the authorization properties. OAUTH_AUTHORIZATION_ENDPOINThas the following value: https://abc123.us-east-1.snowflakecomputing.com/oauth/authorize In this value, the account identifier is the following string: abc123 For more information about obtaining values for the parameters required by the connection profile, see Setting Up a Snowflake API Connection.
Region	Determines the region where the Snowflake jobs are located. us-east-1
Client ID	Defines the client ID assigned to the account in the Snowflake integration setup.
Client Secret	Defines the client secret assigned to the account in the Snowflake integration setup. You can use Secrets in Code to hide this value in the code.
Refresh Token	Defines the value for the refresh token. This string must be URL-encoded.
Redirect URI	Defines the redirect URI assigned to the account in the Snowflake integration setup. This string must be URL-encoded.
Centralized	Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true.

ConnectionProfile:Snowflake IdP

Snowflake is a cloud-computing platform that enables you to process, analyze, and store your data, with authentication based on an Identity Provider (IdP).

The following example shows how to define a Snowflake connection profile with authentication based on an Identity Provider (IdP):

This connection profile authenticates with an Identity Provider (IdP). To use token-based authentication, see ConnectionProfile:DBT.

Copy

"SNOWFLAKE_IDP_CONNECTION_PROFILE":
{
   "Type": "ConnectionProfile:Snowflake IdP",
   "Account Identifier": "{Account_ID}",
   "Region": "us-east-1", 
   "Client ID": "DuHj****************", 
   "Client Secret": "*****",
   "IDP URL": "https://****************",
   "Scope": "session:role:<custom_role>", 
   "Description": "",
   "Centralized": true
}

The following table describes the Snowflake connection profile parameters.

Parameter	Description
Account Identifier	Defines the Snowflake account identifier. To obtain this string, run the Describe Security Integration command in Snowflake and copy the initial string from one of the authorization properties. EXTERNAL_OAUTH_AUDIENCE_LIST has the following value: https://abc123.us-east-1.snowflakecomputing.com abc123 is the account identifier. For information about the values for the parameters required by the connection profile, see the IdP-specific External OAuth configuration instructions in the Snowflake documentation.
Region	Determines the region where the Snowflake jobs are located. us-east-1
Client ID	Defines the client ID assigned to the account in the Snowflake integration setup.
Client Secret	Defines the client secret assigned to the account in the Snowflake integration setup. You can use Secrets in Code to hide this value in the code.
IDP URL	Defines the authentication endpoint for Snowflake IdP.
Scope	Defines the scope, which limits the operations you can do and the roles you can use in the Snowflake IdP plug-in, as follows: session:role:<custom_role> session:role:sysadmin
Centralized	Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true.