Data Processing and Analytics Connection Profiles
The following topics describe connection profiles for data processing and analytics platforms and services:
ConnectionProfile:AWS Athena
AWS Athena enables you to process, analyze, and store your data in the cloud.
The following examples show how to define an AWS Athena connection profile.
-
This JSON-based connection profile authenticates with an AWS access key and secret:
Copy"AWS_ATHENA":
{
"Type": "ConnectionProfile:AWS Athena",
"AWS API Base URL": "https://athena.us-east-1.amazonaws.com",
"AWS Region": "us-east-1",
"Authentication": "SECRET",
"AWS Access Key": "ABCDEF",
"AWS Secret Key": "******",
"Description": "",
"Centralized": true
} -
This JSON-based connection profile authenticates with an AWS IAM role from inside an EC2 instance:
Copy"AWS_ATHENA":
{
"Type": "ConnectionProfile:AWS Athena",
"AWS API Base URL": "https://athena.us-east-1.amazonaws.com",
"AWS Region": "us-east-1",
"Authentication": "NOSECRET",
"IAM Role": "ATHENAIAMROLE",
"Description": "",
"Centralized": true
}
The following table describes the AWS Athena connection profile parameters.
Parameter |
Description |
---|---|
AWS API Base URL |
Determines the authentication endpoint for AWS Athena, as follows: https://athena.<AWSRegion>.amazonaws.com For more information about regional endpoints available for the AWS Athena service, refer to the AWS documentation. |
AWS Region |
Determines the region where the AWS Athena jobs are located. |
Authentication |
Determines one of the following authentication methods:
|
AWS Access Key |
Defines the AWS Athena account access key. |
AWS Secret Key |
Defines the AWS Athena account secret access key. You can use Secrets in Code to hide this value in the code. |
IAM Role |
Defines the Identity and Access Management (IAM) role for the AWS Athena connection. |
Connection Timeout |
Determines the number of seconds to wait after Control-M initiates a connection request to AWS Athena before a timeout occurs. Default: 20 |
Centralized |
Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true. Valid Values:
Default: false |
ConnectionProfile:AWS Data Pipeline
AWS Data Pipeline is a cloud-based extract, transform, load (ETL) service that enables you to automate the transfer, processing, and storage of your data.
The following examples show how to define an AWS Data Pipeline connection profile.
-
This JSON-based connection profile authenticates with an AWS access key and secret:
Copy"AWSDATAPIPELINESECRET ":
{
"Type": "ConnectionProfile:AWS Data Pipeline",
"Authentication": "SECRET",
"AWS Access Key": "MYAWSACCESSKEY1234",
"AWS Secret": "myAwsSecret12345",
"AWS Region": "us-east-1",
"Data Pipeline URL": "https://datapipeline.{{AWSRegion}}.amazonaws.com",
"Connection Timeout": "30",
"Description": "",
"Centralized": true
} -
This JSON-based connection profile authenticates with an AWS IAM role from inside an EC2 instance:
Copy"AWSDATAPIPELINEIAM":
{
"Type": "ConnectionProfile:AWS Data Pipeline",
"Authentication": "NOSECRET",
"IAM Role": "IAMROLENAME",
"AWS Region": "us-east-1",
"Data Pipeline URL": "https://datapipeline.{{AWSRegion}}.amazonaws.com",
"Connection Timeout": "30",
"Description": "",
"Centralized": true
}
The following table describes the AWS Data Pipeline connection profile parameters.
Parameter |
Description |
---|---|
Authentication |
Determines one of the following authentication methods for the connection with AWS Data Pipeline:
|
AWS Access Key |
(SECRET authentication) Defines the AWS Data Pipeline account access key. |
AWS Secret |
(SECRET authentication) Defines the AWS Data Pipeline account secret access key. You can use Secrets in Code to hide this value in the code. |
IAM Role |
(NOSECRET authentication) Defines the Identity and Access Management (IAM) role for connection to AWS. |
AWS Region |
Determines the region where the AWS Data Pipeline jobs are located. |
AWS API Base URL |
Defines the REST API URL for the AWS Data Pipeline regional endpoint, as follows: https://datapipeline.<AWS_Region>.amazonaws.com For more information about regional endpoints available for the AWS Data Pipeline service, refer to the AWS documentation. |
Connection Timeout |
Determines the number of seconds to wait before a timeout occurs after Control-M initiates a request to AWS Data Pipeline. Default: 30 |
Centralized |
Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true. Valid Values:
Default: false |
ConnectionProfile:AWS DynamoDB
AWS DynamoDB is a NoSQL database service that enables you to create database tables, execute statements and transactions, export and import data to and from the Amazon S3 storage service.
For more information, see
The following examples show how to define an AWS DynamoDB connection profile.
-
This JSON-based connection profile authenticates with AWS Key & Secret:
Copy"AWS_DynamoDB":
{
"Type": "ConnectionProfile:AWS DynamoDB",
"Authentication": "Secret",
"AWS Secret": "*****",
"AWS Region": "us-east-1",
"AWS Access Key": "ZKIATY7B2LKB2JQ85I6D",
"AWS Backup URL": "https://dynamodb.{{AWSRegion}}.amazonaws.com",
"Description": "",
"Connection Timeout": "100",
"Centralized": true
} -
This JSON-based connection profile authenticates with IAM Role:
Copy"AWS_ADY_IAM":
{
"Type": "ConnectionProfile:AWS DynamoDB",
"Authentication": "NoSecret",
"AWS Region": "us-east-1",
"AWS Backup URL": "https://dynamodb.{{AWSRegion}}.amazonaws.com",
"IAM Role": "arn:aws:iam::122343212345:role/Amazon12SSMRoleForInstancesQuickSetup",
"Description": "",
"Connection Timeout": "50",
"Centralized": true
}
The following table describes the AWS DynamoDB Connection Profile Parameters.
Parameter |
Description |
---|---|
AWS DynamoDB Login URL |
Determines the AWS DynamoDB authentication endpoint base URL that includes the region that is defined for the AWS account. https://dynamodb.<us-east-1>.amazonaws.com |
AWS Region |
Determines the region where the AWS DynamoDB jobs are located. us-east-1 |
Authentication |
Determines one of the following authentication methods:
|
AWS Access Key |
Defines the AWS Access Key account access key. |
AWS Secret |
Defines the AWS Secret account secret access key. You can use Secrets in Code to hide this value in the code. |
IAM Role |
Defines the Identity and Access Management (IAM) role for the AWS DynamoDB connection. |
Connection Timeout |
Determines the number of seconds to wait after Control-M initiates a connection request to AWS Backup before a timeout occurs. Default: 20 |
Centralized |
Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true. Valid Values:
Default: false |
ConnectionProfile:AWS EMR
AWS EMR is a managed cluster platform that enables you to execute big data frameworks, such as Apache Hadoop and Apache Spark, to process and analyze vast amounts of data.
The following example shows how to define an AWS EMR connection profile:
"AWS_EMR":
{
"Type": "ConnectionProfile:AWS EMR",
"AWSRegion": "us-east-1",
"EMRAccessKey": “ABCDEF",
"EMRSecretKey": "****",
"Description": "",
"Centralized": true
}
The following table describes the AWS EMR connection profile parameters.
Parameter |
Description |
---|---|
AWSRegion |
Determines the AWS region. |
EMRAccessKey |
Defines the token for the connection to AWS. |
EMRSecretKey |
Defines an additional security token for AWS. You can use Secrets in Code to hide this value in the code. |
Centralized |
Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true. Valid Values:
Default: false |
ConnectionProfile:Azure Databricks
Azure Databricks is a cloud-based data analytics platform that enables you to process and analyze large workloads of data.
The following example shows how to define an Azure Databricks connection profile:
"ADF_SERVPRINC":
{
"Type": "ConnectionProfile:Azure Databricks",
"Tenant ID": "tenantId",
"Application ID": "4f477fa3-1a1g-4877-ca92-f39bb563f3b1",
"Client Secret": "*****",
"Databricks url": "https://adb-1111211144444680.0.azuredatabricks.net",
"Azure Login url": "https://login.microsoftonline.com",
"Connection Timeout": "50",
"Description": "",
"Centralized": true
}
The following table describes the Azure Databricks connection profile parameters.
Parameter |
Description |
---|---|
Tenant ID |
Defines the Azure Tenant ID in Azure AD. |
Application ID |
Defines the application (service principal) ID of the registered application. The service principal must meet the following requirements:
|
Client Secret |
Defines the client secret (password) associated with the Azure user and the application.You can use Secrets in Code to hide this value in the code. |
Databricks url |
Defines the URL of your Databricks workspace. |
Azure Login url |
Defines the Azure AD authentication endpoint base URL. |
Connection Timeout |
Defines the timeout value, in seconds, for the trigger call made by Control-M to Azure Databricks. Default: 50 |
Centralized |
Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true. Valid Values:
Default: false |
ConnectionProfile:Azure HDInsight
Azure HDInsight enables you to execute an Apache Spark batch job and perform big data analytics.
The following example shows how to define an Azure HDInsight connection profile:
"AZUREHDINSIGHT":
{
"Type": "ConnectionProfile:Azure HDInsight",
"Cluster Name": "hdcluster",
"Cluster Username": "admin",
"Cluster Password": "*****",
"Description": "",
"Centralized": true
}
The following table describes the Azure HDInsight connection profile parameters.
Parameter |
Description |
---|---|
Cluster Name |
Defines the name of the HDInsight cluster to connect to. |
Cluster Username |
Defines the name of the Administrator to use to connect to Azure HDInsight. |
Cluster Password |
Defines a password for the Administrator, as configured in Azure HDInsight. You can use Secrets in Code to hide this value in the code. |
Centralized |
Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true. Valid Values:
Default: false |
ConnectionProfile:Azure Synapse
Azure Synapse enables you to perform data integration and big data analytics.
The following examples show how to define an Azure Synapse connection profile.
-
This JSON-based connection profile authenticates with an Azure service principal:
Copy"AZURE_SYNAPSE_1":
{
"Type": "ConnectionProfile:Azure Synapse",
"Authentication Method": "PRINCIPAL",
"Tenant ID": "tenantId",
"Azure AD url": "https://login.microsoftonline.com",
"Synapse url": "https://ncu-if-synapse.dev.azuresynapse.net",
"Synapse Resource": "https://dev.azuresynapse.net/",
"App ID": "4f477fa3-1a1g-4877-ca92-f39bb563f3b1",
"Client Secret": "*****",
"Connection Timeout": "50",
"Description": "",
"Centralized": true
} -
This JSON-based connection profile authenticates with a managed identity:
Managed identity authentication is based on an Azure token that is valid, by default, for 24 hours. Token lifetime can be extended by Azure.
Copy"AZURE_SYNAPSE_2":
{
"Type": "ConnectionProfile:Azure Synapse",
"Authentication Method": "MANAGEDID",
"Specify Managed Identity Client ID": "&client_id=",
"Managed Identity Client ID": "72d448f0-ac32-45ea-9158-f8653e4ee16",
"Synapse url": "https://ncu-if-synapse.dev.azuresynapse.net",
"Synapse Resource": "https://dev.azuresynapse.net/",
"Connection Timeout": "50",
"Description": "",
"Centralized": true
}
The following table describes the Azure Synapse connection profile parameters.
Parameter |
Description |
---|---|
Authentication Method |
Defines one of the following types of authentication to use for the connection with Azure Synapse analytics:
To prepare for authentication with each of these methods:
|
Specify Managed Identity Client ID |
(Managed identity) Determines whether the client ID for your managed identity is specified by the Managed Identity Client ID parameter. Include this parameter only if you are using the managed identity authentication method and you have multiple managed identities defined on your Azure virtual machine. Set its value to &client_id=. |
Managed Identity Client ID |
(Managed identity) Determines which client ID to use as the managed identity. This parameter requires a value only if you have multiple managed identities defined on your Azure virtual machine and you included the Specify Managed Identity Client ID parameter. If you have only one managed identity, it is detected automatically. |
Tenant ID |
(Service principal authentication) Defines the Azure Tenant ID in Azure AD. |
Azure AD url |
(Service principal authentication) Defines the Azure AD authentication endpoint base URL. |
Synapse url |
Defines the workspace development endpoint. https://myworkspace.dev.azuresynapse.net |
Synapse Resource |
Defines the resource parameter that serves as the identifier for the Azure Synapse login via Azure AD, as follows: https://dev.azuresynapse.net/ |
App ID |
Defines the application (service principal) ID of the registered application for the Azure Synapse service. |
Client Secret |
(Service principal authentication) Defines the client secret (password) associated with the Azure user and the application. You can use Secrets in Code to hide this value in the code. |
Connection Timeout |
Defines a timeout value, in seconds, for the trigger call made by Control-M to Azure Synapse Analytics. Default: 50 |
Centralized |
Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true. Valid Values:
Default: false |
ConnectionProfile:Databricks
Databricks enables the integration of jobs created in the Databricks environment with your existing Control-M workflows.
The following example shows how to define a Databricks connection profile:
"DATABRICKS":
{
"Type": "ConnectionProfile:Databricks",
"Databricks workspace url": "https://dbc-7b944b32-faf0.cloud.databricks.com",
"Databricks personal access token": "*****",
"Connection Timeout": "50",
"Description": "",
"Centralized": true
}
The following table describes the Databricks connection profile parameters.
Parameter |
Description |
---|---|
Databricks workspace url |
Defines the URL of your Databricks workspace. |
Databricks personal access token |
Defines a Databricks token for authentication of connections to the Databricks workspace. You can use Secrets in Code to hide this value in the code. |
Connection Timeout |
Defines the timeout value, in seconds, for the REST calls made to Databricks. Default: 50 |
Centralized |
Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true. Valid Values:
Default: false |
ConnectionProfile:DBT
dbt (Data Build Tool) is a cloud-based computing platform that enables you to develop, test, schedule, document, and analyze data models.
The following example shows how to define a dbt connection profile:
"DBT_CP":
{
"Type": "ConnectionProfile:DBT",
"DBT URL": "https://cloud.getdbt.com",
"DBT Token": "*****",
"Account ID": "123456",
"Connection Timeout": "60",
"Description": "",
"Centralized": true
}
The following table describes the dbt connection profile parameters.
Parameter |
Description |
---|---|
DBT URL |
Defines the dbt authentication endpoint, as follows: https://cloud.getdbt.com |
DBT Token |
Defines the authentication code that is used to create a connection to the dbt platform. This code is located in the API Access section in the dbt cloud platform. You can use Secrets in Code to hide this value in the code. |
Account ID |
Defines the unique ID that is assigned to your dbt cloud account. This ID is located in the Account Info section in the dbt cloud platform. |
Connection Timeout |
Determines the number of seconds to wait after Control-M initiates a connection request to dbt before a timeout occurs. Default: 60 |
Centralized |
Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true. Valid Values:
Default: false |
ConnectionProfile:GCP BigQuery
GCP BigQuery is a Google Cloud Platform computing service that you can use for data storage, processing, and analysis.
The following example shows how to define a GCP BigQuery connection profile that authenticates with a service account:
"BIGQSA":
{
"Type": "ConnectionProfile:GCP BigQuery",
"Identity Type": "service_account",
"Service Account Key": "*****",
"BigQuery URL": "https://bigquery.googleapis.com",
"Description": "",
"Centralized": true
}
The following table describes the GCP BigQuery connection profile parameters.
Parameter |
Description |
---|---|
Identity Type |
Determines one of the following authentication types using GCP Access Control:
|
BigQuery URL |
Defines the Google Cloud Platform (GCP) authentication endpoint for BigQuery, as follows: https://bigquery.googleapis.com |
Service Account Key |
(Service account) Defines a service account that is associated with an RSA key pair.You can use Secrets in Code to hide this value in the code. |
Centralized |
Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true. Valid Values:
Default: false |
ConnectionProfile:GCP DataFlow
Google Cloud Platform (GCP) Dataflow enables you to perform cloud-based data processing for batch and real-time, data-streaming applications.
The following example shows how to define a GCP Dataflow connection profile, based on a service account:
"GCPDATAFLOW":
{
"Type": "ConnectionProfile:GCP DataFlow",
"Identity Type": "service_account",
"DataFlow URL": "https://dataflow.googleapis.com",
"Service Account Key": "{\"type\":\"service_account\",\"project_id\":\"sso-gcp-dba-ctm1-priv-cc30752\",\"private_key_id\":\"5197d05c5b8212bea944985cec74a34d6c1868aa\",\"private_key\":\"-----BEGIN PRIVATE KEY-----\\nprivate-key\\n-----END PRIVATE KEY-----\\n\",\"client_email\":\"bmc-wla-svc-02@sso-gcp-dba-ctm1-priv-cc30752.iam.gserviceaccount.com\",\"client_id\":\"116650586827623521335\",\"auth_uri\":\"https://accounts.google.com/o/oauth2/auth\",\"token_uri\":\"https://oauth2.googleapis.com/token\", \"auth_provider_x509_cert_url\":\"https://www.googleapis.com/oauth2/v1/certs\",\"client_x509_cert_url\":\"https://www.googleapis.com/robot/v1/metadata/x509/bmc-wla-svc-02%40sso-gcp-dba-ctm1-priv-cc30752.iam.gserviceaccount.com\"}",
"Description": "",
"Centralized": true
}
The following table describes the GCP Dataflow connection profile parameters.
Parameter |
Description |
---|---|
Identity Type |
Determines one of the following authentication types using GCP Access Control:
|
DataFlow URL |
Defines the Google Cloud Platform (GCP) authentication endpoint for Dataflow. https://dataflow.googleapis.com |
Service Account Key |
(Service account) Defines a JSON body that contains the required service account credentials to access GCP. You can use Secrets in Code to hide this value in the code. |
Centralized |
Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true. Valid Values:
Default: false |
ConnectionProfile:GCP Dataproc
Google Cloud Platform (GCP) Dataproc enables you to perform cloud-based big data processing and machine learning.
The following example shows how to define a GCP Dataproc connection profile that authenticates with a service account:
"GCPDATAPROC":
{
"Type": "ConnectionProfile:GCP Dataproc",
"Identity Type": "service_account",
"Dataproc URL": "https://dataproc.googleapis.com",
"Service Account Key": "{\"type\":\"service_account\",\"project_id\":\"sso-gcp-dba-ctm1-priv-cc30752\",\"private_key_id\":\"5197d05c5b8212bea944985cec74a34d6c1868aa\",\"private_key\":\"-----BEGIN PRIVATE KEY-----\\nprivate-key\\n-----END PRIVATE KEY-----\\n\",\"client_email\":\"bmc-wla-svc-02@sso-gcp-dba-ctm1-priv-cc30752.iam.gserviceaccount.com\",\"client_id\":\"116650586827623521335\",\"auth_uri\":\"https://accounts.google.com/o/oauth2/auth\",\"token_uri\":\"https://oauth2.googleapis.com/token\", \"auth_provider_x509_cert_url\":\"https://www.googleapis.com/oauth2/v1/certs\",\"client_x509_cert_url\":\"https://www.googleapis.com/robot/v1/metadata/x509/bmc-wla-svc-02%40sso-gcp-dba-ctm1-priv-cc30752.iam.gserviceaccount.com\"}",
"Connection timeout": "20",
"Description": "",
"Centralized": true
}
The following table describes the GCP Dataproc connection profile parameters.
Parameter |
Description |
---|---|
Identity Type |
Determines one of the following authentication types using GCP Access Control:
|
Dataproc URL |
Defines the Google Cloud Platform (GCP) authentication endpoint for Dataproc. https://dataproc.googleapis.com |
Service Account Key |
(Service account) Defines a JSON body that contains the required service account credentials to access GCP. You can use Secrets in Code to hide this value in the code. |
Connection timeout |
Defines a timeout value, in seconds, for the trigger call to the Google Cloud Platform. Default: 20 |
Centralized |
Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true. Valid Values:
Default: false |
ConnectionProfile:Hadoop
The Hadoop job connects to the Hadoop framework, which enables you to split up and process large data sets on clusters of commodity servers. You can expand your enterprise business workflows to include tasks that execute in your Big Data Hadoop cluster in Control-M with the different Hadoop-supported tools, including Pig, Hive, HDFS File Watcher, Map Reduce Jobs, and Sqoop.
The following examples show how to define a Hadoop connection profile for various types of Hadoop jobs.
Hadoop (All Types)
The following example defines the parameters required for all Hadoop types:
"HADOOP_CONNECTION_PROFILE":
{
"Type": "ConnectionProfile:Hadoop",
"TargetAgent": "edgenode",
"TargetCTM": "CTMHost"
}
The following table describes the Hadoop connection profile parameters.
Parameter |
Description |
---|---|
TargetAgent |
Determines the Agent where the connection profile deploys. |
TargetCTM |
Determines the Control-M/Server where the connection profile deploys. If there is only one Control-M/Server, it defaults to that Control-M/Server. |
The following example defines the optional parameters for defining the user running the Hadoop job types and choosing between a local or centralized connection profile:
"HADOOP_CONNECTION_PROFILE":
{
"Type" : "ConnectionProfile:Hadoop",
"TargetAgent" : "edgenode",
"TargetCTM" : "CTMHost",
"RunAs": "",
"KeyTabPath":"",
"Centralized":true
}
The following table describes the Hadoop connection profile parameters shown in the example above.
Parameter |
Description |
---|---|
RunAs |
Defines the user of the account on which to run Hadoop jobs. Leave this field empty to run Hadoop jobs with the user account where the Agent was installed. The Agent must run as root, if you define a specific RunAs user. |
Centralized |
Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. Valid Values:
Default: false |
The following table describes the parameters that control security in the case of Kerberos security.
Parameter |
Description |
---|---|
RunAs |
Defines the Principal name of the user. |
KeyTabPath |
Defines the Keytab file path for the target user. |
Apache Spark
Apache Spark enables you to define access to a Spark server:
The following example defines an Apache Spark connection profile:
"SPARK_CONNECTION_PROFILE" :
{
"Type": "ConnectionProfile:Hadoop",
"TargetAgent": "docker-hadoop5",
"Spark":
{
"CustomPath": "/home"
}
}
The CustomPath parameter is optional.
Apache Oozie
Apache Oozie enables you to access an Oozie server for a job that submits an Oozie workflow.
The following example defines an Apache Oozie connection profile:
"OOZIE_CONNECTION_PROFILE" :
{
"Type": "ConnectionProfile:Hadoop",
"TargetAgent": "hdp-ubuntu",
"Centralized": false,
"Oozie":
{
"SslEnabled": false,
"Host": "hdp-centos",
"Port": "11000",
"ExtractionRules": [
{
"RuleName": "rule_name1",
"WorkFlowName": "work_flow_name1",
"WorkFlowUserName": "work_flow_user_name1",
"FolderName": "folder_name1",
"JobName": "job_name1"
},
{
"RuleName": "rule_name2",
"WorkFlowName": "work_flow_name2",
"WorkFlowUserName": "work_flow_user_name2",
"FolderName": "folder_name2",
"JobName": "job_name2"
} ]
}
}
The following table describes the Apache Oozie connection profile parameters.
Parameter |
Description |
---|---|
Host |
Defines the Oozie server host. |
Port |
Defines the Oozie server port. Default: 11,000 |
SslEnabled |
Valid Values:
Default: false |
ExtractionRules |
(Optional) Defines the rules for filtering Oozie workflows. Each rule has the following definitions: |
RuleName |
Defines the name of the rule. |
WorkFlowName |
Defines the name of the Oozie workflow to get from the Oozie server. |
WorkFlowUserName |
Defines the name of the user that runs the workflows from the Oozie server. |
FolderName |
Defines the name of the folder that contains the Hadoop job of the Oozie Extractor, as defined in the Hadoop job template. |
JobName |
Defines the name of the Hadoop job of the Oozie Extractor, as defined in the Hadoop job template. |
Apache Sqoop
Apache Sqoop enables you to run a Sqoop job.
The following example shows a connection profile that defines a Sqoop data source and access credentials:
"SQOOP_CONNECTION_PROFILE" :
{
"Type": "ConnectionProfile:Hadoop",
"TargetAgent": "edgenode",
"TargetCTM": "CTMHost",
"Centralized": false,
"Sqoop":
{
"User": "username",
"Password": "userpassword",
"ConnectionString": "jdbc:mysql://mysql.server/database",
"DriverClass": "com.mysql.jdbc.Driver"
}
}
The following table describes the Sqoop connection profile parameters shown in the example above, as well as several additional optional parameters.
Parameter |
Description |
---|---|
User |
Defines a database user connected to the Sqoop server. |
Password |
Defines a password for the specified user. To update an existing connection profile and keep the current password, type five*, as follows: ***** |
ConnectionString |
JDBC-compliant database: Defines the connection string used to connect to the database. |
DriverClass |
JDBC-compliant database: Defines the driver class for the driver .jar file, which indicates the entry-point to the driver. |
PasswordFile |
(Optional) Defines the full path to a file located on the HDFS that contains the password to the database. To use a JCEKS file, include the .jceks file extension. |
DatabaseVendor |
(Optional) Defines the database vendor of an automatically supported database used with Sqoop, one of the following:
|
DatabaseName |
(Optional) Defines the name of an automatically supported database used with Sqoop. |
DatabaseHost |
(Optional) Defines the host server of an automatically supported database used with Sqoop. |
DatabasePort |
(Optional) Defines the port number for an automatically supported database used with Sqoop. |
Apache Tajo
Tajo is an advanced data warehousing system on top of HDFS.
The following example shows a connection profile that defines access to a Tajo server:
"TAJO_CP" :
{
"Type": "ConnectionProfile:Hadoop",
"TargetAgent": "edgenode",
"Tajo":
{
"BinaryPath": "$TAJO_HOME/bin/",
"DatabaseName": "myTajoDB",
"MasterServerName" : "myTajoServer",
"MasterServerPort": "26001"
}
}
The following table describes the Tajo connection profile parameters.
Parameter |
Description |
---|---|
BinaryPath |
Defines the path to the bin directory where tsql utility is located. |
DatabaseName |
Defines the name of the Tajo database. |
MasterServerName |
Defines the Host name of the server where the Tajo master is running. |
MasterServerPort |
Defines the Tajo master port number. |
Apache Hive
Apache Hive enables you to run a Hive BeeLine job.
The following examples show how to define a Hadoop Hive connection profile.
-
This JSON defines a connection profile with a Hive BeeLine endpoint and access credentials:
The parameters in the example translate to the following BeeLine command:
beeline -u jdbc:hive2://<Host>:<Port>/<DatabaseName>
Copy"HIVE_CONNECTION_PROFILE" :
{
"Type": "ConnectionProfile:Hadoop",
"TargetAgent": "edgenode",
"TargetCTM": "CTMHost",
"Hive":
{
"Host": "hive_host_name",
"Port": "10000",
"DatabaseName": "hive_database",
}
} -
This JSON defines a connection profile with optional parameters for a Hadoop Hive type connection profile:
The parameters in the example translate to the following BeeLine command:
beeline -u jdbc:hive2://<Host>:<Port>/<DatabaseName>;principal=<Principal> -n <User> -p <Password>
Copy"HIVE_CONNECTION_PROFILE1":
{
"Type": "ConnectionProfile:Hadoop",
"TargetAgent": "edgenode",
"TargetCTM": "CTMHost",
"Centralized": true,
"Hive":
{
"Host": "hive_host_name",
"Port": "10000",
"DatabaseName": "hive_database",
"User": "user_name",
"Password": "user_password",
"Principal": "Server_Principal_of_HiveServer2@Realm"
}
}
To update an existing connection profile and keep the current password, type five *, as follows:
*****
ConnectionProfile:OCI Data Flow
Oracle Cloud Infrastructure (OCI) Data Flow is a fully managed Apache Spark service that performs processing tasks on extremely large datasets.
The following examples show how to define an OCI Data Flow connection profile.
-
This JSON-based connection profile authenticates with defined parameters:
Copy"OCI_DATAFLOW":
{
"Type": "ConnectionProfile:OCI Data Flow",
"OCI Data Flow URL": "https://dataflow.region.oci.oraclecloud.com",
"OCI Region": "us-phoenix-1",
"Authentication": "DefineParameters",
"User OCID": "ocid1.user.oc1..aaaaaaaasxjplkxcnplaxxxxyutitixxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"Tenancy OCID": "ocid1.tenancy.oc1..aaaaaaaak4xxxhtutyuyxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"Fingerprint": "dc:50:2d:e1:ax:af:x7:x6:xe:x9:ax:cx:cb:x3:x3:6x",
"Private Key": " ----BEGIN PRIVATE KEY---- XXXXXXXXXX ----END PRIVATE KEY---- ",
"Connection Timeout": "30",
"Description": "",
"Centralized": true
} -
This JSON-based connection profile authenticates with a configuration file:
Copy"OCI_ DATAFLOW_CF":
{
"Type": " ConnectionProfile:OCI Data Flow",
"OCI Data Flow URL": "https://dataflow.region.oci.oraclecloud.com",
"Authentication": "ConfigurationFile",
"Config File Path": "\home\dbauser\config.example",
"Profile": "Default",
"Connection Timeout": "30",
"Description": "",
"Centralized": true
}
The following table describes the OCI Data Flow connection profile parameters.
Parameter |
Authentication Method |
Description |
---|---|---|
OCI Data Flow URL |
All methods |
Defines the OCI Data Flow URL in the following format: https://dataflow.<region>.oci.oraclecloud.com/20200129 |
OCI Region |
All methods |
Determines the region where OCI Data Flow is located. ap-melbourne-1 eu-madrid-1 |
Authentication |
NA |
Determines one of the following authentication methods:
|
User OCID |
Defined Parameters |
Defines an individual user within the OCI environment. |
Tenancy OCID |
Defined Parameters |
Defines the OCI Tenancy ID in an OCI Data Flow, which is a global unique identifier for this account within the OCI environment. |
Fingerprint |
Defined Parameters |
Defines a fingerprint which uniquely identifies and verifies the integrity of the associated certificate or key. |
Private Key |
Defined Parameters |
Defines the Private key (critical component) within a set of API signing keys that are used for authentication and secure access to OCI resources. You can use Secrets in Code to hide this value in the code. |
Config File Path |
Configuration File |
Defines the path to the configuration file that contains authentication information. This file is stored on the Control-M/Agent. UNIX: home/user1/config/pem.pem Windows: C:\Users\user1\config\pem.pem |
Profile |
Configuration File |
Defines the name of a specific section in the configuration file, such as DEFAULT or PROFILE2 in the Configuration File code sample. |
Connection Timeout |
All methods |
Determines the number of seconds to wait after Control-M initiates a connection request to OCI Data Flow before a timeout occurs. Default: 30 |
Centralized |
All methods |
Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true. Valid Values:
Default: false |
ConnectionProfile:Snowflake
Snowflake is a cloud-computing platform that enables you to process, analyze, and store your data.
The following example shows how to define a Snowflake connection profile:
This connection profile uses token-based authentication. To authenticate with an Identity Provider (IdP), see ConnectionProfile:Snowflake IdP.
"SNOWFLAKE_CONNECTION_PROFILE":
{
"Type": "ConnectionProfile:Snowflake",
"Account Identifier": "{Account_ID}",
"Region": "us-east-1",
"Client ID": "DuHj****************",
"Client Secret": "*****",
"Refresh Token": "ver%******************",
"Redirect URI": "https%****************”
"Description": "",
"Centralized": true
}
The following table describes the Snowflake connection profile parameters.
Parameter |
Description |
---|---|
Account Identifier |
Defines the Snowflake account identifier. To obtain this string, run the Describe Security Integration command in Snowflake and copy the initial string from one of the authorization properties. OAUTH_AUTHORIZATION_ENDPOINThas the following value: https://abc123.us-east-1.snowflakecomputing.com/oauth/authorize In this value, the account identifier is the following string: abc123 For more information about obtaining values for the parameters required by the connection profile, see Setting Up a Snowflake API Connection. |
Region |
Determines the region where the Snowflake jobs are located. us-east-1 |
Client ID |
Defines the client ID assigned to the account in the Snowflake integration setup. |
Client Secret |
Defines the client secret assigned to the account in the Snowflake integration setup. You can use Secrets in Code to hide this value in the code. |
Refresh Token |
Defines the value for the refresh token. This string must be URL-encoded. |
Redirect URI |
Defines the redirect URI assigned to the account in the Snowflake integration setup. This string must be URL-encoded. |
Centralized |
Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true. Valid Values:
Default: false |
ConnectionProfile:Snowflake IdP
Snowflake is a cloud-computing platform that enables you to process, analyze, and store your data, with authentication based on an Identity Provider (IdP).
The following example shows how to define a Snowflake connection profile with authentication based on an Identity Provider (IdP):
This connection profile authenticates with an Identity Provider (IdP). To use token-based authentication, see ConnectionProfile:DBT.
"SNOWFLAKE_IDP_CONNECTION_PROFILE":
{
"Type": "ConnectionProfile:Snowflake IdP",
"Account Identifier": "{Account_ID}",
"Region": "us-east-1",
"Client ID": "DuHj****************",
"Client Secret": "*****",
"IDP URL": "https://****************",
"Scope": "session:role:<custom_role>",
"Description": "",
"Centralized": true
}
The following table describes the Snowflake connection profile parameters.
Parameter |
Description |
---|---|
Account Identifier |
Defines the Snowflake account identifier. To obtain this string, run the Describe Security Integration command in Snowflake and copy the initial string from one of the authorization properties. EXTERNAL_OAUTH_AUDIENCE_LIST has the following value: https://abc123.us-east-1.snowflakecomputing.com abc123 is the account identifier. For information about the values for the parameters required by the connection profile, see the IdP-specific External OAuth configuration instructions in the Snowflake documentation. |
Region |
Determines the region where the Snowflake jobs are located. us-east-1 |
Client ID |
Defines the client ID assigned to the account in the Snowflake integration setup. |
Client Secret |
Defines the client secret assigned to the account in the Snowflake integration setup. You can use Secrets in Code to hide this value in the code. |
IDP URL |
Defines the authentication endpoint for Snowflake IdP. |
Scope |
Defines the scope, which limits the operations you can do and the roles you can use in the Snowflake IdP plug-in, as follows: session:role:<custom_role> session:role:sysadmin |
Centralized |
Determines whether to create a centralized connection profile, which is stored in the Control-M/EM database and is available to all Agents. You must set this parameter to true. Valid Values:
Default: false |