Data Processing and Analytics Connection Profiles

The following topics describe connection profiles for data processing and analytics platforms and services:

ConnectionProfile:AWS Athena

AWS Athena enables you to process, analyze, and store your data in the cloud.

The following examples show how to define an AWS Athena connection profile.

  • This JSON-based connection profile authenticates with an AWS access key and secret:

    Copy
    "AWS_ATHENA":
    {
       "Type": "ConnectionProfile:AWS Athena",
       "AWS API Base URL": "https://athena.us-east-1.amazonaws.com",
       "AWS Region": "us-east-1",
       "Authentication": "SECRET",
       "AWS Access Key": "ABCDEF",
       "AWS Secret Key": "******",
       "Description": "",
       "Centralized": true
    }
  • This JSON-based connection profile authenticates with an AWS IAM role from inside an EC2 instance:

    Copy
    "AWS_ATHENA"
    {
       "Type": "ConnectionProfile:AWS Athena",
       "AWS API Base URL": "https://athena.us-east-1.amazonaws.com",
       "AWS Region": "us-east-1",
       "Authentication": "NOSECRET",
       "IAM Role": "ATHENAIAMROLE",
       "Description": "",
       "Centralized": true
    }

The following table describes the AWS Athena connection profile parameters.

Parameter

Description

AWS API Base URL

Determines the authentication endpoint for AWS Athena, as follows:

https://athena.<AWSRegion>.amazonaws.com

For more information about regional endpoints available for the AWS Athena service, refer to the AWS documentation.

AWS Region

Determines the region where the AWS Athena jobs are located.

Authentication

Determines one of the following authentication methods:

  • SECRET: Authenticates with an AWS access key and secret.

  • NOSECRET: Authenticates with an AWS IAM role from within the AWS infrastructure.

AWS Access Key

Defines the AWS Athena account access key.

AWS Secret Key

Defines the AWS Athena account secret access key.

You can use Secrets in Code to not expose this value in the code.

IAM Role

Defines the Identity and Access Management (IAM) role for the AWS Athena connection.

Connection Timeout

Determines the number of seconds to wait after Control-M initiates a connection request to AWS Athena before a timeout occurs.

Default: 20

Centralized

Determines whether to create a centralized connection profile, which is stored in the Control-M database and is available to all Agents, versions 9.0.20 or higher.

You must set this parameter to true.

Valid Values:

  • true: Creates a centralized connection profile.

  • false: Creates a local connection profile, which is associated with and stored on a specific Agent.

Default: false

ConnectionProfile:AWS Data Pipeline

AWS Data Pipeline is a cloud-based extract, transform, load (ETL) service that enables you to automate the transfer, processing, and storage of your data.

The following examples show how to define an AWS Data Pipeline connection profile.

  • This JSON-based connection profile authenticates with an AWS access key and secret:

    Copy
    "AWSDATAPIPELINESECRET "
    {
       "Type": "ConnectionProfile:AWS Data Pipeline",
       "Authentication": "SECRET",
       "AWS Access Key": "MYAWSACCESSKEY1234",
       "AWS Secret": "myAwsSecret12345",
       "AWS Region": "us-east-1",
       "Data Pipeline URL": "https://datapipeline.{{AWSRegion}}.amazonaws.com",
       "Connection Timeout": "30",
       "Description": "",
       "Centralized": true
    }
  • This JSON-based connection profile authenticates with an AWS IAM role from inside an EC2 instance:

    Copy
    "AWSDATAPIPELINEIAM":
    {
       "Type": "ConnectionProfile:AWS Data Pipeline",
       "Authentication": "NOSECRET",
       "IAM Role": "IAMROLENAME",
       "AWS Region": "us-east-1",
       "Data Pipeline URL": "https://datapipeline.{{AWSRegion}}.amazonaws.com",
       "Connection Timeout": "30",
       "Description": "",
       "Centralized": true
    }

The following table describes the AWS Data Pipeline connection profile parameters.

Parameter

Description

Authentication

Determines one of the following authentication methods for the connection with AWS Data Pipeline:

  • SECRET: Authenticates with an AWS access key and secret.

  • NOSECRET: Authenticates with an AWS IAM role from inside an EC2 instance.

AWS Access Key

(SECRET authentication) Defines the AWS Data Pipeline account access key.

AWS Secret

(SECRET authentication) Defines the AWS Data Pipeline account secret access key. You can use Secrets in Code to not expose this value in the code.

IAM Role

(NOSECRET authentication) Defines the Identity and Access Management (IAM) role for connection to AWS.

AWS Region

Determines the region where the AWS Data Pipeline jobs are located.

AWS API Base URL

Defines the REST API URL for the AWS Data Pipeline regional endpoint, as follows:

https://datapipeline.<AWS_Region>.amazonaws.com

For more information about regional endpoints available for the AWS Data Pipeline service, refer to the AWS documentation.

Connection Timeout

Determines the number of seconds to wait before a timeout occurs after Control-M initiates a request to AWS Data Pipeline.

Default: 30

Centralized

Determines whether to create a centralized connection profile, which is stored in the Control-M database and is available to all Agents, versions 9.0.20 or higher.

You must set this parameter to true.

Valid Values:

  • true: Creates a centralized connection profile.

  • false: Creates a local connection profile, which is associated with and stored on a specific Agent.

Default: false

ConnectionProfile:AWS DynamoDB

AWS DynamoDB is a NoSQL database service that enables you to create database tables, execute statements and transactions, export and import data to and from the Amazon S3 storage service.

For more information, see Control-M for AWS DynamoDB.

The following examples show how to define an AWS DynamoDB connection profile.

  • This JSON-based connection profile authenticates with AWS Key & Secret:

    Copy
    "AWS_DynamoDB"
    {
       "Type": "ConnectionProfile:AWS DynamoDB",
       "Authentication": "Secret",
       "AWS Secret": "*****",
       "AWS Region": "us-east-1",
       "AWS Access Key": "ZKIATY7B2LKB2JQ85I6D",
       "AWS Backup URL": "https://dynamodb.{{AWSRegion}}.amazonaws.com",
       "Description": "",
       "Connection Timeout": "100",
       "Centralized": true
    }
  • This JSON-based connection profile authenticates with IAM Role:

    Copy
    "AWS_ADY_IAM"
    {
       "Type": "ConnectionProfile:AWS DynamoDB",
       "Authentication": "NoSecret",
       "AWS Region": "us-east-1",
       "AWS Backup URL": "https://dynamodb.{{AWSRegion}}.amazonaws.com",
       "IAM Role": "arn:aws:iam::122343212345:role/Amazon12SSMRoleForInstancesQuickSetup",
       "Description": "",
       "Connection Timeout": "50",
       "Centralized": true
    }

The following table describes the AWS DynamoDB Connection Profile Parameters.

Parameter

Description

 

AWS DynamoDB Login URL

Determines the AWS DynamoDB authentication endpoint base URL that includes the region that is defined for the AWS account.

https://dynamodb.<us-east-1>.amazonaws.com

AWS Region

Determines the region where the AWS DynamoDB jobs are located.

us-east-1

Authentication

Determines one of the following authentication methods:

  • (For AWS Access Key) Secret: Authenticates with an AWS access key and secret.
  • (For AWS Secret) NoSecret: Authenticates with AWS IAM role from inside an EC2 instance.

AWS Access Key

Defines the AWS Access Key account access key.

AWS Secret

Defines the AWS Secret account secret access key. You can use Secrets in Code to not expose this value in the code.

IAM Role

Defines the Identity and Access Management (IAM) role for the AWS DynamoDB connection.

Connection Timeout

Determines the number of seconds to wait after Control-M initiates a connection request to AWS Backup before a timeout occurs.

Default: 20

Centralized

Determines whether to create a centralized connection profile, which is stored in the Control-M database and is available to all Agents, versions 9.0.20 or higher.

You must set this parameter to true.

Valid Values:

  • true: Creates a centralized connection profile.

  • false: Creates a local connection profile, which is associated with and stored on a specific Agent.

Default: false

ConnectionProfile:AWS EMR

AWS EMR is a managed cluster platform that enables you to execute big data frameworks, such as Apache Hadoop and Apache Spark, to process and analyze vast amounts of data.

The following example shows how to define an AWS EMR connection profile:

Copy
"AWS_EMR":
{
   "Type": "ConnectionProfile:AWS EMR",
   "AWSRegion": "us-east-1",
   "EMRAccessKey": “ABCDEF",
   "EMRSecretKey": "****",
   "Description": "",
   "Centralized": true
}

The following table describes the AWS EMR connection profile parameters.

Parameter

Description

AWSRegion

Determines the AWS region.

EMRAccessKey

Defines the token for the connection to AWS.

EMRSecretKey

Defines an additional security token for AWS. You can use Secrets in Code to not expose this value in the code.

Centralized

Determines whether to create a centralized connection profile, which is stored in the Control-M database and is available to all Agents, versions 9.0.20 or higher.

You must set this parameter to true.

Valid Values:

  • true: Creates a centralized connection profile.

  • false: Creates a local connection profile, which is associated with and stored on a specific Agent.

Default: false

ConnectionProfile:Azure Databricks

Azure Databricks is a cloud-based data analytics platform that enables you to process and analyze large workloads of data.

The following example shows how to define an Azure Databricks connection profile:

Copy
"ADF_SERVPRINC":
{
   "Type": "ConnectionProfile:Azure Databricks",
   "Tenant ID": "tenantId",
   "Application ID": "4f477fa3-1a1g-4877-ca92-f39bb563f3b1",
   "Client Secret": "*****"
   "Databricks url": "https://adb-1111211144444680.0.azuredatabricks.net",
   "Azure Login url": "https://login.microsoftonline.com",
   "Connection Timeout": "50",
   "Description": "",
   "Centralized": true
}

The following table describes the Azure Databricks connection profile parameters.

Parameter

Description

Tenant ID

Defines the Azure Tenant ID in Azure AD.

Application ID

Defines the application (service principal) ID of the registered application.

The service principal must meet the following requirements:

  • It must be an Azure Databricks workspace user and admin. In the Databricks Admin Console, it must appear under users and also under admins.

  • It must be associated with a Contributor or Owner roe.

Client Secret

Defines the client secret (password) associated with the Azure user and the application.You can use Secrets in Code to not expose this value in the code.

Databricks url

Defines the URL of your Databricks workspace.

Azure Login url

Defines the Azure AD authentication endpoint base URL.

Connection Timeout

Defines the timeout value, in seconds, for the trigger call made by Control-M to Azure Databricks.

Default: 50

Centralized

Determines whether to create a centralized connection profile, which is stored in the Control-M database and is available to all Agents, versions 9.0.20 or higher.

You must set this parameter to true.

Valid Values:

  • true: Creates a centralized connection profile.

  • false: Creates a local connection profile, which is associated with and stored on a specific Agent.

Default: false

ConnectionProfile:Azure HDInsight

Azure HDInsight enables you to execute an Apache Spark batch job and perform big data analytics.

The following example shows how to define an Azure HDInsight connection profile:

Copy
"AZUREHDINSIGHT":
{
   "Type": "ConnectionProfile:Azure HDInsight",     
   "Cluster Name": "hdcluster"
   "Cluster Username": "admin",  
   "Cluster Password": "*****"
   "Description": "",
   "Centralized": true
}

The following table describes the Azure HDInsight connection profile parameters.

Parameter

Description

Cluster Name

Defines the name of the HDInsight cluster to connect to.

Cluster Username

Defines the name of the Administrator to use to connect to Azure HDInsight.

Cluster Password

Defines a password for the Administrator, as configured in Azure HDInsight. You can use Secrets in Code to not expose this value in the code.

Centralized

Determines whether to create a centralized connection profile, which is stored in the Control-M database and is available to all Agents, versions 9.0.20 or higher.

You must set this parameter to true.

Valid Values:

  • true: Creates a centralized connection profile.

  • false: Creates a local connection profile, which is associated with and stored on a specific Agent.

Default: false

ConnectionProfile:Azure Synapse

Azure Synapse enables you to perform data integration and big data analytics.

The following examples show how to define an Azure Synapse connection profile.

  • This JSON-based connection profile authenticates with an Azure service principal:

    Copy
    "AZURE_SYNAPSE_1":
    {
       "Type": "ConnectionProfile:Azure Synapse",
       "Authentication Method": "PRINCIPAL",
       "Tenant ID": "tenantId",
       "Azure AD url": "https://login.microsoftonline.com",
       "Synapse url": "https://ncu-if-synapse.dev.azuresynapse.net",
       "Synapse Resource": "https://dev.azuresynapse.net/",
       "App ID": "4f477fa3-1a1g-4877-ca92-f39bb563f3b1",
       "Client Secret": "*****",
       "Connection Timeout": "50",
       "Description": "",
       "Centralized": true
    }
  • This JSON-based connection profile authenticates with a managed identity:

    Managed identity authentication is based on an Azure token that is valid, by default, for 24 hours. Token lifetime can be extended by Azure.

    Copy
    "AZURE_SYNAPSE_2":
    {
       "Type": "ConnectionProfile:Azure Synapse",
       "Authentication Method": "MANAGEDID",
       "Specify Managed Identity Client ID": "&client_id=",
       "Managed Identity Client ID": "72d448f0-ac32-45ea-9158-f8653e4ee16",
       "Synapse url": "https://ncu-if-synapse.dev.azuresynapse.net",
       "Synapse Resource": "https://dev.azuresynapse.net/",
       "Connection Timeout": "50",
       "Description": "",
       "Centralized": true
    }

The following table describes the Azure Synapse connection profile parameters.

Parameter

Description

Authentication Method

Defines one of the following types of authentication to use for the connection with Azure Synapse analytics:

  • PRINCIPAL: Authenticates with a service principal.

  • MANAGEDID: Authenticates with managed identity.

To prepare for authentication with each of these methods:

  • Grant your managed identity or service principal access to your Synapse workspace through the Synapse Studio (Manage > Access Control).

  • Assign a contributor role to the Synapse workspace accessed by the managed identity or service principal.

Specify Managed Identity Client ID

(Managed identity) Determines whether the client ID for your managed identity is specified by the Managed Identity Client ID parameter.

Include this parameter only if you are using the managed identity authentication method and you have multiple managed identities defined on your Azure virtual machine. Set its value to &client_id=.

Managed Identity Client ID

(Managed identity) Determines which client ID to use as the managed identity.

This parameter requires a value only if you have multiple managed identities defined on your Azure virtual machine and you included the Specify Managed Identity Client ID parameter.

If you have only one managed identity, it is detected automatically.

Tenant ID

(Service principal authentication) Defines the Azure Tenant ID in Azure AD.

Azure AD url

(Service principal authentication) Defines the Azure AD authentication endpoint base URL.

Synapse url

Defines the workspace development endpoint.

https://myworkspace.dev.azuresynapse.net

Synapse Resource

Defines the resource parameter that serves as the identifier for the Azure Synapse login via Azure AD, as follows:

https://dev.azuresynapse.net/

App ID

Defines the application (service principal) ID of the registered application for the Azure Synapse service.

Client Secret

(Service principal authentication) Defines the client secret (password) associated with the Azure user and the application. You can use Secrets in Code to not expose this value in the code.

Connection Timeout

Defines a timeout value, in seconds, for the trigger call made by Control-M to Azure Synapse Analytics.

Default: 50

Centralized

Determines whether to create a centralized connection profile, which is stored in the Control-M database and is available to all Agents, versions 9.0.20 or higher.

You must set this parameter to true.

Valid Values:

  • true: Creates a centralized connection profile.

  • false: Creates a local connection profile, which is associated with and stored on a specific Agent.

Default: false

ConnectionProfile:Databricks

Databricks enables the integration of jobs created in the Databricks environment with your existing Control-M workflows.

The following example shows how to define a Databricks connection profile:

Copy
"DATABRICKS":
{
   "Type": "ConnectionProfile:Databricks",
   "Databricks workspace url": "https://dbc-7b944b32-faf0.cloud.databricks.com",
   "Databricks personal access token": "*****"
   "Connection Timeout": "50",
   "Description": "",
   "Centralized": true
}  

The following table describes the Databricks connection profile parameters.

Parameter

Description

Databricks workspace url

Defines the URL of your Databricks workspace.

Databricks personal access token

Defines a Databricks token for authentication of connections to the Databricks workspace. You can use Secrets in Code to not expose this value in the code.

Connection Timeout

Defines the timeout value, in seconds, for the REST calls made to Databricks.

Default: 50

Centralized

Determines whether to create a centralized connection profile, which is stored in the Control-M database and is available to all Agents, versions 9.0.20 or higher.

You must set this parameter to true.

Valid Values:

  • true: Creates a centralized connection profile.

  • false: Creates a local connection profile, which is associated with and stored on a specific Agent.

Default: false

ConnectionProfile:DBT

dbt (Data Build Tool) is a cloud-based computing platform that enables you to develop, test, schedule, document, and analyze data models.

The following example shows how to define a dbt connection profile:

Copy
"DBT_CP":
{
   "Type": "ConnectionProfile:DBT",
   "DBT URL": "https://cloud.getdbt.com"
   "DBT Token": "*****",
   "Account ID": "123456",
   "Connection Timeout": "60",
   "Description": "",
   "Centralized": true
}

The following table describes the dbt connection profile parameters.

Parameter

Description

DBT URL

Defines the dbt authentication endpoint, as follows:

https://cloud.getdbt.com

DBT Token

Defines the authentication code that is used to create a connection to the dbt platform.

This code is located in the API Access section in the dbt cloud platform.

You can use Secrets in Code to not expose this value in the code.

Account ID

Defines the unique ID that is assigned to your dbt cloud account.

This ID is located in the Account Info section in the dbt cloud platform.

Connection Timeout

Determines the number of seconds to wait after Control-M initiates a connection request to dbt before a timeout occurs.

Default: 60

Centralized

Determines whether to create a centralized connection profile, which is stored in the Control-M database and is available to all Agents, versions 9.0.20 or higher.

You must set this parameter to true.

Valid Values:

  • true: Creates a centralized connection profile.

  • false: Creates a local connection profile, which is associated with and stored on a specific Agent.

Default: false

ConnectionProfile:GCP BigQuery

GCP BigQuery is a Google Cloud Platform computing service that you can use for data storage, processing, and analysis.

The following example shows how to define a GCP BigQuery connection profile that authenticates with a service account:

Copy
"BIGQSA":
{
   "Type": "ConnectionProfile:GCP BigQuery",
   "Identity Type": "service_account",
   "Service Account Key": "*****",
   "BigQuery URL": "https://bigquery.googleapis.com",
   "Description": "",
   "Centralized": true
}

The following table describes the GCP BigQuery connection profile parameters.

Parameter

Description

Identity Type

Determines one of the following authentication types using GCP Access Control:

  • service_account: Authenticates with an application ID (service account) and client secret.

  • iam_user: Authenticates based on a detected IAM role, which removes the need to provide additional credentials.

BigQuery URL

Defines the Google Cloud Platform (GCP) authentication endpoint for BigQuery, as follows:

https://bigquery.googleapis.com

Service Account Key

(Service account) Defines a service account that is associated with an RSA key pair.You can use Secrets in Code to not expose this value in the code.

Centralized

Determines whether to create a centralized connection profile, which is stored in the Control-M database and is available to all Agents, versions 9.0.20 or higher.

You must set this parameter to true.

Valid Values:

  • true: Creates a centralized connection profile.

  • false: Creates a local connection profile, which is associated with and stored on a specific Agent.

Default: false

ConnectionProfile:GCP DataFlow

Google Cloud Platform (GCP) Dataflow enables you to perform cloud-based data processing for batch and real-time, data-streaming applications.

The following example shows how to define a GCP Dataflow connection profile, based on a service account:

Copy
"GCPDATAFLOW":

   "Type": "ConnectionProfile:GCP DataFlow"
   "Identity Type": "service_account"
   "DataFlow URL": "https://dataflow.googleapis.com"
   "Service Account Key": "{\"type\":\"service_account\",\"project_id\":\"sso-gcp-dba-ctm1-priv-cc30752\",\"private_key_id\":\"5197d05c5b8212bea944985cec74a34d6c1868aa\",\"private_key\":\"-----BEGIN PRIVATE KEY-----\\nprivate-key\\n-----END PRIVATE KEY-----\\n\",\"client_email\":\"bmc-wla-svc-02@sso-gcp-dba-ctm1-priv-cc30752.iam.gserviceaccount.com\",\"client_id\":\"116650586827623521335\",\"auth_uri\":\"https://accounts.google.com/o/oauth2/auth\",\"token_uri\":\"https://oauth2.googleapis.com/token\",  \"auth_provider_x509_cert_url\":\"https://www.googleapis.com/oauth2/v1/certs\",\"client_x509_cert_url\":\"https://www.googleapis.com/robot/v1/metadata/x509/bmc-wla-svc-02%40sso-gcp-dba-ctm1-priv-cc30752.iam.gserviceaccount.com\"}",
   "Description": ""
   "Centralized": true 
}

The following table describes the GCP Dataflow connection profile parameters.

Parameter

Description

Identity Type

Determines one of the following authentication types using GCP Access Control:

  • service_account: Authenticates with an application ID (service account) and client secret.

  • os_user: Authenticates based on a detected IAM role, which removes the need to provide additional credentials.

DataFlow URL

Defines the Google Cloud Platform (GCP) authentication endpoint for Dataflow.

https://dataflow.googleapis.com

Service Account Key

(Service account) Defines a JSON body that contains the required service account credentials to access GCP. You can use Secrets in Code to not expose this value in the code.

Centralized

Determines whether to create a centralized connection profile, which is stored in the Control-M database and is available to all Agents, versions 9.0.20 or higher.

You must set this parameter to true.

Valid Values:

  • true: Creates a centralized connection profile.

  • false: Creates a local connection profile, which is associated with and stored on a specific Agent.

Default: false

ConnectionProfile:GCP Dataproc

Google Cloud Platform (GCP) Dataproc enables you to perform cloud-based big data processing and machine learning.

The following example shows how to define a GCP Dataproc connection profile that authenticates with a service account:

Copy
"GCPDATAPROC":

   "Type": "ConnectionProfile:GCP Dataproc"
   "Identity Type": "service_account"
   "Dataproc URL": "https://dataproc.googleapis.com"
   "Service Account Key": "{\"type\":\"service_account\",\"project_id\":\"sso-gcp-dba-ctm1-priv-cc30752\",\"private_key_id\":\"5197d05c5b8212bea944985cec74a34d6c1868aa\",\"private_key\":\"-----BEGIN PRIVATE KEY-----\\nprivate-key\\n-----END PRIVATE KEY-----\\n\",\"client_email\":\"bmc-wla-svc-02@sso-gcp-dba-ctm1-priv-cc30752.iam.gserviceaccount.com\",\"client_id\":\"116650586827623521335\",\"auth_uri\":\"https://accounts.google.com/o/oauth2/auth\",\"token_uri\":\"https://oauth2.googleapis.com/token\",  \"auth_provider_x509_cert_url\":\"https://www.googleapis.com/oauth2/v1/certs\",\"client_x509_cert_url\":\"https://www.googleapis.com/robot/v1/metadata/x509/bmc-wla-svc-02%40sso-gcp-dba-ctm1-priv-cc30752.iam.gserviceaccount.com\"}",
   "Connection timeout": "20",
   "Description": ""
   "Centralized": true 

The following table describes the GCP Dataproc connection profile parameters.

Parameter

Description

Identity Type

Determines one of the following authentication types using GCP Access Control:

  • service_account: Authenticates with an application ID (service account) and client secret.

  • os_user: Authenticates based on a detected IAM role, which removes the need to provide additional credentials.

Dataproc URL

Defines the Google Cloud Platform (GCP) authentication endpoint for Dataproc.

https://dataproc.googleapis.com

Service Account Key

(Service account) Defines a JSON body that contains the required service account credentials to access GCP. You can use Secrets in Code to not expose this value in the code.

Connection timeout

Defines a timeout value, in seconds, for the trigger call to the Google Cloud Platform.

Default: 20

Centralized

Determines whether to create a centralized connection profile, which is stored in the Control-M database and is available to all Agents, versions 9.0.20 or higher.

You must set this parameter to true.

Valid Values:

  • true: Creates a centralized connection profile.

  • false: Creates a local connection profile, which is associated with and stored on a specific Agent.

Default: false

ConnectionProfile:Hadoop

The Hadoop job connects to the Hadoop framework, which enables you to split up and process large data sets on clusters of commodity servers. You can expand your enterprise business workflows to include tasks that execute in your Big Data Hadoop cluster in Control-M with the different Hadoop-supported tools, including Pig, Hive, HDFS File Watcher, Map Reduce Jobs, and Sqoop.

The following examples show how to define a Hadoop connection profile for various types of Hadoop jobs.

Hadoop (All Types)

The following example defines the parameters required for all Hadoop types:

Copy
"HADOOP_CONNECTION_PROFILE":
{
   "Type": "ConnectionProfile:Hadoop",
   "TargetAgent": "edgenode",
   "TargetCTM": "CTMHost"
}

The following table describes the Hadoop connection profile parameters.

Parameter

Description

TargetAgent

Determines the Agent where the connection profile deploys.

TargetCTM

Determines the Control-M/Server where the connection profile deploys. If there is only one Control-M/Server, it defaults to that Control-M/Server.

The following example defines the optional parameters for defining the user running the Hadoop job types and choosing between a local or centralized connection profile:

Copy
"HADOOP_CONNECTION_PROFILE":
{
   "Type" : "ConnectionProfile:Hadoop",
   "TargetAgent" : "edgenode",
   "TargetCTM" : "CTMHost",
   "RunAs": "",
   "KeyTabPath":"",
   "Centralized":true
}

The following table describes the Hadoop connection profile parameters shown in the example above.

Parameter

Description

RunAs

Defines the user of the account on which to run Hadoop jobs.

Leave this field empty to run Hadoop jobs with the user account where the Agent was installed.

The Agent must run as root, if you define a specific RunAs user.

Centralized

Determines whether to create a centralized connection profile, which is stored in the Control-M database and is available to all Agents, versions 9.0.20 or higher.

You must set this parameter to true.

Valid Values:

  • true: Creates a centralized connection profile.

  • false: Creates a local connection profile, which is associated with and stored on a specific Agent.

Default: false

The following table describes the parameters that control security in the case of Kerberos security.

Parameter

Description

RunAs

Defines the Principal name of the user.

KeyTabPath

Defines the Keytab file path for the target user.

Apache Spark

Apache Spark enables you to define access to a Spark server:

The following example defines an Apache Spark connection profile:

Copy
"SPARK_CONNECTION_PROFILE" :
{
   "Type": "ConnectionProfile:Hadoop",
   "TargetAgent": "docker-hadoop5",
   "Spark":
   {
      "CustomPath": "/home"
   }
}

The CustomPath parameter is optional.

Apache Oozie

Apache Oozie enables you to access an Oozie server for a job that submits an Oozie workflow.

The following example defines an Apache Oozie connection profile:

Copy
"OOZIE_CONNECTION_PROFILE" :
{
   "Type": "ConnectionProfile:Hadoop",
   "TargetAgent": "hdp-ubuntu",
   "Centralized": false,
   "Oozie":
   {
      "SslEnabled": false,
      "Host": "hdp-centos",
      "Port": "11000",
      "ExtractionRules": [
      {
         "RuleName": "rule_name1",
         "WorkFlowName": "work_flow_name1",
         "WorkFlowUserName": "work_flow_user_name1",
         "FolderName": "folder_name1",
         "JobName": "job_name1"
      },
      {
         "RuleName": "rule_name2",
         "WorkFlowName": "work_flow_name2",
         "WorkFlowUserName": "work_flow_user_name2",
         "FolderName": "folder_name2",
         "JobName": "job_name2"
      } ]
   }
}

The following table describes the Apache Oozie connection profile parameters.

Parameter

Description

Host

Defines the Oozie server host.

Port

Defines the Oozie server port.

Default: 11,000

SslEnabled

Valid Values:

  • true

  • false

Default: false

ExtractionRules

(Optional) Defines the rules for filtering Oozie workflows. Each rule has the following definitions:

RuleName

Defines the name of the rule.

WorkFlowName

Defines the name of the Oozie workflow to get from the Oozie server.

WorkFlowUserName

Defines the name of the user that runs the workflows from the Oozie server.

FolderName

Defines the name of the folder that contains the Hadoop job of the Oozie Extractor, as defined in the Hadoop job template.

JobName

Defines the name of the Hadoop job of the Oozie Extractor, as defined in the Hadoop job template.

Apache Sqoop

Apache Sqoop enables you to run a Sqoop job.

The following example shows a connection profile that defines a Sqoop data source and access credentials:

Copy
"SQOOP_CONNECTION_PROFILE" :
{
   "Type": "ConnectionProfile:Hadoop",
   "TargetAgent": "edgenode",
   "TargetCTM": "CTMHost",
   "Centralized": false,
   "Sqoop":
   {
      "User": "username",
      "Password": "userpassword",
      "ConnectionString": "jdbc:mysql://mysql.server/database",
      "DriverClass": "com.mysql.jdbc.Driver"
   }
}

The following table describes the Sqoop connection profile parameters shown in the example above, as well as several additional optional parameters.

Parameter

Description

User

Defines a database user connected to the Sqoop server.

Password

Defines a password for the specified user.

To update an existing connection profile and keep the current password, type five*, as follows:

*****

ConnectionString

JDBC-compliant database: Defines the connection string used to connect to the database.

DriverClass

JDBC-compliant database: Defines the driver class for the driver .jar file, which indicates the entry-point to the driver.

PasswordFile

(Optional) Defines the full path to a file located on the HDFS that contains the password to the database.

To use a JCEKS file, include the .jceks file extension.

DatabaseVendor

(Optional) Defines the database vendor of an automatically supported database used with Sqoop, one of the following:

  • MySQL

  • Oracle (SID)

  • Oracle (Service Name)

  • PostgreSQL

DatabaseName

(Optional) Defines the name of an automatically supported database used with Sqoop.

DatabaseHost

(Optional) Defines the host server of an automatically supported database used with Sqoop.

DatabasePort

(Optional) Defines the port number for an automatically supported database used with Sqoop.

Apache Tajo

Tajo is an advanced data warehousing system on top of HDFS.

The following example shows a connection profile that defines access to a Tajo server:

Copy
"TAJO_CP" :
{
   "Type": "ConnectionProfile:Hadoop",
   "TargetAgent": "edgenode",
   "Tajo":
   {
      "BinaryPath": "$TAJO_HOME/bin/",
      "DatabaseName": "myTajoDB",
      "MasterServerName" : "myTajoServer",
      "MasterServerPort": "26001"
   }
}

The following table describes the Tajo connection profile parameters.

Parameter

Description

BinaryPath

Defines the path to the bin directory where tsql utility is located.

DatabaseName

Defines the name of the Tajo database.

MasterServerName

Defines the Host name of the server where the Tajo master is running.

MasterServerPort

Defines the Tajo master port number.

Apache Hive

Apache Hive enables you to run a Hive BeeLine job.

The following examples show how to define a Hadoop Hive connection profile.

  • This JSON defines a connection profile with a Hive BeeLine endpoint and access credentials:

    The parameters in the example translate to the following BeeLine command:

    beeline -u jdbc:hive2://<Host>:<Port>/<DatabaseName>

    Copy
    "HIVE_CONNECTION_PROFILE" :
    {
       "Type": "ConnectionProfile:Hadoop",
       "TargetAgent": "edgenode",
       "TargetCTM": "CTMHost",
       "Hive":
       {
          "Host": "hive_host_name",
          "Port": "10000",
          "DatabaseName": "hive_database",
       }
    }
  • This JSON defines a connection profile with optional parameters for a Hadoop Hive type connection profile:

    The parameters in the example translate to the following BeeLine command:

    beeline -u jdbc:hive2://<Host>:<Port>/<DatabaseName>;principal=<Principal> -n <User> -p <Password>

    Copy
    "HIVE_CONNECTION_PROFILE1":
    {
       "Type": "ConnectionProfile:Hadoop",
       "TargetAgent": "edgenode",
       "TargetCTM": "CTMHost",
       "Centralized": true,
       "Hive":
       {
          "Host": "hive_host_name",
          "Port": "10000",
          "DatabaseName": "hive_database",
          "User": "user_name",
          "Password": "user_password",
          "Principal": "Server_Principal_of_HiveServer2@Realm"
       }
    }

To update an existing connection profile and keep the current password, type five *, as follows:

*****

ConnectionProfile:Snowflake

Snowflake is a cloud-computing platform that enables you to process, analyze, and store your data.

The following example shows how to define a Snowflake connection profile:

This connection profile uses token-based authentication. To authenticate with an Identity Provider (IdP), see ConnectionProfile:Snowflake IdP.

Copy
"SNOWFLAKE_CONNECTION_PROFILE":

   "Type": "ConnectionProfile:Snowflake"
   "Account Identifier": "{Account_ID}",  
   "Region": "us-east-1",
   "Client ID": "DuHj****************",
   "Client Secret": "*****",
   "Refresh Token": "ver%******************",   
   "Redirect URI": "https%****************” 
   "Description": ""
   "Centralized": true 
}

The following table describes the Snowflake connection profile parameters.

Parameter

Description

Account Identifier

Defines the Snowflake account identifier.

To obtain this string, run the Describe Security Integration command in Snowflake and copy the initial string from one of the authorization properties.

OAUTH_AUTHORIZATION_ENDPOINThas the following value:

https://abc123.us-east-1.snowflakecomputing.com/oauth/authorize

In this value, the account identifier is the following string:

abc123

For more information about obtaining values for the parameters required by the connection profile, see Setting Up a Snowflake API Connection.

Region

Determines the region where the Snowflake jobs are located.

us-east-1

Client ID

Defines the client ID assigned to the account in the Snowflake integration setup.

Client Secret

Defines the client secret assigned to the account in the Snowflake integration setup. You can use Secrets in Code to not expose this value in the code.

Refresh Token

Defines the value for the refresh token.

This string must be URL-encoded.

Redirect URI

Defines the redirect URI assigned to the account in the Snowflake integration setup.

This string must be URL-encoded.

Centralized

Determines whether to create a centralized connection profile, which is stored in the Control-M database and is available to all Agents, versions 9.0.20 or higher.

You must set this parameter to true.

Valid Values:

  • true: Creates a centralized connection profile.

  • false: Creates a local connection profile, which is associated with and stored on a specific Agent.

Default: false

ConnectionProfile:Snowflake IdP

Snowflake is a cloud-computing platform that enables you to process, analyze, and store your data, with authentication based on an Identity Provider (IdP).

The following example shows how to define a Snowflake connection profile with authentication based on an Identity Provider (IdP):

This connection profile authenticates with an Identity Provider (IdP). To use token-based authentication, see ConnectionProfile:DBT.

Copy
"SNOWFLAKE_IDP_CONNECTION_PROFILE":
{
   "Type": "ConnectionProfile:Snowflake IdP",
   "Account Identifier": "{Account_ID}",
   "Region": "us-east-1"
   "Client ID": "DuHj****************"
   "Client Secret": "*****",
   "IDP URL": "https://****************",
   "Scope": "session:role:<custom_role>"
   "Description": "",
   "Centralized": true
}

The following table describes the Snowflake connection profile parameters.

Parameter

Description

Account Identifier

Defines the Snowflake account identifier.

To obtain this string, run the Describe Security Integration command in Snowflake and copy the initial string from one of the authorization properties.

EXTERNAL_OAUTH_AUDIENCE_LIST has the following value:

https://abc123.us-east-1.snowflakecomputing.com

abc123 is the account identifier.

For information about the values for the parameters required by the connection profile, see the IdP-specific External OAuth configuration instructions in the Snowflake documentation.

Region

Determines the region where the Snowflake jobs are located.

us-east-1

Client ID

Defines the client ID assigned to the account in the Snowflake integration setup.

Client Secret

Defines the client secret assigned to the account in the Snowflake integration setup. You can use Secrets in Code to not expose this value in the code.

IDP URL

Defines the authentication endpoint for Snowflake IdP.

Scope

Defines the scope, which limits the operations you can do and the roles you can use in the Snowflake IdP plug-in, as follows:

session:role:<custom_role>

session:role:sysadmin

Centralized

Determines whether to create a centralized connection profile, which is stored in the Control-M database and is available to all Agents, versions 9.0.20 or higher.

You must set this parameter to true.

Valid Values:

  • true: Creates a centralized connection profile.

  • false: Creates a local connection profile, which is associated with and stored on a specific Agent.

Default: false