Hadoop Connection Profile Parameters

The following table describes the Authentication parameters in a Hadoop connection profile.

Parameter

Description

Run as User: (Kerberos: Use Principal)

Defines the user/principal of the user on which to run the job.

For a non-kerberized cluster:

  • If the Agent runs as root, type a value in this field to run the tools as the specified user.
  • If the Agent does not run as root, leave this field empty. A message appears if the user tries to run a job on a non-root agent when this field has a value for the profile.

This parameter is not relevant to the Oozie job type. To run an Oozie job under a different user, you must add a user.name parameter to the Oozie job properties file or in the Oozie job properties.

User's Keytab File Path

Defines the keytab file path for the target user.

Sqoop Connection Profile Parameters

The following table describes the Sqoop profile parameters when using Sqoop with Hadoop. Sqoop is designed to transfer bulk data between Apache Hadoop and structured datastores.

When you provide a connection string to Sqoop, it inspects the protocol scheme to determine the appropriate vendor-specific logic to use. If Sqoop recognizes the given database, it works automatically. Otherwise, the information must be manually entered.

Parameter

Description

Database User

Defines the database user that is connected to the Sqoop server

Database Password

Defines the database user password

Password File (HDFS Full Path)

Indicates the full path to a file located on the HDFS that contains the password to the database

To use a JCEKS file, you must add the .jceks file extension

Automatically Supported Databases - Database Vendor

Determines which of the following automatically supported databases is used with the Sqoop tool:

  • MySQL
  • Oracle (SID)
  • Oracle (Service name)
  • PostgreSQL

Automatically Supported Databases - Database host

Indicates the database host server for Sqoop

Indicates the driver class for each driver .jar file, which indicates the entry-point to that driver

Automatically Supported Databases - Database Port

Indicates the database port for Sqoop

Default Port: 1024

Automatically Supported Databases - Database Name

Indicates the database name for Sqoop

Other JDBC-Compliant Database - Connection String

Indicates the connection string that is used to connect to the database

Other JDBC-Compliant Database - Driver Class

Indicates the driver class for each driver .jar file, which indicates the entry-point to that driver

HiveServer Connection Profile Parameters

The following table describes the HiveServer connection profile parameters, when using HiveServer with Hadoop. HiveServer enables remote clients to execute queries against Hive and retrieve the results. It supports multi-client concurrency and authentication.

Parameter

Description

Connection Type

Determines one of the following options as your connection type:

  • Connection properties: Connects to the HiveServer based on the connection properties that you define.

  • Connection string: Enables you to specify a connection string instead of entering all other properties.

Connection String

Defines a connection string for connecting to the HiveServer. No additional parameters are necessary.

Hive Host

Defines the Hive server host name

Hive Port

Determines the Hive port number

Default Port: 1024

Hive User

Defines the Hive user name

Database Name

Defines the Hive database name

Password

Defines the Hive user password

Hive Principal

Defines the HiveServer2 principal, which is required for Kerberos authentication

Oozie Connection Profile Parameters

The following table describes the Oozie connection profile parameters, when using Oozie with Hadoop. Oozie is a workflow scheduling system used to manage Hadoop jobs.

Field

Description

Server Name

Defines the Oozie server host name/IP address

Server Port

Determines the Oozie server port number

Default: 11000

Use SSL

Determines whether to use SSL when making a connection to the Oozie

Determines if Control-M communicates with the Oozie server in a Secured Socket Layer (SSL)

For Control-M for Hadoop to work with Oozie in SSL mode, do the following:

  • Configure your Oozie Server to use SSL (HTTPS), as described in Oozie documentation
  • Configure the Oozie Client where Control-M for Hadoop is installed to connect using SSL (HTTPS), as described in Oozie documentation

Oozie Extraction Rules

Lists the rules that determine which Oozie workflows to filter

You can add or update extraction rules, as described in Oozie Extraction Rules.

Oozie Extraction Rules

The following table describes the Oozie extraction rule parameters. These parameters are used for configuring the Hadoop connection profile parameters, when using Oozie extraction rules with Hadoop.

Field

Description

Rule Name

Defines the rule name

Workflow Name

Defines the name of the Oozie workflow to get from the Oozie server

Workflow User Name

Defines the name of the user that runs the workflows from the Oozie server

Folder Name

Defines the folder name that contains the Hadoop job of the Oozie Extractor

The folder name should be the exact same name as defined in the Hadoop job template of the Oozie Extractor

Job Name

Defines the name of the Hadoop job of the Oozie Extractor

The job name should be the exact same name as defined in the Hadoop job template of the Oozie Extractor

Spark Connection Profile Parameters

The following table describes the Spark connection profile parameters, when using Spark with Hadoop.

Parameter

Description

Spark Executable

Determines whether to use the default executable or a custom ‘spark-submit’ script to run the Spark job

The default path exists in the environment variable ‘$PATH’

Path

When the custom script option is chosen in the Spark Executable parameter, this parameter defines the full path to the custom ‘spark-submit’ script that will be used to run the job

Tajo Connection Profile Parameters

The following table describes the Tajo connection profile parameters, when using Tajo with Hadoop. Tajo is an advanced data warehousing system on top of HDFS.

Parameter

Description

tsql Bin Directory

Determines the full path to the bin directory where tsql utility is located

Database Name

Defines the database name to use

Tajo Master Server Name

Defines the host name of the server where the Tajo master is running

Tajo Master Server port

Defines the Tajo master port number

Default Port: 26002