HDP

Home

HDP

This topic explains how to configure Unravel to retrieve additional data from Hive, Tez, Spark and Oozie, such as Hive queries, application timelines, Spark jobs, YARN resource management data, and logs. You'll do this by generating Unravel's JARs and distributing them to every node that runs queries in the cluster. Later, after JARs are distributed to the nodes, you'll integrate Hive, Tez, and Spark data with Unravel.

1. Generate and distribute Unravel's Hive Hook and Spark Sensor JARs

Create a directory, for example, /usr/local/unravel-jars, for the JARs.

mkdir /usr/local/unravel-jars
chmod 775 -R /usr/local/unravel-jars/
chown root:hadoop /usr/local/unravel-jars/

Generate the JARs and specify the directory where the Jars must be saved.
```
chmod +x <Installation directory>/install_bin/services/unravel/bin/install/cluster-setup-scripts/unravel_hdp_setup.py

cd <Installation directory>/install_bin/services/unravel/bin/install/cluster-setup-scripts/usr/local/unravel/install_bin/cluster-setup-scripts/

sudo python2 unravel_hdp_setup.py --sensor-only --unravel-server <unravel-host>:3000 --spark-version <spark-version> --hive-version <hive-version> --ambari-server <ambari-host> --btrace-dir /usr/local/unravel-jars/ --hive-hook-dir /usr/local/unravel-jars/
```
Replace the values for unravel-host, spark-version, hive-version, and ambari-host with appropriate values.
For example:
```
python2 unravel_hdp_setup.py --sensor-only --unravel-server xyz66:3000 --spark-version 2.3.0 --hive-version 1.2.1 --btrace-dir /usr/local/unravel-jars/ --hive-hook-dir /usr/local/unravel-jars/
```
Tip
For unravel-host, specify the protocol (HTTP or HTTPS) and use the fully qualified domain name (FQDN) or IP address of Unravel Server. For example, https://playground3.unraveldata.com:3000.
For spark-version, use a Spark version that is compatible with this version of Unravel. For example,
spark-2.0 for Spark 2.0.x
spark-2.1 for Spark 2.1.x
spark-2.2 for Spark 2.2.x
spark-2.3 for Spark 2.3.x
spark-2.4 for Spark 2.4.x
spark-3.0 for Spark 3.0.x
For hive-version, use a Hive version that is compatible with this version of Unravel. For example,
HDP 3.x
3.1.0 for Hive 3.1.0
HDP 2.x
1.2.0 for Hive 1.2.0 or 1.2.1
0.13.0 for Hive 0.13.0
Distribute /usr/local/unravel-jars to all worker, edge, and master nodes that run the queries.
For example,
```
scp -r /usr/local/unravel-jars root@hostname:/usr/local/
```
Make sure the node can reach port 4043 of Unravel Server.

2. Configure Ambari to work with Unravel

Hive configurations
1. Hive
  Click Hive > Configs > Advanced > Advanced hive-env. In the hive-env template, towards the end of line, add:
```
export AUX_CLASSPATH=${AUX_CLASSPATH}:/usr/local/unravel-jars/unravel-hive-1.2.0-hook.jar 
```
2. Hive Hook
  In Ambari's general properties, append ,com.unraveldata.dataflow.hive.hook.UnravelHiveHook, to the following properties:
  Important
  Be sure to append with no space before or after the comma, for example, property=existingValue,newValue
  hive.exec.failure.hooks
  hive.exec.post.hooks
  hive.exec.pre.hooks
  For example, com.unraveldata.dataflow.hive.hook.UnravelHiveHook
```
hive.exec.failure.hooks=existing-value,com.unraveldata.dataflow.hive.hook.UnravelHiveHook
hive.exec.post.hooks=existing-value,com.unraveldata.dataflow.hive.hook.UnravelHiveHook
hive.exec.pre.hooks=existing-value,com.unraveldata.dataflow.hive.hook.UnravelHiveHook
```
3. Custom
  In Ambari's custom hive-site editor set com.unraveldata.host: to unravel-gateway-internal-IP-hostname
  For example,
4. Optional: Hive LLAP if it is enabled
  Tip
  Edit hive-site.xml manually, not through Ambari Web UI.
  1. Copy the settings in Custom hive-interactive-site and paste them into /etc/hive/conf/hive-site.xml.
  2. Copy the settings in Advanced hive-interactive-env and paste them into /etc/hive/conf/hive-site.xml.
Notice
If you have an Unravel version older than 4.5.1.0, create HDFS Hive Hook directories for Unravel:
```
hdfs dfs -mkdir -p /user/unravel/HOOK_RESULT_DIR
hdfs dfs -chown unravel:hadoop /user/unravel/HOOK_RESULT_DIR
hdfs dfs -chmod -R 777 /user/unravel/HOOK_RESULT_DIR
```
Configure HDFS.
Click HDFS > Configs > Advanced > Advanced hadoop-env. In the hadoop-env template, look for export HADOOP_CLASSPATH and append Unravel's JAR path as shown.
```
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:<Unravel installation directory>/unravel-jars/unravel-hive-1.2.0-hook.jar
```
Configure the BTrace agent for Tez
In the tez-site.xml configuration file, append the Java options below to tez.am.launch.cmd-opts and tez.task.launch.cmd-opts:
```
-javaagent:<Unravel installation directory>/unravel-jars/btrace-agent.jar=libs=mr,config=tez -Dunravel.server.hostport=unravel-host:4043
```
Tip
In a Kerberos environment you need to modify tez.am.view-acls property with the "run as" user or *.

Configure the Application Timeline Server (ATS)

Note

From Unravel v4.6.1.6, this step is not mandatory.

In yarn-site.xml:

yarn.timeline-service.enabled=true
yarn.timeline-service.entity-group-fs-store.group-id-plugin-classes=org.apache.tez.dag.history.logging.ats.TimelineCachePluginImpl
yarn.timeline-service.version=1.5 or yarn.timeline-service.versions=1.5f,2.0f

If yarn.acl.enable is true, add unravel to yarn.admin.acl.
In hive-env.sh, add:
```
Use ATS Logging: true
```

In tez-site.xml, add:

tez.dag.history.logging.enabled=true
tez.am.history.logging.enabled=true
tez.history.logging.service.class=org.apache.tez.dag.history.logging.ats.ATSV15HistoryLoggingService
tez.am.view-acls=unravel-"run-as"-user or *

Note

From HDP version 3.1.0 onwards, this Tez configuration must be done manually.

Configure Spark-on-Yarn
Tip
For unravel-host, use Unravel Server's fully qualified domain name (FQDN) or IP address.
For spark-version, use a Spark version that is compatible with this version of Unravel. For example,
- spark-2.0 for Spark 2.0.x
- spark-2.1 for Spark 2.1.x
- spark-2.2 for Spark 2.2.x
- spark-2.3 for Spark 2.3.x
- spark-2.4 for Spark 2.4.x
- spark-3.0 for Spark 3.0.x
1. Add the location of the Spark JARs.
  Click Spark > Configs > Custom spark-defaults > Add Property and use Bulk property add mode, or edit spark-defaults.conf as follows:
  Tip
  - If your cluster has only one Spark 1.X version, spark-defaults.conf is in /usr/hdp/current/spark-client/conf.
  - If your cluster is running Spark 2.X, spark-defaults.conf is in /usr/hdp/current/spark2-client/conf.
  This example uses default locations for Spark JARs. Your environment may vary.
```
spark.unravel.server.hostport=unravel-host:4043
spark.driver.extraJavaOptions=-javaagent:/usr/local/unravel-jars/btrace-agent.jar=config=driver,libs=spark-version
spark.executor.extraJavaOptions=-javaagent:/usr/local/unravel-jars/btrace-agent.jar=config=executor,libs=spark-version
spark.eventLog.enabled=true 
```
2. Enable Spark streaming.
Configure Oozie
1. If you are launching Spark actions:
  Copy the JAR for the Spark version you are using, for example, spark-2.3. If you copy multiple Spark JARs, Oozie won't be to launch actions.
  Ensure that the Spark event log location is configured the same as the local Spark jobs event logs' directory. In other words, Oozie must be able to locate the event log directory to store its event history logs.
2. Make sure that oozie.libpath for the Oozie shared library in HDFS is defined.
3. Copy the Hive Hook JAR and the Btrace JAR to oozie.libpath. If you don't do this, jobs controlled by Oozie 2.3+ fail.

3. Configure the Unravel Host

Define the following properties in <Unravel installation directory>/data/conf/unravel.properties. If you do not find the properties add them.

Tez.

Property/Description	Set by user	Unit	Default
com.unraveldata.yarn.timeline-service.webapp.address The HTTP address of the Timeline service web application.	Optional	string (URL)	-
com.unraveldata.yarn.timeline-service.port Timeline service port.		number	8188

Property/Description

Set by user

Unit

Default

com.unraveldata.yarn.timeline-service.webapp.address

The HTTP address of the Timeline service web application.

Optional

string

(URL)

com.unraveldata.yarn.timeline-service.port

Timeline service port.

number

8188

Note

In a multi-cluster environment, you must add these properties to the Edge node.

Set these if the Application Timeline Server (ATS) requires authentication.

Property/Description	Set by user	Unit	Default
yarn.ats.webapp.username Username required for authentication to the Application Timeline Server (if authentication is required).	Optional	string	-
yarn.ats.webapp.password Password required for authentication to the Application Timeline Server (if authentication is required).	Optional	string	-

Property/Description

Set by user

Unit

Default

yarn.ats.webapp.username

Username required for authentication to the Application Timeline Server (if authentication is required).

Optional

string

yarn.ats.webapp.password

Password required for authentication to the Application Timeline Server (if authentication is required).

Optional

string

4. Optional: Confirm that Unravel UI shows Tez data.

Run <Unravel installation directory>/install_bin/hive_test_simple.sh on the HDP cluster or on any cloud environment where hive.execution.engine=tez.
Log into Unravel server and go to the Applications page. Check for Tez jobs.
Unravel UI may take a few seconds to load Tez data.

5. Add more configuration and instrumentation options

In this section:

Would you like to provide feedback? Just click here to suggest edits.

Home

HDP

1. Generate and distribute Unravel's Hive Hook and Spark Sensor JARs

Tip

2. Configure Ambari to work with Unravel

Important

Tip

Notice

Tip

Note

Note

Tip

Tip

3. Configure the Unravel Host

Note

4. Optional: Confirm that Unravel UI shows Tez data.

5. Add more configuration and instrumentation options

Search results