Part 2: Enabling additional instrumentation
This topic explains how to enable additional instrumentation on your gateway/edge/client nodes.
1. Run the setup script
On the Unravel host, run unravel_mapr_setup.py
with the following values:
Tip
For unravel-host
, use Unravel Server's fully qualified domain name (FQDN) or IP address.
For spark-version
, use a Spark version that is compatible with this version of Unravel. For example,
spark-2.0.0
for Spark 2.0.xspark-2.1.0
for Spark 2.1.xspark-2.2.0
for Spark 2.2.xspark-2.3.0
for Spark 2.3.x
For hive-version
, use a Hive version that is compatible. For example,
1.2.0
for Hive 1.2.0 or 1.2.10.13.0
for Hive 0.13.0
For tez-version
, use a Tez version that is compatible with this version of Unravel. For example,
0.8
for Tez 0.8
The setup script, unravel_mapr_setup.py
, does the following:
It puts the Hive Hook JAR in
/usr/local/unravel_client/
It puts the resource metrics JAR in
/usr/local/unravel-agent/
For MapR 5.2 or MapR 6.0, it changes the contents of these configuration files:
/opt/mapr/spark/spark-
spark-version
/conf/spark-defaults.conf/opt/mapr/hive/hive-
hive-version
/conf/hive-site.xml/opt/mapr/hive/hive-
hive-version
/conf/hive-env.sh/opt/mapr/hadoop/hadoop-
hadoop-version
/etc/hadoop/yarn-site.xml/opt/mapr/hadoop/hadoop-
hadoop-version
/etc/hadoop/mapred-site.xml/usr/local/unravel/etc/unravel.properties
It saves a copy of each original configuration file in the same directory. The copies are is named
*.preunravel
. For example,/opt/mapr/hive/hive-1.2/conf/hive-site.xml.preunravel
.
Once the files are present on the Unravel host, you can compress them with the tar command and distribute them to other hosts, if that is more convenient than running the script. All instrumented nodes must be able to open port 4043 of Unravel Server (host2 if multi-host Unravel install).
2. For Oozie, copy Unravel Hive Hook and BTrace JARs to the HDFS shared library path
Copy the Hive Hook JAR in /usr/local/unravel_client/
and the metrics JARs in /usr/local/unravel-agent/
to the shared lib path specified by oozie.libpath
. If you don't do this, jobs controlled by Oozie 2.3+ will fail.
3. Confirm that Unravel Web UI shows additional data
Run a Hive job using a test script provided by Unravel Server.
Tip
This is where you can see the effects of the instrumentation setup. Best practice is to run this test script on Unravel Server rather than on a gateway/edge/client node. That way you can verify that instrumentation is working first, and then enable instrumentation on other gateway/edge/client nodes.
Note
username
must be a user that can create tables in the default database. If you need to use a different database, copy the script and edit it to change the target database.
This script creates a uniquely named table in the default database, adds some data, runs a Hive query on it, and then deletes the table.
It runs the query twice using different workflow tags so you can clearly see the two different runs of the same workflow in Unravel UI.
sudo -u username
/usr/local/unravel/install_bin/hive_test_simple.sh
4. Confirm and adjust the settings in yarn-site.xml
Check specific properties in /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/yarn-site.xml
to be sure that these settings are present:
yarn.resourcemanager.webapp.address
<property> <name>yarn.resourcemanager.webapp.address</name> <value>
your-resource-manager-webapp-ip-address
:8088</value> <source>yarn-site.xml</source> </property>yarn.log-aggregation-enable
<property> <name>yarn.log-aggregation-enable</name> <value>true</value> <description>For log aggregations</description> </property>
5. Enable additional instrumentation on other hosts in the cluster
Run the shell script
unravel_mapr_setup.sh
on each node of the cluster, just like you ran it on Unravel Server, above.Copy the newly edited
yarn-site.xml
, to all nodes.Do a rolling restart of HiveServer2.
Note
To instrument more servers, you can use the setup script we provide or see the effect it has and replicate that effect using your own automated provisioning system. If you already have a way to customize and deploy
hive-site.xml
,yarn-site.xml
, and user defined function JARs, you can add the changes and JAR from Unravel to your existing mechanism.
6. Enable instrumentation manually
Enable instrumentation manually by updating hive-site.xml
, hive-env.sh
, spark-defaults.conf
, hadoop-env.sh
, mapred-site.xml
, and tez-site.xml
, as explained below.
Note
Once the files are updated on the Unravel host, you can use the scp command to copy them to other hosts. Back up your original files in case you need to roll back changes. In all cases, instrumented nodes must be able to open port 4043 of Unravel Server (host2 if multi-host Unravel install).
Update
hive-site.xml
.Append the contents of
/usr/local/unravel/hive-hook/hive-site.xml.snip
to/opt/mapr/hive/hive-
hive-version
/conf/hive-site.xml
right before</configuration>
.<property> <name>com.unraveldata.host</name> <value>
unravel-host
</value> <description>Unravel hive-hook processing host</description> </property> <property> <name>com.unraveldata.hive.hook.tcp</name> <value>true</value> </property> <property> <name>com.unraveldata.hive.hdfs.dir</name> <value>/user/unravel/HOOK_RESULT_DIR</value> <description>destination for hive-hook, Unravel log processing</description> </property> <property> <name>hive.exec.driver.run.hooks</name> <value>com.unraveldata.dataflow.hive.hook.UnravelHiveHook</value> <description>for Unravel, from unraveldata.com</description> </property> <property> <name>hive.exec.pre.hooks</name> <value>com.unraveldata.dataflow.hive.hook.UnravelHiveHook</value> <description>for Unravel, from unraveldata.com</description> </property> <property> <name>hive.exec.post.hooks</name> <value>com.unraveldata.dataflow.hive.hook.UnravelHiveHook</value> <description>for Unravel, from unraveldata.com</description> </property> <property> <name>hive.exec.failure.hooks</name> <value>com.unraveldata.dataflow.hive.hook.UnravelHiveHook</value> <description>for Unravel, from unraveldata.com</description> </property> </configuration>Update
hive-env.sh
.In
/opt/mapr/hive/hive-
, append these lines:hive-version
/conf/hive-env.shexport AUX_CLASSPATH=${AUX_CLASSPATH}:/usr/local/unravel_client/unravel-hive-
hive-version
-hook.jar export HIVE_AUX_JARS_PATH=${HIVE_AUX_JARS_PATH}:/usr/local/unravel_clientUpdate
spark-defaults.conf
.In
/opt/mapr/spark/spark-
, append these lines:spark-version
/conf/spark-defaults.confspark.unravel.server.hostport
unravel-host
:4043 spark.eventLog.dir maprfs:///apps/spark // the following is one line spark.history.fs.logDirectory maprfs:///apps/spark spark.driver.extraJavaOptions -javaagent:/usr/local/unravel-agent/jars/btrace-agent.jar=libs=spark-spark-version
,config=driver // the following is one line spark.executor.extraJavaOptions -javaagent:/usr/local/unravel-agent/jars/btrace-agent.jar=libs=spark-spark-version
,config=executorUpdate
hadoop-env.sh
.In
/opt/mapr/hadoop/hadoop-<HADOOP_VERSION_X.Y.Z>/etc/hadoop/hadoop-env.sh
, append these lines:export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/usr/local/unravel_client/unravel-hive-
<hive-version>
-hook.jarUpdate
mapred-site.xml
.In
/opt/mapr/hadoop/hadoop-
hadoop-version
/etc/hadoop/mapred-site.xml
, append these lines:<property> <name>mapreduce.task.profile</name> <value>true</value> </property> <property> <name>mapreduce.task.profile.maps</name> <value>0-5</value> </property> <property> <name>mapreduce.task.profile.reduces</name> <value>0-5</value> </property> <property> <name>mapreduce.task.profile.params</name> <value>-javaagent:/usr/local/unravel-agent/jars/btrace-agent.jar=libs=mr -Dunravel.server.hostport=
unravel-host
:4043</value> </property> <property> <name>yarn.app.mapreduce.am.command-opts</name> <value>-javaagent:/usr/local/unravel-agent/jars/btrace-agent.jar=libs=mr -Dunravel.server.hostport=unravel-host
:4043</value> </property>Note
Make sure the original value of
yarn.app.mapreduce.am.command-opts
is preserved, by appending the Java agent setup rather than replacing the original value.Update
tez-site.xml
.In
/opt/mapr/tez/
, append these lines:tez-version
/conf/tez-site.xml<property> <name>tez.task.launch.cmd-opts</name> <value>-javaagent:/usr/local/unravel-agent/jars/btrace-agent.jar=libs=mr,config=tez -Dunravel.server.hostport=
unravel-host
:4043</value> <description /> </property> <property> <name>tez.am.launch.cmd-opts</name> <value>-javaagent:/usr/local/unravel-agent/jars/btrace-agent.jar=libs=mr,config=tez -Dunravel.server.hostport=unravel-host
:4043</value> <description /> </property>