Skip to main content

Home

Part 2: Enabling additional instrumentation

This topic explains how to enable additional instrumentation on your gateway/edge/client nodes.

1. Run the setup script

On the Unravel host, run unravel_mapr_setup.py with the following values:

Tip

For unravel-host, use Unravel Server's fully qualified domain name (FQDN) or IP address.

For spark-version, use a Spark version that is compatible with this version of Unravel. For example,

  • spark-2.0.0 for Spark 2.0.x

  • spark-2.1.0 for Spark 2.1.x

  • spark-2.2.0 for Spark 2.2.x

  • spark-2.3.0 for Spark 2.3.x

For hive-version, use a Hive version that is compatible. For example,

  • 1.2.0 for Hive 1.2.0 or 1.2.1

  • 0.13.0 for Hive 0.13.0

For tez-version, use a Tez version that is compatible with this version of Unravel. For example,

  • 0.8 for Tez 0.8

The setup script, unravel_mapr_setup.py, does the following:

  • It puts the Hive Hook JAR in /usr/local/unravel_client/

  • It puts the resource metrics JAR in /usr/local/unravel-agent/

  • For MapR 5.2 or MapR 6.0, it changes the contents of these configuration files:

    /opt/mapr/spark/spark-spark-version/conf/spark-defaults.conf
    /opt/mapr/hive/hive-hive-version/conf/hive-site.xml
    /opt/mapr/hive/hive-hive-version/conf/hive-env.sh
    /opt/mapr/hadoop/hadoop-hadoop-version/etc/hadoop/yarn-site.xml
    /opt/mapr/hadoop/hadoop-hadoop-version/etc/hadoop/mapred-site.xml
    /usr/local/unravel/etc/unravel.properties
  • It saves a copy of each original configuration file in the same directory. The copies are is named *.preunravel. For example, /opt/mapr/hive/hive-1.2/conf/hive-site.xml.preunravel.

Once the files are present on the Unravel host, you can compress them with the tar command and distribute them to other hosts, if that is more convenient than running the script. All instrumented nodes must be able to open port 4043 of Unravel Server (host2 if multi-host Unravel install).

2. For Oozie, copy Unravel Hive Hook and BTrace JARs to the HDFS shared library path

Copy the Hive Hook JAR in /usr/local/unravel_client/ and the metrics JARs in /usr/local/unravel-agent/ to the shared lib path specified by oozie.libpath. If you don't do this, jobs controlled by Oozie 2.3+ will fail.

3. Confirm that Unravel Web UI shows additional data

Run a Hive job using a test script provided by Unravel Server.

Tip

This is where you can see the effects of the instrumentation setup. Best practice is to run this test script on Unravel Server rather than on a gateway/edge/client node. That way you can verify that instrumentation is working first, and then enable instrumentation on other gateway/edge/client nodes.

Note

username must be a user that can create tables in the default database. If you need to use a different database, copy the script and edit it to change the target database.

This script creates a uniquely named table in the default database, adds some data, runs a Hive query on it, and then deletes the table.

It runs the query twice using different workflow tags so you can clearly see the two different runs of the same workflow in Unravel UI.

sudo -u username /usr/local/unravel/install_bin/hive_test_simple.sh
4. Confirm and adjust the settings in yarn-site.xml

Check specific properties in /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/yarn-site.xml to be sure that these settings are present:

  • yarn.resourcemanager.webapp.address

    <property> 
    <name>yarn.resourcemanager.webapp.address</name> 
    <value>your-resource-manager-webapp-ip-address:8088</value>
    <source>yarn-site.xml</source>
    </property>
  • yarn.log-aggregation-enable

    <property>
    <name>yarn.log-aggregation-enable</name> 
    <value>true</value>
    <description>For log aggregations</description>
    </property>
5. Enable additional instrumentation on other hosts in the cluster
  1. Run the shell script unravel_mapr_setup.sh on each node of the cluster, just like you ran it on Unravel Server, above.

  2. Copy the newly edited yarn-site.xml, to all nodes.

  3. Do a rolling restart of HiveServer2.

    Note

    To instrument more servers, you can use the setup script we provide or see the effect it has and replicate that effect using your own automated provisioning system. If you already have a way to customize and deploy hive-site.xml, yarn-site.xml, and user defined function JARs, you can add the changes and JAR from Unravel to your existing mechanism.

6. Enable instrumentation manually

Enable instrumentation manually by updating hive-site.xml, hive-env.sh, spark-defaults.conf, hadoop-env.sh, mapred-site.xml, and tez-site.xml, as explained below.

Note

Once the files are updated on the Unravel host, you can use the scp command to copy them to other hosts. Back up your original files in case you need to roll back changes. In all cases, instrumented nodes must be able to open port 4043 of Unravel Server (host2 if multi-host Unravel install).

  1. Update hive-site.xml.

    Append the contents of /usr/local/unravel/hive-hook/hive-site.xml.snip to /opt/mapr/hive/hive-hive-version/conf/hive-site.xml right before </configuration>.

    <property>
    <name>com.unraveldata.host</name>
    <value>unravel-host</value>
    <description>Unravel hive-hook processing host</description>
    </property>
    <property>
    <name>com.unraveldata.hive.hook.tcp</name>
    <value>true</value>
    </property>
    <property>
    <name>com.unraveldata.hive.hdfs.dir</name>
    <value>/user/unravel/HOOK_RESULT_DIR</value>
    <description>destination for hive-hook, Unravel log processing</description>
    </property>
    <property>
    <name>hive.exec.driver.run.hooks</name>
    <value>com.unraveldata.dataflow.hive.hook.UnravelHiveHook</value>
    <description>for Unravel, from unraveldata.com</description>
    </property>
    <property>
    <name>hive.exec.pre.hooks</name>
    <value>com.unraveldata.dataflow.hive.hook.UnravelHiveHook</value>
    <description>for Unravel, from unraveldata.com</description>
    </property>
    <property>
    <name>hive.exec.post.hooks</name>
    <value>com.unraveldata.dataflow.hive.hook.UnravelHiveHook</value>
    <description>for Unravel, from unraveldata.com</description>
    </property>
    <property>
    <name>hive.exec.failure.hooks</name>
    <value>com.unraveldata.dataflow.hive.hook.UnravelHiveHook</value>
    <description>for Unravel, from unraveldata.com</description>
    </property>
    </configuration>
  2. Update hive-env.sh.

    In /opt/mapr/hive/hive-hive-version/conf/hive-env.sh, append these lines:

    export AUX_CLASSPATH=${AUX_CLASSPATH}:/usr/local/unravel_client/unravel-hive-hive-version-hook.jar 
    export HIVE_AUX_JARS_PATH=${HIVE_AUX_JARS_PATH}:/usr/local/unravel_client
  3. Update spark-defaults.conf.

    In /opt/mapr/spark/spark-spark-version/conf/spark-defaults.conf, append these lines:

    spark.unravel.server.hostport unravel-host:4043 spark.eventLog.dir maprfs:///apps/spark 
    // the following is one line
    spark.history.fs.logDirectory maprfs:///apps/spark spark.driver.extraJavaOptions -javaagent:/usr/local/unravel-agent/jars/btrace-agent.jar=libs=spark-spark-version,config=driver 
    // the following is one line
    spark.executor.extraJavaOptions -javaagent:/usr/local/unravel-agent/jars/btrace-agent.jar=libs=spark-spark-version,config=executor 
  4. Update hadoop-env.sh.

    In /opt/mapr/hadoop/hadoop-<HADOOP_VERSION_X.Y.Z>/etc/hadoop/hadoop-env.sh, append these lines:

    export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/usr/local/unravel_client/unravel-hive-<hive-version>-hook.jar
  5. Update mapred-site.xml.

    In /opt/mapr/hadoop/hadoop-hadoop-version/etc/hadoop/mapred-site.xml, append these lines:

    <property>
    <name>mapreduce.task.profile</name>
    <value>true</value>
    </property>
    <property>
    <name>mapreduce.task.profile.maps</name>
    <value>0-5</value>
    </property>
    <property>
    <name>mapreduce.task.profile.reduces</name>
    <value>0-5</value>
    </property>
    <property>
    <name>mapreduce.task.profile.params</name>
    <value>-javaagent:/usr/local/unravel-agent/jars/btrace-agent.jar=libs=mr -Dunravel.server.hostport=unravel-host:4043</value>
    </property>
    <property>
    <name>yarn.app.mapreduce.am.command-opts</name>
    <value>-javaagent:/usr/local/unravel-agent/jars/btrace-agent.jar=libs=mr -Dunravel.server.hostport=unravel-host:4043</value>
    </property>

    Note

    Make sure the original value of yarn.app.mapreduce.am.command-opts is preserved, by appending the Java agent setup rather than replacing the original value.

  6. Update tez-site.xml.

    In /opt/mapr/tez/tez-version/conf/tez-site.xml, append these lines:

    <property>
      <name>tez.task.launch.cmd-opts</name>
      <value>-javaagent:/usr/local/unravel-agent/jars/btrace-agent.jar=libs=mr,config=tez -Dunravel.server.hostport=unravel-host:4043</value>
      <description />
    </property>
    
    <property>
      <name>tez.am.launch.cmd-opts</name>
      <value>-javaagent:/usr/local/unravel-agent/jars/btrace-agent.jar=libs=mr,config=tez -Dunravel.server.hostport=unravel-host:4043</value>
      <description />
    </property>