Home

Part 2: Enabling Additional Instrumentation

This topic explains how to configure Unravel to retrieve additional data from Hive, Tez, Spark and Oozie, such as Hive queries, application timelines, Spark jobs, YARN resource management data, and logs.

1. Generate and Distribute Unravel's Hive Hook and Spark Sensor JARs

In order to enable additional instrumentation, first generate Unravel's JARs and distribute them to every edge in the cluster which runs queries. Later, after JARs are distributed to the nodes, we'll walk you through integrating Hive, Tez, and Spark with Unravel.

  1. On Unravel Server, log in as root and confirm that wget is installed.

    yum install -y wget
  2. Run the following commands:

    mkdir /usr/local/unravel-jars
    chmod 775 -R /usr/local/unravel-jars/
    chown root:hadoop /usr/local/unravel-jars/ 
    
    cd /usr/local/unravel/install_bin/cluster-setup-scripts/
    chmod +x /usr/local/unravel/install_bin/cluster-setup-scripts/unravel_hdp_setup.py
    
    sudo python2 unravel_hdp_setup.py --sensor-only --unravel-server unravel-host:3000 --spark-version spark-version --hive-version hive-version --ambari-server ambari-host
    
    cp /usr/local/unravel_client/unravel-hive-1.2.0-hook.jar /usr/local/unravel-jars/
    
    cp -pr /usr/local/unravel-agent/jars/* /usr/local/unravel-jars/
    

    Tip

    For unravel-host, specify the protocol (http or https) and use the fully qualified domain name (FQDN) or IP address of Unravel Server. For example, https://playground3.unraveldata.com:3000.

    Files are installed locally in these two directories:

    • /usr/local/unravel_client (Hive Hook JAR)

    • /usr/local/unravel-agent/jars/ (Resource metrics sensor JARs)

    Make a note of these directories because you'll need to specify them in Ambari.

  3. Distribute /usr/local/unravel_client and /usr/local/unravel-agent/jars/ to all worker, edge, and master nodes which run queries.

    For example,

    scp -r /usr/local/unravel_client/ root@every-cluster-host:/usr/local/unravel-jars
    
    scp -r /usr/local/unravel-agent/ root@every-cluster-host:/usr/local/unravel-jars
  4. On each instrumented node, do the following:

    1. Make sure the node can reach port 4043 of Unravel Server.

    2. Add Unravel's binaries to the *CLASSPATH environment variables.

      export AUX_CLASSPATH=${AUX_CLASSPATH}:/usr/local/unravel-jars/unravel-hive-1.2.0-hook.jar 
      export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/usr/local/unravel-jars/unravel-hive-1.2.0-hook.jar
2. For Oozie, Copy the Hive Hook and BTrace JARs to the HDFS Shared Library Path

Copy the Hive Hook JAR, /usr/local/unravel_client/unravel-hive-1.2.0-hook.jar and the Btrace JAR, /usr/local/unravel-agent/jars/btrace-agent.jar to the HDFS shared library path specified by oozie.libpath. If you don't do this, jobs controlled by Oozie 2.3+ will fail.

3. Add Unravel's Binaries to CLASSPATH in Ambari

Adjust Hive and Hadoop configurations in Ambari to allow these JARs to be picked up by Hive Hook and Hadoop.

  1. Add AUX_CLASSPATH to Ambari's hive-env template:

    • In Ambari, click Hive | Configs | Advanced | Advanced hive-env.

    • Inside the Advanced hive-env template, towards the end of line, add:

      export AUX_CLASSPATH=${AUX_CLASSPATH}:/usr/local/unravel-jars/unravel-hive-1.2.0-hook.jar 
      hdp-part2-image-001.png
  2. Add HADOOP_CLASSPATH to Ambari's hadoop-env template:

    • In Ambari, click HDFS | Configs | Advanced | Advanced hadoop-env.

    • Inside the hadoop-env template, look for export HADOOP_CLASSPATH and append Unravel's JAR path:

      export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/usr/local/unravel-jars/unravel-hive-1.2.0-hook.jar
      hdp-part2-image-002.png
4. Connect Unravel Server to Hive

Warning

Completion of this step requires a restart of all affected Hive services in Ambari Web UI.

  1. In Ambari's general properties, append a comma with no space, followed by com.unraveldata.dataflow.hive.hook.UnravelHiveHook, to the following properties:

    hive.exec.failure.hooks
    hive.exec.post.hooks
    hive.exec.pre.hooks

    For example,

    hive.exec.pre.hooks=existing-value,com.unraveldata.dataflow.hive.hook.UnravelHiveHook
    hive.exec.post.hooks=existing-value,com.unraveldata.dataflow.hive.hook.UnravelHiveHook
    hive.exec.failure.hooks=existing-value,com.unraveldata.dataflow.hive.hook.UnravelHiveHook
    
    ambari-hive-hook.png
  2. In Ambari's custom hive-site editor, do the following:

    1. Add a property, hive.exec.driver.run.hooks, and set its value to hive.exec.driver.run.hook

    2. Set com.unraveldata.host: to unravel-gateway-internal-IP-hostname

    3. Set com.unraveldata.hive.hook.tcp to true

    4. If you have an Unravel version older than 4.5.1.0, set com.unraveldata.hive.hdfs.dir to /user/unravel/HOOK_RESULT_DIR.

    For example,

    ambari-hive-hook-hive-site.png
  3. If LLAP is enabled, copy the above settings in Custom hive-interactive-site to /etc/hive/conf/hive-site.xml.

    Tip

    Edit hive-site.xml manually, not through Ambari Web UI.

  4. If LLAP is enabled, copy the settings from Advanced hive-interactive-env to /etc/hive/conf/hive-site.xml.

    Tip

    Edit hive-site.xml manually, not through Ambari Web UI.

  5. If you have an Unravel version older than 4.5.1.0, create an HDFS Hive hook directory for Unravel:

    hdfs dfs -mkdir -p /user/unravel/HOOK_RESULT_DIR
    hdfs dfs -chown unravel:hadoop /user/unravel/HOOK_RESULT_DIR
    hdfs dfs -chmod -R 777 /user/unravel/HOOK_RESULT_DIR
5. Configure the Application Timeline Server (ATS)

If you're not running Tez, skip this step.

  1. If ATS requires authentication, set these properties iIn /usr/local/unravel/etc/unravel.properties:

  2. In Ambari, add these settings:

    • In yarn-site.xml:

      yarn.timeline-service.enabled: true
      yarn.timeline-service.entity-group-fs-store.group-id-plugin-classes: org.apache.tez.dag.history.logging.ats.TimelineCachePluginImpl
    • If yarn.acl.enable is true, add unravel to yarn.admin.acl.

    • In hive-site.xml, append these values to the following parameters:

      hive.exec.failure.hooks: org.apache.hadoop.hive.ql.hooks.ATSHook
      hive.exec.post.hooks: org.apache.hadoop.hive.ql.hooks.ATSHook
      hive.exec.pre.hooks: org.apache.hadoop.hive.ql.hooks.ATSHook
    • In hive-env.sh, add:

      Use ATS Logging: true
    • In tez-site.xml, add:

      tez.history.logging.service.class: org.apache.tez.dag.history.logging.ats.ATSV15HistoryLoggingService
      tez.am.view-acls: unravel-"run-as"-user or *
  3. In Ambari, restart all services whose components you changed.

6. Connect Unravel Server to Tez

If you're not running Tez, skip this step.

  1. Add these properties to /usr/local/unravel/etc/unravel.properties:

  2. Confirm that hive.execution.engine is set to tez.

    set hive.execution.engine=tez;
  3. Using the Ambari, configure the Btrace agent for Tez:

    • Append the Java options below to tez.am.launch.cmd-opts and tez.task.launch.cmd-opts:

      -javaagent:/usr/local/unravel-agent/jars/btrace-agent.jar=libs=mr,config=tez -Dunravel.server.hostport=unravel-host:4043

      For example,

      tez-am-launch-cmd-opts.png
      tez-task-launch-cmd-opts.png
    • Restart the affected component(s). The screenshot below illustrates this change.

      Note

      In a Kerberos environment you need to modify tez.am.view-acls property with the "run as" user or *.

      hdp-part2-ambari-tez.png
  4. Confirm that Unravel UI shows Tez data.

    1. Run /usr/local/unravel/install_bin/hive_test_simple.sh on the HDP cluster or on any cloud environment where hive.execution.engine=tez.

    2. Check Unravel UI for Tez data.

      Unravel UI may take a few seconds to load Tez data.

  5. In Ambari, restart all affected Hive services.

7. (Optional) Connect Unravel Server to Spark-on-YARN

Warning

Completion of this step requires a restart of all affected Spark services in Ambari UI.

Tip

For unravel-host, use Unravel Server's fully qualified domain name (FQDN) or IP address.

  1. Tell Ambari where the Spark JARs are:

    In Ambari Web UI, on the left side, click Spark | Configs | Custom spark-defaults | Add Property and use bulk.png Bulk property add mode, or edit spark-defaults.conf as follows:

    Tip

    • If your cluster only has one Spark 1.X version, spark-defaults.conf is in /usr/hdp/current/spark-client/conf.

    • If your cluster is running Spark 2.X, spark-defaults.conf is in /usr/hdp/current/spark2-client/conf.

    The example below uses default locations for Spark JARs. Your environment may vary.

    spark.unravel.server.hostport=unravel-host:4043
    spark.driver.extraJavaOptions=-javaagent:/usr/local/unravel-agent/jars/btrace-agent.jar=config=driver,libs=spark-version
    spark.executor.extraJavaOptions=-javaagent:/usr/local/unravel-agent/jars/btrace-agent.jar=config=executor,libs=spark-version
    spark.eventLog.enabled=true
  2. Enable Spark streaming.

    Spark streaming is disabled by default. To enable it, edit spark-defaults.conf as follows.

    Search for spark.driver.extraJavaOptions and set it to the following value. Be sure to substitute the correct version of Spark for spark-version.

    spark.driver.extraJavaOptions=-javaagent:${unravel-sensor-path}/btrace-agent.jar=script=DriverProbe.class:SQLProbe.class:StreamingProbe.class,libs=spark-version

    Note

    The value of spark.driver.extraJavaOptions changes based on whether Spark streaming is enabled or disabled.

    Unravel supports the Spark streaming feature for Spark 1.6.x, 2.0.x, 2.1.x and 2.2.x only.

    Unravel has limited support for Spark apps using the Structured Streaming API introduced in Spark2.

Next Steps

For additional configuration and instrumentation options, see Next steps.