Skip to main content

Home

Part 2: Enabling additional instrumentation

This topic explains how to configure Unravel to retrieve additional data from Hive, Tez, Spark and Oozie, such as Hive queries, application timelines, Spark jobs, YARN resource management data, and logs. You'll do this by generating Unravel's JARs and distributing them to every node that runs queries in the cluster. Later, after JARs are distributed to the nodes, you'll integrate Hive, Tez, and Spark data with Unravel.

1. Generate and distribute Unravel's Hive Hook and Spark Sensor JARs
2. Configure Ambari to work with Unravel
  1. Hive configurations

    1. Hive

      Click Hive > Configs > Advanced > Advanced hive-env. In the hive-env template, towards the end of line, add:

      export AUX_CLASSPATH=${AUX_CLASSPATH}:/usr/local/unravel-jars/unravel-hive-1.2.0-hook.jar 
      ambari-hive-env-aux-classpath.png
    2. Hive Hook

      In Ambari's general properties, append ,com.unraveldata.dataflow.hive.hook.UnravelHiveHook, to the following properties:

      Important

      Be sure to append with no space before or after the comma, for example, property=existingValue,newValue

      hive.exec.failure.hooks
      hive.exec.post.hooks
      hive.exec.pre.hooks

      For example, com.unraveldata.dataflow.hive.hook.UnravelHiveHook

      hive.exec.failure.hooks=existing-value,com.unraveldata.dataflow.hive.hook.UnravelHiveHook
      hive.exec.post.hooks=existing-value,com.unraveldata.dataflow.hive.hook.UnravelHiveHook
      hive.exec.pre.hooks=existing-value,com.unraveldata.dataflow.hive.hook.UnravelHiveHook
      
      ambari-hive-hook.png
    3. Custom

      In Ambari's custom hive-site editor set com.unraveldata.host: to unravel-gateway-internal-IP-hostname

      For example,

      ambari-hive-hook-hive-site.png
    4. Optional: Hive LLAP if it is enabled

      Tip

      Edit hive-site.xml manually, not through Ambari Web UI.

      1. Copy the settings in Custom hive-interactive-site and paste them into /etc/hive/conf/hive-site.xml.

      2. Copy the settings in Advanced hive-interactive-env and paste them into /etc/hive/conf/hive-site.xml.

    Notice

    If you have an Unravel version older than 4.5.1.0, create HDFS Hive Hook directories for Unravel:

    hdfs dfs -mkdir -p /user/unravel/HOOK_RESULT_DIR
    hdfs dfs -chown unravel:hadoop /user/unravel/HOOK_RESULT_DIR
    hdfs dfs -chmod -R 777 /user/unravel/HOOK_RESULT_DIR
  2. Configure HDFS.

    Click HDFS > Configs > Advanced > Advanced hadoop-env. In the hadoop-env template, look for export HADOOP_CLASSPATH and append Unravel's JAR path as shown.

    export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/usr/local/unravel-jars/unravel-hive-1.2.0-hook.jar
    ambari-hadoop-env-hadoop-classpath.png
  3. Configure the BTrace agent for Tez

    In the tez-site.xml configuration file, append the following Java options to tez.am.launch.cmd-opts and tez.task.launch.cmd-opts:

    -javaagent:/usr/local/unravel-jars/btrace-agent.jar=libs=mr,config=tez -Dunravel.server.hostport=unravel-host:4043

    Tip

    In a Kerberos environment you need to modify tez.am.view-acls property with the "run as" user or *.

  4. Configure the Application Timeline Server (ATS).

    Note

    From Unravel v4.6.1.6, this step is not mandatory.

    1. In yarn-site.xml:

      yarn.timeline-service.enabled=true
      yarn.timeline-service.entity-group-fs-store.group-id-plugin-classes=org.apache.tez.dag.history.logging.ats.TimelineCachePluginImpl
      yarn.timeline-service.version=1.5 or yarn.timeline-service.versions=1.5f,2.0f
    2. If yarn.acl.enable is true, add unravel to yarn.admin.acl.

    3. In hive-env.sh, add:

      Use ATS Logging: true
    4. In tez-site.xml, add:

      tez.dag.history.logging.enabled=true
      tez.am.history.logging.enabled=true
      tez.history.logging.service.class=org.apache.tez.dag.history.logging.ats.ATSV15HistoryLoggingService
      tez.am.view-acls=unravel-"run-as"-user or *

      Note

      From HDP version 3.1.0 onwards, this Tez configuration must be done manually.

  5. Configure Spark-on-Yarn

    Tip

    For unravel-host, use Unravel Server's fully qualified domain name (FQDN) or IP address.

    1. Add the location of the the Spark JARs.

      Click Spark > Configs > Custom spark-defaults > Add Property and use bulk.png Bulk property add mode, or edit spark-defaults.conf as follows:

      Tip

      • If your cluster has only one Spark 1.X version, spark-defaults.conf is in /usr/hdp/current/spark-client/conf.

      • If your cluster is running Spark 2.X, spark-defaults.conf is in /usr/hdp/current/spark2-client/conf.

      This example uses default locations for Spark JARs. Your environment may vary.

      spark.unravel.server.hostport=unravel-host:4043
      spark.driver.extraJavaOptions=-javaagent:/usr/local/unravel-jars/btrace-agent.jar=config=driver,libs=spark-version
      spark.executor.extraJavaOptions=-javaagent:/usr/local/unravel-jars/btrace-agent.jar=config=executor,libs=spark-version
      spark.eventLog.enabled=true 
    2. Enable Spark streaming.

  6. Configure Oozie

3. Configure the Unravel Host

Define the following properties in /usr/local/unravel/etc/unravel.properties. If you do not find the properties add them.

  1. Tez.

  2. Set these if the Application Timeline Server (ATS) requires authentication.

4. Optional: Confirm that Unravel UI shows Tez data.
  1. Run /usr/local/unravel/install_bin/hive_test_simple.sh on the HDP cluster or on any cloud environment where hive.execution.engine=tez.

  2. Log into Unravel server and go to the Applications page. Check for Tez jobs.

    Unravel UI may take a few seconds to load Tez data.