Home

Monitoring individual Spark apps

This topic explains how to set up per-appmonitoring of Spark (also called "dev mode"). This is different from cluster-wide monitoring. To monitor individual Spark apps, you must submit them through spark-submit.

The information here applies to Spark versions 1.5.x through 3.0.x.

Note

Spark 3.0 version is supported from Unravel version v4.6.1.6 onwards.

unravel-host must be a fully qualified domain name or IP address.

  1. Get Unravel's Spark sensor.

    The sensor is included in the Unravel Server RPM installation. After installing the Unravel Server RPM on unravel-host, obtain the sensor either from the file system on the Unravel Server host (/usr/local/unravel/webapps/ROOT/hh/unravel-agent-pack-bin.zip), or from http://unravel-host:3000/hh/unravel-agent-pack-bin.zip.

  2. If you run Spark apps in YARN-cluster mode (default):

    1. Put the sensor on the host node(s) from which you will run spark-submit by first creating a destination directory that is readable by all users.

      Tip

      We suggest that unravel-sensor-path be /usr/local/unravel-spark.

      • If spark-submit is used from a single client node:

        mkdir unravel-sensor-path
        cd unravel-sensor-path
        wget http://unravel-host:3000/hh/unravel-agent-pack-bin.zip
      • If spark-submit is used from multiple client nodes, copy the sensor .zip file to HDFS instead of copying it to every client node, and set UNRAVEL_SENSOR_PATH accordingly. For example, copy it to hdfs:///tmp:

        mkdir unravel-sensor-path
        cd unravel-sensor-path
        wget http://unravel-host:3000/hh/unravel-agent-pack-bin.zip
        cd unravel-sensor-path
        hdfs fs -copyFromLocal unravel-agent-pack-bin.zip /tmp
        set UNRAVEL_SENSOR_PATH="hdfs:///tmp"
    2. Define spark.driver.extraJavaOptions and spark.executor.extraJavaOptions as part of your spark-submit command.

      Substitute your local values for:

      • unravel-sensor-path: Parent directory of the Unravel Sensor .zip file, unravel-agent-pack-bin.zip. If you put this file on HDFS, unravel-sensor-path is the parent directory on HDFS.

      • unravel-host-ip-port: IP address and port of the service in the format ip:port. The default port is 4043. Sample value: 10.0.0.142:4043.

      • spark-event-log-dir: Location of the event log directory on HDFS, S3, or local file system. If a remote address is used, include the name node IP address and port.

      • spark-sample-jar-path: Absolute path to the jar file used in the spark-submit command.

      • spark-version: Spark version to be instrumented. Valid options are 1.5 for Spark 1.5.x, 1.6 for Spark 1.6.x, 2.0 for Spark 2.0.x, 2.1 for Spark 2.1.x, 2.2 for Spark 2.2.x, 2.3 for Spark 2.3.x, 2.4 for Spark 2.4.x and 3.0 for Spark 3.0.x.

      export UNRAVEL_SENSOR_PATH=unravel-sensor-path
      export UNRAVEL_SERVER_IP_PORT=unravel-host-ip-port
      export SPARK_EVENT_LOG_DIR=spark-event-log-dir
      export PATH_TO_SPARK_EXAMPLE_JAR=spark-sample-jar-path
      export SPARK_VERSION=spark-version
      
      export ENABLED_SENSOR_FOR_DRIVER="spark.driver.extraJavaOptions=-javaagent:unravel-agent-pack-bin.zip/btrace-agent.jar=libs=spark-$SPARK_VERSION,config=driver"
      
      export ENABLED_SENSOR_FOR_EXECUTOR="spark.executor.extraJavaOptions=-javaagent:unravel-agent-pack-bin.zip/btrace-agent.jar=libs=spark-$SPARK_VERSION,config=executor"
      
      
      spark-submit \
          --class org.apache.spark.examples.sql.RDDRelation \
          --master yarn-cluster \
          --archives $UNRAVEL_SENSOR_PATH/unravel-agent-pack-bin.zip \
          --conf "$ENABLED_SENSOR_FOR_DRIVER" \
          --conf "$ENABLED_SENSOR_FOR_EXECUTOR" \
          --conf "spark.unravel.server.hostport=$UNRAVEL_SERVER_IP_PORT" \
          --conf "spark.eventLog.dir=${SPARK_EVENT_LOG_DIR}" \
          --conf "spark.eventLog.enabled=true" \
          $PATH_TO_SPARK_EXAMPLE_JAR
  3. If you run Spark apps in YARN-client mode:

    To intercept Spark apps running in yarn-client mode, you need to unzip the Unravel Sensor .zip file on the client node at a location readable by all users, referred to as unzipped-archive-dest below. We suggest /usr/local/unravel-spark.

    Important

    Please keep the original unravel-agent-pack-bin.zip file inside unzipped-archive-dest

    1. If you use multiple hosts as clients, on each client.

      mkdir unzipped-archive-dest
      cd unzipped-archive-dest
      wget http://UNRAVEL_HOST_IP:3000/hh/unravel-agent-pack-bin.zip
      unzip unravel-agent-pack-bin.zip
    2. Define spark.executor.extraJavaOptions as part of your spark-submit command.

      To use the example below, substitute your local values for:

      • unzipped-archive-dest: directory of the unzipped Unravel Sensor files.

      • unravel-host-ip-port: IP address and port of the log_receiver service in the format ip:port. Port is 4043 by default. Sample value: 10.0.0.142:4043.

      • spark-event-log-dir: Location of the event log directory on HDFS, S3, or local file system. If a remote address is used, include the namenode IP address and port.

      • spark-sample-jar-path: Absolute path to the jar file used in the spark-submit command.

      • spark-version: Spark version to be instrumented. Valid options are 1.5 for Spark 1.5.x, 1.6 for Spark 1.6.x, 2.0 for Spark 2.0.x, 2.1 for Spark 2.1.x, 2.2 for Spark 2.2.x, 2.3 for Spark 2.3.x, 2.4 for Spark 2.4.x and 3.0 for Spark 3.0.x.

      export UNZIPPED_ARCHIVE_DEST=unzipped-archive-dest
      export UNRAVEL_SERVER_IP_PORT=unravel-host-ip-port
      export SPARK_EVENT_LOG_DIR=spark-event-log-dir
      export PATH_TO_SPARK_EXAMPLE_JAR=spark-sample-jar-path
      export SPARK_VERSION=spark-version
      
      export ENABLED_SENSOR_FOR_EXECUTOR="spark.executor.extraJavaOptions=-javaagent:unravel-agent-pack-bin.zip/btrace-agent.jar=libs=spark-$SPARK_VERSION,config=executor"
      
      spark-submit \
          --class org.apache.spark.examples.sql.RDDRelation \
          --master yarn-client \
          --archives $UNZIPPED_ARCHIVE_DEST/unravel-agent-pack-bin.zip \
              --driver-java-options "-javaagent:/opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar=config=driver,libs=spark-$SPARK_VERSION" \
          --conf "$ENABLED_SENSOR_FOR_EXECUTOR" \
          --conf "spark.unravel.server.hostport=$UNRAVEL_SERVER_IP_PORT" \
          --conf "spark.eventLog.dir=${SPARK_EVENT_LOG_DIR}" \
          --conf "spark.eventLog.enabled=true" \
          $PATH_TO_SPARK_EXAMPLE_JAR