Monitoring individual Spark apps
This topic explains how to set up per-appmonitoring of Spark (also called "dev mode"). This is different from cluster-wide monitoring. To monitor individual Spark apps, you must submit them through spark-submit.
The information here applies to Spark versions 1.5.x through 3.0.x.
Note
Spark 3.0 version is supported from Unravel version v4.6.1.6 onwards.
unravel-host must be a fully qualified domain name or IP address.
- Get Unravel's Spark sensor. - The sensor is included in the Unravel Server RPM installation. After installing the Unravel Server RPM on - unravel-host, obtain the sensor either from the file system on the Unravel Server host (- /usr/local/unravel/webapps/ROOT/hh/unravel-agent-pack-bin.zip), or from- http://.- unravel-host:3000/hh/unravel-agent-pack-bin.zip
- If you run Spark apps in YARN-cluster mode (default): - Put the sensor on the host node(s) from which you will run spark-submit by first creating a destination directory that is readable by all users. - Tip- We suggest that - unravel-sensor-pathbe- /usr/local/unravel-spark.- If - spark-submitis used from a single client node:- mkdir - unravel-sensor-pathcd- unravel-sensor-pathwget http://- unravel-host:3000/hh/unravel-agent-pack-bin.zip
- If - spark-submitis used from multiple client nodes, copy the sensor .zip file to HDFS instead of copying it to every client node, and set- UNRAVEL_SENSOR_PATHaccordingly. For example, copy it to- hdfs:///tmp:- mkdir - unravel-sensor-pathcd- unravel-sensor-pathwget http://- unravel-host:3000/hh/unravel-agent-pack-bin.zip cd- unravel-sensor-pathhdfs fs -copyFromLocal unravel-agent-pack-bin.zip /tmp set UNRAVEL_SENSOR_PATH="hdfs:///tmp"
 
- Define - spark.driver.extraJavaOptionsand- spark.executor.extraJavaOptionsas part of your spark-submit command.- Substitute your local values for: - unravel-sensor-path: Parent directory of the Unravel Sensor .zip file,- unravel-agent-pack-bin.zip. If you put this file on HDFS,- unravel-sensor-pathis the parent directory on HDFS.
- unravel-host-ip-port: IP address and port of the- service in the format- ip:port. The default port is 4043. Sample value:- 10.0.0.142:4043.
- spark-event-log-dir: Location of the event log directory on HDFS, S3, or local file system. If a remote address is used, include the name node IP address and port.
- spark-sample-jar-path: Absolute path to the jar file used in the spark-submit command.
- spark-version: Spark version to be instrumented. Valid options are 1.5 for Spark 1.5.x, 1.6 for Spark 1.6.x, 2.0 for Spark 2.0.x, 2.1 for Spark 2.1.x, 2.2 for Spark 2.2.x, 2.3 for Spark 2.3.x, 2.4 for Spark 2.4.x and 3.0 for Spark 3.0.x.
 - export UNRAVEL_SENSOR_PATH= - unravel-sensor-pathexport UNRAVEL_SERVER_IP_PORT=- unravel-host-ip-portexport SPARK_EVENT_LOG_DIR=- spark-event-log-direxport PATH_TO_SPARK_EXAMPLE_JAR=- spark-sample-jar-pathexport SPARK_VERSION=- spark-versionexport ENABLED_SENSOR_FOR_DRIVER="spark.driver.extraJavaOptions=-javaagent:unravel-agent-pack-bin.zip/btrace-agent.jar=libs=spark-$SPARK_VERSION,config=driver" export ENABLED_SENSOR_FOR_EXECUTOR="spark.executor.extraJavaOptions=-javaagent:unravel-agent-pack-bin.zip/btrace-agent.jar=libs=spark-$SPARK_VERSION,config=executor" spark-submit \ --class org.apache.spark.examples.sql.RDDRelation \ --master yarn-cluster \ --archives $UNRAVEL_SENSOR_PATH/unravel-agent-pack-bin.zip \ --conf "$ENABLED_SENSOR_FOR_DRIVER" \ --conf "$ENABLED_SENSOR_FOR_EXECUTOR" \ --conf "spark.unravel.server.hostport=$UNRAVEL_SERVER_IP_PORT" \ --conf "spark.eventLog.dir=${SPARK_EVENT_LOG_DIR}" \ --conf "spark.eventLog.enabled=true" \ $PATH_TO_SPARK_EXAMPLE_JAR
 
- If you run Spark apps in YARN-client mode: - To intercept Spark apps running in - yarn-clientmode, you need to unzip the Unravel Sensor .zip file on the client node at a location readable by all users, referred to as- unzipped-archive-destbelow. We suggest- /usr/local/unravel-spark.- Important- Please keep the original - unravel-agent-pack-bin.zipfile inside- unzipped-archive-dest- If you use multiple hosts as clients, on each client. - mkdir - unzipped-archive-destcd- unzipped-archive-destwget http://- UNRAVEL_HOST_IP:3000/hh/unravel-agent-pack-bin.zip unzip unravel-agent-pack-bin.zip
- Define spark.executor.extraJavaOptions as part of your spark-submit command. - To use the example below, substitute your local values for: - unzipped-archive-dest: directory of the unzipped Unravel Sensor files.
- unravel-host-ip-port: IP address and port of the- log_receiverservice in the format- ip:port. Port is 4043 by default. Sample value:- 10.0.0.142:4043.
- spark-event-log-dir: Location of the event log directory on HDFS, S3, or local file system. If a remote address is used, include the namenode IP address and port.
- spark-sample-jar-path: Absolute path to the jar file used in the- spark-submitcommand.
- spark-version: Spark version to be instrumented. Valid options are 1.5 for Spark 1.5.x, 1.6 for Spark 1.6.x, 2.0 for Spark 2.0.x, 2.1 for Spark 2.1.x, 2.2 for Spark 2.2.x, 2.3 for Spark 2.3.x, 2.4 for Spark 2.4.x and 3.0 for Spark 3.0.x.
 - export UNZIPPED_ARCHIVE_DEST= - unzipped-archive-destexport UNRAVEL_SERVER_IP_PORT=- unravel-host-ip-portexport SPARK_EVENT_LOG_DIR=- spark-event-log-direxport PATH_TO_SPARK_EXAMPLE_JAR=- spark-sample-jar-pathexport SPARK_VERSION=- spark-versionexport ENABLED_SENSOR_FOR_EXECUTOR="spark.executor.extraJavaOptions=-javaagent:unravel-agent-pack-bin.zip/btrace-agent.jar=libs=spark-$SPARK_VERSION,config=executor" spark-submit \ --class org.apache.spark.examples.sql.RDDRelation \ --master yarn-client \ --archives $UNZIPPED_ARCHIVE_DEST/unravel-agent-pack-bin.zip \ --driver-java-options "-javaagent:/opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar=config=driver,libs=spark-$SPARK_VERSION" \ --conf "$ENABLED_SENSOR_FOR_EXECUTOR" \ --conf "spark.unravel.server.hostport=$UNRAVEL_SERVER_IP_PORT" \ --conf "spark.eventLog.dir=${SPARK_EVENT_LOG_DIR}" \ --conf "spark.eventLog.enabled=true" \ $PATH_TO_SPARK_EXAMPLE_JAR