Monitoring individual Spark apps
This topic explains how to set up per-appmonitoring of Spark (also called "dev mode"). This is different from cluster-wide monitoring. To monitor individual Spark apps, you must submit them through spark-submit.
The information here applies to Spark versions 1.5.x through 3.0.x.
Note
Spark 3.0 version is supported from Unravel version v4.6.1.6 onwards.
unravel-host must be a fully qualified domain name or IP address.
Get Unravel's Spark sensor.
The sensor is included in the Unravel Server RPM installation. After installing the Unravel Server RPM on
unravel-host, obtain the sensor either from the file system on the Unravel Server host (/usr/local/unravel/webapps/ROOT/hh/unravel-agent-pack-bin.zip), or fromhttp://.unravel-host:3000/hh/unravel-agent-pack-bin.zipIf you run Spark apps in YARN-cluster mode (default):
Put the sensor on the host node(s) from which you will run spark-submit by first creating a destination directory that is readable by all users.
Tip
We suggest that
unravel-sensor-pathbe/usr/local/unravel-spark.If
spark-submitis used from a single client node:mkdir
unravel-sensor-pathcdunravel-sensor-pathwget http://unravel-host:3000/hh/unravel-agent-pack-bin.zipIf
spark-submitis used from multiple client nodes, copy the sensor .zip file to HDFS instead of copying it to every client node, and setUNRAVEL_SENSOR_PATHaccordingly. For example, copy it tohdfs:///tmp:mkdir
unravel-sensor-pathcdunravel-sensor-pathwget http://unravel-host:3000/hh/unravel-agent-pack-bin.zip cdunravel-sensor-pathhdfs fs -copyFromLocal unravel-agent-pack-bin.zip /tmp set UNRAVEL_SENSOR_PATH="hdfs:///tmp"
Define
spark.driver.extraJavaOptionsandspark.executor.extraJavaOptionsas part of your spark-submit command.Substitute your local values for:
unravel-sensor-path: Parent directory of the Unravel Sensor .zip file,unravel-agent-pack-bin.zip. If you put this file on HDFS,unravel-sensor-pathis the parent directory on HDFS.unravel-host-ip-port: IP address and port of theservice in the formatip:port. The default port is 4043. Sample value:10.0.0.142:4043.spark-event-log-dir: Location of the event log directory on HDFS, S3, or local file system. If a remote address is used, include the name node IP address and port.spark-sample-jar-path: Absolute path to the jar file used in the spark-submit command.spark-version: Spark version to be instrumented. Valid options are 1.5 for Spark 1.5.x, 1.6 for Spark 1.6.x, 2.0 for Spark 2.0.x, 2.1 for Spark 2.1.x, 2.2 for Spark 2.2.x, 2.3 for Spark 2.3.x, 2.4 for Spark 2.4.x and 3.0 for Spark 3.0.x.
export UNRAVEL_SENSOR_PATH=
unravel-sensor-pathexport UNRAVEL_SERVER_IP_PORT=unravel-host-ip-portexport SPARK_EVENT_LOG_DIR=spark-event-log-direxport PATH_TO_SPARK_EXAMPLE_JAR=spark-sample-jar-pathexport SPARK_VERSION=spark-versionexport ENABLED_SENSOR_FOR_DRIVER="spark.driver.extraJavaOptions=-javaagent:unravel-agent-pack-bin.zip/btrace-agent.jar=libs=spark-$SPARK_VERSION,config=driver" export ENABLED_SENSOR_FOR_EXECUTOR="spark.executor.extraJavaOptions=-javaagent:unravel-agent-pack-bin.zip/btrace-agent.jar=libs=spark-$SPARK_VERSION,config=executor" spark-submit \ --class org.apache.spark.examples.sql.RDDRelation \ --master yarn-cluster \ --archives $UNRAVEL_SENSOR_PATH/unravel-agent-pack-bin.zip \ --conf "$ENABLED_SENSOR_FOR_DRIVER" \ --conf "$ENABLED_SENSOR_FOR_EXECUTOR" \ --conf "spark.unravel.server.hostport=$UNRAVEL_SERVER_IP_PORT" \ --conf "spark.eventLog.dir=${SPARK_EVENT_LOG_DIR}" \ --conf "spark.eventLog.enabled=true" \ $PATH_TO_SPARK_EXAMPLE_JAR
If you run Spark apps in YARN-client mode:
To intercept Spark apps running in
yarn-clientmode, you need to unzip the Unravel Sensor .zip file on the client node at a location readable by all users, referred to asunzipped-archive-destbelow. We suggest/usr/local/unravel-spark.Important
Please keep the original
unravel-agent-pack-bin.zipfile insideunzipped-archive-destIf you use multiple hosts as clients, on each client.
mkdir
unzipped-archive-destcdunzipped-archive-destwget http://UNRAVEL_HOST_IP:3000/hh/unravel-agent-pack-bin.zip unzip unravel-agent-pack-bin.zipDefine spark.executor.extraJavaOptions as part of your spark-submit command.
To use the example below, substitute your local values for:
unzipped-archive-dest: directory of the unzipped Unravel Sensor files.unravel-host-ip-port: IP address and port of thelog_receiverservice in the formatip:port. Port is 4043 by default. Sample value:10.0.0.142:4043.spark-event-log-dir: Location of the event log directory on HDFS, S3, or local file system. If a remote address is used, include the namenode IP address and port.spark-sample-jar-path: Absolute path to the jar file used in thespark-submitcommand.spark-version: Spark version to be instrumented. Valid options are 1.5 for Spark 1.5.x, 1.6 for Spark 1.6.x, 2.0 for Spark 2.0.x, 2.1 for Spark 2.1.x, 2.2 for Spark 2.2.x, 2.3 for Spark 2.3.x, 2.4 for Spark 2.4.x and 3.0 for Spark 3.0.x.
export UNZIPPED_ARCHIVE_DEST=
unzipped-archive-destexport UNRAVEL_SERVER_IP_PORT=unravel-host-ip-portexport SPARK_EVENT_LOG_DIR=spark-event-log-direxport PATH_TO_SPARK_EXAMPLE_JAR=spark-sample-jar-pathexport SPARK_VERSION=spark-versionexport ENABLED_SENSOR_FOR_EXECUTOR="spark.executor.extraJavaOptions=-javaagent:unravel-agent-pack-bin.zip/btrace-agent.jar=libs=spark-$SPARK_VERSION,config=executor" spark-submit \ --class org.apache.spark.examples.sql.RDDRelation \ --master yarn-client \ --archives $UNZIPPED_ARCHIVE_DEST/unravel-agent-pack-bin.zip \ --driver-java-options "-javaagent:/opt/cloudera/parcels/UNRAVEL_SENSOR/lib/java/btrace-agent.jar=config=driver,libs=spark-$SPARK_VERSION" \ --conf "$ENABLED_SENSOR_FOR_EXECUTOR" \ --conf "spark.unravel.server.hostport=$UNRAVEL_SERVER_IP_PORT" \ --conf "spark.eventLog.dir=${SPARK_EVENT_LOG_DIR}" \ --conf "spark.eventLog.enabled=true" \ $PATH_TO_SPARK_EXAMPLE_JAR