Uploading Spark programs to Unravel
Unravel UI displays Spark programs you upload if they're submitted as Java, Scala, Python, or R source code, not as JVM byte code.
You can upload Spark programs either by uploading individual source files or by uploading a .zip file.
Note
The source file is uploaded by the sensor only if one of the stage name is pointing to any of the source files within the source zip.
Uploading individual source files
Upload Spark source files and specify their location on the spark-submit command.
Note
The default value of spark.unravel.program.dir is the current directory (the application's home directory).
yarn-client mode (master yarn and deploy-mode client)
Upload the source files to any local directory accessible to the application's driver, and specify their path with --conf "spark.unravel.program.dir=$PROGRAM_DIR"
on the spark-submit command:
export PROGRAM_DIR=fully-qualified-path-to-local-file-directory
export PATH_TO_SPARK_EXAMPLE_JAR=fully-qualified-jar-path
spark-submit \ --class org.apache.spark.examples.sql.RDDRelation \ --conf "spark.unravel.program.dir=$PROGRAM_DIR" \ --deploy-mode client \ --master yarn \ $PATH_TO_SPARK_EXAMPLE_JAR
yarn-cluster mode (master yarn and deploy-mode client)
Upload the source files to the application's home directory, and specify their path with --files
on the spark-submit command:comma-separated-list-of-source-files
export PROGRAM_DIR=fully-qualified-path-to-local-file-directory
export PATH_TO_SPARK_EXAMPLE_JAR=fully-qualified-jar-path
spark-submit \ --class org.apache.spark.examples.sql.RDDRelation \ --files {comma-separated-list-of-source-files
} \ --deploy-mode client \ --master yarn \ $PATH_TO_SPARK_EXAMPLE_JAR
Uploading an archive
You can upload spark programs by providing either a zip
or egg
archive (file)
Package all relevant source files into a zip
or egg
archive. Keep the archive small by including only the relevant driver source files.
In yarn-client mode, upload the zip archive to any local directory accessible to the application's driver, and specify its path with --conf "spark.unravel.program.zip=$SRC_ZIP_PATH" on the spark-submit command:
export PROGRAM_DIR=/home/user1/spark-examples # Location of source zip export PATH_TO_SPARK_EXAMPLE_JAR=$PROGRAM_DIR/spark-examples.jar # Example jar file export SRC_ZIP_PATH=$PROGRAM_DIR/spark-example-src.zip # Full path to the source zip file spark2-submit \ --class org.apache.spark.examples.SparkPi \ --conf "spark.unravel.program.zip=$SRC_ZIP_PATH" \ --deploy-mode client \ --master yarn \ $PATH_TO_SPARK_EXAMPLE_JAR \ 1000
In yarn-cluster mode, upload the zip archive to the application's home directory by specifying its path with --files $SRC_ZIP_PATH
and its filename with --conf "spark.unravel.program.zip=
on the spark-submit command:src-zip-name
"
export PROGRAM_DIR=/home/user1/spark-examples # Location of source zip export PATH_TO_SPARK_EXAMPLE_JAR=$PROGRAM_DIR/spark-examples.jar # Example jar file export SRC_ZIP_PATH=$PROGRAM_DIR/spark-example-src.zip # Full path to the source zip file export SRC_ZIP_NAME=spark-example-src.zip # Name of the source zip file spark2-submit \ --class org.apache.spark.examples.SparkPi \ --files $SRC_ZIP_PATH \ --conf "spark.unravel.program.zip=$SRC_ZIP_NAME" \ --deploy-mode cluster \ --master yarn \ $PATH_TO_SPARK_EXAMPLE_JAR \ 1000
Tip
Unravel searches for source files in this order:
spark.unravel.program.dir (Option 1)
Application home directory (Option 1)
Zip archive provided as
spark.unravel.program.zip
(Option 2)
After the Spark application has completed, you can see the Spark program(s) in Unravel UI under Spark Application Manager Program tab. When you click an RDD node, Unravel UI highlights the line of code corresponding to the execution graph of that RDD node. For example, in the screenshot below, Unravel UI highlights the MapPartitionsRDD node at line 324 of QueryDriver.scala
.