Skip to main content

Home

Uploading Spark programs to Unravel

Unravel UI displays Spark programs you upload if they're submitted as Java, Scala, Python, or R source code, not as JVM byte code.

You can upload Spark programs either by uploading individual source files or by uploading a .zip file.

Note

The source file is uploaded by the sensor only if one of the stage name is pointing to any of the source files within the source zip.

Uploading individual source files

Upload Spark source files and specify their location on the spark-submit command.

Examples

Note

The default value of spark.unravel.program.dir is the current directory (the application's home directory).

yarn-client mode (master yarn and deploy-mode client)

Upload the source files to any local directory accessible to the application's driver, and specify their path with --conf "spark.unravel.program.dir=$PROGRAM_DIR" on the spark-submit command:

export PROGRAM_DIR=fully-qualified-path-to-local-file-directory                                       
export PATH_TO_SPARK_EXAMPLE_JAR=fully-qualified-jar-path   

spark-submit \
    --class org.apache.spark.examples.sql.RDDRelation \
    --conf "spark.unravel.program.dir=$PROGRAM_DIR" \
    --deploy-mode client \
    --master yarn \
    $PATH_TO_SPARK_EXAMPLE_JAR

yarn-cluster mode (master yarn and deploy-mode client)

Upload the source files to the application's home directory, and specify their path with --files comma-separated-list-of-source-files on the spark-submit command:

export PROGRAM_DIR=fully-qualified-path-to-local-file-directory                                       
export PATH_TO_SPARK_EXAMPLE_JAR=fully-qualified-jar-path 

spark-submit \
    --class org.apache.spark.examples.sql.RDDRelation \
    --files {comma-separated-list-of-source-files}  \
    --deploy-mode client \
    --master yarn \
    $PATH_TO_SPARK_EXAMPLE_JAR
Uploading an archive

You can upload spark programs by providing either a zip or egg archive (file)

Package all relevant source files into a zip or egg archive. Keep the archive small by including only the relevant driver source files.

Examples

In yarn-client mode, upload the zip archive to any local directory accessible to the application's driver, and specify its path with --conf "spark.unravel.program.zip=$SRC_ZIP_PATH" on the spark-submit command:

export PROGRAM_DIR=/home/user1/spark-examples # Location of source zip
export PATH_TO_SPARK_EXAMPLE_JAR=$PROGRAM_DIR/spark-examples.jar # Example jar file
export SRC_ZIP_PATH=$PROGRAM_DIR/spark-example-src.zip # Full path to the source zip file

spark2-submit \
    --class org.apache.spark.examples.SparkPi \
    --conf "spark.unravel.program.zip=$SRC_ZIP_PATH" \
    --deploy-mode client \
    --master yarn \
    $PATH_TO_SPARK_EXAMPLE_JAR \
    1000

In yarn-cluster mode, upload the zip archive to the application's home directory by specifying its path with --files $SRC_ZIP_PATH and its filename with --conf "spark.unravel.program.zip=src-zip-name" on the spark-submit command:

export PROGRAM_DIR=/home/user1/spark-examples # Location of source zip
export PATH_TO_SPARK_EXAMPLE_JAR=$PROGRAM_DIR/spark-examples.jar # Example jar file
export SRC_ZIP_PATH=$PROGRAM_DIR/spark-example-src.zip # Full path to the source zip file
export SRC_ZIP_NAME=spark-example-src.zip # Name of the source zip file

spark2-submit \
    --class org.apache.spark.examples.SparkPi \
    --files $SRC_ZIP_PATH \
    --conf "spark.unravel.program.zip=$SRC_ZIP_NAME" \
    --deploy-mode cluster \
    --master yarn \
    $PATH_TO_SPARK_EXAMPLE_JAR \
    1000

Tip

Unravel searches for source files in this order:

  • spark.unravel.program.dir (Option 1)

  • Application home directory (Option 1)

  • Zip archive provided as spark.unravel.program.zip (Option 2)

After the Spark application has completed, you can see the Spark program(s) in Unravel UI under Spark Application Manager Program tab. When you click an RDD node, Unravel UI highlights the line of code corresponding to the execution graph of that RDD node. For example, in the screenshot below, Unravel UI highlights the MapPartitionsRDD node at line 324 of QueryDriver.scala.

program.png