Spark

Home

Spark

Property/Description	Unit	Default
com.unraveldata.spark.live.pipeline.enabled Specifies if Unravel should process the live job data coming from sensor or not. true: The live job data will be processed as soon as it is received. false: Live job data will not be processed.	boolean	true
com.unraveldata.spark.live.pipeline.maxStoredStages Maximum number of jobs/stages stored in the DB. If an application has `(# jobs/stages)` > `maxStoredStages` only the last `maxStoredStages` are stored. This setting affects only the live pipeline. When processing the event log file (after the application has completed its execution) this property is not considered.	count	5000
com.unraveldata.spark.master Default spark master mode to be used if not available from Sensor. Possible values: local, standalone or yarn (default)	set member	yarn

Property/Description

Set by user

Unit

Default

com.unraveldata.spark.live.pipeline.enabled

Specifies if Unravel should process the live job data coming from sensor or not. true: The live job data will be processed as soon as it is received. false: Live job data will not be processed.

boolean

true

com.unraveldata.spark.live.pipeline.maxStoredStages

Maximum number of jobs/stages stored in the DB. If an application has (# jobs/stages) > maxStoredStages only the last maxStoredStages are stored.

This setting affects only the live pipeline. When processing the event log file (after the application has completed its execution) this property is not considered.

count

5000

com.unraveldata.spark.master

Default spark master mode to be used if not available from Sensor.

Possible values: local, standalone or yarn (default)

set member

yarn

Property/Description	Unit	Default
com.unraveldata.spark.eventlog.location All the possible locations of the event log files. Multiple locations are supported as a comma separated list of values. This property is used only when the Unravel sensor is not enabled. When the sensor is enabled, the event log path is taken from the application configuration at runtime.	string	`hdfs:///user/spark/applicationHistory/`
com.unraveldata.spark.eventlog.maxSize Maximum size of the event log file that will be processed by the Spark worker daemon. Event logs larger than `MaxSize` will not be processed.	bytes	1000000000 (~1GB)
com.unraveldata.spark.eventlog.appDuration.mins Maximum duration (in minutes) of application to pull Spark event log.	min	1440 (1 day)
com.unraveldata.spark.hadoopFsMulti.useFilteredFiles Specifies how to search the event log files. `true`: prefix search `false`: prefix + suffix search Prefix + suffix search is faster as it avoids listFiles() API which may take a long time for large directories on HDFS. This search requires that all the possible suffixes for the event log files are known. Possible suffixes are specified by com.unraveldata.spark.hadoopFsMulti.eventlog.suffixes..	boolean	false
com.unraveldata.spark.hadoopFsMulti.eventlog.suffixes Specifies suffixes used for prefix+suffix search of the event logs when com.unraveldata.spark.hadoopFsMulti.useFilteredFiles=`false`. NOTE: the empty suffix (,,) be part of this value for uncompressed event log files.	CSL	,,.lz4,.snappy,.inprogres
com.unraveldata.spark.appLoading.maxAttempts Maximum number of attempts for loading the event log file from HDFS/S3/ ADL/WASB etc.	count	3
com.unraveldata.spark.appLoading.delayForRetry Delay used among consecutive retries when loading the event log files. The actual delay is not constant, it increases progressively by 2^attempt * delayForRetry.	ms	2000 (2 s)
com.unraveldata.spark.tasks.inMemoryLimit Number of tasks to be kept in memory and DB per stage. All stats are calculated for all the task attempts but only the configured number of tasks will be kept in memory/DB.	count	1000
com.unraveldata.process.event.log Processes event logs. The default is set to True. If set to False the event logs are not processed.	Boolean	True
Events Related
com.unraveldata.spark.events.enableCaching Enables logic for executing caching events.	boolean	false

Property/Description	Set by user	Unit	Default
com.unraveldata.spark.appLoading.maxConcurrentApps The number of applications Unravel keep metadata in Spark worker daemon memory.		count	5
com.unraveldata.spark.time.histogram Specifies whether the timeline histogram is generated or not. Note: Timeline histogram generation is memory intensive.		boolean	false

Property/Description

Set by user

Unit

Default

com.unraveldata.spark.appLoading.maxConcurrentApps

The number of applications Unravel keep metadata in Spark worker daemon memory.

count

com.unraveldata.spark.time.histogram

Specifies whether the timeline histogram is generated or not.

Note: Timeline histogram generation is memory intensive.

boolean

false

Properties defined in spark-default.conf

Property/Description	Set by user	Unit	Default
spark.shutdown.delay.ms Amount of time to delay shutdown so the last messages are processed (allows Btrace sensor to send all the data before the spark driver exits).		ms	0
com.unraveldata.spark.live.interval.sec This is the interval in seconds after which live application data is updated. It allows for tracking of Spark tasks. The Spark APM updates on Task completion in addition Job start, and Job and Stage completion.		s	60

Property/Description

Set by user

Unit

Default

spark.shutdown.delay.ms

Amount of time to delay shutdown so the last messages are processed (allows Btrace sensor to send all the data before the spark driver exits).

com.unraveldata.spark.live.interval.sec

This is the interval in seconds after which live application data is updated. It allows for tracking of Spark tasks. The Spark APM updates on Task completion in addition Job start, and Job and Stage completion.

In this section:

Would you like to provide feedback? Just click here to suggest edits.

Home

Spark

Search results