Spark
Property/Description | Set by user | Unit | Default |
---|---|---|---|
com.unraveldata.spark.live.pipeline.enabled Specifies if Unravel should process the live job data coming from sensor or not. true: The live job data will be processed as soon as it is received. false: Live job data will not be processed. | boolean | true | |
com.unraveldata.spark.live.pipeline.maxStoredStages Maximum number of jobs/stages stored in the DB. If an application has This setting affects only the live pipeline. When processing the event log file (after the application has completed its execution) this property is not considered. | count | 1000 | |
com.unraveldata.spark.master Default spark master mode to be used if not available from Sensor. Possible values: local, standalone or yarn (default) | set member | yarn |
Property/Description | Set by user | Unit | Default |
---|---|---|---|
com.unraveldata.spark.eventlog.location All the possible locations of the event log files. Multiple locations are supported as a comma-separated list of values. This property is used only when the Unravel sensor is not enabled. When the sensor is enabled, the event log path is taken from the application configuration at runtime. | string |
| |
com.unraveldata.spark.eventlog.maxSize Maximum size of the event log file that will be processed by the Spark worker daemon. Event logs larger than | bytes | 1000000000 (~1GB) | |
com.unraveldata.spark.eventlog.appDuration.mins Maximum duration (in minutes) of application to pull Spark event log. | min | 1440 (1 day) | |
com.unraveldata.spark.hadoopFsMulti.useFilteredFiles Specifies how to search the event log files.
Prefix + suffix search is faster as it avoids listFiles() API which may take a long time for large directories on HDFS. This search requires that all the possible suffixes for the event log files are known. Possible suffixes are specified by com.unraveldata.spark.hadoopFsMulti.eventlog.suffixes.. | boolean | false | |
com.unraveldata.spark.hadoopFsMulti.eventlog.suffixes Specifies suffixes used for prefix+suffix search of the event logs when com.unraveldata.spark.hadoopFsMulti.useFilteredFiles= NOTE: the empty suffix (,,) be part of this value for uncompressed event log files. | CSL | ,,.lz4,.snappy,.inprogres | |
com.unraveldata.spark.appLoading.maxAttempts Maximum number of attempts for loading the event log file from HDFS/S3/ ADL/WASB etc. | count | 3 | |
com.unraveldata.spark.appLoading.delayForRetry Delay used among consecutive retries when loading the event log files. The actual delay is not constant, it increases progressively by 2^attempt * delayForRetry. | ms | 2000 (2 s) | |
com.unraveldata.spark.tasks.inMemoryLimit Number of tasks to be kept in memory and DB per stage. All stats are calculated for all the task attempts but only the configured number of tasks will be kept in memory/DB. | count | 1000 | |
Events Related | |||
com.unraveldata.spark.events.enableCaching Enables logic for executing caching events. | boolean | false |
Property/Description | Set by user | Unit | Default |
---|---|---|---|
com.unraveldata.spark.appLoading.maxConcurrentApps The number of applications Unravel keep metadata in Spark worker daemon memory. | count | 5 | |
com.unraveldata.spark.time.histogram Specifies whether the timeline histogram is generated or not. Note: Timeline histogram generation is memory intensive. | boolean | false |
spark-default.conf
Property/Description | Set by user | Unit | Default |
---|---|---|---|
com.unraveldata.spark.shutdown.delay.ms Amount of time to delay shutdown so the last messages are processed (allows Btrace sensor to send all the data before the spark driver exits). | ms | 0 | |
com.unraveldata.spark.live.interval.sec This is the interval in seconds after which live application data is updated. It allows for tracking of Spark tasks. The Spark APM updates on Task completion in addition Job start, and Job and Stage completion. | s | 60 |