Spark

Home

Spark

Property/Description	Set by user	Unit	Default
com.unraveldata.spark.live.skip.small.jobs When set to `true`, skips the processing of small insignificant Spark jobs to improve Spark worker throughput only for a live pipeline.	Optional	boolean	false
com.unraveldata.spark.live.min.jobs.stored.count Indicates the count of a minimum number of Spark jobs to persist for an application (irrespective of the duration) in a live pipeline. Considered only when com.unraveldata.spark.live.skip.small.jobs is set to `true`.	Optional	count	500
com.unraveldata.spark.live.small.jobs.duration.threshold Indicates the minimum duration threshold of Spark jobs to skip insignificant jobs in a live pipeline. Specify the duration in milliseconds. Considered only when com.unraveldata.spark.live.skip.small.jobs is set to `true`.	Optional	milliseconds	1000
com.unraveldata.spark.live.pipeline.enabled Specifies if Unravel should process the live job data coming from the sensor or not. `true`: The live job data will be processed as soon as it is received. `false`: Live job data will not be processed.	Optional	boolean	true
com.unraveldata.spark.live.pipeline.maxStoredStages A maximum number of jobs/stages is stored in the database. If an application has `(# jobs/stages)` > `maxStoredStages` only the last `maxStoredStages` are stored. This setting affects only the live pipeline. This property is not considered when processing the event log file (after the application has completed its execution).	Optional	count	2500
com.unraveldata.spark.master Default spark master mode to be used if not available from Sensor. Possible values: local, standalone, or yarn (default)	Optional	set member	yarn

Property/Description	Set by user	Unit	Default
com.unraveldata.yarn.log.delayForRetry Delay in milliseconds for consecutive retries when loading the yarn log files.		milliseconds	2000
com.unraveldata.yarn.log.maxAttempts A maximum number of attempts for loading the yarn log files.		count	3

Property/Description

Set by user

Unit

Default

com.unraveldata.yarn.log.delayForRetry

Delay in milliseconds for consecutive retries when loading the yarn log files.

milliseconds

2000

com.unraveldata.yarn.log.maxAttempts

A maximum number of attempts for loading the yarn log files.

count

Property/Description	Unit	Default
com.unraveldata.spark.query.size.max Specify the threshold after which a query can be truncated.	count	200000
com.unraveldata.jobtime.to.apptime.ratio.threshold Defines the threshold of the ratio of jobtime to apptime and is used to determine if SQL analysis of queries should be done for an app.	percent	0.6
com.unraveldata.num.sql.queries.to.analyze This property defines the maximum number of queries to analyze in a spark app.	count	20
spark.unravel.sql.op.timing.enabled This property is a unravel specific Spark conf which can be defined in spark-submit command to enable or disable the capturing of SQL timing data.	boolean	true
com.unraveldata.sqloperator.to.query.ratio.threshold This property defines the ratio of operatorTime/querytime and determines whether the operator is slow.	percent	0.2
com.unraveldata.gctime.to.querytime.ratio.threshold This property defines the ratio of gcTime/querytime and determines if the query is spending significant time in GC.	percent	0.2
com.unraveldata.query.to.app.ratio.threshold This property defines the ratio of queryTime/appTime and identifies the most significant query in the app.	percent	0.2

Property/Description	Unit	Default
com.unraveldata.spark.eventlog.location All the possible locations of the event log files. Multiple locations are supported as a comma-separated list of values. This property is used only when the Unravel sensor is not enabled. When the sensor is enabled, the event log path is taken from the application configuration at runtime.	string	`hdfs:///user/spark/applicationHistory/`
com.unraveldata.spark.eventlog.maxSize Maximum size of the event log file that will be processed by the Spark worker daemon. Event logs larger than `MaxSize` will not be processed.	bytes	1000000000 (~1GB)
com.unraveldata.spark.eventlog.appDuration.mins Maximum duration (in minutes) of application to pull Spark event log.	min	1440 (1 day)
com.unraveldata.spark.hadoopFsMulti.useFilteredFiles Specifies how to search the event log files. `true`: prefix search `false`: prefix + suffix search Prefix + suffix search is faster as it avoids listFiles() API which may take a long time for large directories on HDFS. This search requires that all the possible suffixes for the event log files are known. Possible suffixes are specified by com.unraveldata.spark.hadoopFsMulti.eventlog.suffixes..	boolean	false
com.unraveldata.spark.hadoopFsMulti.eventlog.suffixes Specifies suffixes used for prefix+suffix search of the event logs when com.unraveldata.spark.hadoopFsMulti.useFilteredFiles=`false`. NOTE: the empty suffix (,,) be part of this value for uncompressed event log files.	CSL	,,.lz4,.snappy,.inprogres
com.unraveldata.spark.appLoading.maxAttempts Maximum number of attempts for loading the event log file from HDFS/S3/ ADL/WASB etc.	count	3
com.unraveldata.spark.appLoading.delayForRetry Delay used among consecutive retries when loading the event log files. The actual delay is not constant, it increases progressively by 2^attempt * delayForRetry.	ms	2000 (2 s)
com.unraveldata.spark.tasks.inMemoryLimit Number of tasks to be kept in memory and DB per stage. All stats are calculated for all the task attempts but only the configured number of tasks will be kept in memory/DB.	count	1000

Events Related
com.unraveldata.spark.events.enableCaching Enables logic for executing caching events.	boolean	false

Property/Description	Set by user	Unit	Default
com.unraveldata.spark.appLoading.maxConcurrentApps The number of applications Unravel keep metadata in Spark worker daemon memory.		count	5
com.unraveldata.spark.time.histogram Specifies whether the timeline histogram is generated or not. Note: Timeline histogram generation is memory intensive.		boolean	false

Property/Description

Set by user

Unit

Default

com.unraveldata.spark.appLoading.maxConcurrentApps

The number of applications Unravel keep metadata in Spark worker daemon memory.

count

com.unraveldata.spark.time.histogram

Specifies whether the timeline histogram is generated or not.

Note: Timeline histogram generation is memory intensive.

boolean

false

Properties defined in spark-default.conf

Property/Description	Set by user	Unit	Default
spark.unravel.shutdown.delay.ms Amount of time to delay shutdown so the last messages are processed (allows Btrace sensor to send all the data before the spark driver exits).		ms	300
spark.unravel.live.update.interval.sec This is the interval in seconds after which live application data is updated. It allows for the tracking of Spark tasks. The Spark APM updates on Task completion in addition Job start, and Job and Stage completion.		s	60

Property/Description

Set by user

Unit

Default

spark.unravel.shutdown.delay.ms

Amount of time to delay shutdown so the last messages are processed (allows Btrace sensor to send all the data before the spark driver exits).

300

spark.unravel.live.update.interval.sec

This is the interval in seconds after which live application data is updated. It allows for the tracking of Spark tasks. The Spark APM updates on Task completion in addition Job start, and Job and Stage completion.

Property/Description	Unit	Default
com.unraveldata.job.collector.running.load.conf When set to true Running MR jobs are linked to corresponding Hive app if Hive-on-MR app. AutoAction metrics for running hive queries will be sent to AA2 backend.	boolean	false
com.unraveldata.job.collector.hive.queries.cache.size This is used to improve the Hive-MR pipeline by caching data so it can be retrieved from cache instead of external API. You should not have to change this value.	count	1000
com.unraveldata.max.attempt.log.dir.size.in.bytes Maximum size of the aggregated executor log that are imported and processed by the Spark worker for a successful application.	byte	500000000 (~500 MB)
com.unraveldata.max.failed.attempt.log.dir.size.in.bytes Maximum size of the aggregated executor log that are imported and processed by the Spark worker for a failed application.	byte	2000000000 (~2 GB)
com.unraveldata.min.job.duration.for.attempt.log Minimum duration of a successful application or which executor logs are processed (in milliseconds).	ms	600000 (10 mins)
com.unraveldata.min.failed.job.duration.for.attempt.log Minimum duration of failed/killed application for which executor logs are processed (in milliseconds).	ms	60000
com.unraveldata.attempt.log.max.containers Maximum number of containers for the application. If application has more than configured number of containers then the aggregated executor log is processed for the application.	ms	500
com.unraveldata.spark.master Default master for spark applications. (Used to download executor log using correct APIs.) Valid Options: `yarn`, `mesos`, `standalone`.	string	yarn
com.unraveldata.process.executor.log Set the flag to process the executor logs. `true`: The executor logs are processed. `false`: the executor logs are not processed even using `Load Logs`.	boolean	true

Property/Description	Set by user	Unit	Default
com.unraveldata.s3.profile.config.file.path The path to the s3 profile file, e.g., `/usr/local/unravel/etc/s3ro.properties`.		string	-
com.unraveldata.spark.s3.profilesToBuckets Comma separated list of profile to bucket mappings in the following format: <s3_profile>:<s3_bucket>, for example, com.unraveldata.spark.s3.profileToBuckets=profile-prod:com.unraveldata.dev,profile-dev:com.unraveldata.dev. Note Important Ensure that the profiles defined in the property above are actually present in the s3 properties file and that each profile has associated a corresponding pair of credentials `aws_access_key` and `aws_secret_access_key`. The old format: `access_key/secretKey` is no longer supported.)		CSL	-

Property/Description

Set by user

Unit

Default

com.unraveldata.s3.profile.config.file.path

The path to the s3 profile file, e.g., /usr/local/unravel/etc/s3ro.properties.

string

com.unraveldata.spark.s3.profilesToBuckets

Comma separated list of profile to bucket mappings in the following format: <s3_profile>:<s3_bucket>, for example, com.unraveldata.spark.s3.profileToBuckets=profile-prod:com.unraveldata.dev,profile-dev:com.unraveldata.dev.

Note

Important

Ensure that the profiles defined in the property above are actually present in the s3 properties file and that each profile has associated a corresponding pair of credentials aws_access_key and aws_secret_access_key. The old format: access_key/secretKey is no longer supported.)

CSL

Property/Description	Unit	Default
com.unraveldata.tagging.enabled Enables tagging functionality.	boolean	true
com.unraveldata.tagging.script.enabled Enables tagging.	boolean	false
com.unraveldata.app.tagging.script.path Specifies tagging script path to use when enabled=true.	string (path)	/usr/local/unravel/etc/apptag.py
com.unraveldata.app.tagging.script.method.name The name of the method in the python script that generates the tagging dictionary.	string	generate_unravel_tags

Property/Description	Set by user	Unit	Default
com.unraveldata.hdinsight.storage-account.`X` Storage account name that a HDInsight cluster uses. You must define this property for each storage account. `X` starts with 1 and then is incremented by 1 for each additional account. The account numbers must be consecutive.	Optional	string	Azure storage account name.
com.unraveldata.hdinsight.access-key.`X` Storage account key. For each storage-account.`X` you must define access-key.`X` If you have two access keys, pick one to use here.	Optional	string	Azure storage account key.

Property/Description

Set by user

Unit

Default

com.unraveldata.hdinsight.storage-account.X

Storage account name that a HDInsight cluster uses.

You must define this property for each storage account. X starts with 1 and then is incremented by 1 for each additional account. The account numbers must be consecutive.

Optional

string

Azure storage account name.

com.unraveldata.hdinsight.access-key.X

Storage account key.

For each storage-account.X you must define access-key.X If you have two access keys, pick one to use here.

Optional

string

Azure storage account key.

Property/Description	Set by user	Unit	Default
com.unraveldata.adl.accountFQDN The data lake's fully qualified domain name, for example, mydatalake.azuredatalakestore.net.	Optional	string	Azure storage account name.
com.unraveldata.adl.clientId An application ID. An application registration has to be created in the Azure Active Directory.	Optional	string	Azure application id.
com.unraveldata.adl.clientKey An application access key which can be created after registering an application.	Optional	string	Azure storage access key.
com.unraveldata.adl.accessTokenEndpoint The OAUTH 2.0 Access Token Endpoint. It is obtained from the application registration tab on Azure portal.	Optional	string	Azure OAUTH 2.0 token endpoint
com.unraveldata.adl.clientRootPath The path in the Data lake store where the target cluster has been given access.	Optional	string URL	Azure CONTAINER/DIRECTORY path for storage account name.

In this section:

Would you like to provide feedback? Just click here to suggest edits.

Home

Spark

Important

Search results