The Appstore application is designed to monitor and notify administrators about long-running streaming applications. It fetches data from a Spark application index and identifies streaming applications that have been running beyond a user-specified threshold. The notification includes a link to the Unravel UI for further investigation of each application.
Streaming applications are identified based on the type field in Spark hitdocs. The application is classified as a streaming application if the type value matches any of the following:
structured-streaming
spark-streaming
streaming-sql
streaming
Example of a Spark hitdoc:
{ "_index": "app-20240818_24", "_type": "apps", "_id": "app-20240822162019-0000-0822-161547-utt2zva2", "_version": 10404, "_score": 3, "_routing": "app-20240822162019-0000", "_source": { "kind": "spark", "appt": "wfi", "id": "app-20240822162019-0000", "name": "job-284887302010684-run-784605153770294-Job_cluster_sql", "appid": "app-20240822162019-0000-14572172431119038908dbx", "startTime": "2024-08-22T16:20:12.562Z", "finishedTime": "2024-08-22T17:57:20.265Z", "duration": 5827703, "clusterId": "job-284887302010684-run-784605153770294-Job_cluster_sql", "clusterUid": "0822-161547-utt2zva2", "clusterTg": "adb-4202288953632492.12.azuredatabricks.net", "status": "S", "userName": "mjose@unraveldata.com", "queue": "redteam-4795hf", "user": "mjose@unraveldata.com", "wn": "284887302010684_4202288953632492", "wt": "app-20240822162019-0000", "wi": "app-20240822162019-0000-14572172431119038908dbx", "nick": "spark", "totalDfsBytesRead": 0, "totalDfsBytesWritten": 0, "numEvents": 0, "cents": 0.6442849636077881, "aid": "", "userType": "", "inputTables": [], "outputTables": [], "type": "structured-streaming", "db": ["default"], "instances": [], "vcoreSeconds": 0, "memorySeconds": 0, "key": "YARN", "numApps": 0, "numMRJobs": 1430, "mrJobIds": [], "totalMapTasks": 5720, "totalReduceTasks": 0, "totalMapSlotDuration": 136506, "totalReduceSlotDuration": 0, "sm": 5720, "km": 0, "fm": 0, "sr": 0, "kr": 0, "fr": 0, "numSparkApps": 1430, "totalSparkTasks": 5720, "ss": 5720, "ks": 0, "fs": 0, "totalSparkSlotDuration": 136506, "shuffleBytesRead": 0, "shuffleBytesWritten": 0, "processingDelay": 0, "totalDelay": 0, "jobId": 284887302010684, "runId": 784605153770294, "runName": "structured streaming job", "wsId": 4202288953632492, "wsName": "redteam-4795hf", "wsInstance": "adb-4202288953632492.12.azuredatabricks.net", "setupDuration": 299000, "cleanupDuration": 0, "clusterType": "AUTOMATED", "dbuCost": 0.2428209, "dbus": 1.6188059 } }
Refer to the App installation instructions
Refer to the Launch instructions
The Appstore application takes the following inputs to configure the notification settings:
Time: The time threshold for filtering long-running applications.
TopX: The number of top long-running applications to include in the email notification.
Email: The recipient email address for notifications.
Once the system is configured and triggered, it generates email notifications with the following details:
Email Log Table:
Email Sent Time: Timestamp of when the email was sent.
To User: Email address of the notification recipient.
TopK: The number of top applications included in the email.
Time: The time threshold for long-running applications.
