AutoActions
The AutoActions automate the monitoring of your compute cluster by allowing you to define complex, actionable policies for different cluster metrics. You can set the AutoAction policy to generate alerts for Databricks clusters and the job runs that exceed the cost and duration threshold.
For example, you can use an AutoAction to:
Apps: Notify you about a situation that requires manual intervention, such as resource contention or stuck jobs.
Clusters: Monitor Databricks clusters based on cost and duration.
Consider a scenario where the DevOps monitors clusters. The Unravel server processes AutoActions by:
Collecting various metrics from the cluster
Aggregating the metrics according to user-defined triggers (rule) and scope
Detecting violations
Executing the defined actions for each rule violation. Each rule consists of the:
Trigger Conditions are the settings that determine when an AutoAction is violated.
For example, jobs running for more than X hours or clusters costing more than $X.
Scope Define refinements to various scopes.
For example, the app is owned by User X on Workspace Y.
Actions Define the automated actions when user-defined trigger conditions are met. For example, email User X when the trigger conditions are met.
DevOps can quickly take action to control costs by terminating the cluster if necessary. Using Unravel to detect and send alerts allows for real-time monitoring and cost management.