Insights
The insights tab provides a single location to summarize the insights that Unravel aggregates for all unique Jobs. You can gain valuable insights to identify untapped savings, realized cost savings, identify different inefficient events at signature level and many more.
Here are some of the functions you can perform in the insights page:
Additionaly, you can identify inefficiencies such as uneven load in executors, slow garbage collection, excessive I/O activity, executor idle time, node rightsizing needs, and slow tasks in shuffle. You can detect bottlenecks such as resource contention in the driver and slow SQL Spark Operator execution. You can also identify failures, including driver errors due to Out of Memory (OOM) and executor errors due to Out of Memory (OOM).
Let us now explore how the insights page helps you work with the jobs with the help of a few use cases as examples.
The insights you can gain in this page can be of the following categories:
Identify jobs based on total cost.
View the jobs with the potential for productivity boost.
Identify jobs that can provide untapped cost savings.
Identify jobs with realized cost savings.
Determine the total ROI for the jobs.
Insights in this category highlight bottlenecks in the job performance, such as resource contention in the driver, uneven load detected in executors, slow garbage collection, and excessive I/O activity.
Usecase
As a data engineer responsible for optimizing data processing workflows on Databricks, you detect a Slow SQL Spark operator insight within your Databricks data processing workflow, leading to bottlenecks. Using the Insights tab, you can identify the SQL Spark operator causing the slowdown. By implementing Unravel's recommendations, you can improve the efficiency of the SQL Spark operator, leading to faster query execution and reduced resource consumption. With this you can improve the Workflow performance and save your costs.
This category includes insights related to failures in the job execution, such as executor errors due to out-of-memory issues.
Usecase
As a data engineer responsible for managing data processing workflows on Databricks, you can detect failures like out of memory events at the driver or executor level within your Databricks data processing workflow. This failure can lead to job failures and disruptions in workflow execution. By addressing these failures you can ensure the stability and reliability of your data processing workflows on Databricks, minimize downtime and maximize productivity.
Insights in this category point out inefficiencies in the job's execution, such as slow tasks detected in shuffle.
Usecase
As a data engineer responsible for optimizing data processing workflows on Databricks, you can identify inefficiencies such as uneven load in executors and slow garbage collection using the Insights tab. By addressing these inefficiencies, you can improve workflow performance, enhance resource utilization, and maximize productivity on Databricks.
As a DevOps engineer responsible for maintaining and optimizing data processing infrastructure on Databricks, you can identify excessive I/O activity and executor idle time using the Insights tab. By optimizing data storage configurations to reduce I/O overhead or adjusting resource allocation settings to minimize executor idle time, you can optimize resource utilization, reduce processing costs, and enhance workflow efficiency on Databricks.
This insight category indicates that the job is over-provisioned, meaning that it is using more resources than necessary. This insight suggests potential cost savings by resizing the instances used for the job.
Usecase
As a cloud architect overseeing infrastructure management for data processing on Databricks, you identify an Over-provisioned insight for a node resizing event within your Databricks infrastructure. By resizing the oversized nodes to smaller instances, you can reduce costs, improve resource utilization, and optimize performance on Databricks. he next steps for this insight include information about instance resizing for the driver and worker instances, which can be used to implement the necessary changes.
Viewing the insights
On the Unravel UI, navigate to Workflows > Insights.
From the dropdown list in the top-right corner of the page, select the time period to generate the insights.
From the expandable left pane, use the slider to adjust the filters and click Apply Filters to apply the filters. The insights are generated with the filters applied.
The insights are categorized into Bottlenecks, Failures, Inefficiencies, and Over-provisioned. You can click on each of the categories and filter further for the required insights.
The following insights are available for each of the categories:
Bottlenecks
Resource contention for CPU
Resource contention in Driver
Excessive I/O activity detected
Executor idle time detected
Slow SQL Spark operator detected
Slow tasks detected in shuffle
Failures
Driver error due to OOM
Executor error due to OOM
Inefficiencies
Data skew detected
Uneven load detected in executors
Slow garbage collection detected
Join condition is inefficient
Join type is inefficient
Partitions not pruned
Inefficient storage format
Missing statistics
Too many small files scanned
Input split size is inefficient
Over-provisioned
Node right sizing is needed
Idle Time/Wasted Cost
The generated insights are displayed in the form of a table.
The following fields are available in the table:
Field
Description
Job
Lists the jobs associated with the insights. Click the job ID to navigate to the Databricks Jobs page. You can also click the copy icon beside the job to copy the job id.
Workspace
Lists the workspace associated with the job.
Started
Displays the start time of the job.
User
Gives the user id associated with the job.
Runs
Lists the number of times the job has been run.
Insight type
Gives information on the insights available for the job.
Total Cost (USD)
Gives information on the total cost for the job in USD.
Untapped Cost Savings (USD)
Gives information on the amount of cost that can be saved in USD if action is taken on the insight.
Realized Cost Savings (USD)
Gives information on the amount of cost saved till date with action taken on the insight.
Total ROI (USD)
Gives an approximation of the total ROI in USD.
Productivity boost (hrs)
Gives information on the number of hours of productivity boost achieved with action on the insight.
Actions
Click View More to view more details for the job in the Job Run tab.