Skip to main content

Home

Insights

The insights tab provides a single location to summarize the insights that Unravel aggregates for all unique Jobs. You can gain valuable insights to identify untapped savings, realized cost savings, identify different inefficient events at signature level and many more.

Here are some of the functions you can perform in the insights page:

Additionaly, you can identify inefficiencies such as uneven load in executors, slow garbage collection, excessive I/O activity, executor idle time, node rightsizing needs, and slow tasks in shuffle. You can detect bottlenecks such as resource contention in the driver and slow SQL Spark Operator execution. You can also identify failures, including driver errors due to Out of Memory (OOM) and executor errors due to Out of Memory (OOM).

Let us now explore how the insights page helps you work with the jobs with the help of a few use cases as examples.

The insights you can gain in this page can be of the following categories:

  • Identify jobs based on total cost.

  • View the jobs with the potential for productivity boost.

  • Identify jobs that can provide untapped cost savings.

  • Identify jobs with realized cost savings.

  • Determine the total ROI for the jobs.

Insights in this category highlight bottlenecks in the job performance, such as resource contention in the driver, uneven load detected in executors, slow garbage collection, and excessive I/O activity.

Usecase

  • As a data engineer responsible for optimizing data processing workflows on Databricks, you detect a Slow SQL Spark operator insight within your Databricks data processing workflow, leading to bottlenecks. Using the Insights tab, you can identify the SQL Spark operator causing the slowdown. By implementing Unravel's recommendations, you can improve the efficiency of the SQL Spark operator, leading to faster query execution and reduced resource consumption. With this you can improve the Workflow performance and save your costs.

This category includes insights related to failures in the job execution, such as executor errors due to out-of-memory issues.

Usecase

As a data engineer responsible for managing data processing workflows on Databricks, you can detect failures like out of memory events at the driver or executor level within your Databricks data processing workflow. This failure can lead to job failures and disruptions in workflow execution. By addressing these failures you can ensure the stability and reliability of your data processing workflows on Databricks, minimize downtime and maximize productivity.

Insights in this category point out inefficiencies in the job's execution, such as slow tasks detected in shuffle.

Usecase

  • As a data engineer responsible for optimizing data processing workflows on Databricks, you can identify inefficiencies such as uneven load in executors and slow garbage collection using the Insights tab. By addressing these inefficiencies, you can improve workflow performance, enhance resource utilization, and maximize productivity on Databricks.

  • As a DevOps engineer responsible for maintaining and optimizing data processing infrastructure on Databricks, you can identify excessive I/O activity and executor idle time using the Insights tab. By optimizing data storage configurations to reduce I/O overhead or adjusting resource allocation settings to minimize executor idle time, you can optimize resource utilization, reduce processing costs, and enhance workflow efficiency on Databricks.

This insight category indicates that the job is over-provisioned, meaning that it is using more resources than necessary. This insight suggests potential cost savings by resizing the instances used for the job.

Usecase

As a cloud architect overseeing infrastructure management for data processing on Databricks, you identify an Over-provisioned insight for a node resizing event within your Databricks infrastructure. By resizing the oversized nodes to smaller instances, you can reduce costs, improve resource utilization, and optimize performance on Databricks. he next steps for this insight include information about instance resizing for the driver and worker instances, which can be used to implement the necessary changes.

Viewing the insights
  1. On the Unravel UI, navigate to Workflows > Insights.

  2. From the dropdown list in the top-right corner of the page, select the time period to generate the insights.

    timeperiod.png
  3. From the expandable left pane, use the slider to adjust the filters and click Apply Filters to apply the filters. The insights are generated with the filters applied.

    Filters.png
    • The insights are categorized into Bottlenecks, Failures, Inefficiencies, and Over-provisioned. You can click on each of the categories and filter further for the required insights.

      categories.png

      The following insights are available for each of the categories:

      Bottlenecks

      • Resource contention for CPU

      • Resource contention in Driver

      • Excessive I/O activity detected

      • Executor idle time detected

      • Slow SQL Spark operator detected

      • Slow tasks detected in shuffle

      Failures

      • Driver error due to OOM

      • Executor error due to OOM

      Inefficiencies

      • Data skew detected

      • Uneven load detected in executors

      • Slow garbage collection detected

      • Join condition is inefficient

      • Join type is inefficient

      • Partitions not pruned

      • Inefficient storage format

      • Missing statistics

      • Too many small files scanned

      • Input split size is inefficient

      Over-provisioned

      • Node right sizing is needed

      • Idle Time/Wasted Cost

    • The generated insights are displayed in the form of a table.

      insights.png

      The following fields are available in the table:

      Field

      Description

      Job

      Lists the jobs associated with the insights. Click the job ID to navigate to the Databricks Jobs page. You can also click the copy icon beside the job to copy the job id.

      Workspace

      Lists the workspace associated with the job.

      Started

      Displays the start time of the job.

      User

      Gives the user id associated with the job.

      Runs

      Lists the number of times the job has been run.

      Insight type

      Gives information on the insights available for the job.

      Total Cost (USD)

      Gives information on the total cost for the job in USD.

      Untapped Cost Savings (USD)

      Gives information on the amount of cost that can be saved in USD if action is taken on the insight.

      Realized Cost Savings (USD)

      Gives information on the amount of cost saved till date with action taken on the insight.

      Total ROI (USD)

      Gives an approximation of the total ROI in USD.

      Productivity boost (hrs)

      Gives information on the number of hours of productivity boost achieved with action on the insight.

      Actions

      Click View More to view more details for the job in the Job Run tab.