Databricks Release Notes
v 4.7.9.6 Release notes
Release information
Release date | August 2024 |
4796 Software | |
Configuration properties |
New features
Insights
This release brings backend improvements that enhance performance, efficiency, and the accuracy of insights across the platform. It brings more cost-saving recommendations focusing on workload rightsizing and optimization. The following insights are included in this release:
Rightsizing for Streaming Workloads:
For fixed-sized clusters, recommendations are provided to optimize cost and performance.
For autoscaling clusters, to ensure efficient resource allocation during scaling events.
Rightsizing Recommendations for Autoscaling Scenarios:
For long-running tasks within a job run, optimizing resource usage throughout task execution.
For parallel executions of tasks within a job, enhancing efficiency for simultaneous processes.
For contention on driver scenarios, reducing resource competition on the driver node to improve performance.
Conservative Savings Model: A new savings model provides cost-saving insights across all categories, even when direct cost bindings are unavailable.
Instance type whitelisting has been implemented to provide enhanced control over the types of instances used for receiving insights, ensuring that only approved instance types are utilized.
Home page
The Top-X page now includes a "Most Wasted Cost" widget, highlighting the costs associated with failed and killed jobs, allowing users to quickly identify and address inefficiencies.
Customer ValueGen app
This release introduces a new Customer ValueGen app with comprehensive workload analysis, ROI computation, and productivity insights for Databricks environments. The app enables users to analyze historical data, benchmark costs, and identify optimization opportunities to drive enhanced cost efficiency and operational improvements. It includes features for evaluating savings potential, monitoring the cost and savings funnel, and running various diagnostics on the status of insights. Additionally, users can conduct productivity boost analysis and mark applications as interesting to focus on key areas for optimization.
Databricks Partner Connect
Unravel Data is now integrated into the Databricks Partner Connect page, allowing end users to directly create trial accounts for greater flexibility.
Data Pipelines and Improved Cost Insights
Enhancements to secondary data pipelines have been made to ensure better performance, improved monitoring, and greater consistency across the platform. Additionally, backend changes on the Cost Explorer page now deliver more accurate insights, providing you with more precise real-time costs in trends, drill-downs, and budget tabs.
UI Updates
This release includes several UI improvements to enhance clarity and readability across the platform. These updates include refined color schemes and label adjustments to ensure better visual contrast and a more intuitive user experience. In addition, sorting and search issues as well as consistency and accuracy problems are addressed, ensuring a more seamless and accurate experience across the platform.
Security, Infrastructure and Compliance
Critical security vulnerabilities related to Public/Internal Services have been addressed, and the handling of encrypted sensitive values has been improved.
Activities related to CentOS 7’s end-of-life in June have been completed.
Configuration management capabilities for non-Kubernetes environments have been enhanced, and the installation and build processes have been streamlined for faster deployments.
The following table contains key issues addressed in the 4.7.9.6 release.
ID | Description |
---|---|
App Store | |
APP-614 | App Store tasks fail to start with SSL enabled on the MySQL database. |
Auto Action | |
DT-3033 | The date-time format in the policy-violation downloaded CSV file is incorrect. |
Billing | |
DT-3186 | The Billed Cost Over Time graph includes the first day of the next month on the X-axis, despite displaying only the data for the selected period. This issue does not affect functionality or data accuracy. |
DT-3171 | The downloaded CSV file name should be suffixed with the month of the year for which billing data is downloaded. |
Cost | |
DT-2981 | There is a significant cost deviation between dbx_cluster and Spark hitdocs. |
DT-2125 | The cost displayed for the "Executor Idle Time Detected" insight is incorrect |
UIX-6305 | The Others category is displayed twice in legends when the number of clusters exceeds 1000 in the Chargeback page. |
DT-3130 | In the Trends tab, the time displayed in the last x days selection has a time value greater than the current time on the trends page. |
Compute | |
DT-2083 | The Total Allocated Key Performance Indicators (KPIs) for Vcore and memory are not visible in the Compute > Trends page. |
All jobs in the running status are displayed in the Finished tab under Job Runs instead of showing only the finished jobs. | |
Home | |
DT-3095 | There is an inconsistency in displaying the 'Wasted Cost' and 'Untapped Savings' values compared to the 'Total Cost' value under each section in the Home -> TopX tab. |
Spark | |
UIX-6567 | The filters are not retained when switching between the Inefficient and All tabs in the Databricks UI. |
DT-4040 | The percentage shown in "Potential Savings per Run" for the Node Downsizing Insight is incorrect. |
DT-3099 | The Task Time and App Time fields are empty for certain jobs. |
DT-2981 | Some cost deviations are observed between Spark hitdocs and Databricks clusters. |
UI | |
UIX-6321 | The Workflow section currently displays jobs running within the specified duration instead of only jobs completed within the selected time frame. |
Unravel Assistant | |
AI-134, AI-138 | The Unravel Assistant provides incorrect responses to certain questions related to productivity and data skew jobs. The team is actively working on resolving this issue. |
Workflows | |
CPLANE-3614 | Audit logs are not getting written in the database. |
The upcoming releases will include the following key fixes to enhance user experience. It is important to note that while these issues exist, there is no immediate critical impact on using the product, and users can continue to utilize its functionality with confidence.
ID | Description |
---|---|
Cost | |
DT-2968, DT-2967 | Incorrect filters applied while redirecting from budgets to chargeback. |
Data | |
DT-3123 | In the Data page section, the default enabled checkbox is incorrectly disabled under tables. |
Home | |
DT-3096, DT-3095, DT-2837 | The Annualized savings under the sections highlighting the Most Savings in the Topx tab, is incorrectly displaying as zero dollars in certain scenarios. |
DT-2920, DT-3060, DT-2821 | Instead of displaying No data present for selected date range when data is unavailable for a selected date range, empty widgets are shown. |
DT-2836 | Wastage cost is incorrectly showing more than the Cluster cost in some cases. |
Insights | |
DT-2006 | Recommendations are provided for a failed pipeline when users utilize multiple tasks with shared job clusters, and one of the tasks fails. |
DT-2125 | The UI shows a cost discrepancy for the Executor Idle time detected insight in the Databricks version 14.2 with Photon enabled. |
DT-3037 | Node right sizing filters are not being applied on the Insights Preview page. |
DT-3012 | The insight preview page displays incorrect costs for job signatures. |
Reports | |
DT-3122 | Incorrect count of events shown on TopX Report. |
Spark | |
DT-3104 | Multiple instances of the same events are being generated for a single Spark application cluster. |
DT-3103, DT-3026 | There is a mismatch in the executor count when garbage collection (GC) events occur. |
SaaS (Free) | |
DT-2037 | In the Databricks Standard (free) environment, there is an issue where the User Flow badge obstructs pagination. |
UI | |
UIX-6281 | The cost comparison for all the instances is not displayed on the Pipeline detail page. |
Workflows | |
DT-2104 | Sorting is incorrect when the list contains both strings starting with capital and small case letters, |
Azure bill integration
Actual bill data is not live; once configured, it will be available from Azure once every day. The cost for a few clusters may be updated after a couple of days in the bill.
One record per cluster per day is maintained, even if the cluster is restarted multiple times within a day or cluster sessions span across multiple days.
If there are issues with Azure billing, the cost data will not be updated on Unravel.
Tags on cost pages come from actual cost data, while tags on the compute page come from Spark configuration. There is a possibility that the tags on these pages don’t match. This issue will be fixed in the upcoming release.
Billing
Some discrepancies may occur in cost calculations due to differences between the user time zone displayed on the Compute page and the UTC-based aggregation on the Billing page. (DT-2350).
In certain scenarios, the budget status may inaccurately display as Ok even when the budget has been exceeded. This discrepancy occurs when Azure billing is enabled and is a known limitation. Notifications for budget can be delayed by two or three days due to the minimum 24-hour delay in receiving bills. (DT-3091)
Compute
Jobs by status graphs in the Trends tab display spark application details and not the job details. Our development is actively looking into this design limitation and efforts are underway to address this in future updates to enhance the product's capabilities. (DT-2008)
Data is inconsistent between Compute page and Cost page in some cases when Azure bill is enabled. This is a known limitation because of Azure bill data not being live. Our team is actively looking into this limitation and efforts are underway to address this issue in the future updates. (DT-3078)
Data
If tables are created with the same name, accessed, deleted, and re-created, and if those tables are re-accessed, then their query and app count do not match. (DATAPAGE-502)
Home
Home page does not display alerts on the UI when there is missing ROI data for a single day. (DT-2509)
Hovering on Total Cost Trend on the Summary tab of the Home page may display inaccurate date information. (DT-2408)
Insights Preview
In some cases, there is a discrepancy in the runs displayed in the Job Runs page for the selected job. When you click on View runs, only runs with
dbx_cluster kind
having a Spark app id are listed. The discrepancy is due to data inconsistency between entries ofdbx_cluster kind
anddb kind
and this is an expected behavior. (DT-3042)Workflows
The current implementation has a limitation where the wrong run count is displayed for the job ID when sorting by run count in the Workflows > Jobs section. This discrepancy is currently under investigation by our development team, and active efforts are being taken to resolve this issue. (UIX-6526)
Our development team is actively investigating the following Known issues and are working towards resolving them. It is important to note that while these issues exist, there is no immediate critical impact on using the product, and users can continue to utilize its functionality with confidence.
Bug ID | Description | Workaround |
---|---|---|
Compute | ||
PIPELINE-1636 | Inconsistent data is displayed for the cluster Duration and Start Time on the Compute page. | NA |
CUSTOMER-3017 | The job duration displayed in the TopX section of the "Longest Running Jobs" on the Job Compute graph is incorrect. | NA |
Cost | ||
UIX-5624 | Data is not displayed when you click the Optimize button corresponding to OTHERS for the Cost > Chargeback results shown in the table. | NA |
DT-1094 | The No data available message is displayed on the Compute page after navigating from the Trends and Chargeback pages with Tag filters. | NA |
Datapage | ||
DATAPAGE-473 | For Hive metastore 3.1.0 or earlier versions, the creation time of partitions is not captured if a partition is created dynamically. Therefore, the Last Day KPI for the partition section is not shown in Unravel. | NA |
Insights | ||
DT-1987 | There is a mismatch in the computation of costs for fleet and spot instances in Databricks clusters. This issue arises due to the unavailability of the exact node type in the cluster info response. | NA |
Performance | ||
ASI-933 | In the Lag setup, the Duration is not updated for running applications. The Duration should be updated every 15 minutes. | NA |
ASI-936 | In the Lag setup, the App Time data is missing in the Timing tab of many applications. | NA |
Spark | ||
PIPELINE-1616 | If the Spark job is not running for Databricks, the values for the Duration and End time fields are not updated on the Databricks Run Details page. | NA |
DT-2012 | Incorrect details are displayed on the AppSummary > Job Run page when a user repairs a previously failed job. The displayed information may not accurately reflect the repaired job's details. | NA |
DT-1742 | The timezone for the NodeRightSizing insight event is inconsistent in the Spark details page. | NA |
DT-2029 | Applications in a success state may inaccurately display an associated job in a running state instead of transitioning to a failed state. | NA |
DT-3122 | The TopX Report displays an incorrect count of events. | |
UI | ||
PIPELINE-1935 | In the Pipeline details page, when you select the data for a specific date, all instances are displayed instead of displaying only the instances within a selected date. | NA |
UIX-6263 | The cross button on the Pipeline details page does not close the detail page when you click the bars inside the Gantt chart. | NA |
Workflows | ||
DT-1461, PIPELINE-1939, PIPELINE-1940, DT-1093, PIPELINE-1924 | The UI and data exhibit inconsistencies, including problems with job run details, issues related to multiple workflow runs and UTC timestamps , empty content in workflow job pages and issues with filter values and duration display. | NA |
Release information
Release Date | October 2024 |
Software Download |
New Features
A new Insight actionability card now offers a streamlined, user-friendly interface that allows users to access remediation steps within two clicks. The Summary, Remediation, and Impacted Runs of this page gives a detailed analysis of the insight and displays the symptoms causing the issues along with the savings potential for each run. This guides users toward optimal job performance and resource utilization.
Improvements
Unravel now supports Databricks Runtime 15.0.
RBAC support enhanced to include Cost Explorer and Job Insights.
The CICD UI has been updated to include Contended Driver Insights under the Code Insights dropdown.
This release includes backend improvements to support Python 3.9, along with Unravel-dependent libraries, in a custom Kestra image.
This release includes backend updates to improve insights performance in Unravel.
Support for migration savings and wasted cost widgets are included in 4796-hotfix.
Note
With each Unravel upgrade, the init script and sensor jars must be updated in all configured workspaces.
Issues Fixed
ID | Description |
---|---|
DT-4303 | The stages mentioned in the insight are missing from the SQL tab. |
DT-4336 | There is an issue with downloading SaaS Azure cost bills. |
v4.7.9.5 Release notes
Release information
Release date | April 2024 |
Hotfix release date | August 2024 |
4795 Software | |
4795 Hotfix Software | |
Configuration properties |
New features
New Insights preview
A new Insights preview tab is introduced in the Workflow page. This tab offers visibility into your job performance, enabling you to identify inefficiencies, bottlenecks, failures, and optimization opportunities with precision. Equipped with detailed metrics and actionable insights, the insights tab enhances workflow efficiency and promotes operational success.
Unravel Assistant
Introducing Unravel Assistant, your AI assistant for optimizing Databricks jobs. The Unravel Assistant is integrated with the Job Insights tab under the Workflow page and provides a convenient way to dive deeper into your workloads and spending to get insights and recommendations from Unravel to improve performance and reduce costs. Unravel Assistant answers questions in plain language, allowing you to effortlessly analyze performance and cost metrics and make informed decisions to enhance productivity and efficiency in your Databricks workflows.
Home page updates
A new Productivity meter is introduced in the Summary tab, offering a detailed assessment of the operational efficiency within your workflow. This meter provides an indication of the productivity levels, ranging from Snail paced to Lightning, allowing you to identify areas for improvement and optimize resource utilization. The Productivity meter is a valuable tool for enhancing operational effectiveness and achieving peak performance across your workflow.
New dashboards are introduced in the Topx tab, helping you to efficiently manage resources by identifying under-utilized clusters and jobs, monitoring long-running processes, and optimizing costs within both All-Purpose Compute and Job Compute categories.
New metrics for Total ROI and Productivity boost are introduced in the Optimize tab, providing insights into the potential return on investment and productivity gains achievable through resource optimization. Additionally, new visuals for Code optimization allow you to visualize potential productivity boosts in hours, allowing you to make informed decisions, enhance performance, and maximize cost savings.
Unravel Billing page update
Unravel's billing page has been updated to introduce the new Pro Plan with Pay-In-Advance (PIA). This latest pricing model offers you the flexibility to purchase credits upfront with discounts. With PIA, you can monitor your purchased credits, opening and closing credit balances, total credit usage, and adjusted credits throughout the month. The transition to PIA ensures that you have access to more accurate billing, aligned with your individual usage patterns and requirements.
Cost Explorer page
Backend changes have been implemented in the Cost Explorer page to improve the accuracy of insights. You now receive more precise real-time costs in your trends, drill-downs, and budget tabs.
The following table contains key issues addressed in the 4.7.9.5 release.
ID | Description |
---|---|
Cost | |
DT-2116 | On the budget tab under the cost page, the search bar is displaying numeric values with the equal sign after selecting filters. |
Home | |
DT-2672 | The trends widget displays an incorrect savings percentage. |
DT-2778 | The ROI pipeline is failing for the last 7 days due to missing data. |
DT-2851 | Incorrect untapped / annualized savings displayed on the Optimize tab. |
Spark | |
UIX-6523 | The Sort by Write feature is currently not functioning as expected in the Spark details page. |
DT-2123 | On the Spark detail page, the date is overlapping with the drop-down icon. |
Workflows | |
CUSTOMER-2712 | RBAC filter is not functioning properly on the workflow - job runs page, job details page, and compute details page. |
DT-2124 | The timing displayed on the workflows page becomes incorrect when a user navigates from the chargeback page to the workflows page. |
UIX-6274 | In the Job Runs tab, the Cost and Duration filter values remain unchanged even after modifying filters in the left panel or switching between the All, Finished, and Running options. |
The upcoming releases will include the following key fixes to enhance user experience. It is important to note that while these issues exist, there is no immediate critical impact on using the product, and users can continue to utilize its functionality with confidence.
ID | Description |
---|---|
Auto Action | |
DT-3033 | The date-time format in the policy-violation downloaded CSV file is incorrect. |
Billing | |
DT-3186 | The Billed Cost Over Time graph includes the first day of the next month on the X-axis, despite displaying only the data for the selected period. This issue does not affect functionality or data accuracy. |
DT-3171 | The downloaded CSV file name should be suffixed with the month of the year for which billing data is downloaded. |
Cost | |
UIX-6305 | The Others category is displayed twice in legends when the number of clusters exceeds 1000 in the Chargeback page. |
DT-3130 | In the Trends tab, the time displayed in the last x days selection has a time value greater than the current time on the trends page. |
DT-2968, DT-2967 | Incorrect filters applied while redirecting from budgets to chargeback. |
Compute | |
DT-2083 | The Total Allocated Key Performance Indicators (KPIs) for Vcore and memory are not visible in the Compute > Trends page. |
UIX-6321 | All jobs in the running status are displayed in the Finished tab under Job Runs instead of showing only the finished jobs. |
Data | |
DT-3123 | In the Data page section, the default enabled checkbox is incorrectly disabled under tables. |
Home | |
DT-3096, DT-3095, DT-2837 | The Annualized savings under the sections highlighting the Most Savings in the Topx tab, is incorrectly displaying as zero dollars in certain scenarios. |
DT-2920, DT-3060, DT-2821 | Instead of displaying No data present for selected date range when data is unavailable for a selected date range, empty widgets are shown. |
DT-2836 | Wastage cost is incorrectly showing more than the Cluster cost in some cases. |
Insights | |
DT-2006 | Recommendations are provided for a failed pipeline when users utilize multiple tasks with shared job clusters, and one of the tasks fails. |
DT-2125 | The UI shows a cost discrepancy for the Executor Idle time detected insight in the Databricks version 14.2 with Photon enabled. |
DT-3037 | Node right sizing filters are not being applied on the Insights Preview page. |
DT-3012 | The insight preview page displays incorrect costs for job signatures. |
Reports | |
DT-3122 | Incorrect count of events shown on TopX Report. |
Spark | |
UIX-6567 | Filters are not retained when switching between the Inefficient and All tabs. |
DT-3104 | Multiple instances of the same events are being generated for a single Spark application cluster. |
DT-3103, DT-3026 | There is a mismatch in the executor count when garbage collection (GC) events occur. |
DT-3099 | the Task Time and App Time fields are empty for certain jobs |
DT-2981 | Some cost deviations are observed between Spark hitdocs and Databricks clusters. |
SaaS (Free) | |
DT-2037 | In the Databricks Standard (free) environment, there is an issue where the User Flow badge obstructs pagination. |
UI | |
UIX-6281 | The cost comparison for all the instances is not displayed on the Pipeline detail page. |
Unravel Assistant | |
AI-134, AI-138 | The Unravel Assistant provides incorrect responses to certain questions related to productivity and data skew jobs. The team is actively working on resolving this issue. |
Workflows | |
DT-2104 | Sorting is incorrect when the list contains both strings starting with capital and small case letters, |
Azure bill integration
Actual bill data is not live; once configured, it will be available from Azure once every day. The cost for a few clusters may be updated after a couple of days in the bill.
One record per cluster per day is maintained, even if the cluster is restarted multiple times within a day or cluster sessions span across multiple days.
If there are issues with Azure billing, the cost data will not be updated on Unravel.
Tags on cost pages come from actual cost data, while tags on the compute page come from Spark configuration. There is a possibility that the tags on these pages don’t match. This issue will be fixed in the upcoming release.
Billing
Some discrepancies may occur in cost calculations due to differences between the user time zone displayed on the Compute page and the UTC-based aggregation on the Billing page. (DT-2350).
In certain scenarios, the budget status may inaccurately display as Ok even when the budget has been exceeded. This discrepancy occurs when Azure billing is enabled and is a known limitation. Notifications for budget can be delayed by two or three days due to the minimum 24-hour delay in receiving bills. (DT-3091)
Compute
Jobs by status graphs in the Trends tab display spark application details and not the job details. Our development is actively looking into this design limitation and efforts are underway to address this in future updates to enhance the product's capabilities. (DT-2008)
Data is inconsistent between Compute page and Cost page in some cases when Azure bill is enabled. This is a known limitation because of Azure bill data not being live. Our team is actively looking into this limitation and efforts are underway to address this issue in the future updates. (DT-3078)
Data
If tables are created with the same name, accessed, deleted, and re-created, and if those tables are re-accessed, then their query and app count do not match. (DATAPAGE-502)
Home
Home page does not display alerts on the UI when there is missing ROI data for a single day. (DT-2509)
Hovering on Total Cost Trend on the Summary tab of the Home page may display inaccurate date information. (DT-2408)
Insights Preview
In some cases, there is a discrepancy in the runs displayed in the Job Runs page for the selected job. When you click on View runs, only runs with
dbx_cluster kind
having a Spark app id are listed. The discrepancy is due to data inconsistency between entries ofdbx_cluster kind
anddb kind
and this is an expected behavior. (DT-3042)Workflows
The current implementation has a limitation where the wrong run count is displayed for the job ID when sorting by run count in the Workflows > Jobs section. This discrepancy is currently under investigation by our development team, and active efforts are being taken to resolve this issue. (UIX-6526)
Our development team is actively investigating the following Known issues and are working towards resolving them. It is important to note that while these issues exist, there is no immediate critical impact on using the product, and users can continue to utilize its functionality with confidence.
Bug ID | Description | Workaround |
---|---|---|
Compute | ||
PIPELINE-1636 | Inconsistent data is displayed for the cluster Duration and Start Time on the Compute page. | NA |
CUSTOMER-3017 | The job duration displayed in the TopX section of the "Longest Running Jobs" on the Job Compute graph is incorrect. | NA |
Cost | ||
UIX-5624 | Data is not displayed when you click the Optimize button corresponding to OTHERS for the Cost > Chargeback results shown in the table. | NA |
DT-1094 | The No data available message is displayed on the Compute page after navigating from the Trends and Chargeback pages with Tag filters. | NA |
Datapage | ||
DATAPAGE-473 | For Hive metastore 3.1.0 or earlier versions, the creation time of partitions is not captured if a partition is created dynamically. Therefore, the Last Day KPI for the partition section is not shown in Unravel. | NA |
Insights | ||
DT-1987 | There is a mismatch in the computation of costs for fleet and spot instances in Databricks clusters. This issue arises due to the unavailability of the exact node type in the cluster info response. | NA |
Performance | ||
ASI-933 | In the Lag setup, the Duration is not updated for running applications. The Duration should be updated every 15 minutes. | NA |
ASI-936 | In the Lag setup, the App Time data is missing in the Timing tab of many applications. | NA |
Spark | ||
PIPELINE-1616 | If the Spark job is not running for Databricks, the values for the Duration and End time fields are not updated on the Databricks Run Details page. | NA |
DT-2012 | Incorrect details are displayed on the AppSummary > Job Run page when a user repairs a previously failed job. The displayed information may not accurately reflect the repaired job's details. | NA |
DT-1742 | The timezone for the NodeRightSizing insight event is inconsistent in the Spark details page. | NA |
DT-2029 | Applications in a success state may inaccurately display an associated job in a running state instead of transitioning to a failed state. | NA |
DT-3122 | The TopX Report displays an incorrect count of events. | |
UI | ||
PIPELINE-1935 | In the Pipeline details page, when you select the data for a specific date, all instances are displayed instead of displaying only the instances within a selected date. | NA |
UIX-6263 | The cross button on the Pipeline details page does not close the detail page when you click the bars inside the Gantt chart. | NA |
Workflows | ||
DT-1461, PIPELINE-1939, PIPELINE-1940, DT-1093, PIPELINE-1924 | The UI and data exhibit inconsistencies, including problems with job run details, issues related to multiple workflow runs and UTC timestamps , empty content in workflow job pages and issues with filter values and duration display. | NA |
Implemented instance type whitelisting for enhanced control over the instance types used for receiving insights.
Enabled Service Principal Name (SPN) with Role-Based Access Control (RBAC) for seamless integration of Azure billing with Unravel.
Formatted DBX downloaded CSV file data to improve usability in configuring Spark jobs.
Redesigned the Diagnostics page UI and refined textual content for improved clarity and usability.
Enhanced authentication flexibility with a secondary NGUI for alternate authentication methods in Unravel support.
Improved readability of insights within Databricks Structured Streaming by formatting performance trend values.
Enhanced data filtering capabilities by synchronizing tag-based and date-based filtering options for improved data navigation.
Enhanced sensitivity of sunburst visualizations to accommodate all applied filters and features, providing comprehensive data insights.
Increased visibility of contended driver savings information on the homepage for improved user awareness and monitoring capabilities.
Optimized homepage user interface elements to improve navigation and user interaction.
Implemented cluster event generation for systems utilizing Azure Active Directory (AAD) authentication.
Enhancement to handle and process billing files based on size for Databricks on Azure.
Bug ID | Description |
---|---|
DT-3144 | Not able to open apps on SAAS setup after the setup is upgraded. |
DT-3662 | Data unavailable on Jobs/Jobs Runs Page and Cluster status displayed as UNKNOWN state in Unravel UI. |
DT-3661 | Data not available on workflows Page whereas on Compute page data is available. |
DT-3656 | NullPointerException in connectors.DbxClusterConnector. |
DT-3632 | Data not available on workflows Page whereas on Compute page data is available. |
DT-3630 | DBX Mission Wired: Streaming data not available for first streaming query of Streaming job. |
DT-3604 | No data available for Workflows Page. |
DT-3603 | No data is shown in cost explorer and home page. |
DT-3505 | Resource utilization (RM data) is missing for all apps. |
DT-3483 | Multiple cluster sessions pointing to same spark application. |
DT-3409 | Node DownSizing Event is not generated for a particular job. |
DT-3408 | Auto-Action notification links are not working. |
DT-3406 | No data available for selected timeframes on the Home page. |
DT-3404 | Drill downs label shown incorrect for 90 days. |
DT-3403 | Unravel LLM app not working. |
DT-3402 | Signature ROI pipeline failing. |
DT-3375 | Issue in reflecting databricks jobs in workflow page as Kafka consumers were down due to 'java.util.ConcurrentModificationException'. |
SUPPORT-3069 | LDAP login fails when the user is part of hundreds of groups. |
SUPPORT-3024 | Billing page is not loading data. |
SUPPORT-3021 | Unravel_sensor taking time to start. |
SUPPORT-3020 | Home Page is not loading. |
SUPPORT-3019 SUPPORT-3018 | ElasticSearch causing stability and resource loading issues. |
PLATFORM-3252 | Cluster metrics not loading after upgrade. |
PIPELINE-2077 | When grouping cluster session by Tag Key, null values showed up on the chargeback page. |
PIPELINE-2070 | Improve Metrics cache performance in insightworker. |
DT-3691 | The Untapped Cost Savings (USD) value exceeds the Total Cost (USD) value for some signatures shown on the Insights Preview page. |
DT-3669 | Document Unravel property to restrict Insights for the Whitelisted instances. |
DT-3661 | Data not available on workflows Page whereas on Compute page data is available. |
DT-3628 | No data available for Workflows Page. |
DT-3600 | No Consumer Topic found for Consumer groups. |
DT-3577 | Analysis API error causing Analysis tab to be empty. |
DT-3573 | Jobs are not shown for some of the insights that are listed under Inefficient tab. |
DT-3562 | Majority of the clusters displayed under the list for 'Top 50 clusters with the selected inefficient events for Node Down Sizing event are not populated with Analysis tab. |
DT-3414 | Add support to the cost pipeline to process daily cost files from local directory |
DT-3410 | Out of order data is not handled properly for billing data generation. |
DT-3391 | Need to disable the migration savings top-X widget. |
DT-3648 | DATABRICKS_RUNTIME_VERSION environment variable is not set. |
v4.7.9.4 Release notes
Release information
Release date | 22 February 2024 |
Software | |
Configuration properties |
Announcements
Postgres upgrade to version 15.5
The bundled Postgres database has been upgraded to version 15.5. This version supports new installations and upgrades on all platforms. If any database errors are encountered during installation or upgrade, please reach out to Unravel support for assistance.
New features
Seamless integration with Azure billing
Unravel has transitioned from using an approximation algorithm to compute costs for all Databricks entities to integrating with Azure billing APIs. Starting this release, integration with Azure billing APIs provides DBU usage, DBU cost, and VM costs that match with the cloud provider values.
New Home page with insights
Home page is introduced in this release which offers detailed insights into your cloud environment.
Easily assess your cloud spending, resource utilization, and potential savings opportunities at a glance.
With intuitive visuals and actionable data, make informed decisions to optimize your cloud resources effectively.
Explore TopX to identify top cost drivers and performance bottlenecks, and delve into optimization strategies to enhance efficiency and drive savings.
New Unravel Billing page
Unravel has introduced a new billing page from this release to support our new pricing model. Previously, Unravel employed a flat pricing model for different compute types, charging customers a fixed rate. However, recognizing the evolving landscape of compute type usage and the need for greater flexibility and accuracy in billing, Unravel has introduced a new pricing strategy aligned with market trends. Unravel now charges our customers a different pricing for each compute type.
UI enhancements in the Cost Explorer page
Experience a more intuitive navigation in the Cost Explorer page previously known as the Cost page. With renamed pages and refined table headers, navigating through cost-related insights is more intuitive than ever. Drill down into your cloud spending data and delve deeper into cost allocation and resource utilization.
The following table contains key issues addressed in the 4.7.9.4 release.
ID | Description |
---|---|
App Store | |
APP-774 | External Elasticsearch integration not supported within the App Store environment. |
APP-775 | Databricks cost anomaly detection did not function as expected when integrated with external Elasticsearch. |
Cost | |
UIX-6310 | On the Chargeback page, when tag is not provided for any application, NULL is displayed. Upon redirection from the Optimize link for the NULL tag, the Compute page shows all applications for the selected duration instead of specifically displaying applications with no tags. |
Compute | |
DT-2094 | The cluster ID is displayed instead of the cluster name for certain clusters. |
Spark | |
DT-2141 | In the Program tab, when you click the line number, the actual line of the code is not highlighted in the Spark details App Summary page . |
DT-1404 | Jobs created for the PySpark application using User-Defined Functions on a job cluster fail after applying the recommendations for node downsizing. |
PLATFORM-2764 | There is a lag for SQL Streaming applications. |
UX-632 | The timeline histogram does not generate correctly on the Spark application details page. |
Workflows | |
PIPELINE-1626, PIPELINE-1946 | The Unravel user interface may experience issues where certain Azure Databricks jobs are missing and duplicate entries appear in Databricks workflows under specific circumstances. |
The upcoming releases will include the following key fixes to enhance user experience. It is important to note that while these issues exist, there is no immediate critical impact on using the product, and users can continue to utilize its functionality with confidence.
ID | Description |
---|---|
Cost | |
UIX-6305 | The Others category is displayed twice in legends when the number of clusters exceeds 1000 in the Chargeback page. |
Compute | |
DT-2083 | The Total Allocated Key Performance Indicators (KPIs) for Vcore and memory are not visible in the Compute > Trends page. |
UIX-6321 | All jobs in the running status are displayed in the Finished tab under Job Runs instead of showing only the finished jobs. |
Insights | |
DT-2006 | Recommendations are provided for a failed pipeline when users utilize multiple tasks with shared job clusters, and one of the tasks fails. |
DT-2125 | The UI shows a cost discrepancy for the Executor Idle time detected insight in the Databricks version 14.2 with Photon enabled. |
Spark | |
DT-1742 | The timezone for the NodeRightSizing insight event is inconsistent in the Spark details page. |
DT-2029 | Applications in a success state may inaccurately display an associated job in a running state instead of transitioning to a failed state. |
UIX-6523 | The Sort by Write feature is currently not functioning as expected in the Spark details page. |
SaaS (Free) | |
DT-2037 | In the Databricks Standard (free) environment, there is an issue where the User Flow badge obstructs pagination. |
UI | |
UIX-6281 | The cost comparison for all the instances is not displayed on the Pipeline detail page. |
Workflows | |
DT-2104 | Sorting is incorrect when the list contains both strings starting with capital and small case letters, |
Billing
Some discrepancies may occur in cost calculations due to differences between the user time zone displayed on the Compute page and the UTC-based aggregation on the Billing page. (DT-2350)
Compute
Jobs by status graphs in the Trends tab display spark application details and not the job details. Our development is actively looking into this design limitation and efforts are underway to address this in future updates to enhance the product's capabilities. (DT-2008)
Data
If tables are created with the same name, accessed, deleted, and re-created, and if those tables are re-accessed, then their query and app count do not match. (DATAPAGE-502)
Home
Home page does not display alerts on the UI when there is missing ROI data for a single day. (DT-2509)
Hovering on Total Cost Trend on the Summary tab of the Home page may display inaccurate date information. (DT-2408)
Workflows
The current implementation has a limitation where the wrong run count is displayed for the job ID when sorting by run count in the Workflows > Jobs section. This discrepancy is currently under investigation by our development team, and active efforts are being taken to resolve this issue. (UIX-6526)
Our development team is actively investigating the following Known issues and are working towards resolving them. It is important to note that while these issues exist, there is no immediate critical impact on using the product, and users can continue to utilize its functionality with confidence.
Bug ID | Description | Workaround |
---|---|---|
Compute | ||
PIPELINE-1636 | Inconsistent data is displayed for the cluster Duration and Start Time on the Compute page. | NA |
CUSTOMER-3017 | The job duration displayed in the TopX section of the "Longest Running Jobs" on the Job Compute graph is incorrect. | NA |
Cost | ||
UIX-5624 | Data is not displayed when you click the Optimize button corresponding to OTHERS for the Cost > Chargeback results shown in the table. | NA |
DT-1094 | The No data available message is displayed on the Compute page after navigating from the Trends and Chargeback pages with Tag filters. | NA |
Datapage | ||
DATAPAGE-473 | For Hive metastore 3.1.0 or earlier versions, the creation time of partitions is not captured if a partition is created dynamically. Therefore, the Last Day KPI for the partition section is not shown in Unravel. | NA |
Insights | ||
DT-1987 | There is a mismatch in the computation of costs for fleet and spot instances in Databricks clusters. This issue arises due to the unavailability of the exact node type in the cluster info response. | NA |
Performance | ||
ASI-933 | In the Lag setup, the Duration is not updated for running applications. The Duration should be updated every 15 minutes. | NA |
ASI-936 | In the Lag setup, the App Time data is missing in the Timing tab of many applications. | NA |
Spark | ||
PIPELINE-1616 | If the Spark job is not running for Databricks, the values for the Duration and End time fields are not updated on the Databricks Run Details page. | NA |
DT-2012 | Incorrect details are displayed on the AppSummary > Job Run page when a user repairs a previously failed job. The displayed information may not accurately reflect the repaired job's details. | NA |
DT-1742 | The timezone for the NodeRightSizing insight event is inconsistent in the Spark details page. | NA |
DT-2029 | Applications in a success state may inaccurately display an associated job in a running state instead of transitioning to a failed state. | NA |
DT-3122 | The TopX Report displays an incorrect count of events. | |
UI | ||
PIPELINE-1935 | In the Pipeline details page, when you select the data for a specific date, all instances are displayed instead of displaying only the instances within a selected date. | NA |
UIX-6263 | The cross button on the Pipeline details page does not close the detail page when you click the bars inside the Gantt chart. | NA |
Workflows | ||
DT-1461, PIPELINE-1939, PIPELINE-1940, DT-1093, PIPELINE-1924 | The UI and data exhibit inconsistencies, including problems with job run details, issues related to multiple workflow runs and UTC timestamps , empty content in workflow job pages and issues with filter values and duration display. | NA |
v4.7.9.3 Release notes
Software version
Release date: January 25, 2024
See v4.7.9.3 for download information.
See also Unity App release notes
Software upgrade support
The following upgrade paths are supported:
4.7.9.2 → 4.7.9.3
4.7.8.0 Hotfix → 4.7.9.3
4.7.8.0 → 4.7.9.3
4.7.x (Databricks) → 4.7.9.3
For instructions to upgrade to Unravel v 4.7.9.3, see Upgrade to Unravel 4793
For fresh installations, see Deploy Unravel
Announcements
End of Support Announcement for RHEL 6
Red Hat Enterprise Linux 6 (RHEL 6) is no longer supported with Unravel. If you are currently using RHEL 6, Unravel recommends that you plan an upgrade to a supported operating system to continue receiving updates and support. Contact support for any further assistance.
CPU Speed host metrics collection is not supported
Starting from the 4793 release, we have deprecated the collection of CPU Speed host metrics.
Certified platforms
The following platforms are tested and certified in this release:
Databricks (Azure, AWS)
Review your platform's compatibility matrix before you install Unravel.
Updates to Unravel's configuration properties
See 4.7.9.3 - Updates to Unravel properties.
Updates to upgrading Unravel to v4.7.9.3
Go to {unravel_install_dir}/versions/{unravel_version}/core/etc/dbx/cost
Copy the following files:
prices_workload_tier_aws.tsv
prices_workload_tier_azure.tsv
Paste the copied files and replace the existing files in this location:
{unravel_install_dir}/data/conf/cost
The insight_upgrade.sh script be run after the upgrade. This script performs the following tasks:
Deletes older RealTimeLightProcessorEvent entries from the database and elasticsearch index.
Regenerates new NodeRightSizing events for certain clusters.
Go to {unravel_install_dir}/unravel/services/insights_worker_1_1
Run the insights_upgrade.sh script.
New features
A new Healthcheck ROI report is launched and is available as an App Store app. The app provides a comprehensive view of the Databricks environment, focusing on performance, costs, and potential savings. With this app, you can get insights into daily costs, hierarchical cost distribution, user, workspace, cluster, and job metrics. You can identify opportunities for workload optimization, worker resource classification, and migration savings through detailed analytics and recommendations. You can also have a holistic view of cluster metrics, including session costs, wastage analysis, and potential migration savings.
Support for Databricks 13.x and above
Databricks Runtime 13.x and above is supported from this release.
Note
Databricks does not provide Ganglia metrics for Databricks Runtime 13 and above. Unravel now gathers all host-level metrics in real time from the /proc filesystem. There might be variations in the metrics collection approach of Unravel and Databricks itself.
Observability on Databricks SaaS is available with Standard (Free) tier
Unravel has introduced observability on Databricks SaaS for free. You can now access essential observability features at no cost, allowing you to monitor your Databricks environment without incurring additional charges.
Improvements and enhancements
Improved On-demand Insights
The on-demand Insights feature is now significantly faster, providing users with access to the most recent and relevant insights within 10 seconds, and enabling an intuitive comparison of resources that facilitates quick decision making. This update improves the user experience by streamlining the process of obtaining valuable insights.
Python upgrade
In this release, Python is upgraded to version 3.8.12.
Backend updates to improve performance
This release includes significant backend improvements aimed at enhancing overall system performance. These updates contribute to a more responsive and efficient system, ensuring a smoother experience.
The following table contains key issues addressed in the 4.7.9.3 release.
ID | Description |
---|---|
App Store | |
IMP-1089 | Incorrect duration values are noted for the Interesting Apps data in specific applications. |
Compute | |
IMP-1239 | Modify the parsing logic for driver host metrics in the Spark Details page of Compute. |
Insights | |
DT-1519 | The Nodedownsizing event recommends a $0 cost saving for a successful job. |
IMP-1217 | Streaming applications are incorrectly generating RealtimeLightProcessor insights. |
IMP-1272 | An exception occurs while fetching feature data from the feature store. |
Jobs | |
PIPELINE-1982 | In a Spark application, there is a discrepancy in the displayed name on the Jobs page. |
Kafka | |
CPLANE-2649 | The Refresh Kafka command failed to start Zookeeper before initiating Kafka, resulting in an incomplete initialization. |
Security | |
CUSTOMER-2584 | Bind password is exposed in plain text within the AutoAction (AA) logs. |
Sensor | |
CPLANE-3427 | In the Unravel sensor logs, there is an occurrence of java.lang.NumberFormatException. |
Workflows | |
CUSTOMER-2544 | The sort functionality in the cost filter under the Workflow tab is not functioning as expected. |
PIPELINE-2021 | The cost filter under the Workflow tab is not functioning as expected. |
The upcoming releases will include the following key fixes to enhance user experience. It is important to note that while these issues exist, there is no immediate critical impact on using the product, and users can continue to utilize its functionality with confidence.
ID | Description |
---|---|
Cost | |
DT-1879, DT-1871, DT-1853 | The following issues are observed in the Budget page.
|
UIX-6305 | The Others category is displayed twice in legends when the number of clusters exceeds 1000 in the Chargeback page. |
Compute | |
DT-2094 | The cluster ID is displayed instead of the Cluster name for certain clusters. |
DT-2079 | The cluster cost displayed does not match the Azure billing report in some scenarios. |
DT-2083 | The Total Allocated Key Performance Indicators (KPIs) for Vcore and memory are not visible in the Compute > Trends page. |
UIX-6321 | All jobs in the running status are displayed in the Finished tab under Job Runs instead of showing only the finished jobs. |
Insights | |
DT-2006 | Recommendations are provided for a failed pipeline when users utilize multiple tasks with shared job clusters, and one of the tasks fails. |
DT-2125 | The UI shows a cost discrepancy for the Executor Idle time detected insight in the Databricks version 14.2 with Photon enabled. |
Reports | |
DT-1841 | The TopX Report displays an incorrect count of events. |
Spark | |
DT-1742 | The timezone for the NodeRightSizing insight event is inconsistent in the Spark details page. |
DT-2012 | Incorrect details are displayed on the AppSummary > Job Run page when a user repairs a previously failed job. The displayed information may not accurately reflect the repaired job's details. |
/DT-2029 | Applications in a success state may inaccurately display an associated job in a running state instead of transitioning to a failed state. |
DT-2141 | Clicking the line number in the Program tab does not highlight the actual line of code in the Spark details App Summary page |
UIX-6523 | The Sort by Write feature is currently not functioning as expected in the Spark details page. |
SaaS (Free) | |
DT-2037 | In the Databricks Standard (free) environment, there is an issue where the User Flow badge obstructs pagination. |
Workflows | |
DT-2104 | Sorting is incorrect when the list contains both strings starting with capital and small case letters, |
Compute
Jobs by status graphs in the Trends tab display spark application details and not the job details. Our development is actively looking into this design limitation and efforts are underway to address this in future updates to enhance the product's capabilities. (DT-2008)
Workflows
The current implementation has a limitation where the wrong run count is displayed for the job ID when sorting by run count in the Workflows > Jobs section. This discrepancy is currently under investigation by our development team, and active efforts are being taken to resolve this issue. (UIX-6526)
Our development team is actively investigating the following Known issues and are working towards resolving them. It's important to note that while these issues exist, there is no immediate critical impact on using the product, and users can continue to utilize its functionality with confidence.
Bug ID | Description | Workaround |
---|---|---|
App Store | ||
APP-614 | App Store tasks fail to start with SSL enabled on the MySQL database. | |
Compute | ||
PIPELINE-1636 | Inconsistent data is displayed for the cluster Duration and Start Time on the Compute page. | NA |
Cost | ||
UIX-5624 | Data is not displayed when you click the Optimize button corresponding to OTHERS for the Cost > Chargeback results shown in the table. | NA |
DT-1094 | The No data available message is displayed on the Compute page after navigating from the Trends and Chargeback pages with Tag filters. | NA |
UIX-6310 | On the Chargeback page, when no tag is provided for any application, NULL is displayed. Upon redirection from the Optimize link for the NULL tag, the Compute page shows all applications for the selected duration instead of specifically displaying applications with no tags. | NA |
Datapage | ||
DATAPAGE-502 | If tables are created with the same name, accessed, deleted, and re-created, and if those tables are re-accessed, then their query and app count do not match. | NA |
DATAPAGE-740 | The query to fetch tableDailyKPIs is getting timed out when dealing with a huge table partition of 27 million records. From a threshold perspective, it has been verified that the API functions without issues for partition sizes up to 18 million. | NA |
DATAPAGE-473 | For Hive metastore 3.1.0 or earlier versions, the creation time of partitions is not captured if a partition is created dynamically. Therefore, the Last Day KPI for the partition section is not shown in Unravel. | NA |
Insights | ||
DT-1987 | There is a mismatch in the computation of costs for fleet and spot instances in Databricks clusters. This issue arises due to the unavailability of the exact node type in the cluster info response. | NA |
UIX-5127, INSIGHTS-324,UIX-4176 | Link re-direction issues, such as incorrect data filters for viewing Top Groups by Cost and Top Clusters by Cost, as well as missing re-direction links in the App Acceleration section. | NA |
Performance | ||
PIPELINE-1926 | The Insight Worker daemon is experiencing performance lag, causing delays in processing insights and data analytics tasks. | NA |
ASI-933 | In the Lag setup, the Duration is not updated for running applications. The Duration should be updated every 15 minutes. | NA |
ASI-936 | In the Lag setup, the App Time data is missing in the Timing tab of many applications. | NA |
Spark | ||
DT-1404 | Jobs created for the PySpark application using User-Defined Functions on a job cluster fail after applying the recommendations for node downsizing. | |
PIPELINE-1616 | If the Spark job is not running for Databricks, the values for the Duration and End time fields are not updated on the Databricks Run Details page. | NA |
PLATFORM-2764 | You can see a lag for SQL Streaming applications. | NA |
UX-632 | The timeline histogram needs to be generated correctly on the Spark application details page. | NA |
PIPELINE-626 | For PySpark applications, the | NA |
UI | ||
UIX-5581 | The job run count displayed on the Chargeback page differs from the job count shown on the Workflow page. | NA |
PIPELINE-1935 | In the Pipeline details page, when you select the data for a specific date, all instances are displayed instead of displaying only the instances within a selected date. | NA |
UIX-6281 | The cost comparison for all the instances is not displayed on the Pipeline detail page. | NA |
PIPELINE-1934 | On the Pipeline details page, the arrows must point only to the latest run instead of all the runs. | NA |
UIX-6321 | In the Workflow section, instead of displaying only jobs completed within the selected time frame, it currently displays jobs running within the selected duration. | NA |
UIX-6263 | The cross button on the Pipeline details page does not close the detail page when you click the bars inside the Gantt chart. | NA |
UIX-3536 | In the App summary page for Impala, the Query> Operator view is visible after scrolling down. | NA |
Workflows | ||
DT-1461, PIPELINE-1939, PIPELINE-1940, DT-1093, UIX-6274, PIPELINE-1924 | The UI and data exhibit inconsistencies, including problems with job run details, issues related to multiple workflow runs and UTC timestamps , empty content in workflow job pages and issues with filter values and duration display. | NA |
PIPELINE-1626, PIPELINE-1946 | The Unravel UI has the issue of missing some Azure Databricks jobs and duplicate entries in Databricks workflow in certain scenarios. | NA |
App Store tasks fail to start with SSL enabled on the MySQL database. (APP-614)
Stop Unravel.
<Unravel installation directory>/unravel/manager stop
Use an editor to open
<Installation_directory>/unravel/data/conf/unravel.yaml
file.In the
unravel.yaml
file, under the database > advanced > python_flags block, enter the path to the trusted certificates. For example, if Unravel is installed at /opt/unravel, you must edit theunravel.yaml
file as follows:unravel: ...snip... database: ...snip... advanced: python_flags: ssl_ca: /opt/unravel/data/certificates/trusted_certs.pem
Use the manager utility to upload the certificates.
<Unravel installation director>/manager config tls trust add --pem /path/to/certificate
For example: /opt/unravel/manager config tls trust add --pem /path/to/certificate
Enable the Truststore.
<Unravel installation directory>/manager config tls trust enable
Apply the changes and restart Unravel.
<Unravel installation directory>/unravel/manager config apply --restart
Jobs created for the PySpark application using User-Defined Functions on a job cluster fail after applying the recommendations for node downsizing. (DT-1404)
In your Databricks workspace, go to Configure Cluster > Advanced Options > Spark config .
Add and set the following property to true for spark.driver.extraJavaOptions and spark.executor.extraJavaOptions spark configurations:
Dcom.unraveldata.metrics.proctree.enable=true
For example:
spark.executor.extraJavaOptions -Dcom.unraveldata.metrics.proctree.enable=true -javaagent:/dbfs/databricks/unravel/unravel-agent-pack-bin/btrace-agent.jar=config=executor,libs=spark-3.0 spark.driver.extraJavaOptions -Dcom.unraveldata.metrics.proctree.enable=true -javaagent:/dbfs/databricks/unravel/unravel-agent-pack-bin/btrace-agent.jar=config=driver,script=StreamingProbe.btclass,libs=spark-3.0
App-store does not support PostgreSQL over SSL.
Sustained Violation is not supported in AutoActions for Databricks. This is a type of violation that triggers the AutoAction.
All the reports, except for the TopX report, are not supported on Databricks.
Red Hat Enterprise Linux 6 (RHEL 6) is no longer supported with Unravel. If you are currently using RHEL 6, Unravel recommends that you plan an upgrade to a supported operating system to continue receiving updates and support. Contact support for any further assistance.
Starting from the 4793 release, the collection of CPU Speed host metrics is deprecated.