v4.7.8.0 Release notes

Software version

Release date: 20/February/2023

See 4.7.8.0 for download information.v4.7.8.0

Software upgrade support

The following upgrade paths are supported:

4.7.x.x → 4.7.8.0
4.6.1.9 → 4.7.8.0
4.6.1.8 or earlier → 4.6.1.9 → 4.7.8.0

For instructions to upgrade to Unravel v4.6.1.9, see Upgrading Unravel server.

For instructions to upgrade to Unravel v4.7.8.x, see Upgrading Unravel.

For fresh installations, see Installing Unravel.

Certified platforms

The following platforms are tested and certified in this release:

Amazon EMR
Databricks (Azure, AWS)

Review your platform's compatibility matrix before you install Unravel.Compatibility Matrix

Updates to Unravel's configuration properties

See 4.7.x - Updates to Unravel properties.

Updates to upgrading Unravel to v4.7.8.0

An existing license for any previous version (before 4.7.7.x) does not work with the newer version of Unravel. Therefore, before upgrading Unravel , you must obtain a license file from Unravel Customer Support. For information about setting the license, see Upgrading Unravel from version 4.7.x to 4.7.8.x section in Upgrading Unravel.
Optionally, you can regroup multiple Spark worker instances for enhanced performance after upgrading to v4.7.8.0.
Caution
This task requires planning and can be performed only in collaboration with Unravel support team. This is a one-time task.

New features

Data quality integration with Great Expectations
Great Expectations is a product quality tool that enables you to run validations against your data asset by running an Expectation Suite (quality assertions) against it. Great Expectations when integrated with Unravel extends the measure of data quality into Unravel. At the same time, Unravel provides unified visibility of any expectations validated while running the Expectation Suite. Thus adding data quality insights to Unravel's current single-pane data monitoring aspect.
You can view the Data Quality insights of Great Expectations from the Unravel UI > Data > Tables detail page> Analysis tab and also from the Unravel UI > Jobs > Job details page > Analysis tab.
Multi-node deployment of Spark workers for high-volume data processing
You can deploy additional Spark workers on a separate server, other than the server where Unravel is installed, with services to process high-volume data.

Notification channels

A new Notification channels option has been added to the Manage menu, using which you can set up notification channels to receive alerts when certain conditions are triggered. Use notifications to send alerts through email addresses or Slack messages to users or user groups.

For information about the Notification channel, see the following topics:

Topics	Guide name
New topics Notification channels Creating a notification channel Modifying the existing notification channel Viewing notification channels	User Guide
Updates to the existing topics Cost Budget (EMR) Creating a budget Viewing a budget and its details	User Guide
Updates to the existing topics Cost Budget (Databricks) Setting a budget	User Guide

AutoActions support EMR apps and clusters to optimize cost
AutoActions can now monitor EMR apps and clusters. You can set the AutoAction policy to generate alerts for EMR apps and clusters. AutoActions can monitor EMR clusters based on cost, duration, and idle checks and send alerts.
For more information, refer to AutoActions > AutoActions (EMR) topic in User Guide.

Improvements and enhancements

Databricks enhancements

A Databricks Job can be associated with multiple clusters. Each job entry now corresponds to a Databricks job. The following enhancements have been made to the Databricks Workflows > Jobs page (DT-1187):

Pages	Changes
Workflows>Jobs	Removed the Clusters Name column
Workflows>Job Runs	Removed the Clusters Name and Cluster Type columns Removed the Job name link from the Run Name / ID column. Renamed the Run Name / ID to Job name / ID Provided a link to the Run ID. After clicking the Run ID, the job run detail page is displayed. Updated the Search by ID, Keyword field to Search by keyword. You can search for the job name by typing the keyword. Changed the Filter by Cluster Name search to Filter by Job name or ID

Pages

Changes

Workflows>Jobs

Removed the Clusters Name column

Workflows>Job Runs

Removed the Clusters Name and Cluster Type columns
Removed the Job name link from the Run Name / ID column.
Renamed the Run Name / ID to Job name / ID
Provided a link to the Run ID. After clicking the Run ID, the job run detail page is displayed.
Updated the Search by ID, Keyword field to Search by keyword. You can search for the job name by typing the keyword.
Changed the Filter by Cluster Name search to Filter by Job name or ID

The following enhancement is done for the Resources tab on the Spark details page. (DT-1456)

Pages	Changes
Compute>Spark>Resources>Host Metrics and Workflow>Job>Task>Resources>Host Metrics	The following new metrics are added to Host metrics: Total memory Free memory You can use these metrics to evaluate the memory spent on processes other than those of Spark.

Pages

Changes

Compute>Spark>Resources>Host Metrics and Workflow>Job>Task>Resources>Host Metrics

The following new metrics are added to Host metrics:

Total memory
Free memory

You can use these metrics to evaluate the memory spent on processes other than those of Spark.

For information, see User Guide.

Other enhancements
- Node count and duration values are provided for the aggregated cost savings for each recommended node type. (EMR-620)
- The new Account Id column has been added to the AWS Account Settings page to view configured AWS account ID in the Unravel UI. (UIX-5469)
- On the Clusters page, the ID filter has been relocated to the top and is separate. You cannot combine other filters (such as Date and time range) with an ID search. (UIX-5332)
  For information, see the Monitor EMR clusters section in the User Guide.
- The MySQL client library has been updated to the 12.0 version on the user interface. (UIX-5383)
- Enhanced performance by reducing the lag in the Impala pipeline. (ASP-1677)
- Support for downloading as CSV option for EMR Clusters and EMR AutoAction pages. (UIX-4853)
  For information, see Viewing AutoAction and its details and Monitor EMR clusters sections in the User Guide.
- Support for the EMR cluster idle state (EMR-465)
  Unravel now supports AutoAction for the idle state of the cluster. You can set AutoAction when the EMR cluster exceeds the idle duration threshold. For information, see Creating AutoActions in User Guide.

Unsupported

Data

On the Data page, File Reports, Small File reports, and file size information are not supported for Dataproc clusters.

Platforms

Data

In GCP - BigQuery, for the Data page, a count of more than 100 projects is not supported.

BigQuery pricing

For BigQuery pricing, Unravel only supports On-demand analysis pricing. Flat-rate analysis pricing and Storage pricing (Active and Long Term storage) is not supported.

Bug fixes

AutoActions
- When multiple AutoActions policies are created with the Overlapping ruleset and scopes, only one of the AutoAction policies is triggered. (AA-498)
Databricks
- The duplicate job runs (with the same run IDs) are generated on the Job Runs page. (DT-1190)
- On the Compute page, inaccurate information is displayed for clusters in the Inefficient category. (UIX-5064)
- The downloaded TopX Report (in JSON format) lists the incorrect type of Spark app. (REPORT-2094)
- In Databricks, when a job in a workflow fails and a new job is launched instead of a new attempt, the new job cannot be part of the same workflow. (PG-269)
- On the Chargeback page, when you group by clusters, Unravel has a limitation of only grouping a maximum of 1000 clusters. (SUPPORT-1570)
EMR
- After clicking the Hive Query link on a cluster using the bootstrap script, the No apps found with the Id message is displayed. (CLOUD-532)
- On the Clusters page, search by cluster name returns incomplete search results. (UIX-5345)
- On the Clusters page, the Name and Cluster tags filters return incomplete search results. (EMR-595)
- On the Clusters page, the following issues are observed (EMR-588):
  - The cluster list omits clusters with a zero cost when the custom date range is selected
  - The cluster list omits the latest cluster cost when the custom date range is selected
- If clusters terminate with errors without generating NodeDownSizingEvent, then such clusters are displayed in the Inefficient category on the Clusters page. (EMR-542)
- The Spark sensor fails to start. (EMR-485)
- On the Clusters page, a mismatch in the cluster IDs displayed in the ID drop-down list with the selected cluster category in the left panel. (EMR-435)
- For clusters terminated with errors, the node downsizing recommendations are shown. (EMR-422)
Insights
- Clicking the links for operators and stages in the SQLTooManyGroupByEvent does not result in any action. (INSIGHTS-355)
- An exception occurs when generating memory insights for a Spark application. (INSIGHTS-363)
Installation
- Databricks Healthcheck App Store celery daemon fails to start. (INSTALL-2945)
- Installing Unravel fails when connecting with SSL-enabled MariaDB. (INSTALL-3071)
Spark
- A blank page is displayed on the Databricks Run Details page for Spark structured streaming applications. (ASP-1629, UIX-5124)
UI
- On the Clusters page, a discrepancy exists between the cost of clusters and the minimum and maximum cost displayed in the left pane. (UIX-5270)
- From the Clusters page, after clicking the Spark action, refreshing the Spark details page takes longer than expected. (UIX-5247)
- When you return from the application details > SQL tab> Stage page to the application details > Attempt page, the Duration, Data I/O, and Jobs Count fields are not displayed. (UIX-5048)

Known issues

Applications

Event logs and YARN logs are not loaded for some applications in Google Dataproc clusters. (ASP-1372)

AutoActions

AutoActions stop responding due to an invalid or unsupported HTTP URL or webhook. (AA-575)

BigQuery

On the Application details page, the original query link is missing for some cached queries due to the parallel processing of original and cached queries. (BIGQ-61)
Issue: Sometimes, when you process a large number of BigQuery projects with the manager config bigquery integrate command, you may see the following error:
Provider produced inconsistent result after apply
Workaround: Wait for a few minutes and re-run the command. (INSTALL-2860, INSTALL-2934)

Data page

If tables are created with the same name and are accessed, deleted, and re-created, and if those tables are re-accessed, then their query and app count does not match.(DATAPAGE-502)
For Hive metastore 3.1.0 or earlier versions, the create time of partitions is not captured if a partition is created dynamically. Therefore, in Unravel, the Last Day KPI for the partition section are not shown. (DATAPAGE-473)

Dataproc

Google Cloud Dataproc: Executor Logs are not loaded for Spark applications. (ASP-1371)

Installation

Issue: You can encounter a NoIndexFound exception for fresh installations of Unravel on GCP-BigQuery. (BIGQ-104)
Workaround: Run the following CURL command on the Unravel node after the installation.
```
curl -XPUT http://localhost:4171/app-19700101_07
```

Kerberos

Kerberos can only be disabled manually from the unravel.yamlfile.
```
 kerberos:
      enabled: False
```

Reports

Tez

SQL events generator generates SQL Like clause event if the query contains a like pattern even in the literals. (TEZLLAP-349)

Upgrade

After upgrading from v4.7.1.1 to v4.7.5.0, the Hive jobs running with the Tez application as an execution engine are not linked. (EMR-406)
After upgrading to v4.7.1.0, Notebooks do not work. You can configure them separately. (REPORT-1895)
After upgrading from v4.6.x to v4.7.1.0, the Tez application details page does not initially show DAG data. The DAG data is visible only after you refresh the page. (ASP-1126)

Workflow

Jobs are falsely labeled as a Tez App for Oozie Sqoop and Shell actions. (PLATFORM-2403)

Support

For support issues, contact Unravel Support.

In this section:

Home