v4.7.3.0 Release notes

Software version

Release Date: 28/Jan/2022

See 4.7.3.0 for download information.

Software upgrade support

Fresh installations are supported along with the following upgrade path:

v4.7.0.x, v4.7.1.x, v4.7.2.x → v4.7.3.0
v4.6.2.x → v4.7.3.0
v4.6.1.9 → v4.7.3.0
v4.6.1.8 or earlier → v4.6.1.9 → v4.7.3.0

Refer to Upgrading Unravel server for instructions to upgrade to Unravel 4.6.1.9 version.

Refer to Upgrading Unravel for instructions to upgrade to Unravel 4.7.3.0 version.

Refer to Installing Unravel for fresh installations.

Sensor upgrade

Sensor upgrade is mandatory.
Refer to Upgrading Sensors.

Certified platforms

The following platforms are tested and certified in this release:

Cloudera Distribution of Apache Hadoop (CDH)
Cloudera Data Platform (CDP)
Hortonworks Data Platform (HDP)
Amazon Elastic MapReduce (EMR)
Databricks (Azure Databricks and AWS Databricks)
Google Cloud Platform (Dataproc, BigQuery)

Review your platform's compatibility matrix before you install Unravel.

Updates to Unravel's configuration properties

Refer to 4.7.x - Updates to Unravel properties.

New features

App store
The App store page is added from where you can manage all the Unravel apps. From the App store, you can install an app, run the administrative tasks for managing your apps, navigate to different apps, open and run the apps.
Billing service - EMR
The Billing service is now supported for the EMR platform. The Billing tab in Unravel shows the charges of Unravel for its support for EMR. The following pricing plans are supported.
- pay-as-you-go
  As per this plan, Unravel tracks the number of instance hours that the user has incurred and showed the charges based on the usage.
- pay-in-advance
  As per this plan, you can pay in advance for a specific number of Instance Hours). Credits will be taken based on the usage, with the remaining credits shown on the Billing tab daily. Thus users can monitor when they run out of credits.
Cost page
From the Cost page, you can govern your spending for the cloud and optimize your jobs to manage the cost. This page is accessible only for users with admin or read-only admin permissions. The cost page has the following sections:
- Trends
  You can view the trends of DBUs, cost (in $), and the number of clusters, for the total usage. These trends help you to identify periods with anomalies, (for example a sudden spike in the cost).
- Chargeback
  You can view the Chargeback cost attributed to cluster creators and job run creators for the specified time range and filters, which are organized based on the selected Group by option.
- Budget
  The budget page helps you to control costs by comparing the target budget to the actual incurred cost. You can set a target budget (DBUs) based on workspaces, users, clusters, and tags and check if an incurred cost is approaching or has already exceeded the target budget.
Support for Delta Lake on Databricks
You can configure Unravel to fetch the metadata of the Delta tables and monitor them from the Data page on Unravel UI.
Google Cloud BigQuery monitoring (Private Preview)
A private preview of support for BigQuery, which includes:
- Observability
  - View the jobs running on the cluster
  - View details of errors encountered by the jobs
- Governance
  - Chargeback view of resources used in BigQuery.
  - Associating jobs with business priority tags
- Optimization
  - Analysis of the job execution based on the resource usage and time spent
  - Visibility into data/tables usage (hot, warm, cold)
Interactive Precheck
Interactive Precheck can be used to validate configuration before installing unravel and reuse that information to bootstrap an Unravel install. The software will guide you through a series of questions to verify your environment and get the basic configuration information and verify it.
Interactive precheck is available as a standalone package and as part of the full unravel package.
Multi-cluster support
- Multi-cluster support is now available for the Dashboard app on the App store.
- Multi-cluster support is now available for Notebooks.
Reports
TopX reports are now supported on EMR and Databricks platforms.
Support multiple LDAP servers
Unravel now supports users on multiple LDAP servers.

Improvements and enhancements

Applications
- RM tracking URL is shown under the Logs tab of the application details page. (ASP-1300)
- Live resource metrics on the Application details page for running applications. (ASP-1270)
- The application name is shown on the Spark application details page.
Databricks
- Cloud Clusters page renamed to Compute. (DT-1017)
- Capture cost of untracked Databricks clusters. (DT-964)
- Capture cost of a Databricks cluster in a running state (DT-965)
- Under the new Compute tab, the UIX is improved to have pages for each cluster showing metadata, KPI, analysis, and trends.
- Unravel Init script for Databricks as global init script. (DT-1017)
- Support Chargeback by Databricks tags. (DT-967, CUSTOMER-1881)
- Azure AD: Microsoft Graph API: fetch group names API fetches only the required fields. (CDI-333)
Healthcheck
- Healthcheck implementation for App store: Check if appstore is running. (APP-490)
Impala
- Properties added that let you control the following timeouts for Impala CM connector:
  - HTTP connection time out for Impala connector.
  - HTTP read time out for Impala connector
  - HTTP client backoff time for Impala connector. Time for which HTTP client sleeps before reattempting after unsuccessful read attempt. (ASP-1354)
- Properties added that can let you specify the number of retries that will be done to fetch the profile tree for any Impala query. (DOC-1066)
Insights
- Add links to Operator and Stage ID for all Insight events. (INSIGHTS-219)
- Process the Metadata file and update the Table info table. (INSIGHTS-198)
Migration
- Handle multiple name services for HDFS connector. (MIG-180)
- Add cdp-7.1.7 service definitions for Migration reports. (MIG-177)
- Cluster Discovery report
  - Show jobs with unknown queues. (MIG-160)
- Workload Fit report
  - DFS and Non-DFS: Minimum storage selected should be the default storage. (MIG-156)
- Cloud Mapping Per Host report
  - Format numbers for cost values. (MIG-138)
Platform
- Log Receiver (LR): use DocumentStorage by default instead of the file system. (CDI-329)
- Log processing for Spark and Tez moved to a task_worker for better performance. (ASP-1212)
Reports
- Support customizable ports for ondemand. (REPORT-1477)
RBAC
- Move LDAP APIs to datastore. (RBAC-64)
- Support for custom roles and permissions (RBAC-29)
  - Define custom roles beyond the default - admin, read-only, and user.(RBAC-54)
  - Define views that a role can see. (RBAC-60)
  - Define data filters to apply. You can choose from user tags, app tags, app data fields and even write an es query filter to meet your requirements. (RBAC-68, RBAC-69, RBAC-70, RBAC-71)
  - Generate user tags using user tagging script. (RBAC-73)
Spark
- Storage and performance for Spark SQL data. (ASP-667)
UI
- Add headers for Manage pages. (UIX-4449)
- The Last 1 hour filter is fixed for the Databricks Compute page. (UIX-4438)
- App store icon placement enhancement. (UIX-4435)
- Bring back the API Token in the User profile to copy the current token. (UIX-4423)
- Move Manage page items to the top-right header section on Unravel UI. (UIX-4414)
- Show Platform information in the Help Center dropdown. (UIX-4411)
- Show size column in data table and size KPI in the Data details page for Databricks cluster. (UIX-4404)
- Lint fixes for Manage views #2. (UIX-4074)
- Upgraded the software to use NodeJS 14.17.6. (UIX-3955, UIX-3786)
- `Support to provide feedback within the product. (UIX-3880)
- Spark App Name/Id should be displayed on the Spark application page. (UIX-2137)
Utility Upgrade
- Upgrade log4j2 from 2.17.0 to 2.17.1. (CDI-419)
- Upgrade Kafka from 2.2.0 to 3.0.0. (CDI-384)

Unsupported

Billing

Unravel does not support Billing on-prem platforms.

Data

On the Data page, File Reports, Small File reports, and file size information are not supported for MapR, and cloud (EMR, Databricks, GCP) clusters.

Jobs

Impala jobs are not supported on the HDP platform.

Healthcheck

Monitoring the expiration of the SSL Certificates and Kerberos principals in Unravel multi-cluster deployments.

Platforms

MapR

The following features are not supported for MapR:

Impala applications
Kerberos
The following features are supported on the Data page:
- Forecasting
- Small Files
- File Reports
The following reports are not supported on MapR:
- File Reports
- Small Files Report
- Capacity Forecasting
- Migration Planning
The Tuning report is supported only for MR jobs.
Migration Planning
AutoAction is not supported for Impala applications
Migration
Billing
Insights Overview

Migration Planning

Migration Planning is not supported for the following regions for Azure Data Lake:
- Germany Central (Sovereign)
- Germany Northeast (Sovereign)
Forecasting and Migration: In a multi-cluster environment, you can configure only a single cluster at a time. Hence reports are generated only for that single cluster.
Migration Planning is not supported for MapR.

Multi-cluster deployment

Unravel does not support multi-cluster management of combined on-prem and cloud clusters.

Pipeline

Unravel does not support apps belonging to the same pipeline in a multi-cluster environment but is sourced from different clusters. A pipeline can only contain apps that belong to the same cluster.

Reports

All the reports, except for the TopX report, are not supported on Databricks and EMR.
Memory and CPU usage metrics are not supported for TopX reports on Databricks.

Sessions

In Jobs > Sessions, the feature of applying recommendations and then running the newly configured app is not supported.

UI

Pig and Cascading applications are not supported.

Bug fixes

Applications
- The computation of several output rows under the App Summary > SQL tab is incorrect. (ASP-1088)
- On the Chargeback page, no applications are listed in the table. (CD1-429)
- Tez apps with insights do not show the Insight icon on the job listing page. (INSIGHTS-113)
Impala
Impala pipeline improvements to retry for CM API failures to return query profile data. (ASP-1322)
Insights
- The table involved in the join parameter is null if the operator is joining the output of other joins. (INSIGHTS-138)
- Recommendation to set 'hive.exec.reducers.bytes.per.reducer' value keeps oscillating. (INSIGHTS-153)
- Incorrect dot displayed after the DAG ID in App Summary page > Analysis tab for a Hive query. (INSIGHTS-215)
- NullPointerException (ERROR events.SparkEvents: Event for generator com.unraveldata.spark.events.SparkSQLEventGenerator could not be generated due to {} java.lang.NullPointerException at SparkEvents.generateEvents()) is resolved. (INSIGHTS-227)
- Eradicated duplicate table names involved in InefficientJoinConditionEvent. (INSIGHTS-236)
Migration
- PDFs downloaded for migration are incomplete. (MIG-153)
- Workload Fit report:
  - The error message is shown on UI when the report is running. (MIG-154)
  - Cost is not displayed for Azure (Australia Central). (MIG-178)
  - The pie chart does not work appropriately. (MIG-223)
- Cloud Mapping per Host report:
  - Host resource usages are shown as 0 for HDP. (MIG-182)
- An error message is not shown when the backend returns 500 Internal Server Error. (MIG-176)
- Cluster Discovery report: The pie chart shows some issues. (MIG-211)
- The costs available in AWS EC2 and the cost in Unravel are different. (MIG-231)

Known issues

Applications

Workflow/Jobs page displays empty for Analysis, Resources, Daggraph, and Errors tab. (DT-1093)
Event logs and YARN logs are not loaded for some applications in Google Dataproc clusters. (PG-170)

Data page

Incorrect data is displayed in the Number of Queries KPI/Trend graph on the Overview page. (DATAPAGE-502)
Create time of partitions does not get captured in hive metastore if the partition is created dynamically. This limits Unravel to show Last Day KPIs for the partition section.
Wrong data displayed for Number of Partitions Created KPI/trend graph under Partitions KPIs - Last Day section in theData page. (DATAPAGE-473)

Databricks

Table names are not captured properly in some scenarios for Databricks runtime 8.x and above. (PG-252)
DataBricks jobs are being missed intermittently in Unravel. (PG-232)

Dataproc

Google Cloud Datapro: Executor Logs are not loaded for spark applications. (PG-229)

EMR

Exception: Problem when retrieving bootstrap actions for cluster is seen in the aws_worker daemon logs.

Workaround: While creating an AWS account for EMR Chargeback/Insights overview feature, you must include an additional entry in the Policy JSON file for "elasticmapreduce:ListBootstrapActions", as follows:

{
    “Version”: “2012-10-17",
    “Statement”: [
        {
            “Effect”: “Allow”,
            “Action”: [
                “pricing:GetProducts”,
                “elasticmapreduce:ListClusters”,
                “elasticmapreduce:DescribeCluster”,
                “elasticmapreduce:ListInstanceFleets”,
                “elasticmapreduce:ListInstanceGroups”,
                “elasticmapreduce:ListBootstrapActions“,
                “elasticmapreduce:ListInstances”,
                “ec2:DescribeSpotPriceHistory”
            ],
            “Resource”: “*”
        }
    ]
}

Even though the AWS account was already created without this entry (elasticmapreduce:ListBootstrapActions), you can always include this policy later.

Email

Unravel node fails to send email notifications. (INSTALL-1694)

Insights Overview

The Insights Overview tab uses UTC as the timezone while other pages use local time. Hence, the date and time that are shown on the Insights Overview tab and the other pages after redirection can be different. (UIX-4176)

Kerberos

Kerberos can only be disabled manually from the unravel.yamlfile.
```
 kerberos:
      enabled: False
```

Migration

Cloud Mapping Per Host report: Failure to get instance list for certain cloud providers. (MIG-171)
Workaround:
1. Run dbcli.
```
<Unravel installation directory>/unravel/manager run dbcli
```
2. Make the following change in the database schema:
```
ALTER TABLE celery_taskmeta CHANGE COLUMN result result MEDIUMBLOB;
```

Reports

Cluster discovery
- If the metric retrieval for a host fails, then the CPU and memory capacity/usage graphs and heatmaps are not displayed.
  This happens on a CDH cluster when the Cloudera Manager agent of a host does not send any heartbeats to the Cloudera Manager server. Such a host is shown as Bad Health in Cloudera Manager. (REPORT-1706)
  Workaround: Ensure that the Cloudera Manager agent sends heartbeats to the Cloudera Manager on all hosts and that none of the hosts are shown as Bad Health.
- The On-prem Cluster Identity may show an incorrect Spark version on CDH. The report may incorrectly show Spark 1 when Spark 2 is installed on the CDH cluster. (REPORT-1702)
When using PostgreSQL, the % sign is duplicated and displayed in the Workload Fit report > Map to single cluster tab. (MIG-42)
Cloud Mapping Per Host report scheduled in v4.6.1.x will not work in v4.7.1.0. Users must schedule a new report. (REPORT-1886)
The TopX report email contains a link to the Unravel TopX report instead of showing the report content in the email as in the old reports.
Queue analysis: The log file name (unravel_us_1.log) displayed in the error message is incorrect. The correct name of the log file is unravel_sensor.log. (REPORT-1663)

Sensor

The sensor setup script fails with unrecognized arguments. (INSTALL-1667)

Spark

There is a lag seen for SQL Streaming applications. (PLATFORM-2764)

Security

If the customer uses an active directory for Kerberos and the samAccountName and principal do not match, this can cause errors when accessing HDFS. (DOC-755)
In AAD login mode when external logout happens, the user still has access to his current logged-in UI. (UIX-4125)

Spark insights

For PySpark applications, the processCPUTime and the processCPULoad are not captured properly. (ASP-626)

Tez

SQL events generator generates SQL Like clause event if the query contains a like pattern even in the literals. (TEZLLAP-349)

Upgrade

Notebooks will not work after upgrading to v4.7.1.0. You can configure them separately. (REPORT-1895)

In case you have configured a single cluster deployment for Unravel and the cluster name is not default, then the Datapage feature may not work properly.

For this, you must explicitly set the following property after upgrading. (INSTALL-2151)

<Unravel installation directory>/unravel/manager stop
<Unravel installation directory>/unravel/manager config properties set hive.metastore.cluster.ids=<cluster-name>
<Unravel installation directory>/unravel/manager apply
<Unravel installation directory>/unravel/manager start

After you upgrade from v4.6.x to v4.7.1.0, the Tez application details page does not initially show DAG data. The DAG data is visible only after you refresh the page. (ASP-1126)

UI

On the Manage page, the DB Stats are not displayed for untracked clusters. (UIX-4171)
The new user interface (UI) can be accessed only from Chrome.
In the App summary page for Impala, the Query> Operator view is visible after scrolling down. (UIX-3536).

Workflow

Jobs getting falsely labeled as a Tez App for Oozie Sqoop and Shell actions. (PLATFORM-2403)

Support

For support issues, contact Unravel Support.

In this section:

Home