v4.6.1.0 Release notes
Software version
Release Date: 05/18/2020
See v4.6.1.0 for download information
Software upgrade support
All that is required is an RPM upgrade. The following upgrade paths are supported for v4.6.1.0:
- 4.5.4.x to 4.6.1.0 
- 4.5.5.x to 4.6.1.0 
- 4.5.3.x to 4.6.1.0 
- 4.3.1.x to 4.5.0.x to 4.6.1.0 
Sensor upgrade
- A sensor upgrade is mandatory. 
Certified platforms
You must review your platform's compatibility matrix before you upgrade or install.
Updates to Unravel's configuration properties
- Refer to v4.6.x - Updates to Unravel properties. 
Unsupported
- AutoAction 
- Databricks jobs orchestration via services like ADF 
- Notebooks on interactive clusters 
- Spark Program / Query Graph for Notebook and Python tasks 
- Chargeback view by custom tags 
- Cost and Instance Recommendations for Jobs on AWS Databricks 
- Unravel's APIs 
- Sessions 
- Role-based Access Control (RBAC) 
- Support the HDFS path in displaying Spark source code feature. 
- Data Insights for: - Workload 
- Spark 
 
- Reports - Small files 
- Cluster optimization 
- Notebooks 
- Top-X 
- Forecasting 
- Migration planning 
- Queue analysis 
 
- Datapage - Size created of the table 
- Total size 
- Accessed partitions 
- Size created of the partitions 
 
- Datapage - Size created of the table. 
- Total size 
- Size created of the partitions. 
 
- Oozie on EMR 
- Missing table and column statistics events. 
Migration Planning is not supported for the following regions for Azure Data Lake:
- US DoD East 
- US DoD Central 
- Germany Central (Sovereign) 
- Germany Northeast (Sovereign) 
New features
Unravel for Databricks (Azure and AWS)
- Support for jobs running on interactive clusters. 
- Support for visibility into costs Incurred for application on Azure Databricks. 
- New recommendations on cost and instance sizing on Azure Databricks. 
- Support for Spark Streaming apps. 
- Support for Unravel tagging to add Custom Application Tags. 
Reporting
- Added property com.unraveldata.report.excluded.queues that allows you to exclude some queues from metrics collection and reports in the Queue Analysis report. 
Spark Data Pipeline
- Stage timeline data is displayed live. 
- Added properties to allow skip event log loading and executor log loading. 
- Support for loading spark program zip using HDFS path with a new spark property: spark.unravel.program.path. 
Spark Insights
- Added Timings tab in Spark APM, which shows the time breakdown of apps and tasks. 
- Added Bottleneck Analyzer, which detects the application area that is consuming the most time and provides recommendations. 
Data Insights
- Spark implementation for FSImage (Small Files) processing. 
Datapage
- Register table accesses from Spark and SparkSQL apps in the Data Insights framework. 
- Missing table and column statistics events. 
- Register table accesses from Hive queries in the Data Insights framework. 
- Provide insights on compression in table inefficient format event. 
- Data insights framework and generate small files event. 
- Get table and partition sizes from Fsimage to dashboard_summaries and table__info. 
- Provide insights/recommendations on file formats in case of an inefficient format event. 
User Interface (UI)
Added interface to the Manage section to manage API tokens. An administrator can:
- View the list of tokens, which are authorized for the third parties. 
- Generate the API tokens. 
- Delete the API tokens. 
Sensor
- Added support in MR and TEZ sensor to read cluster-ID from extraJavaOptions and env. 
- Provided a way to set cluster-ID to use in URL to LR to POST various data from BTrace sensors. 
Operations
Unravel now detects inefficient and badly formed tables. This information is shared in Operations > Dashboard Inefficient Tables tile. An app version of the table events is also shown in the Hive-MR, Hive-Tez, Impala, and Spark APMs.
Limitation: Only supports:
- On-premise (CDH, HDP) 
- HDFS file system 
- Hive metastore 
Improvements and Enhancements
Data page
- Added paging for scalable retrieval of Impala queries in building the data page, index on column d2 for table dashboard_summaries. Use start_time field instead of created_at in retrieveImpalaQueryInfoByTimeRange(). 
- Add index on column d2 for table dashboard_summaries. 
- Add paging for scalable retrieval of Impala queries in building the data page. 
- Group table events by table. 
- Support External Tables/Partitions and Nested Partitions in Fsimage size calculation. 
- Forward port: Minimize exception logging in getting DFS path and misc changes. 
- Minimize exception logging in getting DFS path and misc changes. 
Reporting
- Improved Capacity Forecasting graph UX. Clear demarcation of actual vs predicted; capacity stays flat rather than predicting capacity increases. 
- Enable updating of sizes to table-info. 
- Implement Fsimage (aka Small Files) using Spark instead of Hive. 
Impala
- Added support for tagging Impala queries with workflows for CDH versions that are later than 5.13. 
- Impala Operator status and KPIs are captured correctly and is more consistent. 
- Various improvements in Impala events. 
- Certified for CDP. 
- Change skew duration threshold default for time skew event from 0.5s to 3s. 
- Recommend use LIMIT only for large # rows. 
- Consider absolute time besides ratio in identifying “long row fetch”. 
AutoAction
- Provision of AutoAction violation badge for Hive. 
Spark Data Pipeline
- Store the processed SQL data from Event log if all live data is not received by SW. 
- Improvements to fetch query plan. 
- Job group is displayed with the job ID if present. 
- Sensor conf to add shutdown delay (spark.unravel.shutdown.delay.ms). 
- Application end time and duration gets updated with the live BTrace sensor data. 
- Spark pipeline improvements to process executor logs with a new Kafka message. 
Kafka
- Add version tag in RequestMetrics:RequestsPerSec. 
- Support for Kafka 2.x. 
Workflow
- Stability fixes are done for Workflow. 
- # OF APPS are displayed correctly for Tagged and for Oozie workflow. 
- Improved the workflow status updating logic ( Stale workflow ). 
- Support for Oozie 5.0+. 
- Added the Stale Oozie Workflow Task to update the Workflows that are stuck in RUNNING state. 
Data Insights
- Data page skips calculating the size of non-existing paths and prevents exceptions as a result. 
- Paging retrieval of Impala queries gets the app access info. 
- Reduce the frequency of checking Kerberos authentication in size calculation. 
Spark Insights
- Improved number of cores and executor recommendations. 
Migration Planning
- Added new properties for Migration Planning Workload fit report and heat map generation. - com.unraveldata.migrationplanning.workloadfit.max.apps - Configure the limit on the maximum number of apps that can be shown in a single slice of workload fit report and heatmap generation. 
- com.unraveldata.migrationplanning.workloadfit.timeout - Configure the timeout for workload fit report and heat map generation. 
 
Customer fixes
- API Call Consolidation: Bulk endpoint for apps with the summary. (CUSTOMER-676) 
- Users are not seeing all Impala applications that are based on the tagging script. (EAR-52) 
- Distinguish which AA triggered an event for a given application not working. (CUSTOMER-942) 
- AutoAction send email action containing & in the email address results in The email address is not valid. (CUSTOMER-1143) 
- AutoAction Templates - Change the Elapsed time to Seconds. (CUSTOMER-841) 
- Add queue dropdown in AA filters. (CUSTOMER-954) 
- AutoAction for failed spark jobs is not working (CUSTOMER-1160) 
- Connecting HDI Clusters across Azure subscriptions to a single Unravel. (CUSTOMER-676) 
- HitdocLoader does not start due to: [ERROR:ELASTIC] Elastic daemon mismatched indexes. (CUSTOMER-1265) 
- Data Insight's Detail page is blank within the customer large production cluster. (CUSTOMER-185) 
- Data Insights are not working appropriately. (CUSTOMER-754) 
- Additional metrics to the data insights - detailed report. (CUSTOMER-178) 
- Add support for Oracle Hive Metastore (Hive libs do more operations than just read-only). (CUSTOMER-265) 
- Data Insights not fully showing metrics against Oracle Hive Metastore seeing lots of ORA-01031: insufficient permissions. (CUSTOMER-479) 
- Data Insights report - feature requests. (CUSTOMER-546) 
- Data insight details page is not showing any data. (CUSTOMER-1153) 
- Sorting by partitions in DataInsights Details is unusable. (CUSTOMER-1164) 
- Data insights column sort in playground 3 broken (Partitions and RP) (CUSTOMER-914) 
- Data page: Display each table's data storage format (ORC/Parquet etc.) (CUSTOMER-391) 
- Complete documentation for setting configs around cluster type, and name vs id. (CUSTOMER-610) 
- Race condition while processing the hive queries. (CUSTOMER-1196) 
- Unravel attempts to write a DELETEME table to Hive Metastore (CUSTOMER-466) 
- The dependency issue on HiveHook jar is causing sqoop job to fail. (CUSTOMER-1196) 
- Need to support ADLS Gen 2. (CUSTOMER-888) 
- Filter Impala queries by cluster in multi-cluster CM deployments. (CUSTOMER-1216) 
- Impala apps ignored due to error: DocFieldValue of "counters" is too large <= 32766 (CUSTOM- ER-1007) 
- On-demand Install fails because SUDO does not exist and ondemand_quick_install.sh and ondemand_install.sh contain SUDO. (CUSTOMER-1303) 
- Missing Swagger entries and documentation about Unravel APIs for Kafka. (CUSTOMER-906) 
- Display latest cumulative value for the metrics across all kafka brokers. (Customer-767) 
- Merged forever_ngui.log with unravel_ngui.log. (CUSTOMER-1246) 
- LDAP configuration is correct yet users cannot see Admin/Manage functionality. (EAR-51) 
- MapReduce Application detail page does not load for non-admin users. (CUSTOMER-1101) 
- Unravel sensor script assumes Spark is installed. (CUSTOMER-404) 
- SENSOR: Unravel Spark sensor appears to be causing java.lang.ClassCircularityError errors when firing against Data Profiler Agent spark jobs. (EAR-18) 
- Restrict TLS protocols for Log Receiver communication. (CUSTOMER-669) 
- Pushlogs.sh fails to create a tar file for log file. (CUSTOMER-1126) 
- LDAP: Slow group queries (CUSTOMER-1176) 
- OpenJDK8U-jre_x64_linux_hotspot_8u232b09 in 4601 branch. (CUSTOMER-1244, CUSTOMER-1245) 
- Support cron job that updates Unravel keytab without requiring Unravel restart for password rotation. (CUSTOMER-828) 
- Streamline Installation Process for On-Premise Hadoop. (CUSTOMER-87) 
- Tagging based on Database Names in SQL coming from Hive, Impala, and SparkSQL. (CUSTOMER-1005) 
- Change connector log entries for missing/incorrect configs to ERROR. (CUSTOMER-1030) 
- Ability to tag based on tables, databases used by applications. (CUSTOMER-1080) 
- Realuser tagging configured during installation. (CUSTOMER-116) 
- Many applications (Spark, MapReduce) are missing logs and metrics within the cluster. (CUSTOMER-152) 
- Unravel showing MapReduce job in RUNNING state, days after the job actually completed. (CUSTOMER-228) 
- Add the ability to retain ES data longer and ensure it is accessible from UI and is responsive. (CUSTOMER-266) 
- Unravel logs are filling up the disk. (CUSTOMER-982) 
- Fix for benchmarks to work with HDInsight, EMR platforms. (CUSTOMER-1106) 
- UnravelListener throws an exception. (CUSTOMER-1118) 
- No Hive jobs are showing up within the cluster and errors are seen within unravel_lr (CUSTOMER-506) 
- Support custom location for Kerberos client configuration. (CUSTOMER-774) 
- SmallFiles Report Failed with unravel-udf-0.2.jar missing error. (CUSTOMER-676) 
- Added Cluster Name to identify Cluster KPI reports. (CUSTOMER-1079) 
- Small files: Concerns on configuring Small Files / Files Report. (CUSTOMER-783) 
- Provide reporting on I/O usage. (CUSTOMER-1336) 
- The Cluster Compare tab the Time Range and the Compare With Range are both set to a default of 7 days. (CUSTOMER-928) 
- Small Files - Recommendations. (CUSTOMER-818) 
- Queue Analysis tab is throwing error SQL if queue names have quotes or special characters. (CUSTOMER-657) 
- Migration Planning reports fail to generate when RM HA is enabled (success or failure depends on which RM is active). (CUSTOMER-1304) 
- Hive / Spark analysis notebooks: CLI report generation fails with cryptic error message there were no matching apps in the filter criteria. (CUSTOMER-1339) 
- Spark Workload Analysis Notebook: allow custom processing of app names to handle a specific app name convention to group runs together. (CUSTOMER-1340) 
- Capacity Forecasting Report showing incorrect total HDFS Capacity. (EAR-57) 
- Reports load slowly in Anthem environment. (CUSTOMER-181) 
- Incorporate Hive table to a path in small files report (CUSTOMER-461) 
- 0 apps in the heatmap is too red in the playground 3, which misleads the user to assume that the cluster is hot (CUSTOMER-908) 
- Creating/Analyze Sessions periodic fails with message: UnravelLogger is not callable. (CUSTOMER-1233) 
- Spark application hangs and fails to exit after Unravel Sensor is installed. (CUSTOMER-1201) 
- Timeline for Spark Stages do not render. (CUSTOMER-1200) 
- The user is unable to go to spark-shell when Unravel-Kafka is down. (CUSTOMER-1152) 
- Spark recommendations default to spark.default.parallelism. (CUSTOMER-1177) 
- Informatica Integration: Spark job ran by Informatica is showing no data in Unravel. (CUSTOMER-1053) 
- API Call Consolidation: Bulk endpoint for apps with summary. (CUSTOMER-676) 
- Do not show Athena Preview from the Applications List if Athena is not set up for that Unravel deployment. (CUSTOMER-930) 
- The Result section displays the default web page for the Web server. (CUSTOMER-1287) 
- Need for a property that disables the Kill/Move feature in UI. (CUSTOMER-889) 
- Usability concerns with date pickers. (CUSTOMER-474) 
- Pig jobs show 0 as the duration for all MR jobs. However, the Gantt chart view shows the duration. (CUSTOMER-1235) 
- Added error handling for generate_app_token.sh. (CUSTOMER-1306) 
- Enabled RBAC for Cluster summary and compare for user role. (CUSTOMER-165) 
- Auto-refresh should save state and can be configurable (New UX Preview). (CUSTOMER-174) 
- Selecting timeframe clears previously selected parameters in the Usage Details Report (New UX Preview). (CUSTOMER-206) 
- More advanced search capabilities and consistent behavior across all search boxes (New UX Preview). (CUSTOMER-341) 
- Keep user preferences in UI (New UX Preview). (CUSTOMER-744) 
- Save page filters for the duration of a session (New UX Preview). (CUSTOMER-489) 
- The Chargeback report's CSV download only gets 1000 pages. (CUSTOMER-748) 
- Running and submitted applications are both blue in the UI. A better color can be chosen instead. (CUSTOMER-823) 
- Updates on the Applications page removes any customizations to the table. (New UX Preview). (CUSTOMER-84) 
- Ability to list all events in application screen for recommended settings or run report vs clicking on individual jobs. (New UX Preview) (CUSTOMER-898) 
- Spark - Job-id search inside Spark Navigation tab is not working. (CUSTOMER-1327) 
- Better ergonomics within the UI when viewing Spark applications (pySpark). (CUSTOMER-299) 
- Ability to search inefficient applications list by app name, user, tables, etc. (CUSTOMER-581) 
- Application filter ignored when FILTER BY APP NAME is also used. (CUSTOMER-807) 
- Persist the selected time period when navigating between tabs within Unravel. (CUSTOMER-86) 
- Data Correctness - Unravel UI doesn't match the resource manager. (CUSTOMER-395) 
- Support API Tokens when using SAML Authentication for UI. (CUSTOMER-908) 
- Improved information when the sensor metrics are missing, especially when the sensor data is not live or sensor configuration was overwritten. (CUSTOMER-831) 
- API Call Consolidation: Bulk endpointWorkflow search still not working. (CUSTOMER-997) 
- Workflow search still not working. (CUSTOMER-997) 
- Many completed workflows are stuck in RUNNING state, and the workflow duration statistics are incorrect. (CUSTOMER-1251) 
- Spark and Tez apps not showing in Oozie workflows. (CUSTOMER-1049) 
- Workflow tagging for Impala queries does not work post CDH 5.13. (EAR-38) 
Bug fixes
Data page
- Added paging for scalable retrieval of Impala queries in building data page, index on column d2 for table dashboard_summaries. Use start_time field instead of created_at in retrieveImpalaQueryInfoByTimeRange(). 
- Reduced sleep time from thread in unravel_tw. 
Reporting
- Implement retention for values in Master Fsimage. (REPORT-1270) 
- Memory and CPU data are missing for Hive on MR apps in the TopX report. (REPORT-1165) 
- get_hive_query_status(): internal error message not found in unravel_ondemand.out. (REPORT-732) 
- One of the small file reports shows No Data Found when we run small files report parallel (race condition)(REPORT-660). 
- Queue analysis now receives the correct metrics when the secondary resource manager is active in the HA configuration. (REPORT-393) 
- Queue analysis failed with list indices error when a specific cluster is selected. (REPORT-370) 
- Queue analysis report graphs are not legible. (REPORT-342) 
- Analyzing queues for multiple clusters may cause overlapping of metrics. (REPORT-297) 
- Capacity forecasting, cluster discovery, and migration planning reports were failing if the value from the property- com.unraveldata.cluster.name did not match the actual cluster name/id. A new property is introduced - property- unravel.python.reporting.cluster.name that indicates the cluster for which these reports should be run. (REPORT-1424) 
- Queues missing in Queue analysis report in HDP environments. (REPORT-294) 
Impala
- A loop is executed while generating events for failed Impala queries. (IMPALA-209) 
- Killed Impala queries are incorrectly classified as failed. (IMPALA-206) 
Platform
- The tagged workflow shows the non-tagged Tez application. (PLATFORM-2158) 
- Exception while downloading event log on clusters where Kerberos is not enabled. (PLATFORM-1646) 
- Queue Metrics Sensor Stops polling after sometime when higher polling rates are set. (PLATFORM-1563) 
- Operations dashboards do not support multi-cluster and have incorrect aggregations. 
- Operations Nodes Dashboard does not capture cluster inactivity in graphs. 
- Spark Application with the same application ID is captured as one. 
- Spark Program / Query graph for Notebook and Python tasks is not supported. 
- Spark default Databricks extraJavaOptions are overwritten by Unravel for spark-submit tasks. 
- DriverOOME and ExecutorOOME events are not generated for the Databricks notebook task. 
- Recommended Azure instances available in Cluster page but not at run time. 
- Recommended Azure instances could be in Beta mode only. 
- Instance recommendation is missing when EMDB is used. 
- The Violation Badge functionality for AutoAction is not working for Impala queries (Running, Killed). (AA-44) 
- EMR: Hive metrics are not published in RUNNING state. (HIVE-135) 
- Latency in fetching the data for MR jobs. (PLATFORM-1613) 
- API connection error while Polling impalad metrics from CM. (PLATFORM-1567) 
- conflicted ephemeral node' or 'Corrupt index found'(PLATFORM-702) 
- gc load metric sensor for MR application will not load on EMR. 
- For PySpark applications, the processCPUTime and the processCPULoad are not captured properly. (USPARK-626) 
- Partition size 0 is shown in the insight message on the timings tab. (USPARK-647) 
- # of Apps is incorrect (PLATFORM-2403)