Home

Operational insights

4530-Rep-OpInsight-Title.png
  • Chargeback YARN - chargeback reports YARN jobs.

  • Chargeback Impala - chargeback reports for Impala jobs.

  • Cluster summary - summary reports for cluster usage.

  • Cluster compare - compares cluster activity between two time periods on the same cluster.

  • Cluster optimization - analyzes the cluster performance and provides fine-tuning insights/recommendations.

  • Queue analysis - analyzes queue activity by apps, vCores and memory.

  • Cluster workload - shows the aggregated workload for all clusters.

  • Top X - the top X applications by various metrics, for example the longest duration and most memory.

  • Cluster KPIS - lists the basic KPIs for the cluster, including node health, apps and events over the time period.

When you can specify a date range or cluster, the pull down menu for it is on the right-hand side of the Operational Insights title bar. By default, it opens showing Chargeback tab grouped by Application Type for all clusters over the last 24 hours.

Note

Click here for common features used throughout Unravel's UI.

Chargeback YARN/Impala

The Chargeback YARN and Impala tabs are identical except the reports are limited to YARN and Impala jobs respectively.

You can generate chargeback reports for multi-tenant cluster usage costs sort by the Group By options: Application Type, Real User, User, Queue, and Other (tags, tables, and realuser). The default filter is Application Type.

The tab is divided into three sections:

  • Donut graphs showing the top results for the Group By selection.

  • Chargeback report showing costs, filtered and sorted by the Group By choice.

  • List of all YARN or Impala applications.

Generate chargeback report

You can set the date range and the cluster to use for the report in the Operational Insights title bar. Use the Group By to filter the information based on your selection. You must select one Group By and may select up to two. Each time you select an Other option it is added to the Group By options. If two Group By options are selected, the sort priority is noted. Click an option to deselect it. In this example the report is filtered and then sorted first Application than on the tag project. Note, that while you Group By tags, you can not by tag values. For instance, given <project, projname> you can Group By on <project> but not <projname>.

Clicking a Group By selection toggles it and changes the sort priority. If you only have one group value selected you can't deselect it until you add another one, i.e., there must always be one Group By choice selected. Using this example, if you deselect Application, the tag project becomes the first priority and you cannot deselect it until you add a second choice. To specify the vCore/Hour and Memory MB/Hour costs fractionally enter them directly into the text box. Hovering over the chart brings up a tooltip for that selection. Click Update Report to generate the report.

A new chargeback report is generated each time you change the Group By filters. However, if you change the base costs, you must click Update Report to apply them. The image below is a Chargeback YARN report. Click Download CSV above the table to down it as CSV file.

Important

When downloading the Impala chargeback applications

  • v4.5.5.1: Only the first 10,000 apps are downloaded.

  • v.4.5.5.0 and earlier; Only the first 5,000 are downloaded.

4530 Reports, Operational Insights, Chargeback
Cluster summary

You can group the Cluster Summary by Applications User, or Queue. You can choose the date range and cluster in the title bar. By default, the tab opens displaying the User information. If you group by Applications, you must then choose to Sort by vCore Seconds or Memory Seconds. Click Download Report As to download the report currently displayed and chose either JSON or CSV format. Note, for User or Queue this is the complete report, but for Applications it is only the vCore or Memory portion of the report (whichever is displayed).

Applications

You can sort applications on vCore or Memory seconds.

4.4-Rep-ClSum-AppVcores.png
User
4.4-Rp-ClSum-User.png
Queue
4.4-Rep-ClSum-Q.png
Cluster compare

This tab opens displaying the cluster group by User with the Time Range and Compare with Range both set to the Last 7 Days, i.e., no comparison is displayed.

Use Group By to generate the report by User or Queue. Use the Time Range and Compare With Range pull-down menus to specify the time ranges.

Any deviation in metrics across the time ranges is highlighted (3). A green highlight with an upward arrow indicates an increase in usage, while red with a down arrow denotes a decrease. If the Time or Compare With range is invalid for the Group By choice the row for that time range is dashed (2).

4.4-Rep-ClCmp-U.png
Cluster optimization

Note

The OnDemand package must be installed to use this report.

This report analyzes your cluster workload over a specified period. It provides insights and configuration recommendations to optimize throughput, resources, and performance. Currently, this feature only supports Hive on MapReduce.

You can use these reports to:

  • Fine-tune your cluster to maximize its performance and minimize your costs.

  • Compare your cluster's performance between two time periods.

Reports are generated on an ad hoc or scheduled basis. All reports are archived and can be accessed via the Reports Archive tab. The tab opens displaying the last report, if any, generated.

Download or generate a report

Click Download JSON to download the displayed report in JSON. To download a prior report go to Reports Archive.

4530 OpInsights - Cluster KPIs Sched

Click Generate New Report, the default is one day. To change the date click on it for the date picker to select a new Date Range. Click Run. Running replaces Run and a countdown is displayed until Unravel starts collecting the data. Generate New Report pulsates blue until the report is completed. When the report is successfully generated, a light green bar is displayed.

4.4-Rep-ClOpt-GreenBar.png

Click Schedule instead of run to schedule the report at some future date and time. You can schedule your report to run once or regularly. All reports (successful or failed attempts) are in the Reports Archive.

Optimization report

The Report has three sections.

  • Header Contains the basic report information author, time run, and dates used to generate the report.

44 Reps OpInsight ClstOp-Hdr.png
  • KPIs

    • Number of Jobs: Per day average

    • Number of vCore Hours: Per day average

    • Number of MapReduce Containers

    • % containers for Map

    • % containers for Reduce

    • Amount of memory (in MB) from of MapReduce containers

    • % containers from Map containers

    • % containers from Reduce containers

The KPIs are a per-day average for the number of days in the report. All the insights/recommendations are based upon the analysis of all jobs.

  • Insights/Recommendations

    This section contains a tab for each app type with the relevant properties under consideration for tuning. These are cluster-wide properties, and they are the defaults for all apps. However, you can override these properties on an app by app basis.

    You can expand the insight tile and further drill-down for more details.

Queue analysis

Note

The OnDemand package must be installed to use this report.

You can generate a report of active queues for all your clusters or just one. The report analyzes queue activity by apps, vCores, memory, and disk. As with all reports, it can be generated on an ad hoc or scheduled basis. The tab opens displaying the last report, if any, generated. Reports are archived and can be accessed via the Reports Archive tab.

Cluster workload

Displays your cluster's YARN apps' workload across a date range using the following views:

  • Month - by date, for example, October 10.

  • Hour - by hour regardless of date, for example, 10.00 - 11.00.

  • Day - by weekday regardless of date, for example, Tuesday.

  • Hour/Day - by hour for a given weekday, for example, 10.00 -11.00 on Tuesday.

You can filter each view by App Count, vCores Hour, and Memory Hour.

Note

To measure the vCores or Memory Hour usage is straightforward; at any given point the Memory or vCore is being used or not.

The App Count isn't a count of unique app instances because apps can span boundaries, i.e., begin and end in different hours/days.

The App Count reflects the apps that were running within that interval up to and including the boundary, i.e., date, hour, day. Therefore, an app can be counted multiple times in time slice.

  • On multiple dates, for example, October 11 and 12.

  • In multiple hours, for example, 10 PM, 11 PM & 12 AM.

  • On multiple days, Thursday & Friday.

  • In multiple hour/day slots.

This results in anomalies where the Sum(24 hours in Hour/Day App Count) > Sum(App Counts for dates representing the day). For instance, in the below example:

  • App Count for Wednesdays (October 10, 17 and 24) = 2492, and

  • App Count across Hour/Day intervals for Wednesday = 2526.

We point this out not because it necessarily has a significant impact in how you can use the data, but to inform you such variations exist.

The tab opens in the Month view filtered on App Count for the past 24 hours. Use the Date Range date picker to change the range. We suggest using a short range as the longer the range the more processing time consumed. Click App Count, vCores Hour, and Memory Hour to change the display metric. The metric you select is used for all subsequent views until changed. Click View By to change between views. Immediately above the graph it textually notes the metric being used for the time range. When the date range is greater than one day the Hour, Day, and Hour/Day views allow you to display the data by either as an Average or Sum.

See Drilling Down below for information on how to retrieve the detailed information within each view.

Month view

Displays the jobs run on the particular date. The color indicates how the day's load compares with the other days within the date range. The day with the least jobs/hours is Wrkld-1.png, while the days with the highest load are Wrkld-5.png. Therefore, the color of any particular day varies in context to the other days being displayed, e.g., when only one day is displayed it is colored Wrkld-5.png. Use Previous and Next in the month's title bar to navigate between months.

Report Clst Workload Monday
Hour, day and hour/day view

These graphs don't link jobs to any specific date at the graph level. For instance, the Hour graph shows that 856 jobs ran at 2 AM (between 2 AM and 3 AM); the Day graph that 2,492 jobs ran on Wednesday, and the Hour/Day that 68 jobs ran at 2 am on a Wednesday. But none of these graphs directly indicate the date these jobs ran on. Only the Month view visually links job counts to a specific date; above we see October 10 had an app count of 822.

Each view opens using the metric selected for the prior view. For instance, if vCores Hour is used to display Month and you switch to Day it is filtered using vCores Hour.

When the DATE RANGE spans multiple days, you have the choice to display the data as either the:

  • Sum - aggregated sum of job count, vCore or memory hour during the time range (default view).

  • Average - Sum / (# of Days in Date Range).

Hour view

Breaks out information by hour. The interval label indicates the start, i.e., 2 AM is 2 AM - 3 AM. Hover over an interval for its details. Click the interval to drill down into it.

Rp_ClWrkld-24Hour-2am.png
Day view

Displays the jobs run on a specific weekday. Hover over an interval for its details. Click the interval to drill down into it.

Report Clst Workload Day Wednesday
Hour/Day view

This view shows the intersection of Hour and Day graphs. The Hour graph showed 856 jobs ran between at 2 AM - 3 AM while the Day graph (immediately above) that 2,492 jobs ran on Wednesday. Below we see that 68 of Wednesday's jobs (2.7%) were running between 2 AM - 3 AM.

Report Clst Workload Day 2ndWednesday
Drilling down in a workload view

Click an interval to bring up its information. In our example, we selected October 11 in the Month view which was filtered on App Count (921 apps). A list breaking out the jobs by app type is displayed. Below we see all 921 were MR jobs.

DetailsBar.png

Click closedBlock.png to display User and Queue details. User is displayed by default; click Queue to see all the queues. In this case there are two users, HDFS (910 jobs) and ROOT (11 jobs). Click Details12.png (job details) to see the running apps for that row. When there are multiple choices shown, Unravel notes which detail is being displayed by highlighting the row. Below there are three options:

  • App Type: MR

    • User: HDFS

    • User: ROOT

We selected the user ROOT so its row is highlighted. Immediately above the table is noted what's being displayed. See Applications > Applications for more information on the table. Click an app to bring it up in its APM. When you change the metric (App Count, vCores Hour and Memory Hour) the window reverts to displaying the graph.

Report Clst Workload Month WithApps
Top X

Note

The OnDemand package must be installed to use this report.

This tab generates two reports:

  • The top X Hive, Impala, and Spark apps for the following categories:

    • Longest Duration: Time to completion.

    • Highest Disk I/O: Summary of total dfs bytes read and written.

    • Highest Cluster Usage: Summary of map/reduce slot duration.

    • Highest CPU Usage: vCore seconds (Hive on Tez not supported)

    • Highest Memory Usage: Memory seconds (Hive on Tez not supported)

  • User Report: which is a Top X report mailed to specific Users. (Available in Unravel 4.5.3.0 and later.)

Executive KPIS

Note

The OnDemand package must be installed to use this report.

This report lets you quickly see the overall health of your cluster. You can also schedule it to be emailed to users on a regular basis. The report has six tiles.

  • Overall Health of Platform

  • Resources

  • Nodes

  • Usage (HDFS) across the cluster

  • YARN Consumers

  • Impala Consumers

By default, this page displays the data for the past hour. You can change the time range by clicking Schedule Report, then click History (Date Range) and select the available time period. Click Apply. The report is updated for the time period selected. If you wish to schedule a report, select your date range and then click Schedule. Once the report is generated, it is sent to all listed recipients. See Scheduling for an explanation

4530-Rep-OpIn-ClstKPIs-SelTimeRange.png
Overall health of the cluster

KPIs across the entire cluster.

4530-Rep-OpIn-ClstKPI-OverallHealth.png
Resources

Graphs the available and allocated vCores and memory for the entire cluster.

4530-Rep-OpIn-ClstKPI-Res.png
Nodes

Graphs the total number of nodes and the breakdown by node status, active, lost, unhealthy, decommissioned and rebooted.

Total = Active + Unhealthy

Where:

Active: currently running and healthy nodes.
Unhealthy: currently running and unhealthy nodes.
4530-Rep-OpIn-ClstKPI-Node.png
Usage (HDFS) across the cluster (forecasting report)

This is the last Reports > Data Insights > Forecasting report, for HDFS disk capacity.

4530-Rep-OpIn-ClstKPI-Fore.png
YARN and Impala consumers

These tables show the Databases and all the tags associated with the YARN or Impala jobs

The following examples have eight sections.

  • By DBS: Databases the YARN or Impala apps use.

  • By Dept: Tag key.

  • By Inputtables: Input tables.

  • By Outputtables: Output tables.

  • By Project: Tag key.

  • By Realuser: User who submits the app.

  • By unravel.app.name: App name.

  • By User: User who runs the app.

Each section of the table lists all its members. For instance, DBS contains a row for each database while Project contains all the values for the tag.

When no data is found only the By User tables are shown for YARN and Impala since the user is the basis of the YARN and Impala tables.

YARN consumers

Table columns

  • Section Name: Each row lists a section member, for example, DBS will list all the databases.

  • App Count: Number of apps which accessed the section member.

  • CPU Hours: Aggregated CPU hours across all the apps accessing the section member.

  • % Total CPU Hours: Percentage of the (Total CPU Hours)/(Aggregated CPU Hours across all apps).

  • Memory Hours: Aggregated member hours time across all the apps accessing the section member.

  • % Memory Hours: Percentage of the (Total Memory Hours)/(Aggregated Memory Hours across all apps).

Reports OpInsights Cluster KPIs-YARN
Impala consumers

Table Columns

  • Section Name: Each row lists a section member, for example, DBS has a row for each database.

  • App Count: Number of apps which accessed the section member.

  • Total Processing Time Hours: Aggregated processing hours across all the apps accessing the section member.

  • % Total Processing Hours: Percentage of the (Total Processing Time Hours)/(Aggregated Processing Time Hours across all apps).

  • Memory Hours: Aggregated member hours time across all the apps accessing the section member.

  • % Memory Hours: Percentage of the (Total Memory Hours)/(Aggregated Memory Hours across all apps).

Reports OpInsights Cluster KPIs-Impala