Home

Operational insights

The following reports are included:

  • Chargeback YARN - chargeback reports YARN jobs.

  • Chargeback Impala - chargeback reports for Impala jobs.

  • Cluster summary - summary reports for cluster usage.

  • Cluster compare - compares cluster activity between two time periods on the same cluster.

  • Cluster optimization - analyzes the cluster performance and provides fine-tuning insights/recommendations.

  • Queue analysis - analyzes queue activity by apps, vCores and memory.

  • Cluster Yarn workload - shows the aggregated workload for all clusters for Yarn applications.

  • Cluster Impala workload - shows the aggregated workload for all clusters for Yarn applications. This is supported only for CDH platforms in Unravel version 4.6.17.

  • Schedule Jobs - predicts the best-suited time slot to schedule a job

  • Top X - the top X applications by various metrics, for example, the longest duration and most memory.

  • Cluster KPIS - lists the basic KPIs for the cluster, including node health, apps, and events over the time period.

When you can specify a date range or cluster, the pull-down menu for it is on the right-hand side of the Operational Insights title bar. By default, it opens showing Chargeback tab grouped by Application Type for all clusters over the last 24 hours.

Note

See Common UI Features for general information and common features about Unravel's UI.

Chargeback YARN/Impala

The Chargeback YARN and Impala tabs are identical except the reports are limited to YARN and Impala jobs respectively.

You can generate chargeback reports for multi-tenant cluster usage costs sort by the Group By options: Application Type, Real User, User, Queue, and Other (tags, tables, and realuser). The default filter is Application Type.

The tab is divided into three sections:

  • Donut graphs showing the top results for the Group By selection.

  • Chargeback report showing costs, filtered and sorted by the Group By choice.

  • List of all YARN or Impala applications.

Generate chargeback report

You can set the date range and the cluster to use for the report in the Operational Insights title bar. Use the Group By to filter the information based on your selection. You must select one Group By and may select up to two. Each time you select an Other option it is added to the Group By options. If two Group By options are selected, the sort priority is noted. Click an option to deselect it. In this example the report is filtered and then sorted first Application than on the tag project. Note, that while you Group By tags, you can not by tag values. For instance, given <project, projname> you can Group By on <project> but not <projname>.

Clicking a Group By selection toggles it and changes the sort priority. If you only have one group value selected you can't deselect it until you add another one, i.e., there must always be one Group By choice selected. Using this example, if you deselect Application, the tag project becomes the first priority and you cannot deselect it until you add a second choice. To specify the vCore/Hour and Memory MB/Hour costs fractionally enter them directly into the text box. Hovering over the chart brings up a tooltip for that selection. Click Update Report to generate the report.

A new chargeback report is generated each time you change the Group By filters. However, if you change the base costs, you must click Update Report to apply them. The image below is a Chargeback YARN report. Click Download CSV above the table to down it as CSV file.

Important

When downloading the Impala chargeback applications

  • v4.5.5.1: Only the first 10,000 apps are downloaded.

  • v.4.5.5.0 and earlier; Only the first 5,000 are downloaded.

4530 Reports, Operational Insights, Chargeback
Cluster summary

You can group the Cluster Summary by Applications User, or Queue. You can choose the date range and cluster in the title bar. By default, the tab opens displaying the User information. If you group by Applications, you must then choose to Sort by vCore Seconds or Memory Seconds. Click Download Report As to download the report currently displayed and chose either JSON or CSV format. Note, for User or Queue this is the complete report, but for Applications it is only the vCore or Memory portion of the report (whichever is displayed).

Applications

You can sort applications on vCore or Memory seconds.

4.4-Rep-ClSum-AppVcores.png
User
4.4-Rp-ClSum-User.png
Queue
4.4-Rep-ClSum-Q.png
Cluster compare

This tab opens displaying the cluster group by User with the Time Range and Compare with Range both set to the Last 7 Days, i.e., no comparison is displayed.

Use Group By to generate the report by User or Queue. Use the Time Range and Compare With Range pull-down menus to specify the time ranges.

Any deviation in metrics across the time ranges is highlighted (3). A green highlight with an upward arrow indicates an increase in usage, while red with a down arrow denotes a decrease. If the Time or Compare With range is invalid for the Group By choice the row for that time range is dashed (2).

4.4-Rep-ClCmp-U.png
Cluster optimization

Note

The OnDemand package must be installed to use this report.

This report analyzes your cluster workload over a specified period. It provides insights and configuration recommendations to optimize throughput, resources, and performance. Currently, this feature only supports Hive on MapReduce.

You can use these reports to:

  • Fine-tune your cluster to maximize its performance and minimize your costs.

  • Compare your cluster's performance between two time periods.

Reports are generated on an ad hoc or scheduled basis. All reports are archived and can be accessed via the Reports Archive tab. The tab opens displaying the last report, if any, generated.

Download or generate a report

Click Download JSON to download the displayed report in JSON. To download a prior report go to Reports Archive.

4530 OpInsights - Cluster KPIs Sched

Click Generate New Report, the default is one day. To change the date click on it for the date picker to select a new Date Range. Click Run. Running replaces Run and a countdown is displayed until Unravel starts collecting the data. Generate New Report pulsates blue until the report is completed. When the report is successfully generated, a light green bar is displayed.

4.4-Rep-ClOpt-GreenBar.png

Click Schedule instead of run to schedule the report at some future date and time. You can schedule your report to run once or regularly. All reports (successful or failed attempts) are in the Reports Archive.

Optimization report

The Report has three sections.

  • Header Contains the basic report information author, time run, and dates used to generate the report.

44 Reps OpInsight ClstOp-Hdr.png
  • KPIs

    • Number of Jobs: Per day average

    • Number of vCore Hours: Per day average

    • Number of MapReduce Containers

    • % containers for Map

    • % containers for Reduce

    • Amount of memory (in MB) from of MapReduce containers

    • % containers from Map containers

    • % containers from Reduce containers

The KPIs are a per-day average for the number of days in the report. All the insights/recommendations are based upon the analysis of all jobs.

  • Insights/Recommendations

    This section contains a tab for each app type with the relevant properties under consideration for tuning. These are cluster-wide properties, and they are the defaults for all apps. However, you can override these properties on an app by app basis.

    You can expand the insight tile and further drill-down for more details.

Queue analysis

Note

The OnDemand package must be installed to use this report.

You can generate a report of active queues for all your clusters or just one. The report analyzes queue activity by apps, vCores, memory, and disk. As with all reports, it can be generated on an ad hoc or scheduled basis. The tab opens displaying the last report, if any, generated. Reports are archived and can be accessed via the Reports Archive tab.

Cluster Yarn Workload

Displays your cluster's YARN apps' workload across a date range using the following views:

  • Month - by date, for example, October 10.

  • Hour - by hour regardless of date, for example, 10.00 - 11.00.

  • Day - by weekday regardless of date, for example, Tuesday.

  • Hour/Day - by hour for a given weekday, for example, 10.00 -11.00 on Tuesday.

main-menu-cluster-yarn-workload.png

You can filter each view by Jobs, vCores Hour, and Memory Hour.

See Drilling Down in a Workload view for information on how to retrieve the detailed information within each view.

Note

To measure the vCores or Memory Hour usage is straightforward; at any given point, the Memory or vCore is being used or not.

The App Count is not a count of unique app instances because apps can span boundaries, i.e., begin and end in different hours/days.

Jobs reflect the apps that were running within that interval up to and including the boundary, i.e., date, hour, day. Therefore, an app can be counted multiple times in a time slice.

  • On multiple dates, for example, October 11 and 12.

  • In multiple hours, for example, 10 PM, 11 PM, and 12 AM.

  • On multiple days, Thursday and Friday.

  • In multiple hour/day slots.

This results in anomalies where the Sum(24 hours in Hour/Day App Count) > Sum(App Counts for dates representing the day). For instance, in the below example:

  • App Count for Wednesdays (October 10, 17, and 24) = 2492, and

  • App Count across Hour/Day intervals for Wednesday = 2526.

This is pointed out not because it necessarily has a significant impact on how you can use the data but to inform you such variations exist.

The tab opens in the Month view filtered on App Count for the past 24 hours.

  1. Go to the Reports > Operational Insights > Cluster Yarn Workload tab.

  2. From the Cluster drop-down on the right, select a cluster or select All Clusters.

  3. From the Data Range drop-down, select a period range from the date picker drop-down. You can also provide a custom period range.

    A short-range period is recommended. The longer the period range, the more time is consumed for processing.

  4. From the View By drop-down, select one of the following options:

    • Month

    • Day

    • Hour

    • Hour/Day

    You can toggle within these View by options.

  5. Select any one option from Job, vCore Hours, or Memory Hour to change the display metric. The metric you select is used for all subsequent views until changed.

Month view
monthwise.pdf

Displays the monthly view that is run for a specified period range. The color indicates how the day's load compares with the other days within the date range. The day with the least jobs/hours is Wrkld-1.png, while the days with the highest load are Wrkld-5.png. Therefore, any particular day's color varies in context to the other days being displayed; for example, when only one day is shown, it is colored Wrkld-5.png.

Use Previous and Next in the month's title bar to navigate between months.

Only the Month view visually links the app count to a specific date; above, we see February 11 has an app count of 7.

Hour, day, and hour/day views

These graphs do not link jobs to any specific date at the graph level. For instance, the Hour graph shows that 856 jobs ran at 2 AM (between 2 AM and 3 AM); the Day graph that 2,492 jobs ran on Wednesday, and the Hour/Day that 68 jobs ran at 2 am on a Wednesday. But none of these graphs directly indicate the date these jobs ran on. Only the Month view visually links job counts to a specific date.

Each view opens using the metric selected for the prior view. For instance, if vCores Hour is used to display Month and you switch to Day, it is filtered using vCores Hour.

When the DATE RANGE spans multiple days, you have the choice to display the data as either the:

  • Sum - aggregated sum of app count, vCore, or memory hour during the time range (default view).

  • Average - Sum / (# of Days in Date Range).

Hour view
hour.png

Plots the information by hour. The interval label indicates the start, i.e., 2 AM is 2 AM - 3 AM. Hover over an interval for its details. Click the interval to drill down into it.

Day view
day.png

Displays the jobs run on a specific weekday. Hover over an interval for its details. Click the interval to drill down into it.

Hour/Day view
hour-day.png

This view shows the intersection of Hour and Day graphs. The Hour graph showed 856 jobs ran between at 2 AM - 3 AM, while the Day graph (immediately above) that 2,492 jobs ran on Wednesday. Below we see that 68 of Wednesday's jobs (2.7%) were running between 2 AM - 3 AM.

Drilling down in a workload view

Click an interval to bring up its information. In our example, we selected October 11 in the Month view which was filtered on App Count (921 apps). A list breaking out the jobs by app type is displayed. Below we see all 921 were MR jobs.

DetailsBar.png

Click closedBlock.png to display User and Queue details. User is shown by default; click Queue to see all the queues. In this case, there are two users, HDFS (910 jobs) and ROOT (11 jobs). Click Details12.png (job details) to see the running apps for that row. When there are multiple choices shown, Unravel notes which detail is being displayed by highlighting the row. Below there are three options:

  • App Type: MR

    • User: HDFS

    • User: ROOT

We selected the user ROOT, so its row is highlighted. Immediately above the table is noted what's being displayed. See Applications > Applications for more information on the table. Click an app to bring it up in its APM. When you change the metric (App Count, vCores Hour , and Memory Hour), the window reverts to displaying the graph.

Report Clst Workload Month WithApps
Cluster Impala Workload

Displays your cluster's Impala apps' workload across a date range using the following views:

  • Month - by date, for example, October 10.

  • Hour - by hour regardless of date, for example, 10.00 - 11.00.

  • Day - by weekday regardless of date, for example, Tuesday.

  • Hour/Day - by hour for a given weekday, for example, 10.00 -11.00 on Tuesday.

main-impala-workload.png

You can filter each view by Job, CPU Hour, and Memory Hour.

See Drilling Down in a Workload view for information on how to retrieve the detailed information within each view.

Note

To measure the CPU hour or Memory Hour usage is straightforward; at any given point, the Memory or CPU is being used or not.

The App Count is not a count of unique app instances because apps can span boundaries, i.e., begin and end in different hours/days.

Jobs reflect the apps that were running within that interval up to and including the boundary, i.e., date, hour, day. Therefore, an app can be counted multiple times in a time slice.

  • On multiple dates, for example, October 11 and 12.

  • In multiple hours, for example, 10 PM, 11 PM, and 12 AM.

  • On multiple days, Thursday and Friday.

  • In multiple hour/day slots.

This results in anomalies where the Sum(24 hours in Hour/Day App Count) > Sum(App Counts for dates representing the day). For instance, in the below example:

  • App Count for Wednesdays (October 10, 17, and 24) = 2492, and

  • App Count across Hour/Day intervals for Wednesday = 2526.

This is pointed out not because it necessarily has a significant impact on how you can use the data but to inform you such variations exist.

The tab opens in the Month view filtered on App Count for the past 24 hours.

  1. Go to the Reports > Operational Insights > Cluster Impala Workload tab.

  2. From the Cluster drop-down on the right, select a cluster or select All Clusters.

  3. From the Data Range drop-down, select a period range from the date picker drop-down. You can also provide a custom period range.

    A short-range period is recommended. The longer the period range, the more time is consumed for processing.

  4. From the View By drop-down, select one of the following options:

    • Month

    • Day

    • Hour

    • Hour/Day

    You can toggle within these View by options.

  5. Select any one option from Job, CPU Hour, or Memory Hour to change the display metric. The metric you select is used for all subsequent views until changed.

Month view
imp-month-view.png

Displays the monthly view that is run for a specified period range. The color indicates how the day's load compares with the other days within the date range. The day with the least jobs/hours is Wrkld-1.png, while the days with the highest load are Wrkld-5.png. Therefore, any particular day's color varies in context to the other days being displayed; for example, when only one day is shown, it is colored Wrkld-5.png.

Use Previous and Next in the month's title bar to navigate between months.

Only the Month view visually links the app count to a specific date; above, we see February 11 has an app count of 7.

Hour, Day, and Hour/Day views

These graphs do not link jobs to any specific date at the graph level. For instance, the Hour graph shows that 856 jobs ran at 2 AM (between 2 AM and 3 AM); the Day graph that 2,492 jobs ran on Wednesday, and the Hour/Day that 68 jobs ran at 2 am on a Wednesday. But none of these graphs directly indicate the date these jobs ran on. Only the Month view visually links job counts to a specific date.

Each view opens using the metric selected for the prior view. For instance, if CPU Hour is used to display Month and you switch to Day, it is filtered using CPU Hour.

When the DATE RANGE spans multiple days, you have the choice to display the data as either the:

  • Sum - aggregated sum of app count, CPU hour, or memory hour during the time range (default view).

  • Average - Sum / (# of Days in Date Range).

Day view
imp-day-view.png

Displays the jobs run on a specific weekday. Hover over an interval for its details. Click the interval to drill down into it.

Hour view
imp-hour-view.png

Plots the information by hour. The interval label indicates the start, i.e., 2 AM is 2 AM - 3 AM. Hover over an interval for its details. Click the interval to drill down into it.

Hour/Day view
imp-hour-day.png

This view shows the intersection of Hour and Day graphs. The Hour graph showed 856 jobs ran between 2 AM - 3 AM, while the Day graph (immediately above) that 2,492 jobs ran on Wednesday. Below we see that 68 of Wednesday's jobs (2.7%) were running between 2 AM - 3 AM.

Drilling down in a workload view

Click an interval to bring up its information. In our example, we selected October 11 in the Month view which was filtered on App Count (921 apps). A list breaking out the jobs by app type is displayed. Below we see all 921 were MR jobs.

DetailsBar.png

Click closedBlock.png to display User and Queue details. User is shown by default; click Queue to see all the queues. In this case, there are two users, HDFS (910 jobs) and ROOT (11 jobs). Click Details12.png (job details) to see the running apps for that row. When there are multiple choices shown, Unravel notes which detail is being displayed by highlighting the row. Below there are three options:

  • App Type: MR

    • User: HDFS

    • User: ROOT

We selected the user ROOT, so its row is highlighted. Immediately above the table is noted what's being displayed. See Applications > Applications for more information on the table. Click an app to bring it up in its APM. When you change the metric (App Count, CPU Hour, and Memory Hour), the window reverts to displaying the graph.

Report Clst Workload Month WithApps
Schedule Jobs

You can predict the best-suited time slot to schedule a job from this tab. The recommendation for the best time slot is determined based on the usage history of the available vCores and memory in the time period that you select. Thus, you can efficiently run the job with maximum resource utilization and reduced chances of failure.

schedulejob.png

You can view the details of every workflow such as the start time, end time, memory and vCores consumed, etc. in a color-coded heatmap.

Identifying the best time-slot to schedule a job
  1. On the Unravel UI, go to Reports > Operational Insights > Schedule Jobs.

  2. Click schedule.png to select a time period. This historical usage of vCores and memory in this time period is considered to determine the best time-slot to schedule a job.

  3. Optionally, select one of the following to view the usage of vCores and memory in the selected period:

    • Available vCores

    • Available memory

    A color-coded heatmap is displayed, which is based on the historical usage of the selected option. The dark-colored bars indicate the availability of more vCores and memory, whereas the light-colored bars indicate usage and thereby lesser availability of vCores and memory. You can hover over a color-coded bar to view the details of the day, time, available vCores, and available memory.

  4. Enter the following details about the job that you want to schedule:

    • vCores: Estimated number of vCores required to run the job.

    • Memory: Estimated memory to run the job.

    • Duration: Estimated time duration to run the job.

  5. Click Recommend. The best-suited time-slot to schedule your job is displayed on the right panel in a heatmap, with a color-coded bar. The day and time of this slot are displayed along with the available vCores and memory for that slot.

    In case, due to some circumstances if you cannot run the job in the recommended time-slot, then click All Feasible on the right and all the feasible time-slots are displayed. You can choose a time slot from this list.

Top X

Note

The OnDemand package must be installed to use this report.

This tab generates two reports:

  • The top X Hive, Impala, and Spark apps for the following categories:

    • Longest Duration: Time to completion.

    • Highest Disk I/O: Summary of total dfs bytes read and written.

    • Highest Cluster Usage: Summary of map/reduce slot duration.

    • Highest CPU Usage: vCore seconds (Hive on Tez not supported)

    • Highest Memory Usage: Memory seconds (Hive on Tez not supported)

  • User Report: which is a Top X report mailed to specific Users. (Available in Unravel 4.5.3.0 and later.)

Executive KPIS

Note

The OnDemand package must be installed to use this report.

This report lets you quickly see the overall health of your cluster. You can also schedule it to be emailed to users on a regular basis. The report has six tiles.

  • Overall Health of Platform

  • Resources

  • Nodes

  • Usage (HDFS) across the cluster

  • YARN Consumers

  • Impala Consumers

By default, this page displays the data for the past hour. You can change the time range by clicking Schedule Report, then click History (Date Range) and select the available time period. Click Apply. The report is updated for the time period selected. If you wish to schedule a report, select your date range and then click Schedule. Once the report is generated, it is sent to all listed recipients. See Scheduling for an explanation

4530-Rep-OpIn-ClstKPIs-SelTimeRange.png
Overall health of the cluster

KPIs across the entire cluster.

4530-Rep-OpIn-ClstKPI-OverallHealth.png
Resources

Graphs the available and allocated vCores and memory for the entire cluster.

4530-Rep-OpIn-ClstKPI-Res.png
Nodes

Graphs the total number of nodes and the breakdown by node status, active, lost, unhealthy, decommissioned and rebooted.

Total = Active + Unhealthy

Where:

Active: currently running and healthy nodes.
Unhealthy: currently running and unhealthy nodes.
4530-Rep-OpIn-ClstKPI-Node.png
Usage (HDFS) across the cluster (forecasting report)

This is the last Reports > Data Insights > Forecasting report, for HDFS disk capacity.

4530-Rep-OpIn-ClstKPI-Fore.png
YARN and Impala consumers

These tables show the Databases and all the tags associated with the YARN or Impala jobs

The following examples have eight sections.

  • By DBS: Databases the YARN or Impala apps use.

  • By Dept: Tag key.

  • By Inputtables: Input tables.

  • By Outputtables: Output tables.

  • By Project: Tag key.

  • By Realuser: User who submits the app.

  • By unravel.app.name: App name.

  • By User: User who runs the app.

Each section of the table lists all its members. For instance, DBS contains a row for each database while Project contains all the values for the tag.

When no data is found only the By User tables are shown for YARN and Impala since the user is the basis of the YARN and Impala tables.

YARN consumers

Table columns

  • Section Name: Each row lists a section member, for example, DBS will list all the databases.

  • App Count: Number of apps which accessed the section member.

  • CPU Hours: Aggregated CPU hours across all the apps accessing the section member.

  • % Total CPU Hours: Percentage of the (Total CPU Hours)/(Aggregated CPU Hours across all apps).

  • Memory Hours: Aggregated member hours time across all the apps accessing the section member.

  • % Memory Hours: Percentage of the (Total Memory Hours)/(Aggregated Memory Hours across all apps).

Reports OpInsights Cluster KPIs-YARN
Impala consumers

Table Columns

  • Section Name: Each row lists a section member, for example, DBS has a row for each database.

  • App Count: Number of apps which accessed the section member.

  • Total Processing Time Hours: Aggregated processing hours across all the apps accessing the section member.

  • % Total Processing Hours: Percentage of the (Total Processing Time Hours)/(Aggregated Processing Time Hours across all apps).

  • Memory Hours: Aggregated member hours time across all the apps accessing the section member.

  • % Memory Hours: Percentage of the (Total Memory Hours)/(Aggregated Memory Hours across all apps).

Reports OpInsights Cluster KPIs-Impala