Skip to main content

Home

TopK report

The TopK report is a report of apps ranked among the TopK involving IO, memory, CPU usage, duration metrics, memory seconds and vcore seconds. You can generate the TopK report from the UI and the command line. This report is applicable for analyzing data from MR, Hive, Spark, and Impala applications.

  1. Click Generate Reports > New button.

  2. In the New Report dialog box, enter the following details.

    TopK

    Items

    Description

    General

    Name

    Name of the report.

    Environment

    Select your platform. The reports corresponding to the selected platform are only listed in the Report type. If you select the All option, all the reports are listed.

    Report type

    Type of report. Select TopK.

    Schedule

    Select the checkbox to schedule the report to run daily, hourly, weekly, or monthly. You can also set the schedule using a cron expression. You can expand the Example drop-down and select the corresponding options. The next four sample run times are displayed for reference.

    Retention

    The number of days to retain the report files. All the reports are stored in the unity-one/src/assets/reports/jobs directory. After completion of the retention period, the report files are automatically purged.

    Parameters

    Application Kind

    Type of application where you want to run the report. The options are Impala, Hive, Spark, and MR.

    Look Back

    Period range in days when you can select applications for report generation. A notification is displayed above this option informing about the duration when data is available for the report.

    Use Exact Date-Time

    Check this option and select the from and to dates to schedule the report for the exact time range specified. The only anomaly with this option is that the same report will be scheduled to be generated repeatedly.

    Users

    Select the users whom you want to include in the report. You can select multiple users. If none are selected, all detected users will be included in the report.

    Queues

    Select the queues that you want to include in the report. You can select multiple queues. If none are selected, all queues detected will be included in the report.

    Clusters

    Select the clusters that you want to include in the report. You can select multiple clusters. If none are selected, all clusters detected will be included in the report.

    Tag Names

    Select the tag names that you want to include in the report. You can select multiple tags. If none are selected, all detected tags will be included in the report.

    Reports

    Select the metrics that must be included in the report. The following metrics are available to choose from:

    • Duration

    • I/O

    • Memory seconds

    • CPU time

    • Cost (only for Databricks)

    Tag Values

    Select the tag values corresponding to the tag names you have selected that you want to include in the report. Options for tag values will only show when you select a tag. You can select multiple tag values. If none are selected, all tag values detected for a tag will be included in the report.

    TopK

    The threshold of the highest-ranking score, which must be shown for the metrics chosen for the applications.

    Features Filter

    Filters the key-value features fetched from feature indices. The custom features/fields can be added in features indices by modifying the customizable_feature.py. Those features will be available here.

    Notifications

    Email to

    Email ID to send the notification when the report is generated. You can select the Attach Files to Email checkbox to receive the reports as an attachment.

  3. Click OK. The generated reports will be listed under Reports on the App UI.

  4. Select the generated report and then click Run. After the report is successfully run, the details of the report runs are listed in the Run box on the right.

  5. Click the following:

    • HTML files link to view the report details.

    • Input parameters link to view the parameters you chose to run the report.

    • Log file link to view the logs of the report.

  1. From the terminal, change directories to unity-one/src directory.

    cd /opt/unravel/data/apps/unity-one/src
  2. Use a screen editor, such as vi editor, to edit the job.json file and copy paste the following text into the file.

    {
        "enabled": false,
        "retention_days": 10,
        "report_type": "topkapps",
        "params": {
            "kind": "hive",
            "start_date": null,
            "end_date": null,
            "days": 300,
            "reports": [
                "io",
                "cpuTime",
                "memorySeconds",
                "duration"
            ],
            "topk": 10
        },
        "notifications": {}
    }

    Parameters

    Description

    kind

    Type of data. For example, Hive, Impala, Spark.

    start_date

    Can be null or a valid start date.

    end_date

    Can be null or a valid start date.

    days

    The number of days to look back while downloading data.

    reports

    An array of types of reports, which must be included in the report.

    topk

    The number of applications to be shown in the report.

  3. Save and exit the screen editor.

  4. Run the following command to generate the report:

    ./unity-cli.sh report generate job.json topkreport true

The following report is generated:

topk-report-sample.png
  • In this TopK report sample, the Top 20 Impala apps based on the CPU Time parameter are listed. The report is identified by a unique Query Hash based on all the chosen parameters of the report schedule.

  • The total number of runs, I/O, CPU time, Memory seconds, and duration are listed in the columns.

  • A sparkline depicts the I/O trend.

  • A few examples of the report runs are listed in the Links to some App Runs column. When you click the link it leads to the Application details page in Unravel where there is comprehensive information about the app runs.

  • Users and Queues parameters selected while creating the report are listed.

Viewing metrics and insights
  1. Open the report and click any app link from the Signature column.

    The Metrics tab displays the trend charts for the Duration, IO, MemorySeconds, and CPU metrics. The dots plotted on the trend charts are the app runs. The app info is provided in the right panel.

  2. You can select any of the following features to view the metric trends:

    • status

    • cluster id

    • cluster UID

  3. Click any dot on any of the Metric trend charts. This becomes the app target run. The details of the app target run appear in the App Run info panel on the right.

  4. In the App Run info panel, click the ID link. This is the ID link of the app target run. The Metric tab is shown with the app target run you have chosen in a different color. The details of the app target run are displayed in the App Target Run panel on the right. Click the View in Unravel link to view the app's details in Unravel.

    targetapprun-highlighted.png
  5. Select any dot (other than the dot for the app target run) on any of the Metric trend charts. This becomes the app baseline run. This is highlighted in a different color. The details of the app target run are displayed in the App Baseline Run panel on the right. Click the View in Unravel link to view the app's details in Unravel.

    baseapprun-highlighted.png
  6. In the App Baseline Run panel, click Compare.

    1. Select an app from the App Type options:

      • Hive on Tez

      • Spark

      • Impala

    2. From the Cluster, select a cluster.

    3. Specify the app runs that you want to compare.

      • Baseline App Run: Specify the ID of the baseline app.

      • Target App Run: Specify the ID of the target app with which you want to compare the baseline app.

    4. Click Compare. The following app details are given for the baseline and target app run:

      • App details

      • Cluster resource usage when these apps were running

      • App statistics comparison details

      • App config comparison details

      • Query string comparison details

      • Query plan comparison details

      unity-app-app-comparator-results.png
  7. In the App Baseline Run panel, click Refresh Insights. The insights are provided in the following sections.

    • Baseline comparison

      The results of the comparison are displayed.

    • Concurrent heavy hitter apps

      This section will show up only when the information is available.

      Select this option to view the top five applications in a specific category.

      1. Click the Concurrent heavy hitter apps option.

      2. Select any option from the Top-5 Applications option.

        troubleshooting-tuning-concurrent-heavy-hitter.png

        The corresponding details are displayed.

        troubleshooting-tuning-concurrent-heavy-hitter1.png