Table Access report

Use the Table Access Report to extract and review Hive, Impala, and Spark application data. The report generates CSV files and an HTML report. It extracts Hive and Impala applications into hive_impala_apps.csv and Spark applications into spark_apps.csv based on a specified time range.

Key Features

Query Performance Analysis: Identify slow queries, track execution trends, and analyze resource allocation.
Data Access Patterns: Determine frequently accessed tables for indexing and partitioning strategies.
Resource Utilization & Optimization: Analyze data read/write volumes to optimize storage and processing.
Query Failure & Debugging: Identify and troubleshoot failed or recurring problematic queries.

Generating a New Report

The following are the mandatory fields for generating a table access report:

Report Name: Name the report.
Retention*: Define the retention period.
Look Back: Set the time range for data extraction.
Email ID: Provide an email address for mail notifications (optional).

For details on the other fields, refer to the following section:

Generating reports

Click Generate Reports > New button.

In the New Report dialog box, enter the following details.

Items	Description
General
Name	Name of the report.
Environment	Select your platform. The reports corresponding to the selected platform are only listed in the Report type. If you select the All option, all the reports are listed.
Report type	Type of report. Select the report
Schedule	Select the checkbox to schedule the report to run daily, hourly, weekly, or monthly. You can also set the schedule can using a cron expression. You can expand the Example drop-down and select the corresponding options. The next four sample run times are displayed for reference.
Retention	The number of days to retain the report files. All the reports are stored in the `unity-one/src/assets/reports/jobs` directory. After completion of the retention period, the report files are automatically purged.
Parameters
Application kind	Select the type of application.
Baseline / Target Specify the following details for comparing the base pipeline and the target pipeline.
Look Back	The period ranges in days when applications can be selected for report generation. A notification is displayed above this option that informs about the duration when data is available for the report.
Use Exact Date-Time	Check this option and select the from and to dates to schedule the report for the specified time range. The only anomaly with this option is that the same report will be scheduled to be generated repeatedly.
Users	Select the users who must be included in the report. You can select multiple users. To select all users, you can leave the field blank.
Queues	Select the queues that you want to be included in the report. You can select multiple queues. To select all queues, you can leave the field blank.
Clusters	Select the clusters that you want to be included in the report. You can select multiple clusters. To select all clusters, you can leave the field blank.
Features Filter	Specify the Key-Value pair. You can add or remove a pair.
Notifications
Email to	Email ID to send the notification when the report is generated. You can specify multiple email IDs with comma separation. You can also select the Attach Files to Email checkbox to receive the reports as an attachment.
Advance Options
Profile Memory	Select this option if you want to generate logs that help to troubleshoot scenarios where the report takes excessive time to generate or fails to generate. Caution The option will increase the report run time excessively.

Click OK. The generated reports will be listed under Reports on the App UI.
Select the generated report and then click Run. After the report is successfully run, the details of the report runs are listed in the Run box on the right.
Click the following:
- HTML files link to view the report details.
- Input parameters link to view the parameters you chose to run the report.
- Log file link to view the logs of the report.

Using the Report

After specifying the inputs, the system creates two CSV files:

hive_impala_apps.csv
- Contains combined details of Hive and Impala applications.
spark_apps.csv
- Contains details of Spark applications.

The following fields are available providing an overview of query execution details in a Hive/Impala environment:

id : A unique identifier for each query.
queryStringFull : The full SQL query executed.
kind : The type of query engine used (e.g., Hive or Impala).
queue : The queue used for executing the query.
clusterId : The cluster identifier where the query was executed.
userName : The user who executed the query.
user : A more detailed user identifier.
startTime : The timestamp when the query execution started.
finishedTime : The timestamp when the query execution finished.
Duration(Seconds) : The total time taken for query execution in seconds.
inputTables : The list of tables read by the query.
outputTables : The list of tables written to by the query.
totalDfsBytesRead : The total amount of data read from the distributed file system (DFS) in bytes.
totalDfsBytesWritten : The total amount of data written to the DFS in bytes.
status : The final status of the query execution

In this section:

Home