Skip to main content

Home

Add or Edit the Unity app reports

Use this API to create or edit the configuration for a new report. You can edit the configuration of a report while triggering the report as well.

PUT http://<unity_one_url>/api/reports/<report_name>

For example: PUT http://xyz.unraveldata.com:8111/api/reports/app_catalog_comparison

Parameters

Each report in the Unity app has common parameters as well as report-specific parameters.

The common parameters, which are used in all the reports, are described in the following table:

Name

Type

Description

enabled

boolean

To schedule the report generation at a fixed interval. Value can be true/false

retention_days

integer

The number of days to keep the report.

profile_memory

boolean

To get detailed info on memory usage. Value can be true/false

report_type

string

Type of the report. You can specify any of the following as report_type:

  • Specify app_catalog_comparison for

    App Catalog Comparison report

  • Specify aws_emr_cost for

    AWS EMR Cost report

  • Specify catalog for

    Catalog report

  • Specify loud_sql_migration for

    Cloud SQL Migration report

  • Specify databricks_cost for

    Databricks Cost Breakdown report

  • Specify databricks_event_analysis for

    Databricks Event Analysis report

  • Specify dataflow_migration for

    Dataflow Migration report

  • Specify databricks_savings for

    Databricks Node Downsizing Savings report

  • Specify databricks_user_and_usage for

    Databricks User and Usage report

  • Specify emr_instance_hours for

    EMR Instance Hours report

  • Specify hdfs_utilisation for

    Hdfs and Small files Report

  • Specify impala_event for

    Impala Events report

  • Specify impala_resource_pool_analysis for

    Impala Resource Pool Analysis report

  • Specify impala_slow_hosts for

    Impala Slow Hosts report

  • Specify inefficient_apps for

    Interesting App report

  • Specify migration_wave_plan for

    Migration Wave Plan report

  • Specify pipeline_analytics for

    Pipeline Analytics reports

  • Specify pipelines_comparison for

    Pipelines Comparison report

  • Specify queue_analysis for

    Queue Analysis report

  • Specify recommended_workflow for

    Recommended Workflow report

  • Specify topkapps for

    Top-K Apps report

  • Specify user_usages for

    User And Usage report

notifications

string

To get email notifications. The value should be comma-separated email IDs.

The following sections provide details about report-specific parameters.

 {
    "enabled": false,
    "retention_days": 50,
    "profile_memory": false,
    "report_type": "app_catalog_comparison",
    "params": {
        "kind": "spark",
        "baseline_start_date": "2022-11-01T06:47:57.140Z",
        "baseline_end_date": "2022-11-08T06:47:57.140Z",
        "baseline_days": null,
        "target_start_date": "2022-11-08T06:48:12.199Z",
        "target_end_date": "2022-11-15T06:48:12.199Z",
        "target_days": null,
        "feature_filters": {
            "null": null
        },
        "baseline_feature_filters": {
            "null": null
        }
    }

Name

Type

Description

kind

string

Application kind.

Value can be hive, impala, spark, mr

start_date

string

Start date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

end_date

string

End date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

days *

integer

Number of days to look back.

reports *

list

Reports to be generated.

Reports can be io, memory seconds,cpu time, duration, cost

topk *

integer

Number of topk apps to be included in the report.

feature_filters

Dict

Key-value pairs to filter the data.

{ ‘clusterUid’: ‘clusterUid From Unravel’, ‘clusterId’: ‘clusterId’,
}

users

list

Filter the results by given users.

queues

list

Filter the results by given queues.

clusters

list

Filter the results by given clusters

{
        "start_date": null,
        "end_date": null,
        "days": 12,
        "topk": 20,
        "all_filters": {
            "null": null
        }
    } 

Name

Type

Description

start_date

string

Start date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

end_date

string

End date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

days *

integer

Number of days to look back.

topk *

integer

The number of topk apps to be included in the report.

all_filters

dict

Features to filter the result.

 {
"start_date": "2023-03-01T11:57:25.429Z",
"end_date": "2023-03-27T11:57:25.429747Z",
"days": null,
"topk": 10,
"job_cost_threshold": 10,
"user_cost_threshold": 10,
"cluster_cost_threshold": 10,
"tag_filters": {
"Creator": null,
"RunName": null
},
"tag_cost_threshold": 10,
"billing_file": null,
"azure_account_type": "PAYG" 
}

Name

Type

Description

start_date

string

Start date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

end_date

string

End date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

days *

integer

Number of days to look back.

billing_file*

string

Filter the results by given users.

topk

integer

Number to analyze the topk entities.

job_cost_threshold

integer

The number, which denotes the job cost threshold.

user_cost_threshold

integer

The number, which denotes the user cost threshold.

cluster_cost_threshold

integer

The number, which denotes cluster cost threshold.

zure_account_type

string

The options are PAYG or EA.

{
        "kind": "spark",
        "start_date": null,
        "end_date": null,
        "days": 100,
        "reports": [
            "app_dependencies",
            "app_catalog"
        ],
        "topk": 20,
        "topk_options": [
            "io",
            "memorySeconds",
            "cpuTime",
            "duration",
            "cost"
        ],
        "feature_filters": {
            "null": null
        }
    },

Name

Type

Description

kind

string

Application kind.

Value can be hive, impala, spark, mr

start_date

string

Start date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

end_date

string

End date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

days *

integer

Number of days to look back.

reports *

list

Specifies which report to run, report to include app_dependencies and app_catalog.

topk

integer

The number of topk apps.

topk_options

list

Options can be any of these or all these.

[ "io","memorySeconds", "cpuTime","duration", "cost"]

feature_filters

dict

Key-value pairs to filter the data.

{
        "kind": "impala",
        "target_system": "teradata",
        "start_date": null,
        "end_date": null,
        "days": 12,
        "users": [
            "hive"
        ],
        "queues": [
            "default"
        ],
        "clusters": [
            "PG-CDP"
        ],
        "tag_names": [
            "RealUser"
        ],
        "tag_values": [
            "unravel"
        ],
        "feature_filters": {
            "clusterId": [
                "PG-CDP"
            ]
        }
    }  

Name

Type

Description

start_date

string

Start date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

end_date

string

End date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

days *

integer

Number of days to look back.

kind

string

Application kind.

Value can be hive, impala, spark, mr

target_system*

string

Value can be any of the following target systems:

teradata, snowflake, bigquery, athena, redshift

users

list

Filter the results by given users.

queues

list

Filter the results by given queues.

clusters

list

Filter the results by given clusters.

tag_names

list

Filter the results by given tag names.

tag_values

list

Filter the results by given tag values.

feature_filters

dict

Key-value pairs to filter the data.

    {
        "start_date": null,
        "end_date": null,
        "days": 10,
        "interval": "10sec",
        "queue_config_filepath": "",
         "adhoc_cutoff": 20,
        "users": [
            "hive"
        ],
        "workspace": [
            "default"
        ],
        "clusters": [
            "PG-CDP"
        ],
        "tag_names": [
            "dept"
        ],
        "tag_values": [
            "Operations"
        ]
    }, 

Name

Type

Description

start_date

string

Start date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

end_date

string

End date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

days *

integer

Number of days to look back.

users

list

Filter the results by given users.

workspace

list

Filter the results by the given workspace.

clusters

list

Filter the results by given clusters.

tag_names

list

Filter the results by given tag names.

tag_values

list

Filter the results by given tag values.

cost

list

Cost in dollars.

    {
        "start_date": null,
        "end_date": null,
        "days": 14,
        "reports": [
            "NDSE",
            "SQL_INEFF"
        ],
        "feature_filters": {
            "null": null
        }
    }

Name

Type

Description

start_date

string

Start date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

end_date

string

End date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

days *

integer

Number of days to look back.

reports *

list

Reports to be generated. Reports can be

"NDSE": "Apps with NodeDownsizingEvent", "CDE_NDSE": "Apps with NodeDownsizingEvent and ContendedDriverEvent", "SQL_INEFF": "SQL Apps with High Impact", "PNPE_NDSE": "Apps with NodeDownsizingEvent and PartitionsNotPrunedEvent", "DSE_NDSE": "Apps with NodeDownsizingEvent and DataSkewEvent", "IJE_NDSE": "Apps with NodeDownsizingEvent and InefficientJoinEvent", "IJCE_NDSE": "Apps with NodeDownsizingEvent and InefficientJoinConditionEvent", "SSFE_NDSE": "Apps with NodeDownsizingEvent and ScanSmallFilesEvent", "SSOE_NDSE": "Apps with NodeDownsizingEvent and SlowSQLOperatorEvent"

feature_filters

dict

Key-value pairs to filter the data.

 "params": {                        : Report parameters 
       "start_date": null,           : Start date, to set exact date time.
       "end_date": null,            : End date, to set exact date time.
       "days": 300,                   : No. of days to look back
   } 

Name

Type

Description

start_date

string

Start date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

end_date

string

End date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

days *

integer

Number of days to look back.

users

list

Filter the results by given users.

workspaces

list

NFilter the results by given workspaces.

clusters

list

Filter the results by given clusters

 {                        
   "target_days": 7,              
   "threshold": 0.001,           
   "featherbolt_path": "/opt/unravel/tmp/ondemand_fsimage/featherbolt_files/", 
   "report_values": "tables_small", 
   "project_name": "project",     
   "tenant_name": "playbook", 
   "database_name": "sys",     
    "table_name": "dag_meta"
   }   

Name

Type

Description

target_days *

integer

Number of days to look back.

threshold*

integer

Threshold limit (MB ) value based on the report generated.

featherbolt_path*

string

Featherbolt files path once fs image processed.

report_values

string

tables_small” for Data Tables small files report.

hdfs_space” for HDFS space utilization threshold.

project_name

string

project name if hdfs_space gave as input otherwise it is empty.

tenant_name

string

tenant name if hdfs_space gave as input otherwise it is empty.

database_name

string

database name if tables_small gave as input otherwise it is empty.

table_name

string

table name if tables_small gave as input otherwise it is empty.

    {
        "start_date": null,
        "end_date": null,
        "days": 10,
        "interval": "10sec",
        "queue_config_filepath": "",
        "users": [
            "hive"
        ],
        "clusters": [
            "PG-CDP"
        ],
        "tag_names": [
            "dept"
        ],
        "tag_values": [
            "Operations"
        ],
      “Pools”:[
     “root.default”
    ],
    },  

Name

Type

Description

start_date

string

Start date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

end_date

string

End date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

days *

integer

Number of days to look back.

users

list

Filter the results by given users.

clusters

list

Filter the results by given clusters

tag_names

list

Filter the results by given tag names.

tag_values

list

Filter the results by given tag values.

pools

list

Filter the results by given pools.

  {
        "start_date": null,
        "end_date": null,
        "days": 10,
        "interval": "10sec",
        "queue_config_filepath": "",
        "users": [
            "hive"
        ],
        "queues": [
            "default"
        ],
        "clusters": [
            "PG-CDP"
        ],
        "tag_names": [
            "dept"
        ],
        "tag_values": [
            "Operations"
        ]
    }  

Name

Type

Description

start_date

string

Start date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

end_date

string

End date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

days *

integer

Number of days to look back.

users

list

Filter the results by given users.

queues

list

Filter the results by given queues.

clusters

list

Filter the results by given clusters.

tag_names

list

Filter the results by given tag names.

tag_values

list

Filter the results by given tag values.

 "params": {                        
       "kind": "hive",                        
       "start_date": null,           
       "end_date": null,            
       "days": 300,                   
        "event": "All"                 
   }

Name

Type

Description

kind

string

Application kind.

Value can be hive, impala, spark, mr

start_date

string

Start date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

end_date

string

End date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

days *

integer

Number of days to look back.

event

list

Specific event.

Events can be All, Application Failure, Resource Utilization, Speedup, Informational, Cost Savings

users

list

Filter the results by given users.

queues

list

NFilter the results by given queues.

clusters

list

Filter the results by given clusters

{
        "start_date": null,
        "end_date": null,
        "days": 100,
        "users": null,
        "pools": null,
        "memory_spilled": null,,
        "Rows_produced": null,
        "duration": null,
        "est_per_node_peak_memory": null,
        "per_node_peak_memory": null,
        "aggregate_peak_memory": null,
        "admission_wait_time": null,
        "hdfs_remote_bytes_read": null,
        "statistics_corrupt_or_missing": "False"
    }  

Name

Type

Description

start_date

string

Start date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

end_date

string

End date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

days *

integer

Number of days to look back.

events

list

Specific event.

Events can be All, Application Failure, Resource Utilization, Speedup, Informational, Cost Savings

pools

list

Filter the results by given pools.

memory_spilled

integer

Memory spilled in bytes threshold values.

Rows_produced

integer

Rows produced threshold value.

duration

integer

Query duration in seconds threshold value

statistics_corrupt_or_missing

boolean

Boolean value.  By default, False is given.

est_per_node_peak_memory

integer

Estimated Per Node Peak memory threshold value.

aggregate_peak_memory

integer

Aggregate Peak Memory threshold value.

admission_wait_time

integer

Admission wait time threshold value.

hdfs_remote_bytes_read

integer

HDFS Remote Bytes threshold value.

    {
        "start_date": null,
        "end_date": null,
        "days": 10,
        "interval": "10sec",
        "queue_config_filepath": "",
         "adhoc_cutoff": 20,
        "users": [
            "hive"
        ],
        "queues": [
            "default"
        ],
        "clusters": [
            "PG-CDP"
        ],
        "tag_names": [
            "dept"
        ],
        "tag_values": [
            "Operations"
        ]
    },

Name

Type

Description

start_date

string

Start date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

end_date

string

End date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

days *

integer

Number of days to look back.

users

list

Filter the results by given users.

queues

list

Filter the results by given queues.

clusters

list

Filter the results by given clusters

adhoc_cutoff

integer

Adhoc cut-off value.

tag_names

list

Filter the results by given tag names.

tag_values

list

Filter the results by given tag values.

    {
        "start_date": null,
        "end_date": null,
        "days": 100,
        "topk": 5
    }

Name

Type

Description

start_date

string

Start date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

end_date

string

End date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

days *

integer

Number of days to look back.

topk *

integer

The number of topk apps to be included in the report.

{
        "baseline_start_date": null,
        "baseline_end_date": null,
        "baseline_days": 12,
        "baseline_pipelines": [
            "Recommendation_Cluster",
            "Covid_Data_Processing"
        ],
        "target_start_date": null,
        "target_end_date": null,
        "target_days": 12,
        "target_pipelines": [
            "Recommendation_Cluster",
            "Covid_Data_Processing"
        ],
        "improved_vcore_seconds_threshold": 5,
        "improved_duration_threshold": 25,
        "improved_io_threshold": 5,
        "improved_memory_seconds_threshold": 5,
        "degraded_vcore_seconds_threshold": 5,
        "degraded_duration_threshold": 25,
        "degraded_io_threshold": 5,
        "degraded_memory_seconds_threshold": 5
    },

   

Name

Type

Description

baseline_start_date

string

Baseline start date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

baseline_end_date

string

Baseline end date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

target_start_date

string

target start date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

target_end_date

string

target end date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

target_days *

integer

Number of target days to look back.

target_pipelines *

list

The list should contain target pipeline names that need to be compared.

Improved_vcore_seconds_threshold *

integer

Improved vcores threshold value should be specified here.

Improved_duration_threshold*

integer

The improved duration threshold value should be specified here.

improved_io_threshold*

integer

Improved IO threshold value should be specified here.

improved_memory_seconds_threshold*

integer

Improved memory threshold value should be specified here.

degraded_vcore_seconds_threshold*

integer

The degraded vcores threshold value should be specified here.

degraded_duration_threshold*

integer

The degraded duration threshold value should be specified here.

degraded_io_threshold*

integer

The degraded IO threshold value should be specified here.

degraded_memory_seconds_threshold*

integer

The degraded memory threshold value should be specified here.

"params": {
        "start_date": null,
        "end_date": null,
        "days": 12,
        "clusters": "default",
        "resource_scheduler_port": "http://sd11.unraveldata.com:8088",
        "resource_scheduler_config_path": null,
        "queues": "root.*"
    }

Name

Type

Description

kind

string

Application kind.

Value can be hive, impala, spark, mr

start_date

string

Start date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

end_date

string

End date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

days *

integer

Number of days to look back.

event*

integer

Specific event.

Events can be All, Application Failure, Resource Utilization, Speedup, Informational, Cost Savings

users

list

Filter the results by given users.

queues

list

NFilter the results by given queues.

clusters

list

Filter the results by given clusters

{                        
       "kind": "hive",                 
       "start_date": null,           
       "end_date": null, 
       "retention_days": 5,          
       "days": 300,                   
       "reports": [
           "io"                             
       ],
       "topk": 10,                     
       "feature_filters": {
           "null": null                   
       }
   }   

Name

Type

Description

kind

string

Application kind.

Value can be hive, impala, spark, mr

start_date

string

Start date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

end_date

string

End date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

days *

integer

Number of days to look back.

reports *

list

Reports to be generated.

Reports can be io, memory seconds,cpu time, duration, cost

topk *

integer

The number of topk apps to be included in the report.

feature_filters

Dict

Key-value pairs to filter the data.

{ ‘clusterUid’: ‘clusterUid From Unravel’, ‘clusterId’: ‘clusterId’,
}

users

list

Filter the results by given users.

queues

list

Filter the results by given queues.

clusters

list

Filter the results by given clusters

{
        "kinds": [
            "hive",
            "impala",
            "spark",
            "mr"
        ],
        "start_date": null,
        "end_date": null,
        "days": 12,
        "resource_metric": "memorySeconds",
        "user_db_filepath": "/path/to/csv.csv",
        "group_by_columns": "column1,column2,column3",
        "country_column": "country",
        "join_column": "id"
    },

Name

Type

Description

start_date

string

Start date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

end_date

string

End date, to set exact date time.

Format: 2022-12-07T07:32:09.629946Z

days *

integer

Number of days to look back.

kinds *

int

Kinds should be any/all of these values.

[ "hive","impala","spark","mr"]

topk *

integer

The number of topk apps to be included in the report.

resource_metric*

string

Should be any of these values: memorySeconds, cpu

user_db_filepath

String

Path to user db.

group_by_columns

string

Group by columns.

country_column

string

Country column name.

join_column

string

Join the column by ID.

Sample request
PUT http://xyz.unraveldata.com:8111/api/reports/topk-apps-spark
{
    "enabled": false,
    "retention_days": 50,
    "profile_memory": false,
    "report_type": "topkapps",
    "params": {
        "kind": "spark",
        "start_date": null,
        "end_date": null,
        "days": 100,
        "ldap_conf": false,
        "reports": [
            "io",
            "cost",
            "cpuTime",
            "duration",
            "memorySeconds"
        ],
        "topk": 10,
        "feature_filters": {
            "null": null
        }
    },
    "notifications": {}
}
Status codes

Code

Description

200

Successful operation