Add or Edit the Unity app reports

Use this API to create or edit the configuration for a new report. You can edit the configuration of a report while triggering the report as well.

PUT http://<unity_one_url>/api/reports/<report_name>

For example: PUT http://xyz.unraveldata.com:8111/api/reports/app_catalog_comparison

Parameters

Each report in the Unity app has common parameters as well as report-specific parameters.

Common parameters for all reports

The common parameters, which are used in all the reports, are described in the following table:

Name	Type	Description
`enabled`	boolean	To schedule the report generation at a fixed interval. Value can be true/false
`retention_days`	integer	The number of days to keep the report.
`profile_memory`	boolean	To get detailed info on memory usage. Value can be true/false
`report_type`	string	Type of the report. You can specify any of the following as `report_type`: Specify `app_catalog_comparison` for App Catalog Comparison report Specify `aws_emr_cost` for AWS EMR Cost report Specify `catalog` for Catalog report Specify `loud_sql_migration` for Cloud SQL Migration report Specify `databricks_cost` for Databricks Cost Breakdown report Specify `databricks_event_analysis` for Databricks Event Analysis report Specify `dataflow_migration` for Dataflow Migration report Specify `databricks_savings` for Databricks Node Downsizing Savings report Specify `databricks_user_and_usage` for Databricks User and Usage report Specify `emr_instance_hours` for EMR Instance Hours report Specify `hdfs_utilisation` for Hdfs and Small files Report Specify `impala_event` for Impala Events report Specify `impala_resource_pool_analysis` for Impala Resource Pool Analysis report Specify `impala_slow_hosts` for Impala Slow Hosts report Specify `inefficient_apps` for Interesting App report Specify `migration_wave_plan` for Migration Wave Plan report Specify `pipeline_analytics` for Pipeline Analytics reports Specify `pipelines_comparison` for Pipelines Comparison report Specify `queue_analysis` for Queue Analysis report Specify `recommended_workflow` for Recommended Workflow report Specify `topkapps` for Top-K Apps report Specify `user_usages` for User And Usage report
`notifications`	string	To get email notifications. The value should be comma-separated email IDs.

Report specific parameters

The following sections provide details about report-specific parameters.

App catalog comparison

 {
    "enabled": false,
    "retention_days": 50,
    "profile_memory": false,
    "report_type": "app_catalog_comparison",
    "params": {
        "kind": "spark",
        "baseline_start_date": "2022-11-01T06:47:57.140Z",
        "baseline_end_date": "2022-11-08T06:47:57.140Z",
        "baseline_days": null,
        "target_start_date": "2022-11-08T06:48:12.199Z",
        "target_end_date": "2022-11-15T06:48:12.199Z",
        "target_days": null,
        "feature_filters": {
            "null": null
        },
        "baseline_feature_filters": {
            "null": null
        }
    }

Name	Type	Description
`kind`	string	Application kind. Value can be hive, impala, spark, mr
`start_date`	string	Start date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`end_date`	string	End date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`days *`	integer	Number of days to look back.
`reports *`	list	Reports to be generated. Reports can be io, memory seconds,cpu time, duration, cost
`topk *`	integer	Number of topk apps to be included in the report.
`feature_filters`	Dict	Key-value pairs to filter the data. { ‘clusterUid’: ‘clusterUid From Unravel’, ‘clusterId’: ‘clusterId’, }
`users`	list	Filter the results by given users.
`queues`	list	Filter the results by given queues.
`clusters`	list	Filter the results by given clusters

AWS EMR cost

{
        "start_date": null,
        "end_date": null,
        "days": 12,
        "topk": 20,
        "all_filters": {
            "null": null
        }
    }

Name	Type	Description
`start_date`	string	Start date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`end_date`	string	End date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`days *`	integer	Number of days to look back.
`topk *`	integer	The number of topk apps to be included in the report.
`all_filters`	dict	Features to filter the result.

Azure billing analysis

 {
"start_date": "2023-03-01T11:57:25.429Z",
"end_date": "2023-03-27T11:57:25.429747Z",
"days": null,
"topk": 10,
"job_cost_threshold": 10,
"user_cost_threshold": 10,
"cluster_cost_threshold": 10,
"tag_filters": {
"Creator": null,
"RunName": null
},
"tag_cost_threshold": 10,
"billing_file": null,
"azure_account_type": "PAYG" 
}

Name	Type	Description
`start_date`	string	Start date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`end_date`	string	End date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`days *`	integer	Number of days to look back.
`billing_file*`	string	Filter the results by given users.
`topk`	integer	Number to analyze the topk entities.
`job_cost_threshold`	integer	The number, which denotes the job cost threshold.
`user_cost_threshold`	integer	The number, which denotes the user cost threshold.
`cluster_cost_threshold`	integer	The number, which denotes cluster cost threshold.
`zure_account_type`	string	The options are PAYG or EA.

Catalog

{
        "kind": "spark",
        "start_date": null,
        "end_date": null,
        "days": 100,
        "reports": [
            "app_dependencies",
            "app_catalog"
        ],
        "topk": 20,
        "topk_options": [
            "io",
            "memorySeconds",
            "cpuTime",
            "duration",
            "cost"
        ],
        "feature_filters": {
            "null": null
        }
    },

Name	Type	Description
`kind`	string	Application kind. Value can be hive, impala, spark, mr
`start_date`	string	Start date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`end_date`	string	End date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`days *`	integer	Number of days to look back.
`reports *`	list	Specifies which report to run, report to include app_dependencies and app_catalog.
`topk`	integer	The number of topk apps.
`topk_options`	list	Options can be any of these or all these. [ "io","memorySeconds", "cpuTime","duration", "cost"]
`feature_filters`	dict	Key-value pairs to filter the data.

Cloud SQL migration

{
        "kind": "impala",
        "target_system": "teradata",
        "start_date": null,
        "end_date": null,
        "days": 12,
        "users": [
            "hive"
        ],
        "queues": [
            "default"
        ],
        "clusters": [
            "PG-CDP"
        ],
        "tag_names": [
            "RealUser"
        ],
        "tag_values": [
            "unravel"
        ],
        "feature_filters": {
            "clusterId": [
                "PG-CDP"
            ]
        }
    }

Name	Type	Description
`start_date`	string	Start date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`end_date`	string	End date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`days *`	integer	Number of days to look back.
`kind`	string	Application kind. Value can be hive, impala, spark, mr
`target_system*`	string	Value can be any of the following target systems: teradata, snowflake, bigquery, athena, redshift
`users`	list	Filter the results by given users.
`queues`	list	Filter the results by given queues.
`clusters`	list	Filter the results by given clusters.
`tag_names`	list	Filter the results by given tag names.
`tag_values`	list	Filter the results by given tag values.
`feature_filters`	dict	Key-value pairs to filter the data.

Databricks cost breakdown

    {
        "start_date": null,
        "end_date": null,
        "days": 10,
        "interval": "10sec",
        "queue_config_filepath": "",
         "adhoc_cutoff": 20,
        "users": [
            "hive"
        ],
        "workspace": [
            "default"
        ],
        "clusters": [
            "PG-CDP"
        ],
        "tag_names": [
            "dept"
        ],
        "tag_values": [
            "Operations"
        ]
    },

Name	Type	Description
`start_date`	string	Start date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`end_date`	string	End date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`days *`	integer	Number of days to look back.
`users`	list	Filter the results by given users.
`workspace`	list	Filter the results by the given workspace.
`clusters`	list	Filter the results by given clusters.
`tag_names`	list	Filter the results by given tag names.
`tag_values`	list	Filter the results by given tag values.
`cost`	list	Cost in dollars.

Databricks Event Analysis

    {
        "start_date": null,
        "end_date": null,
        "days": 14,
        "reports": [
            "NDSE",
            "SQL_INEFF"
        ],
        "feature_filters": {
            "null": null
        }
    }

Name	Type	Description
`start_date`	string	Start date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`end_date`	string	End date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`days *`	integer	Number of days to look back.
`reports *`	list	Reports to be generated. Reports can be "NDSE": "Apps with NodeDownsizingEvent", "CDE_NDSE": "Apps with NodeDownsizingEvent and ContendedDriverEvent", "SQL_INEFF": "SQL Apps with High Impact", "PNPE_NDSE": "Apps with NodeDownsizingEvent and PartitionsNotPrunedEvent", "DSE_NDSE": "Apps with NodeDownsizingEvent and DataSkewEvent", "IJE_NDSE": "Apps with NodeDownsizingEvent and InefficientJoinEvent", "IJCE_NDSE": "Apps with NodeDownsizingEvent and InefficientJoinConditionEvent", "SSFE_NDSE": "Apps with NodeDownsizingEvent and ScanSmallFilesEvent", "SSOE_NDSE": "Apps with NodeDownsizingEvent and SlowSQLOperatorEvent"
`feature_filters`	dict	Key-value pairs to filter the data.

Databricks node downsizing savings

 "params": {                        : Report parameters 
       "start_date": null,           : Start date, to set exact date time.
       "end_date": null,            : End date, to set exact date time.
       "days": 300,                   : No. of days to look back
   }

Name	Type	Description
`start_date`	string	Start date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`end_date`	string	End date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`days *`	integer	Number of days to look back.
`users`	list	Filter the results by given users.
`workspaces`	list	NFilter the results by given workspaces.
`clusters`	list	Filter the results by given clusters

HDFS and Small

 {                        
   "target_days": 7,              
   "threshold": 0.001,           
   "featherbolt_path": "/opt/unravel/tmp/ondemand_fsimage/featherbolt_files/", 
   "report_values": "tables_small", 
   "project_name": "project",     
   "tenant_name": "playbook", 
   "database_name": "sys",     
    "table_name": "dag_meta"
   }

Name	Type	Description
`target_days *`	integer	Number of days to look back.
`threshold*`	integer	Threshold limit (MB ) value based on the report generated.
`featherbolt_path*`	string	Featherbolt files path once fs image processed.
`report_values`	string	“tables_small” for Data Tables small files report. “hdfs_space” for HDFS space utilization threshold.
`project_name`	string	project name if hdfs_space gave as input otherwise it is empty.
`tenant_name`	string	tenant name if hdfs_space gave as input otherwise it is empty.
`database_name`	string	database name if tables_small gave as input otherwise it is empty.
`table_name`	string	table name if tables_small gave as input otherwise it is empty.

Impala event / Impala slow hosts

    {
        "start_date": null,
        "end_date": null,
        "days": 10,
        "interval": "10sec",
        "queue_config_filepath": "",
        "users": [
            "hive"
        ],
        "clusters": [
            "PG-CDP"
        ],
        "tag_names": [
            "dept"
        ],
        "tag_values": [
            "Operations"
        ],
      “Pools”:[
     “root.default”
    ],
    },

Name	Type	Description
`start_date`	string	Start date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`end_date`	string	End date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`days *`	integer	Number of days to look back.
`users`	list	Filter the results by given users.
`clusters`	list	Filter the results by given clusters
`tag_names`	list	Filter the results by given tag names.
`tag_values`	list	Filter the results by given tag values.
`pools`	list	Filter the results by given pools.

Impala resource pool analysis / Recommended workflow

  {
        "start_date": null,
        "end_date": null,
        "days": 10,
        "interval": "10sec",
        "queue_config_filepath": "",
        "users": [
            "hive"
        ],
        "queues": [
            "default"
        ],
        "clusters": [
            "PG-CDP"
        ],
        "tag_names": [
            "dept"
        ],
        "tag_values": [
            "Operations"
        ]
    }

Name	Type	Description
`start_date`	string	Start date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`end_date`	string	End date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`days *`	integer	Number of days to look back.
`users`	list	Filter the results by given users.
`queues`	list	Filter the results by given queues.
`clusters`	list	Filter the results by given clusters.
`tag_names`	list	Filter the results by given tag names.
`tag_values`	list	Filter the results by given tag values.

Interesting apps report

 "params": {                        
       "kind": "hive",                        
       "start_date": null,           
       "end_date": null,            
       "days": 300,                   
        "event": "All"                 
   }

Name	Type	Description
`kind`	string	Application kind. Value can be hive, impala, spark, mr
`start_date`	string	Start date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`end_date`	string	End date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`days *`	integer	Number of days to look back.
`event`	list	Specific event. Events can be All, Application Failure, Resource Utilization, Speedup, Informational, Cost Savings
`users`	list	Filter the results by given users.
`queues`	list	NFilter the results by given queues.
`clusters`	list	Filter the results by given clusters

Interesting Impala query

{
        "start_date": null,
        "end_date": null,
        "days": 100,
        "users": null,
        "pools": null,
        "memory_spilled": null,,
        "Rows_produced": null,
        "duration": null,
        "est_per_node_peak_memory": null,
        "per_node_peak_memory": null,
        "aggregate_peak_memory": null,
        "admission_wait_time": null,
        "hdfs_remote_bytes_read": null,
        "statistics_corrupt_or_missing": "False"
    }

Name	Type	Description
`start_date`	string	Start date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`end_date`	string	End date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`days *`	integer	Number of days to look back.
`events`	list	Specific event. Events can be All, Application Failure, Resource Utilization, Speedup, Informational, Cost Savings
`pools`	list	Filter the results by given pools.
`memory_spilled`	integer	Memory spilled in bytes threshold values.
`Rows_produced`	integer	Rows produced threshold value.
`duration`	integer	Query duration in seconds threshold value
`statistics_corrupt_or_missing`	boolean	Boolean value. By default, False is given.
`est_per_node_peak_memory`	integer	Estimated Per Node Peak memory threshold value.
`aggregate_peak_memory`	integer	Aggregate Peak Memory threshold value.
`admission_wait_time`	integer	Admission wait time threshold value.
`hdfs_remote_bytes_read`	integer	HDFS Remote Bytes threshold value.

Migration wave plan / DataBricks user and usage

    {
        "start_date": null,
        "end_date": null,
        "days": 10,
        "interval": "10sec",
        "queue_config_filepath": "",
         "adhoc_cutoff": 20,
        "users": [
            "hive"
        ],
        "queues": [
            "default"
        ],
        "clusters": [
            "PG-CDP"
        ],
        "tag_names": [
            "dept"
        ],
        "tag_values": [
            "Operations"
        ]
    },

Name	Type	Description
`start_date`	string	Start date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`end_date`	string	End date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`days *`	integer	Number of days to look back.
`users`	list	Filter the results by given users.
`queues`	list	Filter the results by given queues.
`clusters`	list	Filter the results by given clusters
`adhoc_cutoff`	integer	Adhoc cut-off value.
`tag_names`	list	Filter the results by given tag names.
`tag_values`	list	Filter the results by given tag values.

Pipeline analytics

    {
        "start_date": null,
        "end_date": null,
        "days": 100,
        "topk": 5
    }

Name	Type	Description
`start_date`	string	Start date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`end_date`	string	End date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`days *`	integer	Number of days to look back.
`topk *`	integer	The number of topk apps to be included in the report.

Pipeline Comparison

{
        "baseline_start_date": null,
        "baseline_end_date": null,
        "baseline_days": 12,
        "baseline_pipelines": [
            "Recommendation_Cluster",
            "Covid_Data_Processing"
        ],
        "target_start_date": null,
        "target_end_date": null,
        "target_days": 12,
        "target_pipelines": [
            "Recommendation_Cluster",
            "Covid_Data_Processing"
        ],
        "improved_vcore_seconds_threshold": 5,
        "improved_duration_threshold": 25,
        "improved_io_threshold": 5,
        "improved_memory_seconds_threshold": 5,
        "degraded_vcore_seconds_threshold": 5,
        "degraded_duration_threshold": 25,
        "degraded_io_threshold": 5,
        "degraded_memory_seconds_threshold": 5
    },

Name	Type	Description
`baseline_start_date`	string	Baseline start date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`baseline_end_date`	string	Baseline end date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`target_start_date`	string	target start date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`target_end_date`	string	target end date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`target_days *`	integer	Number of target days to look back.
`target_pipelines *`	list	The list should contain target pipeline names that need to be compared.
`Improved_vcore_seconds_threshold *`	integer	Improved vcores threshold value should be specified here.
`Improved_duration_threshold*`	integer	The improved duration threshold value should be specified here.
`improved_io_threshold*`	integer	Improved IO threshold value should be specified here.
`improved_memory_seconds_threshold*`	integer	Improved memory threshold value should be specified here.
`degraded_vcore_seconds_threshold*`	integer	The degraded vcores threshold value should be specified here.
`degraded_duration_threshold*`	integer	The degraded duration threshold value should be specified here.
`degraded_io_threshold*`	integer	The degraded IO threshold value should be specified here.
`degraded_memory_seconds_threshold*`	integer	The degraded memory threshold value should be specified here.

Queue analysis

"params": {
        "start_date": null,
        "end_date": null,
        "days": 12,
        "clusters": "default",
        "resource_scheduler_port": "http://sd11.unraveldata.com:8088",
        "resource_scheduler_config_path": null,
        "queues": "root.*"
    }

Name	Type	Description
`kind`	string	Application kind. Value can be hive, impala, spark, mr
`start_date`	string	Start date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`end_date`	string	End date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`days *`	integer	Number of days to look back.
`event*`	integer	Specific event. Events can be All, Application Failure, Resource Utilization, Speedup, Informational, Cost Savings
`users`	list	Filter the results by given users.
`queues`	list	NFilter the results by given queues.
`clusters`	list	Filter the results by given clusters

TopK apps

{                        
       "kind": "hive",                 
       "start_date": null,           
       "end_date": null, 
       "retention_days": 5,          
       "days": 300,                   
       "reports": [
           "io"                             
       ],
       "topk": 10,                     
       "feature_filters": {
           "null": null                   
       }
   }

Name	Type	Description
`kind`	string	Application kind. Value can be hive, impala, spark, mr
`start_date`	string	Start date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`end_date`	string	End date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`days *`	integer	Number of days to look back.
`reports *`	list	Reports to be generated. Reports can be io, memory seconds,cpu time, duration, cost
`topk *`	integer	The number of topk apps to be included in the report.
`feature_filters`	Dict	Key-value pairs to filter the data. { ‘clusterUid’: ‘clusterUid From Unravel’, ‘clusterId’: ‘clusterId’, }
`users`	list	Filter the results by given users.
`queues`	list	Filter the results by given queues.
`clusters`	list	Filter the results by given clusters

User and usage

{
        "kinds": [
            "hive",
            "impala",
            "spark",
            "mr"
        ],
        "start_date": null,
        "end_date": null,
        "days": 12,
        "resource_metric": "memorySeconds",
        "user_db_filepath": "/path/to/csv.csv",
        "group_by_columns": "column1,column2,column3",
        "country_column": "country",
        "join_column": "id"
    },

Name	Type	Description
`start_date`	string	Start date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`end_date`	string	End date, to set exact date time. Format: 2022-12-07T07:32:09.629946Z
`days *`	integer	Number of days to look back.
`kinds *`	int	Kinds should be any/all of these values. [ "hive","impala","spark","mr"]
`topk *`	integer	The number of topk apps to be included in the report.
`resource_metric*`	string	Should be any of these values: memorySeconds, cpu
`user_db_filepath`	String	Path to user db.
`group_by_columns`	string	Group by columns.
`country_column`	string	Country column name.
`join_column`	string	Join the column by ID.

Sample request

PUT http://xyz.unraveldata.com:8111/api/reports/topk-apps-spark
{
    "enabled": false,
    "retention_days": 50,
    "profile_memory": false,
    "report_type": "topkapps",
    "params": {
        "kind": "spark",
        "start_date": null,
        "end_date": null,
        "days": 100,
        "ldap_conf": false,
        "reports": [
            "io",
            "cost",
            "cpuTime",
            "duration",
            "memorySeconds"
        ],
        "topk": 10,
        "feature_filters": {
            "null": null
        }
    },
    "notifications": {}
}

Status codes

Code	Description
200	Successful operation

In this section:

Home