Home

Detect anomalies in the cost of Databricks job runs

You can use this API to detect whether a particular Databricks job is of high cost or low cost compared to the other job runs. You can pass the job ID and identify cost anomalies for a Databricks job. You can locate the Job ID from the Jobs page of the Databricks platform. You can specify only one job ID at a time.

Request

The request produces an application/json payload.

GET http://<unravel-host>/api/v1/app_store/api/v1/anomalies/jobs/<databricks_jobid>/cost

Example: http://<unravel_host>/api/v1/anomalies/jobs/155/cost

Path parameters

None

Query parameters

All the parameters are optional.

Name

Type

Description

run-mode

string

Use one of the following modes:

  • batch: Detects anomalies from history

  • real-time: Detects anomalies when the application runs; used for real-time data monitoring

Default: batch

platform

string

Supports the Databricks and EMR platforms or on-prem jobs.

Default: Databricks

from

to

string

The date-time filter. Specify the from and to date and time values to find anomalies for a particular date and time range when the jobs are run. Specify the values in the ISO date-time format as follows:

Format: %Y-%m-%dT%H:%M:%S.%fZ

Example: 2022-03-07T14:10:26:054Z

Default value: None

Note

An exception occurs if the from and to values are not specified in the ISO date-time format.

tolerance

integer

Controls the anomaly boundaries—the lower the tolerance, the lower the interval width.

Default value: 0.95

Valid values: 0 to 1

lowerBoundary

upperBoundary

float

Detects cost anomalies based on a threshold of lower and upper limits in the data. These boundaries define the plot on the graph.

You can use the threshold values to visualize the range of expected values and anomalies in the data.

Examples:

Requirement

Values

Result

If you want to view anomalies greater than five dollars

Specify lowerBoundary=5 and upperBoundary=None

All values greater than five are returned as anomalies.

If you want to view anomalies greater than 100 dollars

Specify upperBoundary=100 and lowerBoundary=None

All values greater than 100 are returned as anomalies.

If you want to view anomalies greater than 10 and 100 dollars

Specify lowerBoundary=10 and upperBoundary=100,

All values between 10 and 100 are returned as anomalies

If you want to view all anomalies

lowerBoundary=None and upperBoundary=None

Returns all anomalies reported by the algorithm.

Default: None

topNAnomalies

integer

Displays the highest ten anomalies based on the significance. If the algorithm returns less than ten anomalies, this parameter is ignored.

Default: None

returnAnomaliesOnly

boolean

  • True: Returns only anomalies, not all the data points.

  • False: Returns all data points.

Default: True

suppressNegativeAnomalies

boolean

  • True: Suppresses negative anomalies. If the values are lower than the value specified for the lowerBoundary parameter, those values are considered negative anomalies. These negative anomalies are not returned in the output.

  • False: Returns all anomalies.

Default: False

Examples

Request

curl -X GET -H "Authorization: JWT <token>" -H "Content-Type:application/json" -H "Accept: application/json" http://<unravel-host>/api/v1/anomalies/jobs/<databricks_jobid>/cost

Response

[
   {
       "value": 290,
       "anomaly": 1,
       "datetime": "2022-06-01 14:06:08.184",
       "lowerBoundary": 0.22807244028318743,
       "upperBoundary": 0.26294456690047585,
       "score": 0.09297399690558121
   },
   {
       "value": 10,
       "anomaly": 0,
       "datetime": "2022-06-12 12:05:10.024",
       "lowerBoundary": 0.23436206895797662,
       "upperBoundary": 0.26771495290707303,
       "score": 0.00843190809198149
   },
   {
       "value": 0.23,
       "anomaly": -1,
       "datetime": "2022-06-14 12:05:20.369",
       "lowerBoundary": 0.232278522270265,
       "upperBoundary": 0.26694414579635767,
       "score": 0.009863732771709878
   }
]

Other than the input parameter values, the response returns the following additional parameters:

  • anomaly: Indicates the type of anomaly. The value is an integer.

    • Returns 1 if it is a positive anomaly.

    • Returns -1 if it is a negative anomaly.

    • Returns 0 if it is not an anomaly or when the value for the returnAnomaliesOnly parameter is False in the request.

  • score: Returns the score for each anomaly. The score indicates how far the data points away from the boundary. The farther the value from its boundary, the higher the score.