Expert rule

Overview

Expert Rule (Alerts > AutoActions > New from template > Custom:Expert Rule) is a very powerful mode that provides you with greater flexibility than is available through the templates. Using the Expert Rule, you can create complex rulesets to accommodate almost any cluster monitoring requirements. Before using this mode, you should have a clear understanding of AutoActions concepts and capabilities, along with JSON, which is used to define AutoAction.

Danger

This mode's flexibility and power make it dangerous and capable of wreaking havoc. Consult with the Unravel team before attempting to use the Expert Rule.

Tip

Before using the Expert Rule, look at Alerts > AutoActions > New AutoAction to see if you can use that template. It provides structure and allows you to define Trigger Conditions and Actions , and provides a template for the rules.Templates

See Limitations for the AutoAction limitations.

For each AutoAction you define, you must specify one or more of the following:

Trigger Conditions: boolean conditions that must be met for the Unravel Server to execute the corresponding action. For example, the app cannot run more than 50 mapper tasks.
Refine Scope: boolean conditions that must be met for the Unravel Server to evaluate AutoAction's defining conditions. For example, AutoAction should run from 8.00 and 14.00.

You have the option to specify:

Actions: steps to be taken when the prerequisite and defining conditions evaluate to true, for example, email admin. If you do not specify any action, Unravel logs the violation and saves the state, that is., time slice, ±5 minutes from when the AutoAction was triggered, and all the apps running during the period.

When using Expert Rule you must define the Header, Trigger, Scope, and Actions using the JSON data format.

{
  // header is required
  'HEADER'  

   // Triggers - at least one must be defined. Two or more must be joined using an operator.
   "rules":[
      { scope }  |  "operator" [ { scope } { scope } ... 
   ]

   // Refine Scope- at least one
   'OPTIONS - POLICY/SCOPE'
 
   // Actions - at least one
   "actions":[
      { action } 
   ]
}

Header: basic AutoAction information, including status (in/active).
Triggers: the rules for the scope. You must define at least one rule.
Scope: who, what, where causes a violation, and when. You must specify at least one.
Actions: actions to perform when a violation triggers the AutoAction. If none are defined, the UI still implements and tracks AutoActions.

Defining your AutoAction

Header

You must define a header. The only item not required is the description.

Attributes Name/ Definition	Possible Value	Default Value
enabled Whether the AutoAction is active or not. True: active/enabled. False: inactive/disabled.	True \| False	-
policy_name Value defined by Unravel.	AutoActions2	AutoActions2
policy_id Value defined by Unravel.	10	10
instance_id Any unique value.		-
name_by_user Any unique string. The name is used when AutoAction is displayed in the UI.		-
description_by_user Description of the AutoAction.		-
created_by Value defined by Unravel.	admin	admin
last_edited_by Value defined by Unravel.	admin	admin
created_at Time created. The date and time are in the form of an Epoch/Unix timestamp.		-
updated_at Time updated. The date and time are in the form of an Epoch/Unix timestamp.		-

"enabled": true,
"policy_name": "AutoActions2",
"policy_id": 10,
"instance_id": 273132543512,
"name_by_user": "aa_Sample_Test",
"description_by_user": "long running workflow",
"created_by": "admin",
"last_edited_by": "admin",
"created_at": 1524220191137,
"updated_at": 1524220265920,

Trigger:

The violating conditions of the AutoAction, e.g., apps using memory > 2 GB.

Field Name/ Definition	Possible Values	Required/ Required by	Default Value
scope The rule scope.	app, apps, multi_app, by_name, cluster, clusters, multi_cluster, container, containers, multi_ containers queue, queues, multi_queue, user, users, multi_user Note: apps==multi_app, users==multi-user, etc	√	-
target Application name	any valid app name	when the scope is by_name	-
metric Metric used for comparison.	see supported metrics per type		-
comparison Comparison operator	>, >=, ==, <=, <	metric	-
value Value for comparison. The value form varies by metric.	number	metric	-
state Scope state	new, new_saving, submitted, accepted, scheduled, allocated, allocated_saving, launched, running, finishing, finished, killed, failed, and *		-
type Job type	MapReduce, Yarn, Tez, Spark, Workflow, Hive		-

Logical operators for evaluating multiple rules

Operator	Condition for a Violation
OR	At least one rule evaluates to true.
AND	All rules evaluate to true.
SAME	All the rules evaluate to true and occur within the same scope. See Same Logical Operator for more details.

Operator

Condition for a Violation

At least one rule evaluates to true.

AND

All rules evaluate to true.

SAME

All the rules evaluate to true and occur within the same scope.

See Same Logical Operator for more details.

You must define at least one rule.

A Single Rule

"rules": [
    // rule
   {
    "scope":"",
        // at least one of the following
    //metric
    "metric":"",
    "compare":"",
    "value":,
    "state":"",
    "type":""
   }
]

A violation occurs when the app is a pending workflow with a duration > 10.

"rules":[
  {
    "scope":"apps",
    "metric":"duration",
    "compare":">",
    "value":10,
    "state":"pending",
    "type":"workflow"
  }
]

A violation occurs when the app is a workflow with a duration > 10. Removing state doesn't affect the rule.

"rules":[
  {
    "scope":"apps",
    "metric":"duration",
    "compare":">",
    "value":10,
    "state":"",
    "type":"workflow"
  }
]

A violation occurs when the app has a duration > 10. Removing state and type doesn' affect the rule.

"rules":[
  {
    "scope":"apps",
    "metric":"duration",
    "compare":">",
    "value":10,
    "state":"",
    "type":""
  }
]

A Rule Array

Two or more rules combined with an operator.

"rules": [
 {
   "operator": [
      // rule 1
      {

      }
      // rule 2
      {
      }
      // rule n
      {
      }
   ]
 }
]

Note

Multi_X is equivalent to the plural of X. In this case, we could use multi_apps instead of apps.

Take the following two rules:

// apps   (allocatedMB >=1024)
{
  "scope":"apps",
  "metric":"allocatedMB",
  "compare":">=",
  "value":1024
}

// apps   (allocatedVCores > 100)
{
  "scope":"apps",
  "metric":"allocatedVCores",
  "compare":">",
  "value":100
}

OR example

When they are ORed a violation occurs if at least one rule evaluates to true.

"rules":[
  {
    "OR":[
      {
        "scope":"apps",
        "metric":"allocatedMB",
        "compare":">=",
        "value":1024
      }      
      {
        "scope":"apps",
        "metric":"allocatedVCores",
        "compare":">",
        "value":100
      }
    ]
  }
]

AND example

When ANDed, a violation occurs if both rules evaluate to true.

"rules":[
  {
    "AND":[
      {
        "scope":"apps",
        "metric":"allocatedMB",
        "compare":">=",
        "value":1024
      }      
      {
        "scope":"apps",
        "metric":"allocatedVCores",
        "compare":">",
        "value":100
      }
    ]
  }
]

SAME example

When SAMEd a violation occurs if both rules evaluate to true and the violations are within the same scope.

"rules":[
  {
    "SAME":[
      {
        "scope":"apps",
        "metric":"allocatedMB",
        "compare":">=",
        "value":1024
      }      
      {
        "scope":"apps",
        "metric":"allocatedVCores",
        "compare":">",
        "value":100
      }
    ]
  }
]

Using the above example, if

My_App only violates rule 1 (allocatedMB), and
Your_App only violates rule 2 (allocatedVcores)

the AutoAction isn't triggered because the violations occurred in different scopes, i.e., My_App and Your_App.

However, if

My_App violates both rules (allocatedMB and allocatedVcores), and
Your_App only violates rule 2 (allocatedVcores)

the AutoAction is triggered for My_App but not Your_App.

Given the same ruleset, evaluation becomes more restrictive.

OR: the AutoAction is triggered if one or more of conditions is true.
AND: the AutoAction is triggered if all the conditions are true.
SAME: the AutoAction is triggered if all the conditions are true within a specific scope.

Scope

Who/what can cause the violation, and when it can occur. For example, the violation can only occur between 12:00 and 20:00.

Field Name/Definition	Required/ Required by	Possible Values	Default Value
`X`_mode where `X` is user, queue, cluster, or app. The mode defines how the rules are applied to type `X`: 0 - the rules aren't evaluated. 1 - the rules are evaluated for all type `X`. 2 - the rules are evaluated for only those in `X`_list or matching `X`_transform. 3 - the rules are evaluated for everything except those in `X`_list or matching `X`_transform.	You must define at least one option/policy.	0, 1, 2, 3 Default: 0	0
`X`_list A list of `X` type used when the mode value is 2 or 3. Applicable Only if mode is set to 2 (only) or 3 (except).	if_mode is 2 or 3 and `X`_transform isn't defined	empty, single-item, or comma-separated list.	-
`X`_transform A list of regex used to generate a list of `X` when the mode value is 2 or 3. Applicable Only if mode is set to 2 (only) or 3 (except).	if `X`_mode is 2 or 3 and `X`_list isn't defined	empty, single regex, or comma-separated regex list	-
Time The daily time the AutoAction is trigger.		any period spanning less than 24 hours.	-
Sustained Violation Set a minimum or maximum period for the AutoAction to be triggered. See here for more information		any period less than 24 hours.	-

Scope

where X is a user, queue, cluster, or app.

"X_mode": "",

// at least one of the following if X_mode = 2|3
"X_list": "" ,
"X_mode": "" ,

Cluster - doesn't apply to any clusters.

"cluster_mode": 0,
"cluster_list":"",
"cluster_transform":"",

Queue - applies all queues.

"queue_mode": 1,
"queue_list":"",
"queue_transform":"",

User - applies only the users specified in the list.

"user_mode": 2,
"user_list": [userA, userB],
"user_transform":"",

Application Name - applies to all apps except those matching the list.

"app_mode": 3,
"app_list": [userA, userB],
"app_transform":"",

User - applies only to the users specified in the list and regex.

"user_mode": 2,
"user_list": [userA, userB],
"user_transform":"regex",

Actions

Action to implement upon the violation. e.g., send an email to the app owner upon violation.

You do not have to define any actions, but it defeats the purpose not to. If no actions are defined, the UI keeps track of when the AutoAction was triggered and what triggered it. The trigger and scope conditions must be met before the AutoAction is triggered.

Field Name/Definition	Required/ Required by	Possible Value	Default Value
action The action to be taken.	at least one	send_email, http_post, post_in_slack, move_to_queue, kill_app	-
to Email recipients.	send_mail if to_owner not true	One or more recipients in a comma-separated list.	-
to_owner Send email to owner.	send_mail if to is empty	false: do not send an email true: send email	false
urls URLs for Http post	http_post	One or more URLs in a comma-separated list.	-
token Token generated by slack.	post_in_slack	Slack token	-
channels Slack channel.	post_in_slack	One or more channels in a comma-separated list	-
queue Queue name.	move_to_queue	The name of a valid queue to move the app to.	-

Single action

"actions": [
     {
      "action": ""
          // if required action options
         }
]

Multiple actions

"actions": [
   // action 1 
   {
   }

  // action n
   {
   }
]

Actions can be Ignored when in conflict

Below we specified two actions, move_to_queue , and kill_app, correctly. These two actions, in conjunction, do not make sense. If we kill the app, how can we then move it? Why bother moving the app if we are going to kill it? Effectively only one action can be executed. In this case, Unravel precedes kill_app , and move_to_queue is ignored.

actions":[
  {
    {
      "action":"move_to_queue",
      "queue":"sample"
    },    
    "action":"kill_app"
  }
    }
]

Actions fail if the required information is invalid or not specified.

Below are two actions with invalid information. In send_mail, the addresses are invalid, and the owner isn't to be notified. The http_post has an invalid URL. Unravel AutoAction's engine sends the email and tries to perform the HTTP post; however, both actions will fail. Since only these two actions are specified, no action is effectively taken. However, the UI records the trigger and retains the information to populate for Clusters > Overview Alerts tile, history of runs, and the cluster view for each time the action was triggered.

"actions":[
  {
    "action":"send_email",
    "to": [aBadEmailAddress.mycompany.com,anotherBadAddress.mycompany.com
     ],
    "to_owner":false
  },
  {
    "action":"http_post",
    "urls":[https://nonexistentURL
     ]
  }
]

Example actions

There are five main actions: send_email, http_post, post_in_slack, move_to_queue, and kill_app.

send_email, http_post, and post_in_slack allow you to use comma-separated lists to specify multiple recipients,

Note

You must take care when entering information. A specified action fails if you enter incorrect information, for example, a bad email address/URL/channel, wrong or non-existent queue.

Below are two actions with invalid information.

In send_mail the addresses are invalid, and the owner isn't to be notified.
The http_post has an invalid URL.

Unravel AutoAction's engine sends the email and tries to perform the HTTP post; however, both actions will fail. Since only these two actions are specified, no action is effectively taken. However, the UI records the trigger and retains the information to populate for Clusters > Dashboard Alerts tile, history of runs, and the cluster view for each time the AutoAction's triggered.

URLs and channels, respectively. See AutoActions Templates for example of send_email, http_post , and post_in_slack notifications.Templates

Send_email

You can enter multiple email addresses using a comma separated. To automatically send an email to the owner by set to_owner to true; Unravel handles the rest. list. Unlike when using New from Templates or New AutoAction, you aren't notified when entering an invalid address on the face of it, for example, myMailaddress&com.

"actions":[
  {
    "action":"send_email",
    "to": ["myMail@mycompany.com,ThisPerson@theircompany.com,TheBoss@mycompany.com"
     ],
    "to_owner":false
  }
]

http_post

You can enter multiple URLs using a comma-separated list. As with send_email, you aren't notified when the HTTPS is invalid on the face of it. For example, the following example has an invalid domain name.

"actions": [
     {
      "action": "http_post,
       "to": ["https://test24:3000/post/"
        ]
     }
]

post_in_slack

You can enter multiple channels using a comma-separated list. Verify that your token is correct and the channels are entered correctly.

"actions": [
    {
      "action": "post_in_slack",
      "token": "xyz",
      "channels": [ "auto-action-2"
      ]
    }
]

move_to_queue

Be sure to enter an existing queue. This is non-destructive but may affect the cluster performance and its availability to the users.

"actions": [
    {
      "action": "move_to_queue",
      "queue": "sample"
    }
]

kill_app

This is straightforward, but kill_app is a destructive action and may affect a cluster's performance and availability to the users.

"actions": [
    {
      "action": "kill_app"
    }
]

An expert rule example

This AutoAction triggers apps

Trigger Conditions: Using (memoryMB >= 1024) and (allocatedVcores >100), which occur within the same scope
Refine Scope: Trigger for all apps except the following apps: myApp, yourApp, and theirApp.
Actions: When triggered, post a notification to Slack and move the app to the slow_queue.

{
  // Header
  "enabled":true,
  "policy_name":"AutoActions2",
  "policy_id":10,
  "instance_id":273132543512,
  "name_by_user":"aa_Sample_Test",
  "description_by_user":"long running workflow",
  "created_by":"admin",
  "last_edited_by":"admin",
  "created_at":1524220191137,
  "updated_at":1524220265920,

  // Defining Conditions 
  "rules":[
    {
      "SAME":[
        {
          "scope":"apps",
          "metric":"allocatedMB",
          "compare":">=",
          "value":1024
        }        
        {
          "scope":"apps",
          "metric":"allocatedVCores",
          "compare":">",
          "value":100
        }
      ]
    }
  ] 
   // Prerequisite Conditions
  "app_mode":3,
  "app_list":"myApp, yourApp, theirApp",

  // Actions
  "actions":[
    {
      "action":"post_in_slack",
      "token":"xyz",
      "channels":[
        "auto-action-2"
      ]
    },
    {
      "action":"move_to_queue",
      "queue":"slow_queue"
    }
  ]
}

AutoAction examples

See Sample AutoActions for more Expert Rule examples. Examples include defining an expert rule or action within a New AutoAction template to provide some predefined structure.Templates

In this section:

Home

Expert rule

Overview

Danger

Tip

Defining your AutoAction

Header

Trigger:

Note

Scope

Scope

Actions

Single action

Multiple actions

Actions can be Ignored when in conflict

Actions fail if the required information is invalid or not specified.

Example actions

Note

Send_email

http_post

post_in_slack

move_to_queue

kill_app

An expert rule example

AutoAction examples

Search results