Home

Expert rule

Overview

Expert Rule is a very powerful mode that provides you with greater flexibility than is available through the templates. Using the Expert Rule you can create complex rulesets to accommodate almost any cluster monitoring requirements. Before using this mode, you should have a clear understanding of AutoActions concepts and capabilities, along with JSON, which is used to define the AutoAction.

Danger

This mode's flexibility and power make it dangerous and capable of wreaking havoc. Consult with the Unravel team before attempting to use the Expert Rule.

Tip

Before using the Expert Rule, look at Build Rule to see if you can use that template. It lets you define an expert rule and action while giving you support by defining the header, and providing basic rulesets, options, and actions.

See here for the current limitations on AutoActions.

You must specify

  • Prerequisite conditions: boolean conditions that must be met for the Unravel Server to evaluate the AutoAction's defining conditions, for example, AutoAction should run from 8.00 and 14.00.

  • Defining conditions: boolean conditions that must be met for the Unravel Server to execute the corresponding action, for example, the app can't run more than 50 mapper tasks.

  • Actions: steps to be taken when the prerequisite and defining conditions evaluate to true, for example, send an email to admin.

When using Expert Rule you must define the Header, Ruleset ( rules), Options, and Actions ( actions) using the JSON data format.

{
  // header is required
  'HEADER'  

   // Rules - at least one must be defined. Two or more must be joined using an operator.
   "rules":[
      { scope }  |  "operator" [ { scope } { scope } ... 
   ]

   // Prerequisite Conditions - at least one
   'OPTIONS - POLICY/SCOPE'
 
   // Actions - at least one
   "actions":[
      { action } 
   ]
}
  • Header: Basic AutoAction information including status (in/active).

  • Rules: The rules for the scope. You must define at least one rule.

  • Options - Policy/Scope : Who, what, where causes a violation, and when. You must specify at least one.

  • Actions: Actions executed when a violation triggers the AutoAction. If none are defined the UI still implements and tracks AutoActions.

Defining your AutoAction
Header

You must define a header. The only item not required is the Description.

Attributes Name/

Definition

Possible Value

Default Value

enabled

Whether the AutoAction is active or not.

True: active/enabled.

False: inactive/disabled.

True | False

-

policy_name

Value defined by Unravel.

AutoActions2

AutoActions2

policy_id

Value defined by Unravel.

10

10

instance_id

Any unique value.

-

name_by_user

Any unique string. The name is used when the AutoAction is displayed in the UI.

-

description_by_user

Description of the AutoAction.

-

created_by

Value defined by Unravel.

admin

admin

last_edited_by

Value defined by Unravel.

admin

admin

created_at

Time created. Date and time is in the form of a Epoch/Unix timestamp.

-

updated_at

Time updated. Date and time is in the form of a Epoch/Unix timestamp.

-

"enabled": true,
"policy_name": "AutoActions2",
"policy_id": 10,
"instance_id": 273132543512,
"name_by_user": "aa_Sample_Test",
"description_by_user": "long running workflow",
"created_by": "admin",
"last_edited_by": "admin",
"created_at": 1524220191137,
"updated_at": 1524220265920,
Rules: defining conditions

Field Name/

Definition

Possible Values

Required/

Required by

Default

Value

scope

The rule scope.

app, apps, multi_app, by_name,

cluster, clusters, multi_cluster,

container, containers, multi_ containers

queue, queues, multi_queue,

user, users, multi_user

Note: apps==multi_app, users==multi-user, etc

-

target

Application name

any valid app name

when scope is by_name

-

metric

Metric used for comparison.

see supported metrics per type

-

comparison

Comparison operator

>, >=, ==, <=, <

metric

-

value

Value for comparison. The value form varies by metric.

number

metric

-

state

Scope state

new, new_saving, submitted, accepted, scheduled, allocated, allocated_saving, launched, running, finishing, finished, killed, failed, and *

-

type

Job type

mapreduce, yarn, tez, spark, workflow, hive

-

Logical operators for evaluating multiple rules

Operator

Condition for a Violation

OR

At least one rule evaluates to true.

AND

All rules evaluate to true.

SAME

All the rules evaluate to true and occur within the same scope.

See Same Logical Operator for more details.

You must define at least one rule.

A Single Rule
"rules": [
    // rule
   {
    "scope":"",
        // at least one of the following
    //metric
    "metric":"",
    "compare":"",
    "value":,
    "state":"",
    "type":""
   }
]
A violation occurs when the app is a pending workflow with a duration > 10.
"rules":[
  {
    "scope":"apps",
    "metric":"duration",
    "compare":">",
    "value":10,
    "state":"pending",
    "type":"workflow"
  }
]
A violation occurs when the app is a workflow with a duration > 10. Removing state doesn't affect the rule.
"rules":[
  {
    "scope":"apps",
    "metric":"duration",
    "compare":">",
    "value":10,
    "state":"",
    "type":"workflow"
  }
]
A violation occurs when the app has a duration > 10. Removing state and type doesn' affect the rule.
"rules":[
  {
    "scope":"apps",
    "metric":"duration",
    "compare":">",
    "value":10,
    "state":"",
    "type":""
  }
]
A Rule Array

Two or more rules combined with an operator.

"rules": [
 {
   "operator": [
      // rule 1
      {

      }
      // rule 2
      {
      }
      // rule n
      {
      }
   ]
 }
]

Note

Multi_X is equivalent to the plural of X. In this case, we could use multi_apps instead of apps.

Take the following two rules:
// apps   (allocatedMB >=1024)
{
  "scope":"apps",
  "metric":"allocatedMB",
  "compare":">=",
  "value":1024
}

// apps   (allocatedVCores > 100)
{
  "scope":"apps",
  "metric":"allocatedVCores",
  "compare":">",
  "value":100
}
OR example
When they are ORed a violation occurs if at least one rule evaluates to true.
"rules":[
  {
    "OR":[
      {
        "scope":"apps",
        "metric":"allocatedMB",
        "compare":">=",
        "value":1024
      }      
      {
        "scope":"apps",
        "metric":"allocatedVCores",
        "compare":">",
        "value":100
      }
    ]
  }
]
AND example

When ANDed a violation occurs if both rules evaluate to true.

"rules":[
  {
    "AND":[
      {
        "scope":"apps",
        "metric":"allocatedMB",
        "compare":">=",
        "value":1024
      }      
      {
        "scope":"apps",
        "metric":"allocatedVCores",
        "compare":">",
        "value":100
      }
    ]
  }
]
SAME example

When SAMEd a violation occurs if both rules evaluate to true and the violations are within the same scope.

"rules":[
  {
    "SAME":[
      {
        "scope":"apps",
        "metric":"allocatedMB",
        "compare":">=",
        "value":1024
      }      
      {
        "scope":"apps",
        "metric":"allocatedVCores",
        "compare":">",
        "value":100
      }
    ]
  }
]
Using the above example, if
  • My_App only violates rule 1 (allocatedMB), and

  • Your_App only violates rule 2 (allocatedVcores)

the AutoAction isn't triggered because the violations occurred in different scopes, i.e., My_App and Your_App.

However, if

  • My_App violates both rules (allocatedMB and allocatedVcores), and

  • Your_App only violates rule 2 (allocatedVcores)

the AutoAction is triggered for My_App but not Your_App.

Given the same ruleset, evaluation becomes more restrictive.
  • OR: the AutoAction is triggered if one or more of conditions is true.

  • AND: the AutoAction is triggered if all the conditions are true.

  • SAME: the AutoAction is triggered if all the conditions are true within a specific scope.

Options - policy/scope: prerequisite conditions

Who/what can cause the violation and when. You must define at least one option - policy/scope.

Field Name/Definition

Required/

Required by

Possible Values

Default

Value

X_mode

where X is user, queue, cluster, or app.

The mode defines how the rules are applied to type X:

0 - the rules aren't evaluated.

1 - the rules are evaluated for all type X.

2 - the rules are evaluated for only those in X_list or matching  X_transform.

3 - the rules are evaluated for everything but the those in X_list or matching  X_transform.

You must define at least one option/policy.

0, 1, 2, 3

Default: 0

0

X_list

A list of X type used when the mode value is 2 or 3.

Applicable Only if mode is set to 2 (only) or 3 (except).

if_mode is 2 or 3 and X_transform isn't defined

empty, single item or comma separated list.

-

X_transform

A list of regex used to generate a list of X when the mode value is 2 or 3.

Applicable Only if mode is set to 2 (only) or 3 (except).

if X_mode is 2 or 3 and X_list isn't defined

empty, single regex or comma separated regex list

-

Time

The daily time the AutoAction is trigger.

any time period spanning less than 24 hours.

-

Sustained Violation

Set a minimum or maximum time period for the AutoAction to be triggered. See here for more information

any time period less than 24 hours.

-

Options - policy/scope rule

where X is user, queue, cluster, or app.

"X_mode": "",

// at least one of the following if X_mode = 2|3
"X_list": "" ,
"X_mode": "" ,

Cluster - doesn't apply to any clusters.

"cluster_mode": 0,
"cluster_list":"",
"cluster_transform":"",

Queue - applies all queues.

"queue_mode": 1,
"queue_list":"",
"queue_transform":"",

User - applies only the users specified in the list.

"user_mode": 2,
"user_list": [userA, userB],
"user_transform":"",

Application Name - applies to all apps except those matching the list.

"app_mode": 3,
"app_list": [userA, userB],
"app_transform":"",

User - applies only to the users specified in the list and regex.

"user_mode": 2,
"user_list": [userA, userB],
"user_transform":"regex",
Actions: action to implement upon violation

You do not have to define any actions, but it defeats the purpose not to. If no actions are defined, the UI keeps track of when the AutoAction was triggered and what triggered it. Both the prerequisite and defining conditions must be met before the AutoAction is triggered.

Field Name/Definition

Required/

Required by

Possible Value

Default

Value

action

The action to be taken.

at least one

send_email, http_post, post_in_slack, move_to_queue, kill_app

-

to

Email recipients.

send_mail if to_owner not true

One or more recipients in a comma separated list.

-

to_owner

Send email to owner.

send_mail if to is empty

false: do not send email

true: send email

false

urls

URLs for Http post

http_post

One or more URLs in a comma separated list.

-

token

Token generated by slack.

post_in_slack

Slack token

-

channels

Slack channel.

post_in_slack

One or more channels in a comma separated list

-

queue

Queue name.

move_to_queue

The name of a valid queue to move the app to.

-

Single action
"actions": [
     {
      "action": ""
          // if required action options
         }
]
Multiple actions
"actions": [
   // action 1 
   {
   }

  // action n
   {
   }
]
Actions can be Ignored when in conflict

Below we specified two actions, move_to_queue and kill_app, correctly; but in conjunction they don't make sense. If we kill the app how can we then move it? Why bother moving the app if we are going to kill it? Effectively only one action can be executed. In this case Unravel gives precedence to kill_app and move_to_queue is ignored.

actions":[
  {
    {
      "action":"move_to_queue",
      "queue":"sample"
    },    
    "action":"kill_app"
  }
    }
]
Actions fail if the required information is invalid or not specified.

Below are two actions with invalid information. In send_mail the addresses are invalid and the owner isn't to be notified. The http_post has an invalid URL. Unravel AutoAction's engine sends the email and tries to perform the HTTP post; however both actions will fail. Since only these two actions are specified, no action is effectively taken. However, the UI stills records the trigger and retains the information to populate for Operations > Dashboard AutoActions tile , history of runs and the cluster view for each time the action was triggered. (See AutoActions Overview .)

"actions":[
  {
    "action":"send_email",
    "to": [aBadEmailAddress.mycompany.com,anotherBadAddress.mycompany.com
     ],
    "to_owner":false
  },
  {
    "action":"http_post",
    "urls":[https://nonexistentURL
     ]
  }
]
Example actions

There are five main actions: send_email, http_post, post_in_slack, move_to_queue, and kill_app.

send_email, http_post, and post_in_slack allow you to use comma separated lists to specify multiple recipients,

Below are two actions with invalid information. In send_mail the addresses are invalid and the owner isn't to be notified. The http_post has an invalid URL. Unravel AutoAction's engine sends the email and tries to perform the HTTP post; however both actions will fail. Since only these two actions are specified, no action is effectively taken. However, the UI stills records the trigger and retains the information to populate for Operations > Dashboard AutoActions tile , history of runs and the cluster view for each time the action was triggered. (See AutoActions Overview .)

URLs and channels respectively. See AutoActions Templates for example of send_email, http_post and post_in_slack notifications.

Warning

You must take care when entering information. A specified action fails if you enter the incorrect information, for example bad email address/URL/channel, wrong or non-existent queue.

Send_email

Unlike when using Create from Template or Build Rule, the UI won't notify you when entering an invalid address on the face of it, for example myMailaddress&com. You can automatically send a email to the owner by setting to_owner to true; Unravel handles the rest. You can enter multiple email addresses using a comma separated list.

"actions":[
  {
    "action":"send_email",
    "to": ["myMail@mycompany.com,ThisPerson@theircompany.com,TheBoss@mycompany.com"
     ],
    "to_owner":false
  }
]
http_post

Just like send_email you won't be notified if your HTTPS is invalid on the face of it, for example userAtTheCompany. You can enter multiple URLs using a comma separated list.

"actions": [
     {
      "action": "http_post,
       "to": ["https://test24:3000/post/"
        ]
     }
]
post_in_slack

Verify that your token is correct, and the channels are entered correctly. You can enter multiple channels using a comma separated list.

"actions": [
    {
      "action": "post_in_slack",
      "token": "xyz",
      "channels": [ "auto-action-2"
      ]
    }
]
move_to_queue

Be sure to enter an existing and correct queue. This is non-destructive but none-the-less may affect the cluster performance and its availability to the users.

"actions": [
    {
      "action": "move_to_queue",
      "queue": "sample"
    }
]
kill_app

This is straight forward, but kill_app is a destructive action and may affect the cluster performance and its availability to the users.

"actions": [
    {
      "action": "kill_app"
    }
]
An expert rule example

This AutoAction triggers on apps

  • using (memoryMB >= 1024), has (allocatedVcores >100), and which occur within the same scope,

  • except for the apps, myApp, yourApp, and theirApp.

Upon triggering a notification is posted to a Slack channel and the app is moved to the slow_queue.

{
  // Header
  "enabled":true,
  "policy_name":"AutoActions2",
  "policy_id":10,
  "instance_id":273132543512,
  "name_by_user":"aa_Sample_Test",
  "description_by_user":"long running workflow",
  "created_by":"admin",
  "last_edited_by":"admin",
  "created_at":1524220191137,
  "updated_at":1524220265920,

  // Defining Conditions 
  "rules":[
    {
      "SAME":[
        {
          "scope":"apps",
          "metric":"allocatedMB",
          "compare":">=",
          "value":1024
        }        
        {
          "scope":"apps",
          "metric":"allocatedVCores",
          "compare":">",
          "value":100
        }
      ]
    }
  ] 
   // Prerequisite Conditions
  "app_mode":3,
  "app_list":"myApp, yourApp, theirApp",

  // Actions
  "actions":[
    {
      "action":"post_in_slack",
      "token":"xyz",
      "channels":[
        "auto-action-2"
      ]
    },
    {
      "action":"move_to_queue",
      "queue":"slow_queue"
    }
  ]
}
AutoAction examples

See Sample AutoActions for more Expert Rule examples. Examples include defining an expert rule or action within a Build Rule template to provide some predefined structure.