Skip to main content

Home

Same logical operator

SAME: logically and's rules plus adding the further constraint that the rules must be violated within same scope in order to trigger an AutoAction.

Example - a rule designed to alert on rogue users.
Human-readable form

If any user is running more than ten jobs on a cluster and the same user has more than five jobs pending then report the user as rogue.

More formally

(any user has > 10 running apps) SAME (any user has > 5 pending jobs)

JSON definition
“rules”:[
   “SAME”:[
      {
         “scope”:”users”,
         “metric”:”appCount”,
         “operator”:”>”,
         “value”:10,
         state”:”running”
      },
      {
         “scope”:”users”,
         “metric”:”appCount”,
         “operator”:”>”,
         “value”:5,
         “state”:”pending”
      }
   ]
]
Implementation

Internally the back-end uses a clustering technique to implement the SAME operator. AutoActions runs all metric aggregations simultaneously. When the metrics are received and aggregated it then evaluates all rules and expressions. It starts at the evaluation tree's leaf expressions and works its way up to the root expression.

Assume the above rule, three users (A, B, and C), and the following conditions

  • user A has 12 running and three pending apps

  • user B has seven running and one pending apps

  • user C has 21 running and 11 pending apps

First, the two (2) simple rules are evaluated:

  • does user have more than 10 apps running?

    • User A has 12 → TRUE

    • User B has seven → FALSE

    • User C has 21 → TRUE

  • does user have more than 5 apps pending?

    • User A has three → FALSE

    • User B has one → FALSE

    • User C has 11 → TRUE

Second, it applies clustering by scope and for each cluster it counts the number rules triggered. In the back-end code this procedure is called “linking” of rules (see Ruleset.java).

  • Cluster “User A”, link count = 1.

    • User A > 10 running apps? → TRUE

    • User A > five pending apps? → FALSE

  • Cluster “User B”, link count = 0.

    • User B > 10 running apps? → FALSE

    • User B > five pending apps? → FALSE

  • Cluster “User C”, link count = 2.

    • User C > 10 running apps? → TRUE

    • User C > five pending apps? → TRUE

Third, all groups with less than the needed number of links (2 in this case) are discarded. If some of the rules were triggered, that rule is reset for the group.

  • Cluster “User A” has a link count = 1 so it's reset and discarded.

    • User A > 10 running apps? → TRUE reset to FALSE

    • User A > 5 pending apps? → FALSE

  • Cluster “User B”, link count = 0 so it's discarded.

    • User B > 10 running apps? → FALSE

    • User B > 5 pending apps? → FALSE

Finally, only the users that have triggered all rules remain.

  • Cluster “User C”, link count = 2:

    • User C > 10 running apps? → TRUE

    • User C > 5 pending apps? → TRUE

User C meets the criteria for the Rogue User AutoAction, therefore User C triggers the AutoAction and the alert is sent and/or the actions performed.

Comparison to AND

Both User A and User C would have triggered the above rule were AND used instead of SAME, that is, (any user has > 10 running apps) AND (any user has > 5 pending jobs).

To achieve the same result as the above example using AND instead of SAME, you would need to create the following AutoAction rule for each and every user on the cluster:

  • (Username has > 10 running apps) AND (Username has > 5 pending apps)