Same logical operator
SAME
: logically and
's rules plus adding the further constraint that the rules must be violated within same scope in order to trigger an AutoAction.
Example - a rule designed to alert on rogue users.
Human-readable form
If any user is running more than ten jobs on a cluster and the same user has more than five jobs pending then report the user as rogue.
More formally
(any user has > 10 running apps) SAME
(any user has > 5 pending jobs)
JSON definition
“rules”:[ “SAME”:[ { “scope”:”users”, “metric”:”appCount”, “operator”:”>”, “value”:10, state”:”running” }, { “scope”:”users”, “metric”:”appCount”, “operator”:”>”, “value”:5, “state”:”pending” } ] ]
Implementation
Internally the back-end uses a clustering technique to implement the SAME
operator. AutoActions runs all metric aggregations simultaneously. When the metrics are received and aggregated it then evaluates all rules and expressions. It starts at the evaluation tree's leaf expressions and works its way up to the root expression.
Assume the above rule, three users (A, B, and C), and the following conditions
user A has 12 running and three pending apps
user B has seven running and one pending apps
user C has 21 running and 11 pending apps
First, the two (2) simple rules are evaluated:
does user have more than 10 apps running?
User A has 12 →
TRUE
User B has seven →
FALSE
User C has 21 →
TRUE
does user have more than 5 apps pending?
User A has three →
FALSE
User B has one →
FALSE
User C has 11 →
TRUE
Second, it applies clustering by scope and for each cluster it counts the number rules triggered. In the back-end code this procedure is called “linking” of rules (see Ruleset.java).
Cluster “User A”, link count = 1.
User A > 10 running apps? →
TRUE
User A > five pending apps? →
FALSE
Cluster “User B”, link count = 0.
User B > 10 running apps? →
FALSE
User B > five pending apps? →
FALSE
Cluster “User C”, link count = 2.
User C > 10 running apps? →
TRUE
User C > five pending apps? →
TRUE
Third, all groups with less than the needed number of links (2 in this case) are discarded. If some of the rules were triggered, that rule is reset for the group.
Cluster “User A” has a link count = 1 so it's reset and discarded.
User A > 10 running apps? →
TRUE
reset toFALSE
User A > 5 pending apps? →
FALSE
Cluster “User B”, link count = 0 so it's discarded.
User B > 10 running apps? →
FALSE
User B > 5 pending apps? →
FALSE
Finally, only the users that have triggered all rules remain.
Cluster “User C”, link count = 2:
User C > 10 running apps? →
TRUE
User C > 5 pending apps? →
TRUE
User C meets the criteria for the Rogue User AutoAction, therefore User C triggers the AutoAction and the alert is sent and/or the actions performed.
Comparison to AND
Both User A and User C would have triggered the above rule were AND
used instead of SAME
, that is, (any user has > 10 running apps) AND
(any user has > 5 pending jobs).
To achieve the same result as the above example using AND
instead of SAME
, you would need to create the following AutoAction rule for each and every user on the cluster:
(Username has > 10 running apps) AND (Username has > 5 pending apps)