Templates
Note
See AutoAction's limitations.
Whether using New from Template or New AutoAction you have four sections.
Note
The New from Template > Custom >Expert Rule is an exception to this and is simply a text box.
New from Template is partially completed for the selected triggering condition, for example, long-running Impala query. The New AutoAction is a blank template.
Template Sections
Policy name, description, and status
The Policy Name is mandatory and we suggest using a name that reflects the AutoActions' purpose.
The Description is optional, but with a succinct description of the AutoAction's purpose.
The Is Enabled is automatically set to true. Uncheck the box to disable the Action. You do not have to enable the action upon creation. You can always enable it later.
The Policy name is displayed in the templates title bar.
Trigger Conditions
You must define at least one trigger.
Metric defines a rule "metric" "comparison operator" "value".
See Supported Cluster Metrics for a list and definition of available metrics.
The comparison operators: >, >=, ==, <, and <=.
Value: any valid numeric value. The default value is 0; were you to leave it the AutoAction would constantly trigger.
The Type options are:
MapReduce, YARN, Tez, Spark, Impala, Workflow and Hive
The State options are:
new, new_saving, submitted, accepted, scheduled, allocated, allocatedSaving, launched, running, finishing, finished, killed, failed, undefined, newAny, allocatedAny, pending, and * (all).
Multiple rule types are evaluated in conjunction with each other using:
Or, And, or Same
Or and And work as you would expect. See Same Logical Operator for the definition of Same and its implementation.
This template has the Ruleset defined as needed to fulfill the template type, for example, Rogue Contention In Queues (allocated memory). The template highlights the fields you can or should change. You can't delete/add any of the rule types or add/delete rules. You can change the metric, comparison, type, and state if available by using the pull down menus. If you change the Metric, Type, or State the template doesn't perform the task you have selected, for example, Rogue Contention In Queues (allocated memory). The default value for the metric comparison is 0. You must change the value otherwise the AutoAction constantly triggers. Multiple rules types are Same'd together. See Same Logical Operator for its definition and implementation
The Ruleset initially lists the type of rules available, User, Queue, Cluster, App, or EXPERT RULE. Click the rule type you want to define. Below Add Queue is selected with the options to add rules for metric, type , and state. These options, and only these options, are available for every rule type except the Expert Rule template which is a text box. You must define at least one rule for each of the rule type selected.
In the example below, Metric and Type are selected for the Queue rule type. You use the pull-down menus to select metric, the comparison operator, type, or state. See above for further information. A second rule, Apps has been added. When multiple rules are selected, you must choose how they are evaluated in conjunction with each other. The default is the Same operator, but you may select Or or And. See Same Logical Operator for the definition of Same and its implementation. Or and And work as you would expect. You can choose up to two rules, e.g., user & user, expert rule & queue, etc.
Click Close to delete a rule type and click trash () to delete a specific rule. If you close the rule before saving the AutoActions, your settings are lost.
Options
Define the scope (User, Queue, Cluster, and Application Name), the period in which the AutoAction acts on a violation (Time), and how long/short the violation must occur before the AutoAction takes action (Sustained).
When you select an option its default is All, except Time which defaults to always and Sustained Violation which defaults to none.
When using Create from Template the required option is already checked and uses the default. Any changes you make may cause the AutoAction to not perform as expected.
Check the box next to the option's name to select it.
You can narrow the scope of User, Queue, Cluster, and Application Name by using Only or Except. Only applies the rule to only those apps specified, while Except applies them to all but those specified. Use the Transform to specify the names using a regular expression. The example below is using the Application Name in the Except mode with the app MyApp. You can add more apps by clicking Add Application. Since no regular expression is specified, this option applies to all apps except MyApp. Create from Template defaults to All.
The Time sets the time range and time zone during which the AutoAction can be triggered. The AutoAction remains active but doesn't trigger outside of the specified time range. The default start and end time is when you defined AutoAction with the time zone set to America/Los Angeles. If you don't change the default time the AutoAction can be triggered for only one minute a day. Enter the time directly or click on the clock () in the time box. Time is entered in 24 hour time. The end time must be later than the start time.
Sustained violation specifies a length of time violation must occur before the AutoAction is triggered. This allows time for the violator to self-correct and decreases the false positives The default is zero, in other words, all AutoActions are immediately triggered upon violation and the specified action is carried out. You can select minimum or maximum mode. In both cases the AutoAction must be continually violated.
Minimum sustained mode triggers the action only if this violation was continuously detected for at least the specified period. This suppresses the triggering of violation actions for “on-offs” and metric spikes. These are normal in multi-tenant cluster environments and can return to normal operation on their own. If a violation stops before the minimum time period, the clock is reset for that app. For instance, if the minimum time is one hour and the app violates the AutoAction for 58 minutes and then returns to normal – no action is taken and the time period for that app resets to 0.
Maximum sustained mode triggers the action only if this violation is continuously detected for less than the specified time period. This suppresses the triggering of violations for long-running apps and triggers on AutoAction rule scope on ad hoc short-lived user apps.
Actions
Defines the actions to take when the AutoAction is triggered.
Build Rule and Create from Template, exception for Impala query, you can specify the following actions: Send an email, HTTP Post, Post to Slack, Move App to Queue, and Kill App. Use Build Rule to enter an Expert Action. See Expert Mode for information on defining an action in JSON. You can't kill a Hive, Impala, or Workflow app.
You can choose one or more actions. Check the box to choose that action. If you chose no actions, the UI simply records the violation and saves the data for the cluster view. Shown below are all the possible actions; in Create from Template only actions valid for the template are available.
For Send Email you must enter at least one recipient. Add more recipients by clicking Add Recipient. You can also specify to include the owner of the app selecting the Include Owner radio button.
Note
If you need to send an email notification to the owner of the application who is a LDAP user, configure the additional LDAP properties.
For HTTP post you must enter at least one URL. You can add more URLs by clicking Add URL.
Post to Slack Unravel provides integration with SlackApp allowing you to post information to one or more Slack channels and users. In order to use this feature you need a Slack:
Webhook URL: The incoming webhook URL configured in Slack for the channel or user. You can post to multiple webhooks.
Token: The OAuth access token for the SlackApp.
See Slack's Incoming Webhooks for further information on creating/obtaining the above.
Move app to queue or Kill App. You must enter a queue to move the App to.
Warning
The Move App and Kill App are mutually exclusive. If you select both, the Kill App takes precedence and Move App ignored. In order for these to be executed the scope must:
Have directly caused the rule violation, and Have allocated resources, that is the app is in allocated or running states. Move App is a non-destructive action that shouldn't affect the cluster performance and its availability to the user; however, we suggest using it with caution.
Kill App is a destructive action. It can affect the cluster performance and its availability to the users. This option is primarily to kill rogue apps that are causing contention of cluster resources.
Use Build Rule to enter an action using JSON. See Expert Rules for examples.