Home

Tagging applications

You can define tags for groups of applications using a python script. Unravel retrieves the script from the property com.unraveldata.app.tagging.script.path so you must define all your application tags in that file. You can also use this script to set workflow tags.

You can think of the script as creating a database comprised of a list of keys, their associated values, and what applications are associated with a specific <key, value>.

For example,

  • You have three departments: finance, hr, and marketing.

  • You would create

    • the key department and

    • give it three values finance, hr and marketing.

  • You would then associate applications with one of more of <key, value> pairs.

    • One hive query might be associated with dept:marketing while another with dept:finance.

Note

You can not associate an application with more than one value per key. Given the example above, an application cannot be associated with both dept:marketing and dept:finance.

See What is tagging? for more information on tagging, its purpose and a more comprehensive description.

Your Python script must be idempotent, i.e., it must produce the same result over multiple invocations with different input (metadata) for the same application.

Application tags are immutable and once created they cannot be changed.

Using a Python script

See Writing a Python script and the example script for tips on how to write a script.

  1. Set the following properties in /usr/local/unravel/etc/unravel.properties.

    com.unraveldata.tagging.script.enabled=true
    com.unraveldata.app.tagging.script.path=python_script
    com.unraveldata.app.tagging.script.method.name=method_name
  2. Restart the following daemons. You must restart these daemons after you reset the property values above or edit the script referenced.

    /etc/init.d/unravel_all.sh stop-etl
    /etc/init.d/unravel_all.sh start
Writing a Python script

You can add print/debugging statements to the script, but they are logged each time the script is run. Consequently, there are numerous/duplicated entries as the script is invoked multiple times during an application's run. You can also specify workflow tags in your script.

Format

In the Python script, you set a tag_key to a tag_value.

Your tag_value can be a string, the return value of a method, or a concatenation of both.

  • tag["auth"]="admin"

  • tag["scope"]=app_obj.getAppQueue()

  • tags["dept"]=app_obj.getAppName() + "_" + app_obj.getQueue()

Example Python script

The following script creates seven tag_keys for applications and then populates them, generating the tagging dictionary.

  • hive_query_id

  • dept

  • team

  • auth

  • scope

  • unravel.workflow.name, and unravel.workflow.utctimestamp (See tagged workflows.)

The tagging properties are set to the script file and method name.

com.unraveldata.app.tagging.script.path-=/usr/scripts/Tagging.py
com.unraveldata.app.tagging.script.method.name-=get_tags
# filename: /usr/scripts/Tagging.py

from datetime import datetime

# get_tags is the method so com.unraveldata.app.tagging.script.method.name=get_tags 
def get_tags(app_obj):

 tags = {}

# MR apps get the hive_query_id tag
 if app_obj.getAppType() == "mr":
    tags["hive_query_id"] = app_obj.getAppConf("hive.query.id")

# every app gets a dept and team tag
 tags["dept"] = app_obj.getAppName() + "_" + app_obj.getQueue()
 tags["team"] = app_obj.getUsername()

# Only apps with username=admin get this tag
 if app_obj.getUsername() == "admin": 
   tags["auth"] = "admin"

# Every app gets a scope tag based upon queue they are in
 if app_obj.getQueue() == "engr":
   # All apps in the "engr" queue get this tag
   tags["scope"] = "engineering-application"
 elif app_obj.getQueue() == "qa":
   # All apps in the "qa" queue get this tag
   tags["scope"] = "qa-application"
 else:
   # All apps not in the"engr" or "qa" queues get this tag
   tags["scope"] = "daily-application"

# creates the workflow tags, these are Unravel tags and you should contact support@unraveldata.com before using them
  tags["unravel.workflow.name"] = "Workflow-" + tags["team"] 
  tags["unravel.workflow.utctimestamp"] = app_obj.getAppType() + "-" + str(datetime.utcnow())


 return tags
Running scripts

The tags computed in the Python script feed into Unravel core ETL pipeline. The Python script is invoked in the ingestion pipeline and is set up to access application metadata to create tags on the fly. The first time an application is invoked and running it is not listed when applications are filtered by tags. Debug and print statements are logged multiple times as the script is invoked multiple times over a run.

References

You can download example tagging scripts. (This is currently private; please contact Unravel Support.)