Tagging workflows
About Unravel workflow tags
You can add two Unravel tags (<key, value> pairs) to mark queries and jobs that belong to a particular workflow:
- unravel.workflow.name: a string that represents the name of the workflow. The recommended format is - TenantName-ProjectName-WorkflowName.
- unravel.workflow.utctimestamp: a timestamp in - yyyyMMddThhmmssZformat representing the logical time of a run of the workflow in UTC/ISO format. In UNIX/LINUX bash. You can get a timestamp in UTC format by running the command "- $(date -u '+%Y%m%dT%H%M%SZ')".- Note- Do not put quotes ("") or blank spaces in/around the tag keys or values. For example: - SET unravel.workflow.name="ETL-Workflow;[Incorrect syntax]
- SET unravel.workflow.name=ETL-Workflow;[Correct syntax]
 
Different runs of the same workflow have
- The same value for unravel.workflow.name but 
- different values for unravel.workflow.utctimestamp. 
Different workflows have different values for unravel.workflow.name.
Hive query example
This is a Hive query that was marked as part of the Financial-Tenant-ETL-Workflow workflow that ran on February 1, 2016:
SET unravel.workflow.name=Financial-Tenant-ETL-Workflow; SET unravel.workflow.utctimestamp=20160201T000000Z; SELECT foo FROM table WHERE … Your Hive Query text goes here
Easy recipes for tagging workflows
- Export the workflow name and UTC timestamp from your top-level script that schedules each run of the workflow. - Here, we use - bash's- datecommand to generate the timestamp.- export WORKFLOW_NAME=Financial-Tenant-ETL-Workflow export UTC_TIME_STAMP=$(date -u '+%Y%m%dT%H%M%SZ') 
- Follow the instructions for your job type. 
Examples by job type
hive -f hive/simple_wf.hql
In hive/simple_wf.hql:
SET unravel.workflow.name=Financial-Tenant-ETL-Workflow; SET unravel.workflow.utctimestamp=20160201T000000Z; SELECT foo FROM table WHERE … Your Hive Query text goes here
sqoop export \ -D"unravel.workflow.name=$WORKFLOW_NAME" -D"unravel.workflow.utctimestamp=$UTC_TIME_STAMP" \ --connect jdbc:mysql://127.0.0.1:3316/unravel_mysql_prod --table settings -m 1 \ --export-dir /tmp/sqoop_test --username unravel --verbose --password foobar
Note
Sqoop has bugs related to quotes.
Substitute your file name for /tmp/data/small and /tmp/outsmoke.
hadoop jar libs/ooziemr-1.0.jar com.unraveldata.mr.apps.Driver \ -D"unravel.workflow.name=$WORKFLOW_NAME" -D"unravel.workflow.utctimestamp=$UTC_TIME_STAMP" \ -p /wordcount.properties -input/tmp/data/small-output/tmp/outsmoke
Note
For Spark jobs, you must prefix the Unravel tags with "spark.". For example, unravel.workflow.name becomes spark.unravel.workflow.name.
spark-submit \
    --conf "spark.unravel.workflow.name=$WORKFLOW_NAME" 
    --conf "spark.unravel.workflow.utctimestamp=$UTC_TIME_STAMP" 
    --conf "spark.eventLog.enabled=true" \
    --class org.apache.spark.examples.SparkPi \
    --master yarn-cluster \
    --deploy-mode clusterpig \ -param WORKFLOW_NAME=$WORKFLOW_NAME -param UTC_TIME_STAMP=$UTC_TIME_STAMP \ -x mapreduce -f pig/simple.pig
In pig/simple.pig:
SET unravel.workflow.name $WORKFLOW_NAME; 
SET unravel.workflow.utctimestamp $UTC_TIME_STAMP; 
lines = LOAD '/tmp/data/small' using PigStorage('|') AS (line:chararray); 
words = FOREACH lines GENERATE FLATTEN(TOKENIZE(line)) as word; 
grouped = GROUP words BY word; 
wordcount = FOREACH grouped GENERATE group, COUNT(words); DUMP wordcount; impala-shell -i <impald_host:port> \
    -f simpleImpala.sql \
    --var=workflowname='ourImpalaWorkflow' \
    --var=utctimestamp=$(date -u '+%Y%m%dT%H%M%SZ')In  ../simpleImpala.sql:
SET 
      DEBUG_ACTION="::::unravel.workflow.name::${var:workflowname}::::unravel.workflow.utctimestamp::${var:utctimestamp}::::"; 
      select * from usstates;;Finding pipelines in Unravel web UI
Once your tagged workflows have been run, log into Unravel Web UI and select Jobs > Pipeline to start exploring Unravel's Workflow Management features.