Skip to main content


Tagging workflows
About Unravel workflow tags

You can add two Unravel tags (<key, value> pairs) to mark queries and jobs that belong to a particular workflow:

  • a string that represents the name of the workflow. The recommended format is TenantName-ProjectName-WorkflowName.

  • unravel.workflow.utctimestamp: a timestamp in yyyyMMddThhmmssZ format representing the logical time of a run of the workflow in UTC/ISO format. In UNIX/LINUX bash. You can get a timestamp in UTC format by running the command "$(date -u '+%Y%m%dT%H%M%SZ')".


    Do not put quotes ("") or blank spaces in/around the tag keys or values. For example:

    • SET"ETL-Workflow; [Incorrect syntax]

    • SET; [Correct syntax]

Different runs of the same workflow have

  • The same value for but

  • different values for unravel.workflow.utctimestamp.

Different workflows have different values for

Hive query example

This is a Hive query that was marked as part of the Financial-Tenant-ETL-Workflow workflow that ran on February 1, 2016:

SET unravel.workflow.utctimestamp=20160201T000000Z;
SELECT foo FROM table WHERE … Your Hive Query text goes here
Easy recipes for tagging workflows
  1. Export the workflow name and UTC timestamp from your top-level script that schedules each run of the workflow.

    Here, we use bash's date command to generate the timestamp.

    export WORKFLOW_NAME=Financial-Tenant-ETL-Workflow export UTC_TIME_STAMP=$(date -u '+%Y%m%dT%H%M%SZ')
  2. Follow the instructions for your job type.

Examples by job type
Hive on MR query using SET commands in Hive
hive -f hive/simple_wf.hql 

In hive/simple_wf.hql:

SET unravel.workflow.utctimestamp=20160201T000000Z;
SELECT foo FROM table WHERE … Your Hive Query text goes here
Sqoop job using –D command line parameters
sqoop export \
 -D"$WORKFLOW_NAME" -D"unravel.workflow.utctimestamp=$UTC_TIME_STAMP"  \
 --connect jdbc:mysql:// --table settings -m 1 \
 --export-dir /tmp/sqoop_test --username unravel --verbose --password foobar


Sqoop has bugs related to quotes.

Direct MapReduce job using –D command line parameters

Substitute your file name for /tmp/data/small and /tmp/outsmoke.

hadoop jar libs/ooziemr-1.0.jar \
-D"$WORKFLOW_NAME" -D"unravel.workflow.utctimestamp=$UTC_TIME_STAMP"  \
-p / -input /tmp/data/small -output /tmp/outsmoke
Spark job using --conf command line parameters


For Spark jobs, you must prefix the Unravel tags with "spark.". For example, becomes

spark-submit \
    --conf "$WORKFLOW_NAME" 
    --conf "spark.unravel.workflow.utctimestamp=$UTC_TIME_STAMP" 
    --conf "spark.eventLog.enabled=true" \
    --class org.apache.spark.examples.SparkPi \
    --master yarn-cluster \
    --deploy-mode cluster
Pig job using –param and SET commands
pig \
-x mapreduce -f pig/simple.pig

In pig/simple.pig:

SET unravel.workflow.utctimestamp $UTC_TIME_STAMP; 
lines = LOAD '/tmp/data/small' using PigStorage('|') AS (line:chararray); 
words = FOREACH lines GENERATE FLATTEN(TOKENIZE(line)) as word; 
grouped = GROUP words BY word; 
wordcount = FOREACH grouped GENERATE group, COUNT(words); DUMP wordcount; 
Impala job using SET commands
impala-shell -i <impald_host:port> \
    -f simpleImpala.sql \
    --var=workflowname='ourImpalaWorkflow' \
    --var=utctimestamp=$(date -u '+%Y%m%dT%H%M%SZ')

In ../simpleImpala.sql:

      select * from usstates;;
Finding pipelines in Unravel web UI

Once your tagged workflows have been run, log into Unravel Web UI and select Jobs > Pipeline to start exploring Unravel's Workflow Management features.