Home

Tagging workflows
About Unravel workflow tags

You can add two Unravel tags (<key, value> pairs) to mark queries and jobs that belong to a particular workflow:

  • unravel.workflow.name: a string that represents the name of the workflow. The recommended format is TenantName-ProjectName-WorkflowName.

  • unravel.workflow.utctimestamp: a timestamp in yyyyMMddThhmmssZ format representing the logical time of a run of the workflow in UTC/ISO format. In UNIX/LINUX bash. You can get a timestamp in UTC format by running the command "$(date -u '+%Y%m%dT%H%M%SZ')".

    Note

    Do not put quotes ("") or blank spaces in/around the tag keys or values. For example:

    • SET unravel.workflow.name="ETL-Workflow; [Incorrect syntax]

    • SET unravel.workflow.name=ETL-Workflow; [Correct syntax]

Different runs of the same workflow have

  • The same value for unravel.workflow.name but

  • different values for unravel.workflow.utctimestamp.

Different workflows have different values for unravel.workflow.name.

Hive query example

This is a Hive query that was marked as part of the Financial-Tenant-ETL-Workflow workflow that ran on February 1, 2016:

SET unravel.workflow.name=Financial-Tenant-ETL-Workflow;
SET unravel.workflow.utctimestamp=20160201T000000Z;
SELECT foo FROM table WHERE … Your Hive Query text goes here
Easy recipes for tagging workflows
  1. Export the workflow name and UTC timestamp from your top-level script that schedules each run of the workflow.

    Here, we use bash's date command to generate the timestamp.

    export WORKFLOW_NAME=Financial-Tenant-ETL-Workflow export UTC_TIME_STAMP=$(date -u '+%Y%m%dT%H%M%SZ')
  2. Follow the instructions for your job type.

Examples by job type
Hive on MR query using SET commands in Hive
hive -f hive/simple_wf.hql 

In hive/simple_wf.hql:

SET unravel.workflow.name=Financial-Tenant-ETL-Workflow; 
SET unravel.workflow.utctimestamp=20160201T000000Z;
SELECT foo FROM table WHERE … Your Hive Query text goes here
Sqoop job using –D command line parameters
sqoop export \
 -D"unravel.workflow.name=$WORKFLOW_NAME" -D"unravel.workflow.utctimestamp=$UTC_TIME_STAMP"  \
 --connect jdbc:mysql://127.0.0.1:3316/unravel_mysql_prod --table settings -m 1 \
 --export-dir /tmp/sqoop_test --username unravel --verbose --password foobar

Note

Sqoop has bugs related to quotes.

Direct MapReduce job using –D command line parameters

Substitute your file name for /tmp/data/small and /tmp/outsmoke.

hadoop jar libs/ooziemr-1.0.jar com.unraveldata.mr.apps.Driver \
-D"unravel.workflow.name=$WORKFLOW_NAME" -D"unravel.workflow.utctimestamp=$UTC_TIME_STAMP"  \
-p /wordcount.properties -input /tmp/data/small -output /tmp/outsmoke
Spark job using --conf command line parameters

Note

For Spark jobs, you must prefix the Unravel tags with "spark.". For example, unravel.workflow.name becomes spark.unravel.workflow.name.

spark-submit \
    --conf "spark.unravel.workflow.name=$WORKFLOW_NAME" 
    --conf "spark.unravel.workflow.utctimestamp=$UTC_TIME_STAMP" 
    --conf "spark.eventLog.enabled=true" \
    --class org.apache.spark.examples.SparkPi \
    --master yarn-cluster \
    --deploy-mode cluster
Pig job using –param and SET commands
pig \
-param WORKFLOW_NAME=$WORKFLOW_NAME -param UTC_TIME_STAMP=$UTC_TIME_STAMP  \
-x mapreduce -f pig/simple.pig

In pig/simple.pig:

SET unravel.workflow.name $WORKFLOW_NAME; 
SET unravel.workflow.utctimestamp $UTC_TIME_STAMP; 
lines = LOAD '/tmp/data/small' using PigStorage('|') AS (line:chararray); 
words = FOREACH lines GENERATE FLATTEN(TOKENIZE(line)) as word; 
grouped = GROUP words BY word; 
wordcount = FOREACH grouped GENERATE group, COUNT(words); DUMP wordcount; 
Impala job using SET commands
impala-shell -i <impald_host:port> \
    -f simpleImpala.sql \
    --var=workflowname='ourImpalaWorkflow' \
    --var=utctimestamp=$(date -u '+%Y%m%dT%H%M%SZ')

In ../simpleImpala.sql:

SET 
      DEBUG_ACTION="::::unravel.workflow.name::${var:workflowname}::::unravel.workflow.utctimestamp::${var:utctimestamp}::::"; 
      select * from usstates;;
Finding pipelines in Unravel web UI

Once your tagged workflows have been run, log into Unravel Web UI and select Jobs > Pipeline to start exploring Unravel's Workflow Management features.