Tagging workflows
About Unravel workflow tags
You can add two Unravel tags (<key, value> pairs) to mark queries and jobs that belong to a particular workflow:
unravel.workflow.name: a string that represents the name of the workflow. The recommended format is
TenantName-ProjectName-WorkflowName.unravel.workflow.utctimestamp: a timestamp in
yyyyMMddThhmmssZformat representing the logical time of a run of the workflow in UTC/ISO format. In UNIX/LINUX bash. You can get a timestamp in UTC format by running the command "$(date -u '+%Y%m%dT%H%M%SZ')".Note
Do not put quotes ("") or blank spaces in/around the tag keys or values. For example:
SET unravel.workflow.name="ETL-Workflow;[Incorrect syntax]SET unravel.workflow.name=ETL-Workflow;[Correct syntax]
Different runs of the same workflow have
The same value for unravel.workflow.name but
different values for unravel.workflow.utctimestamp.
Different workflows have different values for unravel.workflow.name.
Hive query example
This is a Hive query that was marked as part of the Financial-Tenant-ETL-Workflow workflow that ran on February 1, 2016:
SET unravel.workflow.name=Financial-Tenant-ETL-Workflow; SET unravel.workflow.utctimestamp=20160201T000000Z; SELECT foo FROM table WHERE … Your Hive Query text goes here
Easy recipes for tagging workflows
Export the workflow name and UTC timestamp from your top-level script that schedules each run of the workflow.
Here, we use
bash'sdatecommand to generate the timestamp.export WORKFLOW_NAME=Financial-Tenant-ETL-Workflow export UTC_TIME_STAMP=$(date -u '+%Y%m%dT%H%M%SZ')
Follow the instructions for your job type.
Examples by job type
hive -f hive/simple_wf.hql
In hive/simple_wf.hql:
SET unravel.workflow.name=Financial-Tenant-ETL-Workflow; SET unravel.workflow.utctimestamp=20160201T000000Z; SELECT foo FROM table WHERE … Your Hive Query text goes here
sqoop export \ -D"unravel.workflow.name=$WORKFLOW_NAME" -D"unravel.workflow.utctimestamp=$UTC_TIME_STAMP" \ --connect jdbc:mysql://127.0.0.1:3316/unravel_mysql_prod --table settings -m 1 \ --export-dir /tmp/sqoop_test --username unravel --verbose --password foobar
Note
Sqoop has bugs related to quotes.
Substitute your file name for /tmp/data/small and /tmp/outsmoke.
hadoop jar libs/ooziemr-1.0.jar com.unraveldata.mr.apps.Driver \ -D"unravel.workflow.name=$WORKFLOW_NAME" -D"unravel.workflow.utctimestamp=$UTC_TIME_STAMP" \ -p /wordcount.properties -input/tmp/data/small-output/tmp/outsmoke
Note
For Spark jobs, you must prefix the Unravel tags with "spark.". For example, unravel.workflow.name becomes spark.unravel.workflow.name.
spark-submit \
--conf "spark.unravel.workflow.name=$WORKFLOW_NAME"
--conf "spark.unravel.workflow.utctimestamp=$UTC_TIME_STAMP"
--conf "spark.eventLog.enabled=true" \
--class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--deploy-mode clusterpig \ -param WORKFLOW_NAME=$WORKFLOW_NAME -param UTC_TIME_STAMP=$UTC_TIME_STAMP \ -x mapreduce -f pig/simple.pig
In pig/simple.pig:
SET unravel.workflow.name $WORKFLOW_NAME;
SET unravel.workflow.utctimestamp $UTC_TIME_STAMP;
lines = LOAD '/tmp/data/small' using PigStorage('|') AS (line:chararray);
words = FOREACH lines GENERATE FLATTEN(TOKENIZE(line)) as word;
grouped = GROUP words BY word;
wordcount = FOREACH grouped GENERATE group, COUNT(words); DUMP wordcount; impala-shell -i <impald_host:port> \
-f simpleImpala.sql \
--var=workflowname='ourImpalaWorkflow' \
--var=utctimestamp=$(date -u '+%Y%m%dT%H%M%SZ')In ../simpleImpala.sql:
SET
DEBUG_ACTION="::::unravel.workflow.name::${var:workflowname}::::unravel.workflow.utctimestamp::${var:utctimestamp}::::";
select * from usstates;;Finding pipelines in Unravel web UI
Once your tagged workflows have been run, log into Unravel Web UI and select Jobs > Pipeline to start exploring Unravel's Workflow Management features.