Part 2: Connecting Unravel to an HDInsight cluster
This topic explains how to connect Unravel Server to an HDInsight cluster and deploy Unravel sensors on the cluster's nodes through an Azure "script action". Unravel different "script actions" for different cluster types.
Cluster type | Download path | Apply to cluster node types |
---|---|---|
Hadoop, HBase, or Spark | Head node, Worker node, Edge node | |
Kafka | Head node |
Note
If your cluster doesn't have access to the Internet, download the scripts, store them in an Azure blob storage account, and use the blob storage URI on the script action's Bash script URI field.
Prerequisites
Unravel Server must already be running, with the Unravel UI accessible on port 3000.
If you plan to create a cluster, you must have the following information ready:
Virtual Network and subnet of the Unravel VM
Your Azure Storage details. For storage setup, see Create Azure Storage.
Read the latest documentation on the ports required by HDInsight: https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-port-settings-for-services
Ensure Unravel service is running on Unravel VM and ports 3000 and 4043 are reachable from the Azure HDInsight cluster master node before running the Unravel "script action" script.
Run the following checks:
ssh -i
ssh_key
ssh_user
@unravel-host
sudo su - netstat -anp | grep 3000 tcp 0 0 0.0.0.0:3000 0.0.0.0:* LISTEN 65072/node hostname # On one of the cluster's head nodes: pingunravel-host
Connect to a new cluster
Log into the Azure portal.
Select HDInsight cluster.
In the dialog box, enter the details for your desired cluster type, topology, OS, and so on.
In the Security + networking tab, make sure to select the same virtual network and subnet that is used by the Unravel VM.
In the Storage tab, select whether to use Azure Blob Storage or Azure Data Lake Storage, plus any secondary accounts.
In the Cluster size tab, select your desired topology for number of workers and VM types.
Optional: In the Script action tab, specify settings based on your cluster type.
You can also do this step after the cluster has been created.
In the Summary - Confirm configurations tab, review your cluster and click Create. It takes 5-15 minutes to create your cluster, depending on its size and parameters.
Connect to an existing Hadoop, HBase, or Spark cluster
Log into the Azure portal.
Select HDInsight cluster.
Select the Hadoop, HBase, or Spark cluster that you want to apply the Unravel "script action" script to.
Click Script actions on the vertical menu, and click Submit new.
Specify the following settings for the bootstrap script:
Click Create.
Connect to an existing Kafka cluster
Log into the Azure portal.
Select HDInsight cluster.
Select the Kafka cluster that you want to apply the Unravel "script action" script to.
If the Kafka cluster has no Internet access, download the
HDInsightUtilities-v01.sh
script and copy it to the Kafka head node's/tmp
folder.For example,
wget -O /tmp/HDInsightUtilities-v01.sh -q https://hdiconfigactions.blob.core.windows.net/linuxconfigactionmodulev01/HDInsightUtilities-v01.sh
Click Script actions | Submit new.
Specify the following settings for the bootstrap script:
Click Create.
After the Kafka script action script completed successfully, open an SSH session to the Kafka cluster's "head node" and append the contents of
/tmp/unravel/unravel.ext.properties
to/usr/local/unravel/etc/unravel.properties
on your Unravel VM.In a multi-cluster deployment, com.unraveldata.ext.kafka.clusters is a comma-separated list of clusters and the set of properties prefixed with com.unraveldata.ext.kafka.
cluster_name
are repeated for each cluster.Note
This step also applies to new clusters.
For example, for two Kafka clusters,
/tmp/unravel/unravel.ext.properties
looks like this:com.unraveldata.ext.kafka.clusters=
cluster_name1
,cluster_name2
com.unraveldata.ext.kafka.cluster_name1
.bootstrap_servers=wn0-cluster_name1
:9092,wn1-cluster_name1
:9092 com.unraveldata.ext.kafka.cluster_name1
.jmx_servers=broker1,broker2 com.unraveldata.ext.kafka.cluster_name1
.jmx.broker1.host=wn0-cluster_name1
com.unraveldata.ext.kafka.cluster_name1
.jmx.broker1.port=9999 com.unraveldata.ext.kafka.cluster_name1
.jmx.broker2.host=wn1-cluster_name1
com.unraveldata.ext.kafka.cluster_name1
.jmx.broker2.port=9999 com.unraveldata.ext.kafka.cluster_name2
.bootstrap_servers=wn0-cluster_name2
:9092,wn1-cluster_name2
:9092 com.unraveldata.ext.kafka.cluster_name2
.jmx_servers=broker1,broker2 com.unraveldata.ext.kafka.cluster_name2
.jmx.broker1.host=wn0-cluster_name2
com.unraveldata.ext.kafka.cluster_name2
.jmx.broker1.port=9999 com.unraveldata.ext.kafka.cluster_name2
.jmx.broker2.host=wn1-cluster_name2
com.unraveldata.ext.kafka.cluster_name2
.jmx.broker2.port=9999Note
Unravel VM must have access to the Kafka worker nodes' broker port 9092 and Kafka JMX port 9999
After updating the Kafka properties, restart Unravel Server.
sudo /etc/init.d/unravel_all.sh restart
Next steps
For additional configuration and instrumentation options, see Next Steps.
Troubleshooting tips
From the Azure portal, you can check if a script action finished successfully by checking the Script Action History:
If script action process fails, you can check the error messages from the HDInsight cluster's Ambari dashboard, which has a balloon next to the cluster name on the top menu bar with the recent operations.
Click Ops and search for the most recent run_customscriptaction command and inspect the log messages. You may see multiple entries of run_customscriptaction which were created by previous runs.
The Unravel script action cannot be rerun. If you need to redeploy the Unravel script action, you must submit a new "script action" script with a different name.