Home

Part 2: Connecting Unravel to an HDInsight cluster

This topic explains how to connect Unravel Server to an HDInsight cluster and deploy Unravel sensors on the cluster's nodes through an Azure "script action". Unravel different "script actions" for different cluster types.

Cluster type

Download path

Apply to cluster node types

Hadoop, HBase, or Spark

unravel_hdi_spark_bootstrap_4.5.sh

Head node, Worker node, Edge node

Kafka

unravel_hdi_kafka_bootstrap.sh

Head node

Note

If your cluster doesn't have access to the Internet, download the scripts, store them in an Azure blob storage account, and use the blob storage URI on the script action's Bash script URI field.

Prerequisites
Connect to a new cluster
  1. Log into the Azure portal.

  2. Select HDInsight cluster.

  3. In the dialog box, enter the details for your desired cluster type, topology, OS, and so on.

    hdi-clusters.png
  4. In the Security + networking tab, make sure to select the same virtual network and subnet that is used by the Unravel VM.

    hdi-clusters-security.jpg
  5. In the Storage tab, select whether to use Azure Blob Storage or Azure Data Lake Storage, plus any secondary accounts.

  6. In the Cluster size tab, select your desired topology for number of workers and VM types.

    hdi-clusters-size.png
  7. Optional: In the Script action tab, specify settings based on your cluster type.

    You can also do this step after the cluster has been created.

  8. In the Summary - Confirm configurations tab, review your cluster and click Create. It takes 5-15 minutes to create your cluster, depending on its size and parameters.

Connect to an existing Hadoop, HBase, or Spark cluster
  1. Log into the Azure portal.

  2. Select HDInsight cluster.

  3. Select the Hadoop, HBase, or Spark cluster that you want to apply the Unravel "script action" script to.

  4. Click Script actions on the vertical menu, and click Submit new.

  5. Specify the following settings for the bootstrap script:

    hdi-script-action.png
  6. Click Create.

Connect to an existing Kafka cluster
  1. Log into the Azure portal.

  2. Select HDInsight cluster.

  3. Select the Kafka cluster that you want to apply the Unravel "script action" script to.

  4. If the Kafka cluster has no Internet access, download the HDInsightUtilities-v01.sh script and copy it to the Kafka head node's /tmp folder.

    For example,

    wget -O /tmp/HDInsightUtilities-v01.sh -q https://hdiconfigactions.blob.core.windows.net/linuxconfigactionmodulev01/HDInsightUtilities-v01.sh
  5. Click Script actions | Submit new.

  6. Specify the following settings for the bootstrap script:

  7. Click Create.

  8. After the Kafka script action script completed successfully, open an SSH session to the Kafka cluster's "head node" and append the contents of /tmp/unravel/unravel.ext.properties to /usr/local/unravel/etc/unravel.properties on your Unravel VM.

    In a multi-cluster deployment, com.unraveldata.ext.kafka.clusters is a comma-separated list of clusters and the set of properties prefixed with com.unraveldata.ext.kafka.cluster_name are repeated for each cluster.

    Note

    This step also applies to new clusters.

    For example, for two Kafka clusters, /tmp/unravel/unravel.ext.properties looks like this:

    com.unraveldata.ext.kafka.clusters=cluster_name1, cluster_name2
    com.unraveldata.ext.kafka.cluster_name1.bootstrap_servers=wn0-cluster_name1:9092,wn1-cluster_name1:9092
    com.unraveldata.ext.kafka.cluster_name1.jmx_servers=broker1,broker2
    com.unraveldata.ext.kafka.cluster_name1.jmx.broker1.host=wn0-cluster_name1
    com.unraveldata.ext.kafka.cluster_name1.jmx.broker1.port=9999
    com.unraveldata.ext.kafka.cluster_name1.jmx.broker2.host=wn1-cluster_name1
    com.unraveldata.ext.kafka.cluster_name1.jmx.broker2.port=9999
    
    com.unraveldata.ext.kafka.cluster_name2.bootstrap_servers=wn0-cluster_name2:9092,wn1-cluster_name2:9092
    com.unraveldata.ext.kafka.cluster_name2.jmx_servers=broker1,broker2
    com.unraveldata.ext.kafka.cluster_name2.jmx.broker1.host=wn0-cluster_name2
    com.unraveldata.ext.kafka.cluster_name2.jmx.broker1.port=9999
    com.unraveldata.ext.kafka.cluster_name2.jmx.broker2.host=wn1-cluster_name2
    com.unraveldata.ext.kafka.cluster_name2.jmx.broker2.port=9999

    Note

    Unravel VM must have access to the Kafka worker nodes' broker port 9092 and Kafka JMX port 9999

  9. After updating the Kafka properties, restart Unravel Server.

    sudo /etc/init.d/unravel_all.sh restart
Next steps

For additional configuration and instrumentation options, see Next Steps.

Troubleshooting tips
  • From the Azure portal, you can check if a script action finished successfully by checking the Script Action History:

    hdi-submit-new.jpg
  • If script action process fails, you can check the error messages from the HDInsight cluster's Ambari dashboard, which has a balloon next to the cluster name on the top menu bar with the recent operations.hdi-title-ops.png

    Click Ops and search for the most recent run_customscriptaction command and inspect the log messages. You may see multiple entries of run_customscriptaction which were created by previous runs.

    run_customscriptaction.jpg
  • The Unravel script action cannot be rerun. If you need to redeploy the Unravel script action, you must submit a new "script action" script with a different name.