Home

Installing Unravel on a Separate Azure VM

This option involves the following steps:

Create Azure Storage

This topic explains how to create Azure storage appropriate for your Hadoop cluster, which could be Kafka or Spark.

First, you need to determine which storage type is appropriate for your cluster and supported by Unravel. You have the following options:

  • Windows Azure Storage Blob ("Azure Storage")

    By default HDInsight 3.6 uses Blob storage, which is a general-purpose storage type for Big Data. Blob storage is a key-value store with a flat namespace. It has full support for:

    • Analytics workloads; batch, interactive, streaming analytics

    • Machine learning data such as log files, IoT data, click streams, large datasets

    • Low-cost, tiered storage

    • High availability/disaster recovery

    Note

    Unravel doesn't support encryption (SSL) with Blob storage (WASB).

  • Azure Data Lake Storage generation 1 (ADLS v1)

    The other major option for Hadoop clusters is ADLS v1. ADLS is a hierarchical file system. It has full support for

    • Analytics workloads; batch, interactive, streaming analytics

    • Machine learning data such as log files, IoT data, click streams, large datasets

    • File system semantics

    • File-level security

    • Scalability

  • Azure Data Lake Storage generation 2 (ADLS v2) (Preview mode)

    ADLS generation 2 combines the features of Blob storage andADLS generation 1.

    Note

    Unravel has not been tested with ADLS v2 since it is still in preview mode.

For an in-depth comparison, see https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-comparison-with-blob-storage

Note

The rest of this document refers to these storage types as Blob and ADLS.

Prerequisites
  • You must already have an Azure account and able to log into https://portal.azure.com

  • You must already have a resource group assigned to a region in order to group your policies, VMs, and storage blobs/lakes/drives.

    A resource group is a container that holds related resources for an Azure solution. In Azure, you logically group related resources such as storage accounts, virtual networks, and virtual machines (VMs) to deploy, manage, and maintain them as a single entity.

  • You must already have a virtual network for your resource group. This virtual network will be shared by your Hadoop cluster and the Unravel VM.

Steps
  1. Log into https://portal.azure.com.

  2. Click Storage accounts | + Add.

  3. On the Basics tab, enter values for the following fields:

    1. Subscription: Select the subscription type.

    2. Resource Group: Select the resource group to associate with this storage instance.

    3. Storage Account Name: Enter a name, using lowercase letters and numbers.

    4. Location: Select a data center region.

    5. Performance: Select Standard or Premium:

      Standard storage uses magnetic disks and is cheaper. Premium storage uses SSDs, so it has higher performance and is recommended for Spark and Kafka clusters.

    6. Account kind: Select your storage type: Blob or Storage (ADLS v1)

      OR StorageV2 (ADLS v2)

      Note

      StorageV2 (ADLS v2) is still in preview mode and is not currently supported by Unravel.

    7. Replication: Select your desired replication to either be local, or always available in the same zone, region, or replicated geographically. See more choices in the Advanced section.

      • Locally redundant storage (LRS): Only handles failures within the data-center. Durability guarantee is 11 9's.

      • Zone-redundant storage (ZRS): Handles failures in the data-center and zone, but not region. Durability guarantee is 12 9's. Only supported on ADLS v2.

      • Geo-redundant storage (GRS): Handles failures in the data-center, zone, and region, but does not allow read-access in another region in a failure scenario. Durability guarantee is 16 9's.

      • Read-access geo-redundant storage (RA-GRS) FIXLINK: Handles failures in the data-center, zone, region, and allows read-access in another region. Durability guarantee is 16 9's.

    8. Access Tier: Only available for Blob storage & ADLS v2 (which is not currently supported by Unravel). If you pick this option, select hot storage.

      hdi-create-storage.png
  4. Click the Advanced tab.

    1. Set Secure transfer required to Disabled or Enabled.

      Note

      Unravel doesn't support encryption (SSL) with Blob storage (WASB).

    2. For Virtual Networks,select whether to allow traffic from all networks or only from within the virtual network and subnet(s) you specify.

      hdi-create-storage-review.png
  5. Click Review + create.

  6. If your settings are correct, click Create. To edit your settings, click Previous.

Related Resources
Finding Unravel Properties' Values in Microsoft Azure
Azure - creating a storage account
Difference between Replication types
Creating and Configuring the Azure VM

This topic explains how to create a separate Azure VM, install the Unravel RPM, and configure it.

Prerequisites
  • You must already have an Azure account and able to log into https://portal.azure.com

  • You must already have a resource group assigned to a region in order to group your policies, VMs, and storage blobs/lakes/drives.

    A resource group is a container that holds related resources for an Azure solution. In Azure, you logically group related resources such as storage accounts, virtual networks, and virtual machines (VMs) to deploy, manage, and maintain them as a single entity.

  • You must already have a virtual network and network security group set up for your resource group. Your virtual network and subnet(s) must be big enough to be shared by the Unravel VM and the target HDInsight cluster(s).

  • You must have root privilege in order to perform some commands on the VM.

  • You must already have created a storage system. For instructions, see Create Azure Storage.

  • You must have an SSH key pair.

  • Your VM host must meet the requirements below.

Support Chart and VM Requirements

Azure HDI cluster compatibility

HDInsight 3.6

Storage type: Blob (WASB) or ADLS v1

Limitations: Unravel currently only works with Blob (WASB) or ADLS v1. It does not support multiple Azure Data Lake Storage accounts or ADLS v2 (preview).

HDP 2.6.5

Spark 1.6.3, 2.1.0, 2.2.0, 2.3.0

Limitations: Spark relies on the yarn-site configuration property yarn.log-aggregation.file-formats, whose only supported value is TFile. In other words:

<property>
  <name>yarn.log-aggregation.file-formats</name>
  <value>TFile</value>
</property>

Hive 2.1

Kafka 0.10.0, Kafka 1.0, Kafka 1.1 (preview)

Image (underlying operating system for the VM)

RHEL 7 or CentOS 7.2 - 7.6

Note that the actual HDInsight Kafka/Spark cluster can run another OS.

CPU and RAM minimum requirements

Minimum VM type suggested: Medium memory optimized such as Standard_E8s_v3

Cores: 8 min

RAM: 64 GB min

Disk requirement

min 100GB for /srv on 2nd disk (recommended at least 500 GB SSD disk for /srv volume)

Network requirement

  • Unravel VM should be located in the same VNET and VSNET as the HDInsight cluster

  • Port 443 open on Hadoop cluster for Azure HDInsight to monitor applications

  • Port 3000 (or 4020) for Unravel Web UI access

  • UDP and TCP ports 4041-4043 open from Hadoop cluster to Unravel Server

  • HDFS ports open from Hadoop cluster to Unravel Server

  • Hive MetaStore DB port open to Unravel Server for partition reporting

  • For Oozie, port 11000 open to Unravel Server

Security requirement

Allows inbound ssh to the unravel VM

Allows outbound Internet access and all traffic within the subnet (VSNET).

Allows TCP port 3000 and 4043 to Unravel VM from HDInsight cluster

Note

unravel-host must be a fully qualified domain name or IP address.

Provision an Azure VM for Unravel Server
  1. Log into https://portal.azure.com.

  2. Select Virtual machines, and click + Add.

    hdi-home-vm.png
  3. On the Basics tab, enter values for the following fields:

    • Subscription: Select the subscription type.

    • Resource Group: Select the resource group to associate with this VM. The VM inherits configurations for lifecycle, permissions, and policies from this group.

    • Virtual machine name: Enter a name, using only alphanumeric characters, "-", and "_". This value becomes the VM's hostname.

    • Region: Select a data center region for this VM. Note that not all VM types are available in all regions.

    • Availability options: Select your redundancy (durability) settings.

      hdi-create-vm.jpg
    • Image: Select the underlying operating system for the VM. As noted above, Unravel supports RHEL 7 or CentOS 7.2 - 7.6 only.

    • Size (required): Select a standard, memory optimized VM with at least 8 vCores and 64 GB RAM, such as E8s_v3.

      hdi-select-vm-size.png
    • Select your VM's Authentication type.

      Tip

      Best practice is to authenticate using an SSH public key, which you can generate using ssh-keygen. Avoid any reserved names like "admin" for the username.

    • Set Inbound Port Rules:

      If you plan on allowing external access to Unravel UI, then

      select Allow selected ports and then select HTTPS and SSH. EDITORNOTE: Please confirm this deletion.

  4. Click Next: Disks.

  5. On the Disks tab, enter values for the following fields:

    • OS disk type: For better performance in production, we recommend a Premium SSD because it tolerates higher IOPS. For a dev/test cluster, we suggest a Standard SSD.

      hdi-create-new-disk.png
    • Advanced: We recommend using managed disks that have better performance and reliability. (? EDITORNOTE: When I expand the "Advanced" section I see nothing)

    • Data disks:

      If you don't have a disk ready, click Create and attach new disk. In the dialog box, provide the desired Disk Name, Size (GiB), and Source Type of "empty disk". We recommend at least 500 GB of space, and ideally 1024 GB for production clusters.

      Otherwise, click Attach an existing disk.

  6. Click Next: Networking.

  7. On the Networking tab, enter values for the following fields:

    Warning

    It is imperative that the VM, the Azure storage, and the cluster(s) you plan to monitor are all on the same virtual network and subnet(s).

    • Virtual network (required): Select the appropriate virtual network for your cluster(s).

    • Subnet (required): Select a subnet with the appropriate address range based on the number of IPs you plan to have in your network. For more information, see https://www.aelius.com/njh/subnet_sheet.html.

      hdi-create-vm-network.png
    • NIC network security group: Set this to Basic.

      Unravel Server works with multiple HDInsight clusters, including existing clusters and new clusters.

      • A TCP and UDP connection is needed from the "head node" of each HDInsight cluster to Unravel Server.

      • Add an inbound security policy to allow SSH access and 443 access to the Unravel node.

      • The default security policy should allow all access within the VNET. Default rules start with a priority of 65000.

        hdi-inbound-security.png
  8. Click Review + create.

  9. Click Create.

    It takes about 2 minutes to create your VM.

    hdi-create-vm-deploy.png

    When Azure completes the creation of your VM,click Go to resource.

  10. Copy the VM's public IP address.

  11. Open an SSH session to your VM's public IP address and verify that your IP address is as expected:

    ssh -i ssh-key user@ip-address
  12. Verify that eth0 on the new VM is bound to the private IP address shown in the Azure portal.

    ifconfig
    eth0      Link encap:Ethernet  HWaddr 00:0d:3a:1b:c2:48
              inet addr:10.10.1.96
Configure the VM at First Login
  1. Install ntpd, start it at boot time, and check whether time is accurate. This is necessary in order to synchronize your VM's clock. For more information about ntpd, see https://wiki.archlinux.org/index.php/Network_Time_Protocol_daemon.

    sudo su -
    yum install ntp
    ntpd -u ntp:ntp
  2. Disable Security Enhanced Linux (SELinux) permanently. This is important because HDFS maintains replication in different nodes/racks, so setting firewall rules in SELinux will lead to performance degradation.

    sudo setenforce Permissive
  3. In /etc/selinux/config, set SELINUX=permissive to make sure the settings persist after reboot:

    SELINUX=permissive
  4. Install libaio.x86_64.

    Libaio has a huge performance benefit over the standard POSIX asynchronous I/O facility because the operations are performed in the Linux kernel instead of as a separate user process.

    sudo yum -y install libaio.x86_64
  5. Install lzop.x86_64.

    Hadoop requires LZO compression libraries.

    sudo yum install lzop.x86_64
  6. Disable the firewall and check your iptable rules.

    sudo systemctl disable firewalld
    sudo systemctl stop firewalld
    sudo iptables -F
    sudo iptables -L
  7. Prepare the second disk (for example, /dev/sdc) with at least 500 GB that was configured previously on Azure portal. Use fdisk -l to check any 500GB disk without partition. This step requires root privilege.

    sudo su -
     
     
    # List all disks and partitions
    # You should see one called "sdc" if you attached a 500-1000 GB disk.
    fdisk -l
    fdisk /dev/sdc
    # p (list current partitions)
    # n (new partition)
    # p (primary)
    # Keep accepting rest of default configs.
    # w (save)
     
    # Format the disk
    /usr/sbin/mkfs -t ext4 /dev/sdc
     
     
    mkdir -p /srv
     
    DISKUUID=`/usr/sbin/blkid |grep ext4 |grep sdc | awk '{ print $2}' |sed -e 's/"//g'`
    echo $DISKUUID
     
    # Mount the disk on /srv
    echo "${DISKUUID}    /srv   ext4 defaults  0 0" >> /etc/fstab
    mount /dev/sdc1 /srv
     
    # Verify the disk space
    df -hT /srv
     
    Filesystem     Type  Size  Used Avail Use% Mounted on
    /dev/sdc1      ext4  197G   61M  187G   1% /srv
     
     
    # Set permissions for Unravel and symlink Unravel's directories to the /srv mount
    mkdir -p /srv/local/unravel
    chmod -R 755 /srv/local
    ln -s /srv/local/unravel /usr/local/unravel
    chmod 755 /usr/local/unravel      
  8. Create the hdfs user and the hadoop group.

    sudo useradd  hdfs
    sudo groupadd hadoop
    sudo usermod -a -G hadoop hdfs
(Optional) Install MySQL

Complete the [Before Installing Unravel RPM] steps in Install and Configure MySQL for Unravel.

Install the Unravel Server RPM on the VM
  1. Get the Unravel Server RPM.

    Download the RPM from the Unravel distribution server to the Unravel VM. For instructions, see Download Unravel Software.

    cd /tmp
    # Note that the same RPM is used for both EMR and HDInsight.
    curl -u username:password -v https://preview.unraveldata.com/unravel/RPM/version/unravel-version-EMR-latest.rpm -o unravel-version-EMR-latest.rpm 
  2. Install the Unravel Server RPM.

    Tip

    The precise filename can vary, depending on how it was fetched or copied. The rpm command does not require a .rpm suffix. The flag -Uworks for both initial installations and upgrades.

    sudo rpm -U unravel-4.4.3.0-EMR-latest.rpm
  3. Run the await_fixups.sh script to make sure background processing is finished before you proceed to other steps.

    Note

    If you're doing a routine upgrade, you can start all Unravel daemons, but don't stop or restart them until await_fixups.sh prints DONE (it takes a few minutes).

    /usr/local/unravel/install_bin/await_fixups.sh
    DONE

    This installation creates the following directories, databases, and users:

    Directories: The installation creates /usr/local/unravel/ which contains the executables, scripts, and settings (/usr/local/unravel/etc/unravel.properties).

    /etc/init.d/unravel_* contains scripts for controlling the Unravel services

    /etc/init.d/unravel_all.sh can be used to manually stop, start, restart, and get the status of all daemons in the proper order.

    Subsequent RPM upgrades don't change /usr/local/unravel/etc/unravel.properties because your site-specific properties are put into this file.

    Users: User unravel is created if it does not already exist.

    DB: The initial bundled Postgres database and other durable state are put in /srv/unravel/ This can later be switched to an external RDS. We recommend an externally managed MySQL DB for production, such as Azure SQL DB.

    Config: The master configuration file is in /usr/local/unravel/etc/unravel.properties

    Logs: All logs are in /usr/local/unravel/logs/

  4. Grant access to Unravel Server

    Security Reminder

    Do not make Unravel Server UI TCP port 3000 accessible on the public Internet because doing so would violate your licensing terms.

    • By default, a Public IP should be assigned to the Unravel VM .

    • Create a security policy that allows SSH access to Unravel VM from your trusted network. For the Azure HDInsight cluster(s), it is required to allow port 443 (HTTPS) from Azure networks (or simply allow TCP port 443 from the outside).

    • It is recommended that you use an SSH key to access the Unravel node.

If You Installed MySQL, Configure It

Complete the [After Installing Unravel RPM] steps in Install and Configure MySQL for Unravel.

Modify Properties and Start Unravel Daemons
  1. Open an SSH session to the Unravel VM.

    ssh -i ssh-private-key ssh-user@unravel-host
  2. Set correct permissions on the Unravel configuration directory.

    cd /usr/local/unravel/etc
    sudo chown unravel:unravel *.properties
    sudo chmod 644 *.properties
  3. Update unravel.ext.sh based on how you plan to configure your cluster.

    Tip

    To find your cluster's HDInsight version, see https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning#supported-hdinsight-versions. You need this information for the commands below.

    # Find the version of HDP that is installed by checking the HDP symlink. Take the first 2 digits, such as 2.6
    # You can also check https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning#supported-hdinsight-versions 
    hdp-select status | grep hadoop
    hadoop-client - 2.6.5.3005-27
    
    # Append this classpath based on the version you found
    echo "export CDH_CPATH=/usr/local/unravel/dlib/hdp2.6.x/*" >> /usr/local/unravel/etc/unravel.ext.sh
  4. Run the "switch user" script.

    /usr/local/unravel/install_bin/switch_to_user.sh hdfs hadoop
  5. In /usr/local/unravel/etc/unravel.properties, add/modify the following properties:

    • com.unraveldata.onprem

      This is optional at this time but is required later.

      echo "com.unraveldata.onprem=false" >> /usr/local/unravel/etc/unravel.properties
    • Modify other properties using the guidelines in the table below:

    • Update the following properties for an HDInsight cluster, depending on whether you're using Blob storage or ADLS.

      Set these properties with values you obtain from Azure. For help in locating the right values, see Finding Unravel Properties' Values in Microsoft Azure.

      • For Blob (WASB) storage, update:

      • For ADLS, update:

Restart Unravel Server

Whenever you modify com.unraveldata.login.admins in /usr/local/unravel/etc/unravel.properties, you must restart Unravel Server for the changes to take effect.

The echo command shows the page to visit from your web browser.

If you are using an SSH tunnel or HTTP proxy, you might need to make adjustments to the host/IP of the URL:

sudo /etc/init.d/unravel_all.sh restart
sleep 60
Log into Unravel UI
  1. Run the echo command to find the URL for Unravel UI.

    If you are using an SSH tunnel or HTTP proxy, you might need to make adjustments to the host/IP of the URL:

    echo "http://(hostname -f):3000/"
  2. Create an SSH tunnel to access the Azure VM for Unravel's TCP port 3000.

    ssh -i ssh-private-key ssh-user@unravel-host -L 3000:127.0.0.1:3000
  3. Using a supported web browser, navigate to http://127.0.0.1:3000 and log in as user admin with password unraveldata.

    signin.png

Congratulations! Unravel Server is up and running. Proceed to Connecting Unravel to the HDInsight Cluster.

Connecting Unravel to the HDInsight Cluster

This topic explains how to spin up a Hadoop, HBase, Spark, or Kafka cluster, configure the cluster with a script action, and connect it to Unravel Server.

Before Unravel can analyze any job running on your HDInsight cluster, the Unravel server and sensors must be deployed on the cluster nodes through an Azure "script action".

Note

For HDInsight clusters without Internet access, you can download these scripts and store them in your Azure blob storage and use the blob storage URI on the script action's Bash script URI field.

Unravel provides two types of "script actions" depending on the type of cluster.

Cluster Type

Download path

Supported HDI cluster(s)

Apply to cluster node type(s)

Hadoop, HBase, or Spark

unravel_hdi_spark_bootstrap_4.5.sh

Hadoop 2.7.3

HBase 1.1.2

Spark 2.1.0, 2.2.0, 2.3.0

Head node, Worker node, Edge node

Kafka

unravel_hdi_kafka_bootstrap.sh

Kafka 0.10.0, Kafka 1.0.0, Kafka 1.1.0

Head node

Checks before running script action

Read the latest documentation on the ports required by HDInsight: https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-port-settings-for-services

Ensure Unravel service is running on Unravel VM and ports 3000 and 4043 are reachable from the Azure HDInsight cluster master node before running the the Unravel "script action" script.

For example,

ssh -i ssh_key ssh_user@unravel-host
sudo su -
netstat -anp | grep 3000
tcp 0 0 0.0.0.0:3000 0.0.0.0:* LISTEN 65072/node
hostname

# On one of the cluster's head nodes:
ping unravel-host

Depending on your cluster type, choose one of these options:

Prerequisites
  • You must already have an Unravel VM on Azure HDInsight running and the Unravel UI available on port 3000. For instructions, see Creating and Configuring the Azure VM.

  • If you plan to create a cluster, you must have the following information ready:

    • Virtual Network and subnet of the Unravel VM

    • Your Azure Storage details. For storage setup, see Create Azure Storage.

Option A: Connecting to a New Cluster
  1. Log into the Azure portal (https://portal.azure.com).

  2. Select HDInsight cluster.

  3. In the dialog box, enter the details for your desired cluster type, topology, OS, and so on.

    hdi-clusters.png
  4. In the Security + networking tab, make sure to select the same virtual network and subnet that is used by the Unravel VM.

    hdi-clusters-security.jpg
  5. In the Storage tab, select whether to use Azure Blob Storage or Azure Data Lake Storage, plus any secondary accounts.

  6. In the Cluster size tab, select your desired topology for number of workers and VM types.

    hdi-clusters-size.png
  7. Optional: In the Script action tab, refer to option B and option C if you wish to set up a "script action" script for your desired cluster type at this time. You can always do this step after the cluster has been deployed.

  8. In the Summary - Confirm configurations tab, review your cluster and click on Create. It should take anywhere from 5-15 minutes to create your cluster dependency on the size and parameters.

Option B: Connecting to an Existing Hadoop, HBase, or Spark Cluster
  1. Log into the Azure portal (https://portal.azure.com).

  2. Select HDInsight cluster.

  3. Select the Hadoop, HBase, or Spark cluster that you want to apply the Unravel "script action" script to.

  4. Click Script actions on the vertical menu, and click Submit new.

  5. Specify the information on the right column for the fields on the left.

    Script type

    Custom

    Name

    unravel-script-01 (or any name to identify this script action run)

    Bash script URI

    https://raw.githubusercontent.com/unravel-data/public/master/hdi/hdinsight-unravel-spark-script-action/unravel_hdi_spark_bootstrap_4.5.sh

    Node type(s)

    Select options Head, Worker, and Edge. Note: The Edge option is only available for an existing cluster.

    Parameters

    --unravel-server unravel-host-private-ip:3000 --spark-version spark-version [--metrics-factor N]

    The optional flag --metrics-factor specifies how often, in multiples of 5 second JVM metrics are obtained. Default value is 1, which means 5 seconds. For workloads dominated by long-running jobs, use a larger factor. For example, if a cluster only has one Spark job that takes hours, use a factor of 12, or 60 seconds.

    For example,

    --unravel-server 10.10.1.10:3000 --spark-version 2.3.0

    To undo the changes use the --uninstall parameter.

    For example,

    --unravel-server 10.10.1.10:3000 --spark-version 2.3.0 --uninstall

    Persist this script action to rerun when new nodes are added to the cluster.

    Select this check box. Note that persistence only applies on new Head and Worker nodes

    hdi-script-action.png
  6. Click Create.

Option C: Connecting to Existing Kafka Cluster(s)
  1. Log into the Azure portal (https://portal.azure.com).

  2. Select HDInsight cluster.

  3. Select the Kafka cluster that you want to apply the Unravel "script action" script to.

  4. If the Kafka cluster has no Internet access, download the HDInsightUtilities-v01.sh script and copy it to the Kafka head node's /tmp folder.

    For example,

    wget -O /tmp/HDInsightUtilities-v01.sh -q https://hdiconfigactions.blob.core.windows.net/linuxconfigactionmodulev01/HDInsightUtilities-v01.sh
  5. Click Script actions | Submit new and enter the following information:

    Script type

    Custom

    Name

    unravel-script-01 (or any name to identify this script action run)

    Bash script URI

    https://raw.githubusercontent.com/unravel-data/public/master/hdi/hdinsight-unravel-kafka-script-action/unravel_hdi_kafka_bootstrap.sh

    Node type(s)

    Head

    Parameters

    --unravel-server unravel-server-private-ip:3000

    For example,

    --unravel-server 10.10.1.10:3000

    Persist this script action

    Checked. Note that persistence only applies on new Head nodes

  6. Click Create.

  7. After the Kafka script action script completed successfully, open an SSH session to the Kafka cluster's "head node" and append the contents of /tmp/unravel/unravel.ext.properties to /usr/local/unravel/etc/unravel.properties on your Unravel VM.

    In a multi-cluster deployment, com.unraveldata.ext.kafka.clusters is a comma-separated list of clusters and the set of properties prefixed with com.unraveldata.ext.kafka.cluster_name are repeated for each cluster.

    For example, for two Kafka clusters, /tmp/unravel/unravel.ext.properties looks like this:

    # Adding Kafka properties
    Adding Kafka properties
    com.unraveldata.ext.kafka.clusters=cluster_name1, cluster_name2
    com.unraveldata.ext.kafka.cluster_name1.bootstrap_servers=wn0-cluster_name1:9092,wn1-cluster_name1:9092
    com.unraveldata.ext.kafka.cluster_name1.jmx_servers=broker1,broker2
    com.unraveldata.ext.kafka.cluster_name1.jmx.broker1.host=wn0-cluster_name1
    com.unraveldata.ext.kafka.cluster_name1.jmx.broker1.port=9999
    com.unraveldata.ext.kafka.cluster_name1.jmx.broker2.host=wn1-cluster_name1
    com.unraveldata.ext.kafka.cluster_name1.jmx.broker2.port=9999
    
    com.unraveldata.ext.kafka.cluster_name2.bootstrap_servers=wn0-cluster_name2:9092,wn1-cluster_name2:9092
    com.unraveldata.ext.kafka.cluster_name2.jmx_servers=broker1,broker2
    com.unraveldata.ext.kafka.cluster_name2.jmx.broker1.host=wn0-cluster_name2
    com.unraveldata.ext.kafka.cluster_name2.jmx.broker1.port=9999
    com.unraveldata.ext.kafka.cluster_name2.jmx.broker2.host=wn1-cluster_name2
    com.unraveldata.ext.kafka.cluster_name2.jmx.broker2.port=9999

    Note

    Unravel VM must have access to the Kafka worker nodes' broker port 9092 and Kafka JMX port 9999

  8. After updating the Kafka properties, restart Unravel Server.

    sudo /etc/init.d/unravel_all.sh restart
Next Steps

For additional configuration and instrumentation options, see Next Steps.

Troubleshooting Tips
  • From the Azure portal, you can check if a script action finished successfully by checking the SCRIPT ACTION HISTORY:

    hdi-submit-new.jpg
  • If script action process fails, you can check the error messages from the HDInsight cluster's Ambari dashboard, which has a balloon next to the cluster name on the top menu bar with the recent operations.hdi-title-ops.png

    Click Ops and search for the most recent run_customscriptaction command and inspect the log messages. You may see multiple entries of run_customscriptaction which were created by previous runs.

    run_customscriptaction.jpg
  • The Unravel script action cannot be rerun. If you need to redeploy the Unravel script action, you must submit a new "script action" script with a different name.