Home

[INTERNAL] Update the Azure Resource Manager if needed

For the dev team, you may need to update the ARM (Azure Resource Manager) Template for Kafka or Spark clusters:

[INTERNAL] Step 3: Create Unravel VM using the ARM template

Unravel VM can be deployed using Azure Resource Manager (ARM) template. This can automate the Unravel VM node setup; however the ARM template needs to be updated to reflect your specific environment.

The following ARM templates and parameter JSON file are for reference only. You need to update or create your own template files.

This example template creates an Azure "standard E8s V3" VM in the existing VNET and subnet, and it adds a data disk on the VM for the /srv mount point.

Note

You will need to update VNET, subnetRef, vmName, adminPassword before running the ARM template to create VM

Optionally change the size of data disk; it is currently set to 500G. If you change the disk size, update unravel-setup.sh script (search for 500).

For CentOS 7.3 Unravel VM

For RedHat 7.4 Unravel VM

Note

Both ARM template and parameter files have to be modified to fit your Azure environment

This ARM template embedded with Azure Extension script to download and install Unravel RPM. The Azure Extension script for Unravel RPM installation

Note

The custom extension script will fix most of the basic unravel configuration; however you will have to manually edit /usr/local/unravel/etc/unravel.properties file for blob storage account or data lake store access information.

For instructions on editing unravel.properties, see Step 1.

Below is the content of this extension script

# Download unravel rpm
/usr/bin/wget http://preview.unraveldata.com/unravel/RPM/4.2.7/Azure/unravel-4.2.7-Azure-latest.rpm

BLOBSTORACCT=${1}
BLOBPRIACKEY=${2}
BLOBSECACKEY=${3}

DLKSTOREACCT=${4}
DLKCLIENTAID=${5}
DLKCLIENTKEY=${6}
DLKCLITOKEPT=${7}
DLKCLIROPATH=${8}


# Prepare the VM for unravel rpm install
/usr/bin/yum install -y ntp
/usr/bin/yum install -y libaio
/usr/bin/yum install -y lzop
/usr/bin/systemctl enable ntpd
/usr/bin/systemctl start ntpd
/usr/bin/systemctl disable firewalld
/usr/bin/systemctl stop firewalld

/usr/sbin/iptables -F

/usr/sbin/setenforce 0
/usr/bin/sed -i 's/enforcing/disabled/g' /etc/selinux/config /etc/selinux/config

sleep 30


# Prepare disk for unravel
mkdir -p /srv

DATADISK=`/usr/bin/lsblk |grep 500G | awk '{print $1}'`
echo $DATADISK > /tmp/datadisk
echo "/dev/${DATADISK}1" > /tmp/dataprap

echo "Partitioning Disk ${DATADISK}"
echo -e "o\nn\np\n1\n\n\nw" | fdisk /dev/${DATADISK}

DATAPRAP=`cat /tmp/dataprap`
DDISK=`cat /tmp/datadisk`
/usr/sbin/mkfs -t ext4 ${DATAPRAP}

DISKUUID=`/usr/sbin/blkid |grep ext4 |grep $DDISK  | awk '{ print $2}' |sed -e 's/"//g'`
echo "${DISKUUID}    /srv   ext4 defaults  0 0" >> /etc/fstab

/usr/bin/mount -a

# install unravel rpm
/usr/bin/rpm  -U unravel-4.2.7-Azure-latest.rpm

/usr/bin/sleep 5


# Update Unravel Lic Key into the unravel.properties file
# Obtain a valid unravel Lic Key file ; the following is just non working one
echo "com.unraveldata.lic=1p6ed4s492012j5rb242rq3x3w702z1l455g501z2z4o2o4lo675555u3h" >> /usr/local/unravel/etc/unravel.properties
echo "export CDH_CPATH="/usr/local/unravel/dlib/hdp2.6.x/*"" >> /usr/local/unravel/etc/unravel.ext.sh

# Update Azure blob storage account credential in unravel.properties file
# Update and uncomment the following lines to reflect your Azure blob storage account name and keys

if [ $BLOBSTORACCT != "NONE" ] && [ $BLOBPRIACKEY != "NONE" ] && [ $BLOBSECACKEY != "NONE" ]; then

   echo "blob storage account name is ${BLOBSTORACCT}"
   echo "blob primary access key is ${BLOBPRIACKEY}"
   echo "blob secondary access key is ${BLOBSECACKEY}"
   echo "# Adding Blob Storage Account information, Update and uncomment following lines" >> /usr/local/unravel/etc/unravel.properties
   echo "com.unraveldata.hdinsight.storage-account-name-1=fs.azure.account.key.${BLOBSTORACCT}.blob.core.windows.net" >> /usr/local/unravel/etc/unravel.properties
   echo "com.unraveldata.hdinsight.primary-access-key=${BLOBPRIACKEY}" >> /usr/local/unravel/etc/unravel.properties
   echo "com.unraveldata.hdinsight.storage-account-name-2=fs.azure.account.key.${BLOBSTORACCT}.blob.core.windows.net" >> /usr/local/unravel/etc/unravel.properties
   echo "com.unraveldata.hdinsight.secondary-access-key=${BLOBSECACKEY}" >> /usr/local/unravel/etc/unravel.properties

else
   echo "One or more of your blob storage account parameter is invalid, please check your parameter file"
fi

sleep 3

if [ $DLKSTOREACCT != "NONE" ] && [ $DLKCLIENTAID != "NONE" ] && [ $DLKCLIENTKEY != "NONE" ] && [ $DLKCLITOKEPT != "NONE" ] && [ $DLKCLIROPATH != "NONE" ]; then

   echo "Data Lake store name is ${DLKSTOREACCT}"
   echo "Data Lake Client ID is ${DLKCLIENTAID}"
   echo "Data Lake Client Key is ${DLKCLIENTKEY}"
   echo "Data Lake Access Token is ${DLKCLITOKEPT}"
   echo "Data Lake Client Root Path is ${DLKCLIROPATH}"
   echo "# Adding Data Lake Account information, Update and uncomment following lines" >> /usr/local/unravel/etc/unravel.properties
   echo "com.unraveldata.adl.accountFQDN=${DLKSTOREACCT}.azuredatalakestore.net" >> /usr/local/unravel/etc/unravel.properties
   echo "com.unraveldata.adl.clientId=${DLKCLIENTAID}" >> /usr/local/unravel/etc/unravel.properties
   echo "com.unraveldata.adl.clientKey=${DLKCLIENTKEY}" >> /usr/local/unravel/etc/unravel.properties
   echo "com.unraveldata.adl.accessTokenEndpoint=${DLKCLITOKEPT}" >> /usr/local/unravel/etc/unravel.properties
   echo "com.unraveldata.adl.clientRootPath=${DLKCLIROPATH}" >> /usr/local/unravel/etc/unravel.properties

else
  echo "One or more of your data lake storge parameter is invalid, please check your parameter file"
fi

# Adding unravel properties for Azure Cloud

echo "com.unraveldata.onprem=false" >> /usr/local/unravel/etc/unravel.properties
echo "com.unraveldata.spark.live.pipeline.enabled=true" >> /usr/local/unravel/etc/unravel.properties
echo "com.unraveldata.spark.appLoading.maxAttempts=10" >> /usr/local/unravel/etc/unravel.properties
echo "com.unraveldata.spark.appLoading.delayForRetry=4000" >> /usr/local/unravel/etc/unravel.properties

# Starting Unravel daemons
# uncomment below will start unravel daemon automatically but within unravel_all.sh start  will have exit status=1.
# Thus we recommend login to unravel VM and run unravel_all.sh manually
# /etc/init.d/unravel_all.sh start

1. Download the ARM template and parameter JSON files onto your configured Azure CLI workstation.

2. Use the Azure CLI to deploy Unravel VM using this template and parameters JSON file.

To Validate template before deployment.

az group deployment validate  --resource-group RESOURCEGROUPNAME --template-file azuredeploy.json --parameters azuredeploy.parameters.json
az group deployment create --name deploymentname --resource-group RESOURCEGROUPNAME --template-file azuredeploy.json --parameters azuredeploy.parameters.json

2. When the Unravel VM creation completes, open an SSH session to the VM using your defined ssh user, and start Unravel services.

/etc/init.d/unravel_all.sh start
[INTERNAL] Step 4: ARM template for Spark2 HDinsight cluster with Unravel script actions
Install a Spark 2.1 cluster with Unravel script action

This ARM template allows you to create a Linux-based Spark2 HDInsight cluster and a Spark edge node. This template also runs the Unravel's Script Actions script to setup Unravel Sensors and configuration on header, worker, and edge nodes.

This ARM template uses the existing VNET, Subnet, and Storage Account on the same resource group. You need to update those values in the parameter variables to reflect your Azure environment.

A Spark edge node is a Linux virtual machine with the same client tools installed and configured as in the headnodes. You can use Spark edge node for accessing the cluster and testing/hosting your client applications.

Note

You must deploy an Unravel VM and update the script action parameters --unravel-server unravel-ip:3000 --spark-version 2.1.0

Optionally you can change the VM size of header, worker and edge nodes; and currently they are all using VM size of "Standard_D3_v2"

Tip

After modifying this template please validate it before applying. HDInsight cluster creation takes about 15 - 25 minutes

Install using CLI

1. Download the ARM template and parameter JSON files into your configured Azure CLI workstation.

2. Validate template before deployment.

az group deployment validate --resource-group RESOURCEGROUPNAME --template-file azuredeploy.json --parameters azuredeploy.parameters.json

3. Use the Azure CLI to deploy Spark 2.1 cluster using this template and parameters JSON file.

az group deployment create --name deploymentname --resource-group RESOURCEGROUPNAME --template-file azuredeploy.json --parameters azuredeploy.parameters.json
(Optional) Install manually on an existing Spark2 cluster

1. From Azure portal, click the resource of the target Spark2 cluster under your resource group and click "Script actions".

hdi-script-actions.png

In Script Actions dialog box:

SubmitScript.png

Warning

The checkbox for Persist this script action to rerun when ... only affects new worker nodes when scaling up the worker nodes. Checking this box does not automatically apply the Unravel Script Action script to newly added edge node from ARM template.

Tip

You can upload the unravel_hdi_bootstrap.sh script into the Azure blob storage and use the new URI path from your blob

2. Script Action validates the script and then processes it. Monitor the Azure portal until script actions has completed.

3. After completion, login to Ambari and check the Ambari task status.

Install additional edge node on existing HDinsight clusterThe ARM template for install edge node with Unravel script action onlyUse the ARM template to install the edge node with your custom install script action script and Unravel script action (two scripts are run in this example)

In this example, an edge node will be created first. Next, it runs emptynode-setup.sh, followed by Unravel's Action script: unravel_hdi_bootstrap.sh

Note

Use of the above ARM template for edge node requires change in scriptActionUri path and application name in variables and also parameters for cluster name. Adjust the ARM templates for your setup and validate it before using.

[INTERNAL] Step 5: ARM template for Kafka cluster with Unravel script actions

This ARM template allows you to create a Linux-based Kafka cluster with Unravel setup on predefined Unravel VM. The Unravel Script Actions script for Kafka is specifically designed for Kafka cluster and needs to run on header nodes only .

This template runs Unravel's Script Actions in order to detect the Kafka cluster bootstrap servers and jmx broker nodes. It updates unravel.properties on Unravel VM with the detected values.

This ARM template uses an existing VNET, Subnet and Storage Account on the same resource group. The worker nodes in this Kafka cluster use two data disks per node.

Note

You need to deploy unravel VM and update the Script Actions parameters UNRAVEL__IP:3000.

Tip

Validate the modified ARM template before applying. The HDInsight cluster creation takes about 15 - 25 minutes

Apply script actions via CLI

1. Download the ARM template and parameter JSON files into your configured Azure CLI workstation.

2. Modify parameter file.

Change the parameter values to reflect your Azure environment, i.e., VNET, Subnet, StorageAccount, Cluster name, etc.

You can change the VM size of header, worker nodes. By default they use "Standard_D3_v2" as the VM size.

3. Validate the template before deployment.

az group deployment validate --resource-group RESOURCEGROUPNAME --template-file azuredeploy.json --parameters azuredeploy.parameters.json

4. Use Azure CLI to deploy the Kafka cluster using the template and parameters JSON file.

az group deployment create --name deploymentname --resource-group RESOURCEGROUPNAME --template-file azuredeploy.json --parameters azuredeploy.parameters.json

5. After the Kafka cluster is successfully created, the unravel Script Actions script updates the /usr/local/unravel/etc/unravel.properties on Unravel's VM with the Kafka configuration values.

The following is the example of the lines appended to unravel.properties file. The actual values will reflect your particular environment.

com.unraveldata.ext.kafka.clusters=seuguiko98003
com.unraveldata.ext.kafka.seuguiko98003.bootstrap_servers=wn0-seugui:9092,wn1-seugui:9092
com.unraveldata.ext.kafka.seuguiko98003.jmx_servers=broker1,broker2
com.unraveldata.ext.kafka.seuguiko98003.jmx.broker1.host=wn0-seugui
com.unraveldata.ext.kafka.seuguiko98003.jmx.broker1.port=9999
com.unraveldata.ext.kafka.seuguiko98003.jmx.broker2.host=wn1-seugui
com.unraveldata.ext.kafka.seuguiko98003.jmx.broker2.port=9999

6. Once unravel.properties file is updated, manually restart unravel_km on Unravel VM.

/etc/init.d/unravel_km restart
Alternatively, apply Unravel Kafka script actions scripts manually on an existing Kafka cluster

If you already have an existing Kafka cluster, you can apply Unravel's Kafka script via Azure portal.

1. From Azure portal, click the resource of the target Kafka cluster under your resource group and click Script Actions.

azure-cluster-overview-script-actions

2. In the Script Actions dialog box:

azure-cluster-overview-script-actions-submit

Tip

You can upload the unravel_hdi_kafka_bootstrop.sh script into the Azure blob storage and use the new URI path from your blob

3. Script Action validates the script and then processes it. Monitor the Azure portal until script actions has completed

4. After completion, login to Ambari of the cluster and check the Ambari task status.

4. Login to unravel VM and restart the unravel Kafka monitor daemon, unravel_km.

/etc/init.d/unravel_km restart