- Home
- Unravel 4.6.1 Documentation
- Installation
- Platforms
- Microsoft Azure HDInsight
- Part 1: Installing Unravel on a Separate Azure VM
Part 1: Installing Unravel on a Separate Azure VM
Installing Unravel on a separate Azure VM allows you to connect to ephemeral Hadoop clusters on the same virtual network.
This topic explains how to create a separate Azure VM, install the Unravel RPM, and configure it.
Important
If you have not already done so, confirm your cluster meets Unravel's hosting requirements.
1. Provision an Azure VM for Unravel Server
Log into the Azure portal.
Select Virtual machines, and click + Add.
On the Basics tab, enter values for the following fields:
Subscription: Select the subscription type.
Resource Group: Select the resource group to associate with this VM. The VM inherits configurations for lifecycle, permissions, and policies from this group.
Virtual machine name: Enter a name, using only alphanumeric characters, hypens ("-"), and underscores ("_"). The VM name you specify here also becomes the VM's hostname.
Region: Select a data center region for this VM. Some VM types are unavailable in some regions.
Availability options: Select your redundancy (durability) settings.
Image: Select a compatible underlying operating system for the VM. See Unravel's Azure HDI compatibility matrix.
Size (required): Select a VM type that meets Unravel's requirements.
Select your VM's Authentication type.
Tip
Best practice is to authenticate using an SSH public key, which you can generate using ssh-keygen. Avoid any reserved names like "admin" for the username.
Set Inbound Port Rules:
If you plan to allow external access to Unravel UI, select Allow selected ports and then select HTTPS and SSH.
Click Next: Disks.
On the Disks tab, enter values for the following fields:
OS disk type: For better performance in production, we recommend a Premium SSD because it tolerates higher IOPS. For a dev/test cluster, we suggest a Standard SSD.
Advanced: We recommend using managed disks that have better performance and reliability.
Data disks:
If you don't have a disk ready, click Create and attach new disk. In the dialog box, specify the disk name, size in GiB (must meet Unravel's minimum requirements), and source type of "empty disk".
Otherwise, click Attach an existing disk.
Click Next: Networking.
On the Networking tab, enter values for the following fields:
Warning
For HDInsight, it's essential that the VM, the Azure storage, and the cluster(s) you plan to monitor are all on the same virtual network and subnet(s).
Virtual network (required): Select the appropriate virtual network for your cluster(s).
Subnet (required): Select a subnet with the appropriate address range based on the number of IPs you plan to have in your network.
NIC network security group: Set this to Basic.
For HDInsight, a TCP and UDP connection is needed from the "head node" of each HDInsight cluster to Unravel Server.
Add an inbound security policy to allow SSH access and 443 access to the Unravel node.
The default security policy should allow all access within the VNET. Default rules start with a priority of 65000.
Click Review + create.
Click Create.
It takes about 2 minutes to create your VM.
When Azure completes the creation of your VM, click Go to resource.
Copy the VM's public IP address.
Open an SSH session to your VM's public IP address to verify that your IP address is reachable:
ssh -i
ssh-key
user
@ip-address
Verify that eth0 on the new VM is bound to the private IP address shown in the Azure portal.
ifconfig eth0 Link encap:Ethernet HWaddr 00:0d:3a:1b:c2:48 inet addr:10.10.1.96
2. Configure the VM
Install ntpd, start it at boot time, and confirm that the time on the VM is accurate.
This is necessary in order to synchronize your VM's clock.
sudo su - yum install ntp ntpd -u ntp:ntp
Disable Security Enhanced Linux (SELinux) permanently. This is important because HDFS maintains replication in different nodes/racks, so setting firewall rules in SELinux leads to performance degradation.
sudo setenforce Permissive
In
/etc/selinux/config
, setSELINUX=permissive
to make sure the settings persist after reboot:SELINUX=permissive
Install
libaio.x86_64
.Libaio has a huge performance benefit over the standard POSIX asynchronous I/O facility because the operations are performed in the Linux kernel instead of as a separate user process.
sudo yum -y install libaio.x86_64
Install
lzop.x86_64
.Hadoop requires LZO compression libraries.
sudo yum install lzop.x86_64
Disable the firewall and check your iptable rules.
sudo systemctl disable firewalld sudo systemctl stop firewalld sudo iptables -F sudo iptables -L
Prepare the second disk (for example,
/dev/sdc
) with at least 500 GB that was configured previously on Azure portal. Use fdisk -l to check any 500GB disk without partition. This step requires root privilege.sudo su - # List all disks and partitions # You should see one called "sdc" if you attached a 500-1000 GB disk. fdisk -l fdisk /dev/sdc # p (list current partitions) # n (new partition) # p (primary) # Keep accepting rest of default configs. # w (save) # Format the disk /usr/sbin/mkfs -t ext4 /dev/sdc mkdir -p /srv DISKUUID=`/usr/sbin/blkid |grep ext4 |grep sdc | awk '{ print $2}' |sed -e 's/"//g'` echo $DISKUUID # Mount the disk on /srv echo "${DISKUUID} /srv ext4 defaults 0 0" >> /etc/fstab mount /dev/sdc1 /srv # Verify the disk space df -hT /srv Filesystem Type Size Used Avail Use% Mounted on /dev/sdc1 ext4 197G 61M 187G 1% /srv # Set permissions for Unravel and symlink Unravel's directories to the /srv mount mkdir -p /srv/local/unravel chmod -R 755 /srv/local ln -s /srv/local/unravel /usr/local/unravel chmod 755 /usr/local/unravel
If you have HDInsight clusters, create the
hdfs
user and thehadoop
group.Important
If you have Databricks workspaces, skip this step.
sudo useradd hdfs sudo groupadd hadoop sudo usermod -a -G hadoop hdfs
3. Install Unravel Server on the VM
Download the Unravel Server RPM.
Install the Unravel Server RPM:
sudo rpm -ivh unravel-
version
.rpmThis installation creates the following directories, databases, and users:
Directories: The installation creates
/usr/local/unravel/
which contains the executables, scripts, and settings (/usr/local/unravel/etc/unravel.properties
)./etc/init.d/unravel_*
contains scripts for controlling the Unravel services/etc/init.d/unravel_all.sh
can be used to manually stop, start, restart, and get the status of all daemons in the proper order.Subsequent RPM upgrades don't change
/usr/local/unravel/etc/unravel.properties
because your site-specific properties are put into this file.Users: User
unravel
is created if it does not already exist.Config: The master configuration file is
/usr/local/unravel/etc/unravel.properties
.Logs: All logs are in
/usr/local/unravel/logs/
Grant access to Unravel Server:
By default, a Public IP should be assigned to the Unravel VM.
Create a security policy that allows SSH access on to the Unravel VM through port 443.
It is recommended that you use an SSH key to access the Unravel node.
4. Set up your database using one of the following methods
a. Set up an Azure MySQL instance for Unravel.
or
5. Configure Unravel Server with basic options
Open an SSH session to the Unravel VM.
ssh -i
ssh-private-key
ssh-user
@unravel-host
Set correct permissions on the Unravel configuration directory.
cd /usr/local/unravel/etc sudo chown unravel:unravel *.properties sudo chmod 644 *.properties
Update
unravel.ext.sh
based your cluster's HDInsight version.hdp-select status | grep hadoop hadoop-client - 2.6.5.3005-27 # Append this classpath based on the version you found echo "export CDH_CPATH=/usr/local/unravel/dlib/hdp2.6.x/*" >> /usr/local/unravel/etc/unravel.ext.sh
Run the
switch_to_user.sh
script./usr/local/unravel/install_bin/switch_to_user.sh hdfs hadoop
Set up permissions for ABFS (ADLS Gen 2) using Managed Identity. Refer to Setting up user assigned managed identity.
In
unravel.properties
, add/modify the following properties:Set com.unraveldata.onprem to
false
.Property/Description
Set by user
Unit
Default
com.unraveldata.onprem
Specifies whether the deployment is on premise or on cloud.
Important
For Azure Databricks, EMR, and HDInsight set to False
boolean
true
Set general properties:
Property/Description
Set by user
Unit
Default
com.unraveldata.customer.organization
Customer name. Used to identify your installation for reporting and notification purposes in Unravel UI.
Optional
string
Not Set
com.unraveldata.advertised.url
Defines the Unravel Server URL for HTTP traffic.
Example: http://unravelserver.company.com:3000
string
http://{host}:3000
com.com.unraveldata.hdfs.timezone
Timezone of HDFS, for example, US/Eastern, Etc/GMT-4, America/New_York. If the timezone is not set then an error message is logged and UTC timezone is used.
Possible timezones can be obtained by calling
TimeZone.getAvailableIDs()
.string
-
com.unraveldata.tmpdir
The base location for Unravel process control files where Unravel's temp files reside.
string
(path)
/srv/unravel/tmp
com.unraveldata.history.maxSize.weeks
Number of weeks retained for search results in Elastic Search.
integer
5
com.unraveldata.retention.max.days
Number of days to keep the heaviest data (such as error logs and drill-down details) in the SQL Database.
integer
30
Point Unravel to your Azure storage account(s) and their storage formats:
Property/Description
Set by user
Unit
Default
com.unraveldata.hdinsight.storage-account.
X
Storage account name that a HDInsight cluster uses.
You must define this property for each storage account.
X
starts with 1 and then is incremented by 1 for each additional account. The account numbers must be consecutive.Optional
string
Azure storage account name.
(See finding the storage name.)
com.unraveldata.hdinsight.access-key.
X
Storage account key.
For each storage-account.
X
you must define access-key.X
If you have two access keys, pick one to use here.Optional
string
Azure storage account key.
(See finding the access key.)
Property/Description
Set by user
Unit
Default
com.unraveldata.azure.storage.wasb.account-name.
X
Name of the WASB storage account that the HDInsight cluster uses.
You must define this property for each WASB storage account.
X
. X=1 for the first storage account and the is incremented by one for each new account, that is, account numbers must be consecutive.Optional
string
Azure storage account name.
(See finding the storage name.)
com.unraveldata.azure.storage.wasb.access-key.
X
WASB storage account key.
For each storage account defined you must define the storage access key. If you have two keys, pick one to use here.
Optional
string
Azure storage account access key.
(See finding the access key.)
Note
In Unravel 4.5.0.5, you can only specify a single ADLS Gen 1 account.
Property/Description
Set by user
Unit
Default
com.unraveldata.adl.accountFQDN
The data lake's fully qualified domain name, for example, mydatalake.azuredatalakestore.net.
Optional
string
Azure storage account name.
(See finding the storage name.)
com.unraveldata.adl.clientId
An application ID. An application registration has to be created in the Azure Active Directory.
Optional
string
Azure application id.
(See finding the application Id.)
com.unraveldata.adl.clientKey
An application access key which can be created after registering an application.
Optional
string
Azure storage access key.
(See finding the storage access key.)
com.unraveldata.adl.accessTokenEndpoint
The OAUTH 2.0 Access Token Endpoint. It is obtained from the application registration tab on Azure portal.
Optional
string
Azure OAUTH 2.0 token endpoint
(See finding the OAUTH endpoint.)
com.unraveldata.adl.clientRootPath
The path in the Data lake store where the target cluster has been given access.
Optional
string
URL
Azure CONTAINER/DIRECTORY path for storage account name.
(See finding the container path.)
Property/Description
Set by user
Unit
Default
com.unraveldata.azure.storage.adl.account-name.
X
The Azure Data Lake Gen1 storage account. The name does not need to be fully qualified. For instance, you can use mydatalake or mydatalake.azuredatalakestore.net.
You must define this property for each storage account.
X
starts with 1 and then is incremented by 1 for each additional account. The account numbers must be consecutive.Optional
string
Azure storage account name.
(See finding the storage name.)
com.unraveldata.azure.storage.adl.client-id.
X
An application ID. An application registration has to be created in the Azure Active Directory.
Optional
string
Azure application id.
(See finding the application Id.)
com.unraveldata.azure.storage.adl.client-key.
X
An application's "secret" (key) described in the ADL Gen1 client-id field.
Optional
string
Azure storage secret.
(See finding the secret (access key).)
com.unraveldata.azure.storage.adl.access-token-endpoint.
X
The OAUTH 2.0 Access Token Endpoint. It is obtained from the application registration tab on Azure portal.
Optional
string
Azure OAUTH 2.0 token endpoint
(See finding the OAUTH endpoint.)
You can either use the combination of account name and access key properties or you can use the combination of account name, tenant ID, and client ID properties to point Unravel to your Azure storage accounts.
Property/Description
Set by user
Unit
Default
com.unraveldata.azure.storage.abfs.account-name.
X
Name of the ABFS storage account that the HDInsight cluster uses.
You must define this property for each ABFS storage account.X. X=1 for the first storage account and then is incremented by one for each new account, that is, account numbers must be consecutive.
(See finding the storage name.)
Optional
string
Azure storage account name.
com.unraveldata.azure.storage.abfs.access-key.
X
The access key for the corresponding ABFS storage account.
(See finding the secret (access key).)
Optional
string
Azure storage account name.
Property/Description
Set by user
Unit
Default
com.unraveldata.azure.storage.abfs.account-name.
X
Name of the ABFS storage account that the HDInsight cluster uses.
You must define this property for each ABFS storage account.X. X=1 for the first storage account and then is incremented by one for each new account, that is, account numbers must be consecutive.
(See finding the storage name.)
Optional
string
Azure storage account name.
com.unraveldata.azure.storage.abfs.tenant.
X
Tenant ID.
This corresponds to fs.azure.account.oauth2.msi.tenant for account
X
Use the directory ID as the Tenant ID. (See finding the Directory ID)
(See finding the Tenant ID)
Required
string
Tenant ID of the managed identity.
com.unraveldata.azure.storage.abfs.client-id.
X
Client ID
This corresponds to fs.azure.account.oauth2.client.id for account
X
(See finding the Client ID)
Required
string
Client ID of the managed identity
7. Start Unravel services
sudo /etc/init.d/unravel_all.sh restart
8. Log into Unravel UI
Run the echo command to find the URL for Unravel UI.
If you're using an SSH tunnel or HTTP proxy, you might need to make adjustments to the host/IP of the URL:
echo "http://(hostname -f):3000/"
Create an SSH tunnel to access the Azure VM for Unravel's TCP port 3000.
ssh -i
ssh-private-key
ssh-user
@unravel-host
-L 3000:127.0.0.1:3000Using a supported web browser (see Unravel's Azure HDI compatibility matrix, navigate to
http://127.0.0.1:3000
and log in as useradmin
with passwordunraveldata
.