Home

Microsoft Azure Databricks (Manual)

This topic explains how to deploy Unravel on Microsoft Azure Databricks walking you through the following procedures.

Verify you meet the prerequisites for installation of Azure Databricks

Create Azure components

Install Unravel

Configure and restart Unravel

Configure Databricks with Unravel

For Reference:

Uninstalling Unravel server and sensors on Databricks

Installing Databricks sensors using Setup script

Prerequisites
Platform

Each version of Unravel has specific platform requirements. See to ​Unravel's ​​Azure Databricks​ compatibility matrix confirm your Azure Databricks platform meets the requirements for the version of Unravel that you're installing.

Sizing

Minimum VM type suggested: Medium memory optimized such as Standard_E8s_v3

Permissions
  • You must already have an Azure account.

  • You must already have a resource group assigned to a region in order to group your policies, VMs, and storage blobs/lakes/drives.

    A resource group is a container that holds related resources for an Azure solution. In Azure, you logically group related resources such as storage accounts, virtual networks, and virtual machines (VMs) to deploy, manage, and maintain them as a single entity.

  • You must have root privilege in order to perform some commands on the VM.

Network
  • Your virtual network and subnet(s) must be big enough to be shared by the Unravel VM and the target Databricks cluster(s).

  • You can use an existing virtual network or create a new one, but the virtual network must be in the same region and same subscription as the Azure Databricks workspace that you plan to create.

  • A CIDR range between /16 - /24 is required for the virtual network.

  • You must assign a public IP address to the Unravel Azure VM and open port 4043 for non-SSL and 4443 for unsecured SSL.

  • You must allow inbound SSH connections to the Unravel VM.

  • You must allow outbound Internet access and all traffic within the subnet (VSNET).

Architecture

In order to manage, monitor, and optimize the modern data applications running on your Databricks clusters, Unravel server needs data corresponding to the Databricks cluster as well as about the modern data apps running on the cluster. This information includes metrics, configuration information, and logs. Some of this data is pushed to Unravel, while some is pulled by the daemons in Unravel Server.

azure_databricks.png
Create Azure VM
  1. Sign in to the Azure portal.

  2. Select Virtual Machines > Add and enter the following information:

  3. In the Basics tab (default) enter the following.

    createVM-Basics.png

    Project Details

    • Subscription: Choose the applicable subscription.

    • Resource group: Create a new group or choose an existing one.

    4522-createVM-projectDetails.png

    Instance Details

    • Virtual Machine Name:The Unravel server name.

    • Region: Select the Azure region.

    • Availability Options: Select No infrastructure is redundancy required.

    • Image: Select the appropriate image. Both Centos-based 7.x+ and Red Hat Enterprise Linux 7.x+ are supported.

    • Size: Click Change Size. In the modal select Memory optimized image with at least 128 GB memory and Premium Disk support, for example, E16s_v3 in East US 2)

    createVM-instanceDetails.png

    Administrator account

    • Authentication type: Select password or SSH Key.

    • Username and Password: Enter your VM login information.

    createVM-AdminAccount.png

    Inbound Port Rules

    • Public inbound ports: Select Allow selected ports.

    • Selected Inbound ports: Select both HTTPS and SSH.

    createVM-InboundPortRules.png
  4. Click Next: Disks >.

    createVM-NextDisks.png
  5. In the Disks tab enter the following information:

    4522-createVM-disks.png

    Disk Options

    • OS disk type: Select Premium SSD.

    4522-createVM-disks-DiskOpt.png

    Data Disk

    • Click Create and attach a new disk.

      Note: This disk is formatted so don't choose Attach an existing disk.

      • Enter a Name.

      • Select Source typeNone (empty disk).

      • Set Size to at least 512 GiB.

      • Account type: Select premium SSD.

    4522-createVM-Disks-DataDisks.png
  6. Click Next: Networking >

    4522-createVM-DisksNextNetworking.png
  7. In the Networking tab enter the following information.:

    • Virtual network: Create new or choose an existing one.

    • Subnet: Create new or choose an existing one.

    • Public IP: Create new or choose an existing one.

    • Select Inbound ports: Select HTTPS and SSH.

    4522-createVM-Disks-NetworkInterface.png
  8. Click Review + create.

    4522-createVM-ReviewCreate.png
  9. Your deployment is now created.

    4522-createVM-Deploying.png
  10. Select Go to Resource > Networking > Inbound port rules > Add inbound port rule and include the following ports.

    Rule Name

    Destination

    Destination IP Address

    Destination Port Ranges

    Unravel_3000

    IP Addresses

    NIC Private IP

    3000

    Unravel_4043

    IP Addresses

    NIC Private IP

    4043

    4522-createVM-DeplymentNetworking.png
  11. Click OK.

Create Azure Database for MySQL
  1. Select Create a Resource > Azure Database for MySQL. Click Create.

    4522-createMySQL-create.png
  2. In the Basics tab (default) enter the following.

    Project Details

    • Subscription: Choose the applicable subscription.

    • Resource group: Create a new group or choose an existing one.

    4522-createMySQL-projectDetails.png

    Server Details

    • Server name: Enter the MySQL server name.

    • Data Source: Select None.

    • Admin Username: Enter the MySQL admin name.

    • Password/Confirm Password: Enter Admin password.

    • Location: Select Azure region; it should be same region as the VM. (See Step 3 Create Azure VM, Instance Details.)

    • Version: Select 5.7.

    4522-createMySQL-ServerDetails.png
    • Compute + storage: Click Configure Server. Select Memory Optimized, Compute Generation - Gen 5, 4 vCores, General Purpose Storage of 100GB with Auto-growth enabled. Click OK.

    4522-createMySQL-ConfigureSize.png
  3. Click Review + Create.

    4522-createMySQL-ReviewCreate.png
  4. Select Go to Resource > Connection Security > Add existing virtual network, enter the following information, and then select Enable:

    • Subscription: Must be the same subscription as the VM. (See step 1 in Create VM.)

    • Virtual Network: Must be the same virtual network as the VM. (See step 7 in Create VM.)

    • Subnet: Create a new one if a default subnet doesn’t exist.

    4522-createMySQL-ConnectionSecurity.png
  5. Select Go to Resource > Connection Security > SSL settings, and change the following:

    • Enforce SSL connection: Select Disabled.

  6. Click Save.

  7. Select Server Parameters, change the following settings

    Name

    From

    To

    sort_buffer_size

    524288

    16777216 (32000000 and beyond or maximum allowed value)

    query_cache_size

    0

    67108864 (64000000 and beyond or maximum allowed value)

    max_connect_errors

    100

    2000000000 (2000000000 and beyond)

    character_set_server

    LATIN1

    UTF8

    innodb_file_per_table

    OFF

    ON

    innodb_thread_concurrency

    0

    20

    innodb_read_io_threads

    4

    16

    innodb_io_capacity

    200

    4000

    innodb_io_capacity_max

    2000

    4000

  8. Click Save.

Create Azure Databricks
  1. Select Create a resource > Azure Databricks > Create. Go directly to step#3 if you already have workspaces.

  2. Select Workspace name, Subscription, Resource group, Location, and Pricing tier.

  3. Review VNET Peering options to connect Databricks with Unravel VM.

Install Unravel on Azure VM
Configure and restart Unravel
Configure Unravel with Azure MySQL
  1. Using MySQL, create a database and user for Unravel. Enter MySQL admin login password when prompted:

    mysql> CREATE DATABASE unravel_mysql_prod;
    mysql> CREATE USER 'Unravel database user'@'MySQL server name' IDENTIFIED BY 'Unravel database password';
    mysql> GRANT ALL PRIVILEGES ON unravel_mysql_prod.* TO 'Unravel database user'@'MySQL server name';
  2. Configure MySQL in /usr/local/unravel/etc/unravel.properties. Check Azure MySQL resource page for property values.

    unravel.jdbc.username=Unravel database user
    unravel.jdbc.password=Unravel database password
    unravel.jdbc.url=jdbc:mysql://MySQL Server name:3306/unravel_mysql_prod
    unravel.jdbc.url.params=useSSL=true&requireSSL=false
  3. Install MySQL JDBC connector driver in Unravel classpath.

    wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.47.tar.gz -O /tmp/mysql-connector-java-5.1.47.tar.gz
    
    cd /tmp
    tar xvzf /tmp/mysql-connector-java-5.1.47.tar.gz
    sudo mkdir -p /usr/local/unravel/share/java
    
    sudo cp /tmp/mysql-connector-java-5.1.47/mysql-connector-java-5.1.47.jar /usr/local/unravel/share/java
    
    sudo cp /tmp/mysql-connector-java-5.1.47/mysql-connector-java-5.1.47.jar /usr/local/unravel/dlib/unravel
  4. Create database and tables for Unravel.

    /usr/local/unravel/dbin/db_schema_upgrade.sh
Configure Unravel with Workspace
  1. Go to Workspace> Admin Console> Access Control and enable Personal Access Tokens. See Enable token-based authentication.

  2. Go to Workspace> User Settings> Access Tokens and click Generate New Token. See Authenticate using Databricks personal access tokens. Choose the lifetime of the token as indefinite.

  3. Install Unravel agents on the Workspace and update Unravel config with the Workspace details. See Running the Databricks_setup.sh script

    Note

    Run the following commands only if the Databricks command-line tool is installed using Python virtual environment:

    sudo bash
    virtualenv -p /usr/bin/python3 mypy3
    source mypy3/bin/activate
    usr/local/unravel/install_bin/databricks_setup.sh --add-workspace -i <Workspace id> -n <Workspace name> -t <Workspace token> -r https://<Workspace location>.azuredatabricks.net -p <Workspace_tier> -u <Unravel DNS or IP address>:4043
Restart Unravel
  1. Restart all Unravel services

    service unravel_all.sh restart
  2. Using a supported web browser, (see ​Unravel's ​​Azure Databricks​ compatibility matrix), navigate to http://unravel-host:3000 and log in with username admin with password unraveldata.

    signin.png
Configure Databricks with Unravel

On every Cluster that you want to monitor (Automated/Interactive), update the following sections under Advanced Options:

  • Spark/SparkConfig: copy the following snippet to Spark > Spark Conf. Replace <Unravel VM IP> as required. This snippet is also generated by the data bricks setup script on Unravel.

    spark.eventLog.enabled true
    spark.eventLog.dir dbfs:/databricks/unravel/eventLogs/
    spark.unravel.server.hostport <Unravel DNS or IP Address>:4043
    spark.unravel.shutdown.delay.ms 300
    spark.executor.extraJavaOptions  -Dcom.unraveldata.client.rest.request.timeout.ms=1000 -Dcom.unraveldata.client.rest.conn.timeout.ms=1000 -javaagent:/dbfs/databricks/unravel/unravel-agent-pack-bin/btrace-agent.jar=config=executor,libs=spark-2.3
    spark.driver.extraJavaOptions  -Dcom.unraveldata.client.rest.request.timeout.ms=1000 -Dcom.unraveldata.client.rest.conn.timeout.ms=1000 -javaagent:/dbfs/databricks/unravel/unravel-agent-pack-bin/btrace-agent.jar=config=driver,script=StreamingProbe.btclass,libs=spark-2.3
    

    Note

    For spark-submit jobs, you must use the spark-submit parameters as shown in the following snippet:

    "--conf", "spark.eventLog.enabled=true",
    "--conf", "spark.eventLog.dir=dbfs:/databricks/unravel/eventLogs/",
    "--conf", "spark.unravel.shutdown.delay.ms=300",
    "--conf", "spark.unravel.server.hostport=<Unravel DNS or IP Address>:4043",
    "--conf", "spark.executor.extraJavaOptions= -Dcom.unraveldata.client.rest.request.timeout.ms=1000 -Dcom.unraveldata.client.rest.conn.timeout.ms=1000 -javaagent:/dbfs/databricks/unravel/unravel-agent-pack-bin/btrace-agent.jar=config=executor,libs=spark-2.3",
    "--conf", "spark.driver.extraJavaOptions= -Dcom.unraveldata.client.rest.request.timeout.ms=1000 -Dcom.unraveldata.client.rest.conn.timeout.ms=1000 -javaagent:/dbfs/databricks/unravel/unravel-agent-pack-bin/btrace-agent.jar=config=driver,script=StreamingProbe.btclass,libs=spark-2.3"
    
  • Logging: Select DBFS as Destination, and copy the following as Cluster Log Path.

    dbfs:/cluster-logs/
  • Init Scripts: Select DBFS as Destination, and copy the following as Init Script Path and then click Add.

    dbfs:/databricks/unravel/unravel-db-sensor-archive/dbin/install-unravel.sh
Installing Databricks sensors using Setup script
Description

This topic explains how to configure Unravel for Databricks using the /usr/local/unravel/install_bin/databricks_setup.sh script on the Unravel server. This script deploys Unravel agent binaries and prints Databricks cluster configuration that is required for Unravel monitoring. Repeat this script for each workspace you want to initialize.

Important

Run this script as the same username that you used to install the Unravel server.

Syntax
usage:  
databricks_setup.sh --add-workspace -i <workspace-id> -n <workspace-name> -r <workspace-instance> -t <workspace-token> -u <unravel_server:port> [options]  databricks_setup.sh --print-spark-conf -u <unravel_server:port> [options]  databricks_setup.sh --help

Options:

Option

Description

--add-workspace | -a

Sets up or updates a Databricks workspace for monitoring by Unravel. Valid values:

-i

Id of the workspace to be configured.

-n

Workspace name.

-r

Workspace instance. Must start with https://.

For example, https://eastus.databricks.com

-t

Workspace access token.

-u

Unravel LR Endpoint. (e.g: 10.0.0.4:4043)

-p

(Optional) Workspace tier. Accepted values are - premium, standard. The default is premium.

-e

Enables/disables SSL for Databricks sensor and agent.

Valid values: true, false.

Default: false.

-c

(Optional) Enables/disables SSL connections to Unravel endpoints without certificates. This option is only in effect if -e is set to true.

Valid values: true, false.

Default: false.

-v

(Optional) Spark version to be used.

Default: 2.3.

-d

(Optional) Enables debug logs for Unravel Databricks sensor installation.

Default: false.

-m

(Optional) Specifies the frequency in seconds in which to poll cluster metrics.

Default: 30.

--print-spark-conf | -p

Print the minimal Spark configuration required to monitor cluster using Unravel.

-u

Unravel server URL.

For example, 0.0.0.1:4043

-e

Enables/disables SSL for Databricks sensor and agent.

Valid values: true, false.

Default: false.

-c

(Optional) Enables/disables SSL connections to Unravel endpoints without certificates. This option is only in effect if -e is set to true.

Valid values: true, false.

Default: false.

-v

(Optional) Spark version to be used. Default: 2.3.

--help | -h

Prints the usage of this script.

Note

If you generate new tokens, re-run this script to update Unravel Server.

Examples

To add/edit a workspace:

/usr/local/unravel/install_bin/databricks_setup.sh --add-workspace -i 1234567890 -n DemoWorkspace -t ***** -r https://eastus.azuredatabricks.net -u 10.1.2.3:4043 -p premium
Deleting directory - dbfs:/databricks/unravel/unravel-agent-pack-bin
Deleted directory successfully
Deleting directory - dbfs:/databricks/unravel/unravel-db-sensor-archive
Deleted directory successfully
Creating directory - dbfs:/databricks/unravel/logs
Created directory successfully
Creating directory - dbfs:/databricks/unravel/eventLogs
Created directory successfully
Copying /tmp/unravel_db.properties to dbfs:/databricks/unravel/unravel-db-sensor-archive/etc/unravel_db.properties
Copied file successfully
Copying /tmp/agent-pack to dbfs:/databricks/unravel/unravel-agent-pack-bin
Copied file successfully
Copying /tmp/sensor_pack to dbfs:/databricks/unravel/unravel-db-sensor-archive
Copied file successfully

-----------------------------------
Cluster Spark Configuration
-----------------------------------
spark.executor.extraJavaOptions -Dcom.unraveldata.client.rest.request.timeout.ms=1000 -Dcom.unraveldata.client.rest.conn.timeout.ms=1000 -javaagent:/dbfs/databricks/unravel/unravel-agent-pack-bin/btrace-agent.jar=config=executor,libs=spark-2.3
spark.eventLog.enabled true
spark.unravel.server.hostport 10.1.2.3:4043
spark.driver.extraJavaOptions -Dcom.unraveldata.client.rest.request.timeout.ms=1000 -Dcom.unraveldata.client.rest.conn.timeout.ms=1000 -javaagent:/dbfs/databricks/unravel/unravel-agent-pack-bin/btrace-agent.jar=config=driver,script=StreamingProbe.btclass,libs=spark-2.3
spark.eventLog.dir dbfs:/databricks/unravel/eventLogs/
spark.unravel.shutdown.delay.ms 300

-----------------------------------
Spark Submit Parameters
-----------------------------------
"--conf", "spark.executor.extraJavaOptions= -Dcom.unraveldata.client.rest.request.timeout.ms=1000 -Dcom.unraveldata.client.rest.conn.timeout.ms=1000 -javaagent:/dbfs/databricks/unravel/unravel-agent-pack-bin/btrace-agent.jar=config=executor,libs=spark-2.3",
"--conf", "spark.eventLog.enabled=true",
"--conf", "spark.unravel.server.hostport=10.1.2.3:4043",
"--conf", "spark.driver.extraJavaOptions= -Dcom.unraveldata.client.rest.request.timeout.ms=1000 -Dcom.unraveldata.client.rest.conn.timeout.ms=1000 -javaagent:/dbfs/databricks/unravel/unravel-agent-pack-bin/btrace-agent.jar=config=driver,script=StreamingProbe.btclass,libs=spark-2.3",
"--conf", "spark.eventLog.dir=dbfs:/databricks/unravel/event

To print the configuration:

/usr/local/unravel/install_bin/databricks_setup.sh -p -u 10.1.2.3:4043

-----------------------------------
Cluster Spark Configuration
-----------------------------------
spark.executor.extraJavaOptions -Dcom.unraveldata.client.rest.request.timeout.ms=1000 -Dcom.unraveldata.client.rest.conn.timeout.ms=1000 -javaagent:/dbfs/databricks/unravel/unravel-agent-pack-bin/btrace-agent.jar=config=executor,libs=spark-2.3
spark.eventLog.enabled true
spark.unravel.server.hostport 10.1.2.3:4043
spark.driver.extraJavaOptions -Dcom.unraveldata.client.rest.request.timeout.ms=1000 -Dcom.unraveldata.client.rest.conn.timeout.ms=1000 -javaagent:/dbfs/databricks/unravel/unravel-agent-pack-bin/btrace-agent.jar=config=driver,script=StreamingProbe.btclass,libs=spark-2.3
spark.eventLog.dir dbfs:/databricks/unravel/eventLogs/
spark.unravel.shutdown.delay.ms 300

-----------------------------------
Spark Submit Parameters
-----------------------------------
"--conf", "spark.executor.extraJavaOptions= -Dcom.unraveldata.client.rest.request.timeout.ms=1000 -Dcom.unraveldata.client.rest.conn.timeout.ms=1000 -javaagent:/dbfs/databricks/unravel/unravel-agent-pack-bin/btrace-agent.jar=config=executor,libs=spark-2.3",
"--conf", "spark.eventLog.enabled=true",
"--conf", "spark.unravel.server.hostport=10.1.2.3:4043",
"--conf", "spark.driver.extraJavaOptions= -Dcom.unraveldata.client.rest.request.timeout.ms=1000 -Dcom.unraveldata.client.rest.conn.timeout.ms=1000 -javaagent:/dbfs/databricks/unravel/unravel-agent-pack-bin/btrace-agent.jar=config=driver,script=StreamingProbe.btclass,libs=spark-2.3",
"--conf", "spark.eventLog.dir=dbfs:/databricks/unravel/eventLogs/",
"--conf", "spark.unravel.shutdown.delay.ms=300"

-----------------------------------
Databricks Cluster Init Script
-----------------------------------
dbfs:/databricks/unravel/unravel-db-sensor-archive/dbin/install-unravel.sh
Uninstalling Unravel Server and Sensors on Azure Databricks

Delete the Unravel installation location on DBFS for each workspace where Unravel is deployed using DBFS CLI:

dbfs rm -r dbfs:/databricks/unravel

For a list of Databricks workspaces configured, see /usr/local/unravel/etc/unravel.properties.