Home

Microsoft Azure Databricks

This topic explains how to deploy Unravel on Microsoft Azure Databricks walking you through the following procedures.

Verify you meet the prerequisites for installation of Azure Databricks

Create Azure components

Install Unravel

Configure and restart Unravel

Complete the installation

Uninstalling Unravel server and sensors on Azure Databricks.

  1. Select Create a Resource > Azure Database for MySQL. Click Create.

    4522-createMySQL-create.png
  2. In the Basics tab (default) enter the following.

    Project Details

    • Subscription: Choose the applicable subscription.

    • Resource group: Create a new group or choose an existing one.

    4522-createMySQL-projectDetails.png

    Server Details

    • Server name: Enter the MySQL server name.

    • Data Source: Select None.

    • Admin Username: Enter the MySQL admin name.

    • Password/Confirm Password: Enter Admin password.

    • Location: Select Azure region; it should be same region as the VM. (See Step 3 Create Azure VM, Instance Details.)

    • Version: Select 5.7.

    4522-createMySQL-ServerDetails.png
    • Compute + storage: Click Configure Server. Select Memory Optimized, Compute Generation - Gen 5, 4 vCores, General Purpose Storage of 100GB with Auto-growth enabled. Click OK.

    4522-createMySQL-ConfigureSize.png
  3. Click Review + Create.

    4522-createMySQL-ReviewCreate.png
  4. Select Go to Resource > Connection Security > Add existing virtual network, enter the following information, and then select Enable:

    • Subscription: Must be the same subscription as the VM. (See step 1 in Create VM.)

    • Virtual Network: Must be the same virtual network as the VM. (See step 7 in Create VM.)

    • Subnet: Create a new one if a default subnet doesn’t exist.

    4522-createMySQL-ConnectionSecurity.png
  5. Select Go to Resource > Connection Security > SSL settings, and change the following:

    • Enforce SSL connection: Select Disabled.

  6. Click Save.

  7. Select Server Parameters, change the following settings

    Name

    From

    To

    sort_buffer_size

    524288

    16777216 (32000000 and beyond or maximum allowed value)

    query_cache_size

    0

    67108864 (64000000 and beyond or maximum allowed value)

    max_connect_errors

    100

    2000000000 (2000000000 and beyond)

    character_set_server

    LATIN1

    UTF8

    innodb_file_per_table

    OFF

    ON

    innodb_thread_concurrency

    0

    20

    innodb_read_io_threads

    4

    16

    innodb_io_capacity

    200

    4000

    innodb_io_capacity_max

    2000

    4000

  8. Click Save.

Description

This topic explains how to configure Unravel for Databricks using the /usr/local/unravel/install_bin/databricks_setup.sh script on the Unravel server. This script deploys Unravel agent binaries and prints Databricks cluster configuration that is required for Unravel monitoring. Repeat this script for each workspace you want to initialize.

Important

Run this script as the same username that you used to install the Unravel server.

Syntax
usage:  
databricks_setup.sh --add-workspace -i <workspace-id> -n <workspace-name> -r <workspace-instance> -t <workspace-token> -u <unravel_server:port> [options]  databricks_setup.sh --print-spark-conf -u <unravel_server:port> [options]  databricks_setup.sh --help

Options:

Option

Description

--add-workspace | -a

Sets up or updates a Databricks workspace for monitoring by Unravel. Valid values:

-i

Id of the workspace to be configured.

-n

Workspace name.

-r

Workspace instance. Must start with https://.

For example, https://eastus.databricks.com

-t

Workspace access token.

-u

Unravel LR Endpoint. (e.g: 10.0.0.4:4043)

-p

(Optional) Workspace tier. Accepted values are - premium, standard. The default is premium.

-e

Enables/disables SSL for Databricks sensor and agent.

Valid values: true, false.

Default: false.

-c

(Optional) Enables/disables SSL connections to Unravel endpoints without certificates. This option is only in effect if -e is set to true.

Valid values: true, false.

Default: false.

-v

(Optional) Spark version to be used.

Default: 2.3.

-d

(Optional) Enables debug logs for Unravel Databricks sensor installation.

Default: false.

-m

(Optional) Specifies the frequency in seconds in which to poll cluster metrics.

Default: 30.

--print-spark-conf | -p

Print the minimal Spark configuration required to monitor cluster using Unravel.

-u

Unravel server URL.

For example, 0.0.0.1:4043

-e

Enables/disables SSL for Databricks sensor and agent.

Valid values: true, false.

Default: false.

-c

(Optional) Enables/disables SSL connections to Unravel endpoints without certificates. This option is only in effect if -e is set to true.

Valid values: true, false.

Default: false.

-v

(Optional) Spark version to be used. Default: 2.3.

--help | -h

Prints the usage of this script.

Note

If you generate new tokens, re-run this script to update Unravel Server.

Examples

To add/edit a workspace:

/usr/local/unravel/install_bin/databricks_setup.sh --add-workspace -i 1234567890 -n DemoWorkspace -t ***** -r https://eastus.azuredatabricks.net -u 10.1.2.3:4043 -p premium
Deleting directory - dbfs:/databricks/unravel/unravel-agent-pack-bin
Deleted directory successfully
Deleting directory - dbfs:/databricks/unravel/unravel-db-sensor-archive
Deleted directory successfully
Creating directory - dbfs:/databricks/unravel/logs
Created directory successfully
Creating directory - dbfs:/databricks/unravel/eventLogs
Created directory successfully
Copying /tmp/unravel_db.properties to dbfs:/databricks/unravel/unravel-db-sensor-archive/etc/unravel_db.properties
Copied file successfully
Copying /tmp/agent-pack to dbfs:/databricks/unravel/unravel-agent-pack-bin
Copied file successfully
Copying /tmp/sensor_pack to dbfs:/databricks/unravel/unravel-db-sensor-archive
Copied file successfully

-----------------------------------
Cluster Spark Configuration
-----------------------------------
spark.executor.extraJavaOptions -Dcom.unraveldata.client.rest.request.timeout.ms=1000 -Dcom.unraveldata.client.rest.conn.timeout.ms=1000 -javaagent:/dbfs/databricks/unravel/unravel-agent-pack-bin/btrace-agent.jar=config=executor,libs=spark-2.3
spark.eventLog.enabled true
spark.unravel.server.hostport 10.1.2.3:4043
spark.driver.extraJavaOptions -Dcom.unraveldata.client.rest.request.timeout.ms=1000 -Dcom.unraveldata.client.rest.conn.timeout.ms=1000 -javaagent:/dbfs/databricks/unravel/unravel-agent-pack-bin/btrace-agent.jar=config=driver,script=StreamingProbe.btclass,libs=spark-2.3
spark.eventLog.dir dbfs:/databricks/unravel/eventLogs/
spark.unravel.shutdown.delay.ms 300

-----------------------------------
Spark Submit Parameters
-----------------------------------
"--conf", "spark.executor.extraJavaOptions= -Dcom.unraveldata.client.rest.request.timeout.ms=1000 -Dcom.unraveldata.client.rest.conn.timeout.ms=1000 -javaagent:/dbfs/databricks/unravel/unravel-agent-pack-bin/btrace-agent.jar=config=executor,libs=spark-2.3",
"--conf", "spark.eventLog.enabled=true",
"--conf", "spark.unravel.server.hostport=10.1.2.3:4043",
"--conf", "spark.driver.extraJavaOptions= -Dcom.unraveldata.client.rest.request.timeout.ms=1000 -Dcom.unraveldata.client.rest.conn.timeout.ms=1000 -javaagent:/dbfs/databricks/unravel/unravel-agent-pack-bin/btrace-agent.jar=config=driver,script=StreamingProbe.btclass,libs=spark-2.3",
"--conf", "spark.eventLog.dir=dbfs:/databricks/unravel/event

To print the configuration:

/usr/local/unravel/install_bin/databricks_setup.sh -p -u 10.1.2.3:4043

-----------------------------------
Cluster Spark Configuration
-----------------------------------
spark.executor.extraJavaOptions -Dcom.unraveldata.client.rest.request.timeout.ms=1000 -Dcom.unraveldata.client.rest.conn.timeout.ms=1000 -javaagent:/dbfs/databricks/unravel/unravel-agent-pack-bin/btrace-agent.jar=config=executor,libs=spark-2.3
spark.eventLog.enabled true
spark.unravel.server.hostport 10.1.2.3:4043
spark.driver.extraJavaOptions -Dcom.unraveldata.client.rest.request.timeout.ms=1000 -Dcom.unraveldata.client.rest.conn.timeout.ms=1000 -javaagent:/dbfs/databricks/unravel/unravel-agent-pack-bin/btrace-agent.jar=config=driver,script=StreamingProbe.btclass,libs=spark-2.3
spark.eventLog.dir dbfs:/databricks/unravel/eventLogs/
spark.unravel.shutdown.delay.ms 300

-----------------------------------
Spark Submit Parameters
-----------------------------------
"--conf", "spark.executor.extraJavaOptions= -Dcom.unraveldata.client.rest.request.timeout.ms=1000 -Dcom.unraveldata.client.rest.conn.timeout.ms=1000 -javaagent:/dbfs/databricks/unravel/unravel-agent-pack-bin/btrace-agent.jar=config=executor,libs=spark-2.3",
"--conf", "spark.eventLog.enabled=true",
"--conf", "spark.unravel.server.hostport=10.1.2.3:4043",
"--conf", "spark.driver.extraJavaOptions= -Dcom.unraveldata.client.rest.request.timeout.ms=1000 -Dcom.unraveldata.client.rest.conn.timeout.ms=1000 -javaagent:/dbfs/databricks/unravel/unravel-agent-pack-bin/btrace-agent.jar=config=driver,script=StreamingProbe.btclass,libs=spark-2.3",
"--conf", "spark.eventLog.dir=dbfs:/databricks/unravel/eventLogs/",
"--conf", "spark.unravel.shutdown.delay.ms=300"

-----------------------------------
Databricks Cluster Init Script
-----------------------------------
dbfs:/databricks/unravel/unravel-db-sensor-archive/dbin/install-unravel.sh

Delete the Unravel installation location on DBFS for each workspace where Unravel is deployed using DBFS CLI:

dbfs rm -r dbfs:/databricks/unravel

For a list of Databricks workspaces configured, see /usr/local/unravel/etc/unravel.properties.