- Home
- Deploy Unravel for Databricks
- Other installation options
- Cloud installation
- Microsoft Azure Databricks
Microsoft Azure Databricks
Before installing Unravel in Azure Databricks, check and ensure that the Unravel installation requirements are completed and follow the below instructions to install and configure Unravel:
1. Create Unravel VM and Azure Databricks resource
4. Install Unravel either with Interactive Precheck or manually
5. Configure Unravel Log Receiver
6. Connect Databricks cluster to Unravel
1. Create Unravel VM and Azure Databricks resource
Sign in to the Azure portal.
Select Virtual Machines > Add and enter the following information in the Basics tab:
Project Details
Subscription
Choose the applicable subscription.
Resource group
Create a new resource group or choose an existing one.
Instance Details
Virtual Machine Name:
The Unravel server name.
Region:
Select the Azure region.
Availability Options
Select
No infrastructure redundancy required
.Image
Select the appropriate image. Both
Centos-based 7.x
+ andRed Hat Enterprise Linux 7.x
+ are supported.Size
Click Change Size. In the modal, select Memory-optimized image with at least
128 GB memory
andPremium Disk support
, for example, E16s_v3 in East US 2)Administrator account
Authentication type
Select
password
orSSH Key
.Username and Password
Enter your VM login information.
Inbound Port Rules
Public inbound ports
Select Allow selected ports.
Selected Inbound ports
Select both
HTTPS
andSSH
.Click Next: Disks > and enter the following information in the Disks tab.
Disk Options
OS disk type: Select
Premium SSD
.
Data Disk
Click Create and attach a new disk.
Caution
This disk is formatted, so do not choose the Attach an existing disk option.
Enter a Name.
Select Source type
None (empty disk)
.Set Size to at least 512 GiB.
Account type: Select
premium SSD
.
Click Next: Networking > and enter the following information:
Virtual network: Create a new one or choose an existing one.
Subnet: Create a new or choose an existing one.
Public IP: Create a new one or choose an existing one.
Select Inbound ports: Select
HTTPS
andSSH
.
Click Review + create. Your deployment is now created.
Select Go to Resource > Networking > Inbound port rules > Add inbound port rule and include the following ports.
Name
Destination
Destination IP Address
Destination Port Ranges
Unravel_3000
IP Addresses
NIC Private IP
3000
Unravel_443
IP Addresses
NIC Private IP
443
Unravel_4043
IP Addresses
NIC Private IP
4043
Unravel_4443
IP Addresses
NIC Private IP
4443
Click OK.
Select Create a resource > Azure Databricks > Create. Go directly to step#3 if you already have workspaces.
Select Workspace name, Subscription, Resource group, Location, and Pricing tier.
Review VNET Peering options to connect Databricks with Unravel VM.
Databricks Workspace
Unravel VM
Region
VNET peering option
Deployed in the same VNET as Azure Workspace.
-
-
Deployed in different VNET from Azure Workspace.
Any Azure region
Create VNET Peering between the two VNETs
Any Azure region
Create VNET Peering between the two VNETs
2. Download Unravel
Important
Before you download, Unravel for your platform, ensure to get the username and password from Unravel Support.
Go to the Downloads section for the complete list of Unravel product downloads.
Click the Unravel version that you want to download.
Run the commands provided to download Unravel version of your choice. You can download Unravel TAR or RPM package.
3. Deploy Unravel
Unravel binaries are available as a TAR file or RPM package. You can deploy the Unravel binaries in any directory on the server. However, the user who installs Unravel must have the write permissions to the directory where the Unravel binaries are deployed.
After you extract the contents of the TAR file or RPM package, unravel
directory is created within the installation directory (<unravel_installation_directory
>), and Unravel will be available in <Unravel_installation_directory>/unravel. The directory layout will be unravel/versions/<Directories and files>
.
The following steps to deploy Unravel from a TAR file should be performed by a user, who will run Unravel.
Create an Installation directory.
mkdir
</path/to/installation/directory>
For example: mkdir /opt/
Extract Unravel tar file to the installation directory, which you have created in the first step. After you extract the contents of the TAR file,
unravel
directory is created within the installation directory.tar zxf unravel-
<version>
tar.gz -C</path/to/installation/directory>
For example: tar zxf unravel-4.7.x.x.tar.gz -C /opt
The unravel directory will be available within
/opt
Grant ownership of the directory to a user who will run Unravel.
chown -R username:groupname
</path/to/installation/directory>
For example: chown -R unravel:unravelgroup /opt/unravel
Important
The following steps to deploy Unravel from an RPM package should be performed by a root user. After the RPM package is deployed, the remaining installation procedures should be performed by the Unravel user.
Create an installation directory.
mkdir
</path/to/installation/directory>
For example: mkdir /usr/local/unravel
Run the following command:
rpm -i unravel-
<version>
.rpmFor example: rpm -i unravel-4.7.x.x.rpm
The unravel directory will be available in
/usr/local
If you want to provide a different location, use the --prefix command.
For example:
mkdir /opt/unravel rpm -i unravel-4.7.x.x.rpm --prefix /opt
The unravel directory will be available in
/opt
.Grant ownership of the directory to a user who will run Unravel. This user executes all the processes involved in Unravel installation.
chown -R
username
:groupname
</path/to/installation/directory>
For example:chown -R unravel:unravelgroup /usr/local/unravel
Continue with the installation procedures as unravel user.
4. Install Unravel
You can install Unravel either with Interactive Precheck or manually without Interactive Precheck.
Note
Unravel recommends installation with Interactive Precheck.
To install Unravel with Interactive precheck, you must run the Interactive Precheck utility to generate a bootstrap configuration file for installation, configure the Unravel log receiver, and then connect the Databricks cluster to Unravel.
The Interactive Precheck utility validates the required configurations before installing Unravel. When you run the Interactive Precheck utility, various checks are prompted to gather configuration information. The responses you provide for these checks generate a bootstrap configuration file. This file, which contains the configuration information, is then used to install Unravel. Using the Interactive Precheck utility
Do the following to install and configure Unravel with Interactive Precheck.
After you download and deploy the Unravel, run the
precheck.sh
script fromunravel/versions/X.Y.Z/healthcheck/
.For example:
/opt/unravel/versions/X.Y.Z/healthcheck/precheck.sh
Enter the necessary details when you are prompted for the following configuration information:
This section covers general information about your Unravel install. You are prompted for the following:
Select the Data platform you want to monitor.
You must answer the following prompts:
-- General information Which data platform are you installing for? 1- Hadoop 2- EMR 3- HDI 4- Databricks 5- Dataproc 6- BigQuery Select one of the above [Hadoop]: ## You can choose a number corresponding to the platform.
This check allows you to configure database-related information and an external database for Unravel.
-- Database configuration Configure an external database? (y/n) [No]:
If you answer No, an Unravel-managed database is used for the installation.
If you answer Yes, you are further prompted for the type of external database that you want to configure.
-- Database configuration Configure an external database? (y/n) [No]: y Type 1- PostgresQL 2- MySQL 3- MariaDB
If you choose a specific type of external database, you are prompted for the following database information and test connectivity to that database. Refer to Integrating Database for more details. For example: Integrate database (Cloud)
-- Database configuration Configure an external database? (y/n) [No]: y Type 1- PostgresQL 2- MySQL 3- MariaDB Select one of the above []: 1 Selected: PostgresQL Database hostname [None]: Database port (integer) [None]: Database schema [None]: Does the database use TLS (y/n) [No]: Database username [None]: Database password [None] (no echo): Do you wish to test connecting to the external database? (y/n) [Yes]:-- Database configuration Configure an external database? (y/n) [No]:
If you choose MySQL or MariaDB database you are further prompted for extra packages. If you answer Yes, the Extra packages section searches for the required JDBC drivers.
-- Database configuration Will Unravel connect to a MySQL or MariaDB database (ex: hive metastore) ? (y/n) [No]:
The license option prompts you to set the Unravel license. Specify the license file location.
--License Do you want to set the unravel license now? (y/n) [Yes]: License file location (ex: /path/to/unravel.license) [/home/unravel/valid.lic]: License: Ok
If the license file is valid, the
OK
message is displayed.If the license file is expired, the license expired error is displayed with the license expiry date and timestamp.
If the license file content is incorrect, the invalid license error is displayed with the appropriate error message. You can fix the error in the license file and then set the license.
To resolve the errors, see Licensing error messages and troubleshooting.
Note
If the filename is not provided, the command prompts for the license information. You can provide either a path to the license file or the content of the license file.
Sample content of the license file:
##### BEGIN UNRAVEL LICENSE Licensee : ACME Disintegrating Pistol Manufacturing Valid from : 2022-12-16 00:00:00 UTC Expire after : 2023-10-16 23:59:00 UTC License type : Enterprise Licensed number of nodes : 1000000 Signature : c2Uvb2JqLnRhcmdldC92OF9pbml0aWFsaXplcnMvZ2VuL3RvcnF1ZS Revision : 1 ##### END UNRAVEL LICENSE #####
The Extra packages check shows if you use Unravel-managed MySQL/MariaDB or need JDBC drivers. Else, this check is automatically skipped.
-- Extra package location *** JDBC drivers are required for Unravel managed MySQL or MariaDB. *** Database software package is required for Unravel managed MySQL or MariaDB. External package location [None]: /<my-extra-packages> ##This is the path to the directory where the required packages are located.
If the required packages are located, then the following message is shown:
The following packages will be installed: Database server: /my-extra-packages/mysql-5.7.27-linux-glibc2.12-x86_64.tar.gz JDBC driver: - /my-extra-packages/mysql-connector-java-5.1.48.tar.gz External package: Ok
If the required packages are not found, then the following error message is showing:
External package: ERROR - ERROR: Couldn't find jdbc drivers in /my-extra-packages - ERROR: Looked for: mysql-connector-java-*.tar.gz mysql-connector-java-*.jar mariadb-java-client-*.jar - ERROR: Couldn't find database server package in /my-extra-packages - ERROR: Looked for: mysql-*-linux-glibc2.12-x86_64.tar.gz mariadb-*-linux-x86_64.tar.gz
This check allows you to add root and intermediate certificates to validate the trust chain to establish the connection using TLS. Currently, only
pem
files are supported.-- TLS certificate trustchain Add trusted certificates? (y/n) [No]: ##If you answer “No”, certificates will not be added. ##If you answer “Yes”, All the certificates found at the specified location are imported. Wildcards can be used.
This check allows you to configure and test HTTPS for the unravel UI. This check prompts you for the certificate, key, password, and hostname details used to access Unravel.
Use HTTPS to access unravel? (y/n) [Yes]: ##If you answer “Yes”, you are prompted for the path to the certificate and key. Unravel uses this information to configure TLS during installation. ##If you answer “No”, you are shown a warning message for confirmation.
The information provided is verified for the following:
If the Key and Certificate match
If the certificate is valid
If the certificate applies for the provided hostname
This check allows you to set the Unravel UI port and verify the connectivity.
-- Unravel default port Port number (integer) [3000]: Do you want to test if the port is accessible? (y/n) [Yes]: This will open port 3000 and listen for connection for 120 seconds. Use your browser to test if the Unravel UI will be accessible on that port. We have detected the following hostnames: - some.host.example Browse to: http://some.host.example:3000 ATTENTION: This address is an example. You should test with the URL that will be used to access Unravel.
A connection on port 3000 is tried and established. If the connection is successful,
Unravel Port Test: OK
is shown on the browser, andUnravel port: Reached
is shown on the server.This check allows you to set a custom data directory and verify the access if the directories exist. You will always find the software location where you deploy the Unravel binaries. In this check, only the space and access are tested. That data location that you have configured will be used.
-- Unravel directories Software [/opt/unravel]: Data [/opt/unravel/data]: Directories: ERROR - OK: 33 GB of free disk space for software. - ERROR: SYSH0026: Space for data 33 GB is low, recommended minimum is 100 GB.
This check allows you to configure and test email. You are prompted for host and credentials, and the following items are tested:
Connectivity
Authentication, only if provided.
Optional: Send test mail.
Following is a sample:
-- Mail server (SMTP) configuration Unravel can send notification and alert emails. This will allow you to configure and test connection to a SMTP server. Optionally, it can also send a test email. You will have to provide: - Protocol, hostname and port - Credentials if required Configure a SMTP server? (y/n) [No]: y SMTP hostname [None]: smtphostname.gmail.com SMTP port (usually 25 for clear text, 465 for SSL, 587 for STARTLS) (integer) [None]: 587 Security protocol 1- None 2- SSL 3- StartTLS Select one of the above [None]: 3 Selected: StartTLS Authentication required? (y/n) [Yes]: y Username [None]: daemon@unraveldata.com Password [None] (no echo): From [None]: daemon@unraveldata.com To [None]: user@unraveldata.com Send test email (y/n) [No]: y
This option allows you to run the full precheck using some of the provided information. Additional tests like user limits, CPU, and memory are run this way.
-- Full precheck
Note
For more information, refer to Using the Interactive Precheck utility.Using the Interactive Precheck utility
The responses that you have provided for the configuration information are used to generate a configuration file. You can use this configuration file when you run the setup command to install and configure Unravel.
After you have completed the responses, you are prompted to confirm if you want to generate the bootstrap configuration file. Press ENTER if you want to generate the bootstrap configuration file.
-- Unravel bootstrap configuration Generate a unravel bootstrap configuration file? (y/n) [Yes]:
The bootstrap configuration file is generated and located at
$HOME/unravel-interactive-precheck/unravel-bootstrap.yaml
.Install Unravel with the bootstrap configuration file.
<unravel_installation_directory>
/unravel/versions/<Unravel version>/setup --bootstrap $HOME/unravel-interactive-precheck/unravel-bootstrap.yamlApply the changes.
<Unravel installation directory>
/unravel/manager config applyStart all the services.
<unravel_installation_directory>
/unravel/manager startCheck the status of services.
<unravel_installation_directory>
/unravel/manager reportThe following service statuses are reported:
OK: Service is up and running.
Not Monitored: Service is not running. (Has stopped or has failed to start)
Initializing: Services are starting up.
Does not exist: The process unexpectedly disappeared. A restart will be attempted ten times.
You can also get the status and information for a specific service. Run the manager report command as follows:
<unravel_installation_directory>
/unravel/manager report<service>
For example: /opt/unravel/manager report auto_action
You can run the setup command to install Unravel manually.
The setup command allows you to do the following:
Runs Precheck automatically to detect possible issues that prevent a successful installation. Suggestions are provided to resolve issues. Refer to Precheck filters for the expected value for each filter.
Let you run extra parameters to integrate the database of your choice.
The setup command allows you to use a managed database shipped with Unravel or an external database. When you run the setup command without additional parameters, the Unravel managed PostgreSQL database is used. Otherwise, you can specify any of the following databases, which is supported by Unravel, with the setup command:
MySQL (Unravel managed as well as external MySQL database)
MariaDB (Unravel managed as well as external MariaDB database)
PostgreSQL (External PostgreSQL)
Refer to Integrate database for details. Integrating database
Let you specify a separate path for the data directory other than the default path.
The Unravel data and configurations are located in the
data
directory. By default, the installer maintains the data directory under<Unravel installation directory>/data
. You can also change thedata
directory's default location by running additional parameters with the setup command. To install Unravel with the setup command.Provides more options for setup.
To install Unravel with the setup command, do the following:
After deploying the binaries, if you are the root user, switch to Unravel user.
su -
<unravel user>
Notice
Only the Unravel user who owns the installation directory should run the setup command to install Unravel.
Run setup command with any of the following databases (PostgreSQL, MySQL, MariaDB). Refer to setup options for all the additional parameters that you can run with the setup command.
Tip
Run --help with the setup command and any combination of the setup command for complete usage details.
<unravel_installation_directory>
/unravel/versions/<Unravel version>
/setup --helpRefer to Integrate database topic and complete the prerequisites before running the setup command with any database other than Unravel managed PostgreSQL, which is shipped with the product. Extra parameters must be passed with the setup command when you use another database.Integrate database (Cloud)
Optionally, if you want to provide a different data directory, you can pass an extra parameter (--data-directory) with the setup command as shown below:
<unravel_installation_directory>
/unravel/versions/<Unravel version>
/setup --enable-databricks --data-directory/the/data/directory
Similarly, you can configure separate directories for other unravel directories—contact support for assistance.
Unravel managed PostgreSQL
<unravel_installation_directory>
/unravel/versions/<Unravel version>
/setup --enable-databricksNotice
If you are using Unravel managed PostgreSQL database, and the Hive metastore is using MySQL, refer Set up Unravel Managed PostgreSQL for Hive metastore with MySQLIntegrate database (Cloud)
External PostgreSQL
<unravel_installation_directory>
/unravel/versions/<Unravel version>
/setup --enable-databricks --external-database postgresqlFor example: /opt/unravel/versions/abcd.992/setup --enable-databricks --external-database postgresql
The details required for the following are prompted on the screen:
Host: Port: Schema: Username: Password:
For example:
Host:xyz.unraveldata.com Port:5432 Schema:unravel_db_prod Username:unravel Password:unraveldata
Note
The
HOST
,PORT
,SCHEMA
,USERNAME
, andPASSWORD
are optional fields and are prompted if missing.
Unravel managed MySQL
<unravel_installation_directory>
/unravel/versions/<Unravel version>
/setup --enable-databricks --extra /tmp/mysqlExternal MySQL
<unravel_installation_directory>
/unravel/versions/<Unravel version>
/setup --enable-databricks --extra /tmp/<MySQL-directory> --external-database mysqlThe details required for the following are prompted on the screen:
Host: Port: Schema: Username: Password:
For example:
Host:xyz.unraveldata.com Port:5432 Schema:unravel_db_prod Username:unravel Password:unraveldata
Note
The
HOST
,PORT
,SCHEMA
,USERNAME
, andPASSWORD
are optional fields and are prompted if missing.
Unravel managed MariaDB
<unravel_installation_directory>
/unravel/versions/<Unravel version>
/setup --enable-databricks --extra /tmp/mariadbExternal MariaDB
<unravel_installation_directory>
/unravel/versions/<Unravel version>
/setup --enable-databricks --extra /tmp/<MariaDB-directory>
--external-database mariadbThe details required for the following are prompted on the screen:
Host: Port: Schema: Username: Password:
For example:
Host:xyz.unraveldata.com Port:5432 Schema:unravel_db_prod Username:unravel Password:unraveldata
Note
The
HOST
,PORT
,SCHEMA
,USERNAME
,andPASSWORD
are optional fields and are prompted if missing.
When you run the setup command, the Precheck utility, which identifies the issues that prevent a successful installation, is automatically run. Refer to Precheck filters list to view details of each item in the precheck run output.
The Precheck output displays the issues that prevent a successful installation and provides suggestions to resolve them. You must resolve each of the issues before proceeding. After the prechecks are resolved, you must re-login or reload the shell to execute the setup command again.
Note
In certain situations, you can skip the precheck using the setup --skip-precheck.
For example:
/opt/unravel/versions/
<Unravel version>
/setup --cluster-access abc1011.p2g.net.eu.xyz --skip-precheckYou can also skip the checks that you know can fail. For example, if you want to skip the Check limits option and check_network_ports, run the setup command as follows:
setup --filter-precheck ~check_limits,~check_network_ports
Precheck filtersFilters
Description
Expected Value
System
Check uptime
Verifies the period since the last server reboot.
>24h
Clock sync
Verifies if the clock synchronization service is running on the server.
The clock synchronization service is up and running.
CPU requirement
Verifies if the server has enough CPUs to run Unravel efficiently.
Check requirements.
Memory requirement
Verifies that the server has enough memory to run Unravel efficiently.
Check requirements.
Disk access
Verifies that the user who runs unravel has access to the configured disk locations.
Unravel users can access the configured disk locations.
Disk Freespace
Verifies if the disk locations have enough free space.
Check requirements
Kerberos tools
Verifies that the Kerberos tools are available on the server to support kerberized environments.
Kerberos tools are installed.
Network ports
Verifies that the network ports used by Unravel are available.
Check requirements
OS libraries
Verifies the required libraries if run with Unravel managed MySQL.
The following packages must be installed for fulfilling the OS level requirements for MySQL:
numactl-libs
(for libnuma.so)libaio
(for libaio.so)
OS release
Verifies that the OS distribution is supported.
Check compatibility matrix
OS settings
Verifies vm.max_map_count recommended.
Check requirements
SELinux
Verifies if the SELinux status is enabled or not and provides in which mode it is(Permissive, Disabled, Enforcing).
Check product documentation.
Check limits
Verifies that user limits are set to values. Verify the following:
nofile
nproc
vm.max_map_count
memlock
The following values have to be set::
soft nofile should be set to 1048576
hard nofile should be set to 1048576
soft nproc should be set to unlimited
hard nproc should be set to unlimited
vm.max_map_count should be set to 262144
memlock should be set to unlimited
Healthcheck report bundle
Healthcheck report tarball. This report provides the summary and information gathered by the healthcheck with the location.
setup Options
Description
-h, --help
Shows help for setup.
--config CONFIG
Specify a different path to the configuration file.
<unravel_installation_directory>
/unravel/versions/<Unravel version>
/setup --configpath/to/config/directory
--enable-core
Enables core node support for non-Hadoop clusters.
--cluster-access
(Edge node parameter)
Enables cluster access to the core node in a multi-cluster environment.
--data-forwarder host:port cluster-type cluster-id
Data forwarder, main unravel node.
--data-directory
Specify a different path to the data directory.
--external-database [param [param ...]]
Enable external database.
--external-database-ssl
Enable external database with SSL.
--log-file
Setup log file location. Default is
/tmp/unravel-setup-YYYYMMDD-HHMMSS.log
.--extra DIR, -e DIR
Specify extra packages location.
--precheck
Run the preinstallation check.
Following is a sample of the setup command run result:
/opt/unravel/versions/abcd.1004/setup 2021-04-05 15:51:30 Sending logs to: /tmp/unravel-setup-20210405-155130.log 2021-04-05 15:51:30 Running preinstallation check... 2021-04-05 15:51:31 Gathering information ................. Ok 2021-04-05 15:51:51 Running checks .................. Ok -------------------------------------------------------------------------------- system Check limits : PASSED Clock sync : PASSED CPU requirement : PASSED, Available cores: 8 cores Disk access : PASSED, /opt/unravel/versions/abcd.1004/healthcheck/healthcheck/plugins/system is writable Disk freespace : PASSED, 229 GB of free disk space is available for precheck dir. Kerberos tools : PASSED Memory requirement : PASSED, Available memory: 79 GB Network ports : PASSED OS libraries : PASSED OS release : PASSED, OS release version: centos 7.6 OS settings : PASSED SELinux : PASSED -------------------------------------------------------------------------------- hadoop Clients : PASSED - Found hadoop - Found hdfs - Found yarn - Found hive - Found beeline Distribution : PASSED, found CDH 6.3.3 RM HA Enabled/Disabled : PASSED, Disabled Healthcheck report bundle: /tmp/healthcheck-20210405155130-xyz.unraveldata.com.tar.gz 2021-04-05 15:51:53 Prepare to install with: /opt/unravel/versions/abcd.1004/installer/installer/../installer/conf/presets/default.yaml 2023-08-29 07:00:38 Sending logs to: /tmp/unravel-setup-20230829-070038.log 2023-08-29 07:00:38 Found package: /tmp/mysql/mysql-connector-java-5.1.47.jar 2023-08-29 07:00:39 Prepare to install with: /opt/unravel/versions/4.7.9.2.6165/installer/installer/../installer/conf/presets/databricks.yaml 2023-08-29 07:00:43 Sending logs to: /opt/unravel/logs/setup.log 2023-08-29 07:00:47 Instantiating templates ........................................................................................................................................................................................................................... Ok 2023-08-29 07:00:47 Installing sensors file ................. Ok 2023-08-29 07:00:47 Installing pgsql connector ... Ok 2023-08-29 07:00:47 Installing mysql connector ... Ok 2023-08-29 07:00:48 Starting service monitor ... Ok 2023-08-29 07:00:52 Request start for elasticsearch_1 .... Ok 2023-08-29 07:00:52 Waiting for elasticsearch_1 for 120 sec ........ Ok 2023-08-29 07:00:58 Request start for zookeeper .... Ok 2023-08-29 07:00:58 Waiting for zookeeper_1 for 120 sec ....... Ok 2023-08-29 07:01:02 Request start for kafka .... Ok 2023-08-29 07:01:02 Waiting for kafka for 120 sec ...... Ok 2023-08-29 07:01:04 Waiting for kafka to be alive for 120 sec .... Ok 2023-08-29 07:01:10 Initializing pgsql ... Ok 2023-08-29 07:01:12 Request start for pgsql .... Ok 2023-08-29 07:01:12 Waiting for pgsql for 120 sec ...... Ok 2023-08-29 07:01:14 Getting database schema 'unravel' ... Not found 2023-08-29 07:01:14 Creating database schema 'unravel' ............ Ok 2023-08-29 07:01:15 Generating hashes .... Ok 2023-08-29 07:01:16 Loading elasticsearch templates ............ Ok 2023-08-29 07:01:18 Creating kafka topics ......................... Ok 2023-08-29 07:02:06 Verifying schema objects ... Ok 2023-08-29 07:02:06 Creating schema objects ......... Ok 2023-08-29 07:02:25 Request stop ............................................. Ok 2023-08-29 07:02:35 Preparing databricks workspace ... No workspaces defined 2023-08-29 07:02:36 Done 2023-08-29 07:02:36 *** WARNING: TLS isn't enabled, this represents a security concern. Refer to the documentation on how to enable TLS. 2023-08-29 07:02:36 *** WARNING: Unravel is not licensed, please configure a valid license.
Set the path of a license file.
<Unravel installation directory>/unravel/manager config license set
<license filename>
This command takes a filename as input and performs the following actions:
Reads the license file path and the license file
The license YAML file contains product licensing information, license validity and expiration date, and the licensed number of clusters and nodes.
Verifies whether it is a valid license
Adds the com.unraveldata.license.file property to the
unravel.properties
file. For information, see License property.
Note
If you do not provide the license filename, the
manager config license set
command prompts for the license information. You can copy the content of the license file.Apply the changes.
<Unravel installation directory>
/unravel/manager config apply<Unravel installation directory>
/unravel/manager refresh databricksStart all the services.
<unravel_installation_directory>
/unravel/manager startCheck the status of services.
<unravel_installation_directory>
/unravel/manager reportThe following service statuses are reported:
OK: Service is up and running.
Not Monitored: Service is not running. (Has stopped or has failed to start)
Initializing: Services are starting up.
Does not exist: The process unexpectedly disappeared. A restart will be attempted ten times.
You can also get the status and information for a specific service. Run the manager report command as follows:
<unravel_installation_directory>
/unravel/manager report <service>For example: /opt/unravel/manager report auto_action
5. Configure Unravel Log Receiver
Stop Unravel.
<unravel_installation_directory>
/unravel/manager stopReview and update Unravel Log Receiver (LR) endpoint. This is default set to local FQDN, only visible to workspaces within the same network. If this is not the case, run the following to set the LR endpoint and press ENTER:
<unravel_installation_directory>
/unravel/manager config databricks set-lr-endpoint<hostname>
<port>
For example: /opt/unravel/manager config databricks set-lr-endpoint <hostname><port>
Note
If you do not enter the port number for
<port>
, then the default port 4043 is considered for cases where SSL is not enabled and port 4443 in cases where SSL is enabled.Note
The LR endpoint configuration will only be set in the unravel.yaml file. It will not be reflected in unravel.properties.
Apply the changes.
<unravel_installation_directory>
/unravel/manager refresh databricks<unravel_installation_directory>
/unravel/manager config apply then startStart all the services.
<unravel_installation_directory>
/unravel/manager start
6. Connect Databricks cluster to Unravel
Run the following steps to connect the Databricks cluster to Unravel.
Register workspace in Unravel.
Sign in to Unravel UI, and from the upper right, click > Workspaces. The Workspaces Manager page is displayed.
Click Add Workspace and enter the following details.
Field
Description
Workspace Id
Databricks workspace ID, which can be found in the Databricks URL.
The random numbers shown after o= in the Databricks URL become the workspace ID.
For example, in this URL:https://<databricks-instance>/?o=987654321123456, the Databricks workspace ID is the random number after o=, which is 987654321123456.
Workspace Name
Databricks workspace name. A human-readable name for the workspace. For example,
ACME-Workspace
Instance (Region) URL
Regional URL where the Databricks workspace is deployed. Specify the complete URL. Expected format is protocol://dns or ip(:port). Ensure that the URL does not end with a slash. For example, a valid input is: https://eastus.azuredatabricks.net. An invalid input is: https://eastus.azuredatabricks.net/.
Tier
Select a subscription option: Standard or Premium. For Databricks Azure, you can get the pricing information from the Azure portal. For Databricks AWS you can get detailed information about pricing tiers from Databricks AWS pricing.
Token
Use the personal access token to secure authentication to the Databricks REST APIs instead of passwords. You can generate the token from the workspace URL (Go to User Settings > Developer > Access tokens > Manage > Generate New Token)
See Authentication using Databricks personal access tokens to create personal access tokens.
Note
Users with admin or non-admin roles can create personal access tokens. For non-admin tokens, you must fulfill the requirements as mentioned here.
Note
After you click Add, it takes around 2-3 minutes to register the Databricks Workspace with Unravel.
Add Unravel configuration to Databricks clusters using any of the following options:
Global init script
Global init script applies the Unravel configurations to all clusters in a workspace.
Global init is deployed automatically on the Workspace and needs to be enabled manually from the location shown in the following image:
Go to your workspace, and from the dropdown located in the upper right corner, select Admin Settings.
From Settings, click Compute and then click Manage next to Global init scripts. The Global init scripts page is shown.
Use the toggle key under the Enabled column to enable the Global init scripts.
You can also find the Global initialization script in your workspace at this path: /Workspace/Unravel/install-unravel.sh
If it is not deployed automatically, you can do one of the following
Use this script as a Cluster init script.
Add Unravel configuration to Databricks clusters using the Global init script by referring to these instructions.
Note
Cluster logging should be enabled at the cluster level. See Logging in Cluster init script for instructions.
Important
When you upgrade from an Unravel version below v4.7.5.0, you must disable or remove all the previously set up global init scripts (unravel_cluster_init, unravel_spark_init).
Cluster init script
Cluster init script applies the Unravel configurations at the cluster level. To setup cluster init scripts from the cluster UI, do the following:
Go to Unravel UI, click Manage > Workspaces , and select the Workspace URL.
Log in to the Workspace.
Access the required cluster you want to monitor through Unravel.
Set the path to /Unravel/install-unravel.sh as shown in this image.
Note
Prior to configuring the new cluster-level init script, ensure you remove any existing cluster-level init script configurations that are pointing to the DBFS location. For cluster-level init script setup, make sure to configure it using the workspace file path: /unravel/install-unravel.sh.
Note
To add Unravel configurations to job clusters via API, refer to How to set up cluster init scripts via cluster API.
Managing tokens for workspace access
If you have a premium workspace and the workspace access control is enabled, you must provide the appropriate tokens when adding a workspace. The type of token required depends on whether the workspace is an Admin workspace or a read-only (RO) workspace.
Admin workspaces
Admin access token
Token of a user who has Can Manage permission on Workspace
Standard workspaces
Navigate to Workspace Settings > Access Tokens > Manage Tokens and generate a Read-Only Access Token.
In the case of workspaces with only the Can Read permission token, the Global Init and Cluster Init scripts will not be automatically added to the locations. You must add them manually from Unravel/install-unravel.sh.
Setting the backward compatibility for init scripts
To maintain the backward compatibility of your init scripts, run the following command:
<Installation_directory>/unravel/manager config databricks copy-to-dbfs --id
<workspace-id>
--unsafe-copy-init-to-dbfsThis command copies the init scripts to the DBFS (Databricks File System) location, specifically the
dbin
folder. By default, during workspace addition, updates, or Unravel upgrades, thedbin
folder can be removed. However, running this command prevents its removal and guarantees that it is also transferred to the DBFS.For example:
/opt/unravel/manager config databricks copy-to-dbfs --id 6679977360960347 --unsafe-copy-init-to-dbfs
Alternatively, you can use the --all option in the command instead of specifying --id for a single workspace. This streamlined approach ensures that the same step is applied to all currently registered or added workspaces in Unravel.
For example:
/opt/unravel/manager config databricks copy-to-dbfs
Set the public hostname/IP for your Unravel server
By default, unravel uses the server name and unravel's UI ports and TLS configuration when generating URLs, this allows you to specify a different name, port or generating https URLs without enabling TLS on the unravel side. To set the public hostname/IP for your Unraver server, run the following manager command:
Note
This is an optional step and is only needed if the default setup doesn't work.
manager config set public_hostname {
host
} {port
} {--tls
} {--no-tls
}Here:
{host}: The public hostname or IP address.
{port}: The port number on which the server will accept connections.
{--tls}: Use this if you want TLS to be terminated in front of the Unravel server without enabling TLS on the Unravel side.
{--no-tls}: Use this if you do not want TLS termination.
For example:
manager config set public_hostname unravel.example.com 1234 --tls
With this, unravel will generate URL like: https://unravel.example.com:1234.
Set additional configurations if required.Configurations
Configure the Workspace for Data page.
Ensure that at least one of the workspaces is populated before you configure a workspace for the Data page.
To configure the Databricks for Data page, do the following:
Stop Unravel
<Unravel installation directory>
/unravel/manager stopSet the following property. Replace the example values in this example with your specific configuration details. Refer here for more details about this property.
com.unraveldata.hive.metastore.list=ktkwspace hive.metastore.cluster.ids=default hive.metastore.ktkwspace.workspace.ids=1632882178162910 javax.jdo.option.ktkwspace.ConnectionURL=jdbc:mariadb://consolidated-eastusc3-prod-metastore-2.mysql.database.azure.com:3306/organization1632882178162910?useSSL=true&enabledSslProtocolSuites=TLSv1,TLSv1.1,TLSv1.2&serverSslCert=/databricks/common/mysql-ssl-ca-cert.crt javax.jdo.option.ktkwspace.ConnectionDriverName=org.mariadb.jdbc.Driver javax.jdo.option.ktkwspace.ConnectionUserName=qmJdQLMx7w8aYRhv@consolidated-eastusc3-prod-metastore-2 javax.jdo.option.ktkwspace.ConnectionPassword=imf1IV9GntqiclN5kahcJJSHI_5aM9Y4lV4eQipb
Apply the changes and restart Unravel services.
<Unravel installation directory>/unravel/manager config apply --restart
Start Unravel
<Unravel installation directory>
/unravel/manager start
Optionally, you can run healthcheck, at this point, to verify that all the configurations and services are running successfully.
<unravel_installation_directory>
/unravel/manager healthcheckHealthcheck is run automatically on an hourly basis in the backend. You can set the healthcheck intervals and email alerts to receive the healthcheck reports.
Tip
The workspace setup can be done anytime and does not impact the running clusters or jobs.
Refer to Databricks FAQ.
Set the following permissions to use a non-admin token with Unravel:
In Admin Settings, the Token Usage must have CAN USE permission either as a user or as a group or All Users, or SP for the non-admin user; otherwise, an error is shown.
If in the Admin Settings, the Workspace Access Control is enabled, create the Unravel folder at the workspace root file system, and ensure that the CAN MANAGE permission is granted on the Unravel folder for the non-admin user/group/SP used by Unravel.
If, in the Admin Settings, Workspace Visibility Control is enabled, Unravel User/group/SP needs to have Workspace access and Databricks SQL access permission
Create init script manually; see Connect Databricks cluster to Unravel for setting up Unravel sensor via Global init script or Cluster init script.
If in the Admin Settings, the Cluster Visibility Control is enabled, you must grant the Unravel token the CAN ATTACH permission to all the clusters (per cluster) manually.
If in the Admin Settings, the Job Visibility Control is enabled, you must grant Unravel token with the CAN ATTACH permission to all the jobs (per job) manually.
The following API endpoint permissions should be also granted:
Endpoint
Permission
/api/2.0/workspace/mkdirs
CAN MANAGE permission in the parent folder.
/api/2.0/clusters/get?cluster_id
No extra permission
/api/2.0/clusters/events
No extra permission
/api/2.0/clusters/list-node-types
No extra permission
/api/2.0/clusters/list
CAN ATTACH permission is needed to be granted per cluster to see the clusters
/api/2.0/clusters/events
CAN ATTACH permission is needed to be granted per cluster to see the clusters
/api/2.0/jobs/runs/get?run_id
CAN VIEW permission is needed to be granted per job run
/api/2.0/jobs/runs/list
CAN VIEW permission is needed to be granted per job run
/api/2.0/sql/history/queries
Only two permissions admin can see all, user can only see their queries.
/api/2.0/sql/warehouses
CAN USE permission required per warehouse
/api/2.1/unity-catalog/catalogs
USE CATALOG permission to see the catalog.
/api/2.1/unity-catalog/metastores
admin only
/api/2.1/unity-catalog/schemas
USE_SCHEMA permission per schema or catalog.
/api/2.1/unity-catalog/tables
USE_SCHEMA permission per schema or catalog.
/api/2.1/unity-catalog/storage-credentials
Storage credentials the caller has permission to access. If the caller is a metastore admin, all storage credentials will be retrieved.
/api/2.0/workspace/import
admin only
/api/2.0/global-init-scripts
admin only