- Home
- Unravel 4.7.6.x Documentation
- Installation
- Cloud installation
- Databricks
- Amazon Web Services (AWS) Databricks
Amazon Web Services (AWS) Databricks
Before installing Unravel in AWS Databricks, check and ensure that the installation requirements are completed and follow the below instructions to install and configure Unravel:
1. Create an EC2 instance and connect Databricks to Unravel VM
On your AWS Console, go to the EC2 dashboard and click Launch Instance.
Select the following options based on Unravel's instance requirements:
Base OS
Instance type and size
Ports
Networking
The EC2 instance must be in the same region as the target clusters, which Unravel EC2 node will be monitoring.
Security groups or policies
Create a security group that allows port 3000 and port 4043 from the cluster nodes' IP address, and put the security group member used on the cluster in this rule.
Sample inbound rule Type
Protocol
Port range
Source
All traffic
All
All
Security group ID of this group or subnet IP block.
For example, 10.10.0.0/16
SSH
TCP
22
0.0.0.0/0 or trusted public IP for SSH access
Custom TCP Rule
TCP
443
Security group ID used on the cluster or subnet IP block (if the IP block belongs to a different VPC). Required for VPC peering connection
Custom TCP Rule
TCP
3000
Security group ID used on the cluster or subnet IP block (if the IP block belongs to a different VPC). Required for VPC peering connection.
Custom TCP Rule
TCP
4043
Security group ID used on the cluster or subnet IP block (if the IP block belongs to a different VPC). Required for VPC peering connection.
Custom TCP Rule
TCP
4443
Security group ID used on the cluster or subnet IP block (if the IP block belongs to a different VPC). Required for VPC peering connection.
Review the Virtual Private Cloud (VPC) Peering options to connect Databricks with the Unravel VM.
Workspace | VPC Peering Options |
---|---|
Workspace and Unravel VM are in the same VPC | - |
Workspace VPC is in a different Region | Use VPC Peering: |
Workspace VPC is in a different AWS account | Use VPC Peering: |
2. Download Unravel
Important
Before you download, Unravel for your platform, ensure to get the username and password from Unravel Support.
Go to the Download section for the complete list of Unravel product downloads.
Click the Unravel version that you want to download.
Run the commands provided to download Unravel version of your choice. You can download Unravel TAR or RPM package.
3. Deploy Unravel
Unravel binaries are available as a TAR file or RPM package. You can deploy the Unravel binaries in any directory on the server. However, the user who installs Unravel must have the write permissions to the directory where the Unravel binaries are deployed.
After you extract the contents of the TAR file or RPM package, unravel
directory is created within the installation directory (<unravel_installation_directory
>) and Unravel will be available in <Unravel_installation_directory>/unravel. The directory layout will be unravel/versions/<Directories and files>
.
The following steps to deploy Unravel from a TAR file should be performed by a user, who will run Unravel.
Create an Installation directory.
mkdir
/path/to/installation/directory
For example: mkdir /opt/
Extract Unravel tar file to the installation directory, which you have created in the first step. After you extract the contents of the TAR file,
unravel
directory is created within the installation directory.tar zxf unravel-
<version>
tar.gz -C</path/to/installation/directory>
For example: tar zxf unravel-4.7.0.0.tar.gz -C /opt
The unravel directory will be available within
/opt
.Grant ownership of the directory to a user who will run Unravel.
chown -R username:groupname
</path/to/installation/directory>
For example: chown -R hadoop:hadoop /opt/unravel/
Important
The following steps to deploy Unravel from an RPM package should be performed by a root user. After the RPM package is deployed, the remaining installation procedures should be performed by unravel user.
Create an installation directory.
mkdir /usr/local/unravel
Run the following command:
rpm -i unravel-
<version>
.rpmFor example: rpm -i unravel-4.7.0.0.rpm
The unravel directory will be available in
/usr/local
.If you want to provide a different location, use the --prefix command.
For example:
mkdir /opt/unravel
rpm -i unravel-4.7.0.0.rpm --prefix /opt
The unravel directory will be available in
/opt
.Grant ownership of the directory to a user who will run Unravel. This user executes all the processes involved in Unravel installation.
chown -R
username
:groupname
/usr/local/unravelFor example: chown -R hadoop:hadoop /usr/local/unravel
Continue with the installation procedures as unravel user.
4. Install Unravel
You can install Unravel either with Interactive Precheck or manually without Interactive Precheck.
Note
Unravel recommends installation with Interactive Precheck.
To install Unravel with Interactive precheck, you must run the Interactive Precheck utility to generate a bootstrap configuration file for installation.
The Interactive Precheck utility is run to validate the required configurations before installing Unravel. When you run the Interactive Precheck utility, various checks are prompted for gathering configuration information. The responses you provide for these checks generate a bootstrap configuration file. This file, which contains the configuration information, is then used to install Unravel.
Do the following to install and configure Unravel with Interactive Precheck.
After you download and deploy the Unravel, run the
precheck.sh
script fromunravel/versions/X.Y.Z/healthcheck/
.For example:
/opt/unravel/versions/X.Y.Z/healthcheck/precheck.sh
Enter the necessary details when you are prompted for the following configuration information:
This section covers general information about your Unravel install. You are prompted for the following:
Data platform you want to monitor.
For Hadoop: type of Unravel node you want to configure.
For edge nodes: core node location and test connectivity.
You must answer the following prompts:
-- General information Which data platform are you installing for? 1- Hadoop 2- EMR 3- HDI 4- Databricks 5- Dataproc 6- BigQuery Select one of the above [Hadoop]: ## You can choose a number corresponding to the platform.
This check allows you to configure database-related information and an external database for Unravel.
-- Database configuration Configure an external database? (y/n) [No]:
If you answer No, an Unravel-managed database is used for the installation.
If you answer Yes, you are further prompted for the type of external database that you want to configure.
-- Database configuration Configure an external database? (y/n) [No]: y Type 1- PostgresQL 2- MySQL 3- MariaDB
If you choose a specific type of external database, you are prompted for the following database information and test connectivity to that database. Refer to Integrating Database for more details. For example:
-- Database configuration Configure an external database? (y/n) [No]: y Type 1- PostgresQL 2- MySQL 3- MariaDB Select one of the above []: 1 Selected: PostgresQL Database hostname [None]: Database port (integer) [None]: Database schema [None]: Does the database use TLS (y/n) [No]: Database username [None]: Database password [None] (no echo): Do you wish to test connecting to the external database? (y/n) [Yes]:-- Database configuration Configure an external database? (y/n) [No]:
If you choose MySQL or MariaDB database you are further prompted for extra packages. If you answer Yes, the Extra packages section searches for the required JDBC drivers.
-- Database configuration Will Unravel connect to a MySQL or MariaDB database (ex: hive metastore) ? (y/n) [No]:
The Extra packages check shows if you use Unravel-managed MySQL/MariaDB or need JDBC drivers. Else, this check is automatically skipped.
-- Extra package location *** JDBC drivers are required for Unravel managed MySQL or MariaDB. *** Database software package is required for Unravel managed MySQL or MariaDB. External package location [None]: /<my-extra-packages> ##This is the path to the directory where the required packages are located.
If the required packages are located, then the following message is shown:
The following packages will be installed: Database server: /my-extra-packages/mysql-5.7.27-linux-glibc2.12-x86_64.tar.gz JDBC driver: - /my-extra-packages/mysql-connector-java-5.1.48.tar.gz External package: Ok
If the required packages are not found, then the following error message is showing:
External package: ERROR - ERROR: Couldn't find jdbc drivers in /my-extra-packages - ERROR: Looked for: mysql-connector-java-*.tar.gz mysql-connector-java-*.jar mariadb-java-client-*.jar - ERROR: Couldn't find database server package in /my-extra-packages - ERROR: Looked for: mysql-*-linux-glibc2.12-x86_64.tar.gz mariadb-*-linux-x86_64.tar.gz
This check allows you to enable and configure Kerberos to access your Hadoop cluster.
-- Kerberos configuration Configure unravel to use kerberos? (y/n) [No]: ##If you answer “No”, Kerberos will not be configured for Unravel. ##If you answer “Yes”, you are further prompted for keytab location and principal. The information is then validated for accuracy.
This check allows you to add root and intermediate certificates to validate the trust chain to establish the connection using TLS. Currently, only
pem
files are supported.-- TLS certificate trustchain Add trusted certificates? (y/n) [No]: ##If you answer “No”, certificates will not be added. ##If you answer “Yes”, All the certificates found at the specified location are imported. Wildcards can be used.
This check allows you to configure and test HTTPS for the unravel UI. This check prompts you for the certificate, key, password, and hostname details used to access Unravel.
Use HTTPS to access unravel? (y/n) [Yes]: ##If you answer “Yes”, you are prompted for the path to the certificate and key. Unravel uses this information to configure TLS during installation. ##If you answer “No”, you are shown a warning message for confirmation.
The information provided is verified for the following:
If the Key and Certificate match
If the certificate is valid
If the certificate applies for the provided hostname
This check allows you to set the Unravel UI port and verify the connectivity.
-- Unravel default port Port number (integer) [3000]: Do you want to test if the port is accessible? (y/n) [Yes]: This will open port 3000 and listen for connection for 120 seconds. Use your browser to test if the Unravel UI will be accessible on that port. We have detected the following hostnames: - some.host.example Browse to: http://some.host.example:3000 ATTENTION: This address is an example. You should test with the URL that will be used to access Unravel.
A connection on port 3000 is tried and established. If the connection is successful,
Unravel Port Test: OK
is shown on the browser, andUnravel port: Reached
is shown on the server.This check allows you to set a custom data directory and verify the access if the directories exist. You will always find the software location where you deploy the Unravel binaries. In this check, only the space and access are tested. That data location that you have configured will be used.
-- Unravel directories Software [/opt/unravel]: Data [/opt/unravel/data]: Directories: ERROR - OK: 33 GB of free disk space for software. - ERROR: SYSH0026: Space for data 33 GB is low, recommended minimum is 100 GB.
This check allows you to configure and test email. You are prompted for host and credentials, and the following items are tested:
Connectivity
Authentication, only if provided.
Optional: Send test mail.
Following is a sample:
-- Mail server (SMTP) configuration Unravel can send notification and alert emails. This will allow you to configure and test connection to a SMTP server. Optionally, it can also send a test email. You will have to provide: - Protocol, hostname and port - Credentials if required Configure a SMTP server? (y/n) [No]: y SMTP hostname [None]: smtphostname.gmail.com SMTP port (usually 25 for clear text, 465 for SSL, 587 for STARTLS) (integer) [None]: 587 Security protocol 1- None 2- SSL 3- StartTLS Select one of the above [None]: 3 Selected: StartTLS Authentication required? (y/n) [Yes]: y Username [None]: daemon@unraveldata.com Password [None] (no echo): From [None]: daemon@unraveldata.com To [None]: user@unraveldata.com Send test email (y/n) [No]: y
This option allows you to run the full precheck using some of the provided information. Additional tests like user limits, CPU, and memory are run this way.
-- Full precheck
Note
For more information, refer to Using the Interactive Precheck utility.
The responses that you have provided for the configuration information are used to generate a configuration file. You can use this configuration file when you run the setup command to install and configure Unravel.
After you have completed the responses, you are prompted to confirm if you want to generate the bootstrap configuration file. Press ENTER if you want to generate the bootstrap configuration file.
-- Unravel bootstrap configuration Generate a unravel bootstrap configuration file? (y/n) [Yes]:
The bootstrap configuration file is generated and located at
$HOME/unravel-interactive-precheck/unravel-bootstrap.yaml
.Install Unravel with the bootstrap configuration file.
<unravel_installation_directory>
/unravel/versions/<Unravel version>/setup --bootstrap $HOME/unravel-interactive-precheck/unravel-bootstrap.yamlApply the changes.
<Unravel installation directory>
/unravel/manager config applyStart all the services.
<unravel_installation_directory>
/unravel/manager startCheck the status of services.
<unravel_installation_directory>
/unravel/manager reportThe following service statuses are reported:
OK: Service is up and running.
Not Monitored: Service is not running. (Has stopped or has failed to start)
Initializing: Services are starting up.
Does not exist: The process unexpectedly disappeared. A restart will be attempted ten times.
You can also get the status and information for a specific service. Run the manager report command as follows:
<unravel_installation_directory>
/unravel/manager report<service>
For example: /opt/unravel/manager report auto_action
You can run the setup command to install Unravel manually.
The setup command allows you to do the following:
Runs Precheck automatically to detect possible issues that prevent a successful installation. Suggestions are provided to resolve issues. Refer to Precheck filters for the expected value for each filter.
Let you run extra parameters to integrate the database of your choice.
The setup command allows you to use a managed database shipped with Unravel or an external database. When you run the setup command without additional parameters, the Unravel managed PostgreSQL database is used. Otherwise, you can specify any of the following databases, which is supported by Unravel, with the setup command:
MySQL (Unravel managed as well as external MySQL database)
MariaDB (Unravel managed as well as external MariaDB database)
PostgreSQL (External PostgreSQL)
Refer to Integrate database for details.
Let you specify a separate path for the data directory other than the default path.
The Unravel data and configurations are located in the
data
directory. By default, the installer maintains the data directory under<Unravel installation directory>/data
. You can also change thedata
directory's default location by running additional parameters with the setup command. To install Unravel with the setup command.Provides more options for setup.
To install Unravel with the setup command, do the following:
After deploying the binaries, if you are the root user, switch to Unravel user.
su -
<unravel user>
Notice
Only the Unravel user who owns the installation directory should run the setup command to install Unravel.
Run setup command with any of the following databases (PostgreSQL, MySQL, MariaDB). Refer to setup options for all the additional parameters that you can run with the setup command.
Tip
Run --help with the setup command and any combination of the setup command for complete usage details.
<unravel_installation_directory>
/unravel/versions/<Unravel version>
/setup --helpRefer to Integrate database topic and complete the prerequisites before running the setup command with any other database other than Unravel managed PostgreSQL, which is shipped with the product. Extra parameters must be passed with the setup command when you use another database.
Optionally, if you want to provide a different data directory, you can pass an extra parameter (--data-directory) with the setup command as shown below:
<unravel_installation_directory>
/unravel/versions/<Unravel version>
/setup --enable-databricks --data-directory/the/data/directory
Similarly, you can configure separate directories for other unravel directories—contact support for assistance.
Unravel managed PostgreSQL
<unravel_installation_directory>
/unravel/versions/<Unravel version>
/setup --enable-databricksNotice
If you are using Unravel managed PostgreSQL database, and the Hive metastore is using MySQL, refer Set up Unravel Managed PostgreSQL for Hive metastore with MySQLIntegrate database (Cloud)
External PostgreSQL
<unravel_installation_directory>
/unravel/versions/<Unravel version>
/setup --enable-databricks --external-database postgresql<HOST>
<PORT>
<SCHEMA>
<USERNAME>
<PASSWORD>
For example: /opt/unravel/versions/abcd.992/setup --enable-databricks --external-database postgresql xyz.unraveldata.com 5432 unravel_db_prod unravel unraveldata
Note
The
HOST
,PORT
,SCHEMA
,USERNAME
, andPASSWORD
are optional fields and are prompted if missing.
Unravel managed MySQL
<unravel_installation_directory>
/unravel/versions/<Unravel version>
/setup --enable-databricks --extra /tmp/mysqlExternal MySQL
<unravel_installation_directory>
/unravel/versions/<Unravel version>
/setup --enable-databricks --extra /tmp/<MySQL-directory> --external-database mysql<HOST>
<PORT>
<SCHEMA>
<USERNAME>
<PASSWORD>
Note
The
HOST
,PORT
,SCHEMA
,USERNAME
, andPASSWORD
are optional fields and are prompted if missing.
Unravel managed MariaDB
<unravel_installation_directory>
/unravel/versions/<Unravel version>
/setup --enable-databricks --extra /tmp/mariadbExternal MariaDB
<unravel_installation_directory>
/unravel/versions/<Unravel version>
/setup --enable-databricks --extra /tmp/<MariaDB-directory>
--external-database mariadb<HOST>
<PORT>
<SCHEMA>
<USERNAME>
<PASSWORD>
Note
The
HOST
,PORT
,SCHEMA
,USERNAME
,andPASSWORD
are optional fields and are prompted if missing.
When you run the setup command, the Precheck utility, which identifies the issues that prevent a successful installation, is automatically run. Refer to Precheck filters list to view details of each item in the precheck run output.
The Precheck output displays the issues that prevent a successful installation and provides suggestions to resolve them. You must resolve each of the issues before proceeding. After the prechecks are resolved, you must re-login or reload the shell to execute the setup command again.
Note
In certain situations, you can skip the precheck using the setup --skip-precheck.
For example:
/opt/unravel/versions/
<Unravel version>
/setup --cluster-access abc1011.p2g.net.eu.xyz --skip-precheckYou can also skip the checks that you know can fail. For example, if you want to skip the Check limits option and check_network_ports, run the setup command as follows:
setup --filter-precheck ~check_limits,~check_network_ports
Precheck filtersFilters
Description
Expected Value
System
Check uptime
Verifies the period since the last server reboot.
>24h
Clock sync
Verifies if the clock synchronization service is running on the server.
The clock synchronization service is up and running.
CPU requirement
Verifies if the server has enough CPUs to run Unravel efficiently.
Check requirements.
Memory requirement
Verifies that the server has enough memory to run Unravel efficiently.
Check requirements.
Disk access
Verifies that the user who runs unravel has access to the configured disk locations.
Unravel users can access the configured disk locations.
Disk Freespace
Verifies if the disk locations have enough free space.
Check requirements
Kerberos tools
Verifies that the Kerberos tools are available on the server to support kerberized environments.
Kerberos tools are installed.
Network ports
Verifies that the network ports used by Unravel are available.
Check requirements
OS libraries
Verifies the required libraries if run with Unravel managed MySQL.
The following packages must be installed for fulfilling the OS level requirements for MySQL:
numactl-libs
(for libnuma.so)libaio
(for libaio.so)
OS release
Verifies that the OS distribution is supported.
Check compatibility matrix
OS settings
Verifies vm.max_map_count recommended.
Check requirements
SELinux
Verifies if the SELinux status is enabled or not and provides in which mode it is(Permissive, Disabled, Enforcing).
Check product documentation.
Check limits
Verifies that user limits are set to values
Check requirements
Healthcheck report bundle
Healthcheck report tarball. This report provides the summary and information gathered by the healthcheck with the location.
Precheck filtersFilters
Description
Expected Value
System
Check uptime
Verifies the period since the last server reboot.
>24h
Clock sync
Verifies if the clock synchronization service is running on the server.
The clock synchronization service is up and running.
CPU requirement
Verifies if the server has enough CPUs to run Unravel efficiently.
Check requirements.
Memory requirement
Verifies that the server has enough memory to run Unravel efficiently.
Check requirements.
Disk access
Verifies that the user who runs unravel has access to the configured disk locations.
Unravel users can access the configured disk locations.
Disk Freespace
Verifies if the disk locations have enough free space.
Check requirements
Kerberos tools
Verifies that the Kerberos tools are available on the server to support kerberized environments.
Kerberos tools are installed.
Network ports
Verifies that the network ports used by Unravel are available.
Check requirements
OS libraries
Verifies the required libraries if run with Unravel managed MySQL.
You must install the following packages for fulfilling the OS level requirements for MySQL:
numactl-libs
(for libnuma.so)libaio
(for libaio.so)
OS release
Verifies that the OS distribution is supported.
Check compatibility matrix
OS settings
Verifies vm.max_map_count recommended.
Check requirements
SELinux
Verifies if the SELinux status is enabled or not and provides the mode (Permissive, Disabled, Enforcing).
Check product documentation.
Check limits
Verifies that user limits are set to values
Check requirements
Hadoop
Clients
Ensure that the following Hadoop clients are installed and configured on the server:
Apache Hadoop
Hadoop Distributed File System (HDFS)
Apache Hadoop Yarn
Apache Hive
Apache Beeline
You can ignore the following Precheck limitations on MapR:
The Hadoop client check reports missing clients (HDFS and beeline)
Any check that depends on the HDFS client (for example, HDFS Access check) reports the following message:
HADH0070: hdfs client is not available
Ensure that Unravel has access to files in the MapR file system or needs to provide access manually.
Check compatibility matrix
Distribution
Verifies that the Hadoop distribution is a supported version.
Check compatibility matrix
RM HA Enabled/Disabled
Verifies if RM is running in HA mode
Healthcheck report bundle
healthcheck report tarball. This report provides the summary and information gathered by the healthcheck with the location.
setup Options
Description
-h, --help
Shows help for setup.
--config CONFIG
Specify a different path to the configuration file.
<unravel_installation_directory>
/unravel/versions/<Unravel version>
/setup --configpath/to/config/directory
--enable-core
Enables core node support for non-Hadoop clusters.
--cluster-access
(Edge node parameter)
Enables cluster access to the core node in a multi-cluster environment.
--data-forwarder host:port cluster-type cluster-id
Data forwarder, main unravel node.
--data-directory
Specify a different path to the data directory.
--external-database [param [param ...]]
Enable external database.
--external-database-ssl
Enable external database with SSL.
--log-file
Setup log file location. Default is
/tmp/unravel-setup-YYYYMMDD-HHMMSS.log
.--extra DIR, -e DIR
Specify extra packages location.
--precheck
Run the preinstallation check.
Following is a sample of the setup command run result:
/opt/unravel/versions/abcd.1004/setup 2021-04-05 15:51:30 Sending logs to: /tmp/unravel-setup-20210405-155130.log 2021-04-05 15:51:30 Running preinstallation check... 2021-04-05 15:51:31 Gathering information ................. Ok 2021-04-05 15:51:51 Running checks .................. Ok -------------------------------------------------------------------------------- system Check limits : PASSED Clock sync : PASSED CPU requirement : PASSED, Available cores: 8 cores Disk access : PASSED, /opt/unravel/versions/abcd.1004/healthcheck/healthcheck/plugins/system is writable Disk freespace : PASSED, 229 GB of free disk space is available for precheck dir. Kerberos tools : PASSED Memory requirement : PASSED, Available memory: 79 GB Network ports : PASSED OS libraries : PASSED OS release : PASSED, OS release version: centos 7.6 OS settings : PASSED SELinux : PASSED -------------------------------------------------------------------------------- hadoop Clients : PASSED - Found hadoop - Found hdfs - Found yarn - Found hive - Found beeline Distribution : PASSED, found CDH 6.3.3 RM HA Enabled/Disabled : PASSED, Disabled Healthcheck report bundle: /tmp/healthcheck-20210405155130-xyz.unraveldata.com.tar.gz 2021-04-05 15:51:53 Prepare to install with: /opt/unravel/versions/abcd.1004/installer/installer/../installer/conf/presets/default.yaml 2021-04-05 15:51:57 Sending logs to: /opt/unravel/logs/setup.log 2021-04-05 15:51:57 Instantiating templates ................................................................................................................................................................................................................................ Ok 2021-04-05 15:52:05 Creating parcels .................................... Ok 2021-04-05 15:52:20 Installing sensors file ............................ Ok 2021-04-05 15:52:20 Installing pgsql connector ... Ok 2021-04-05 15:52:22 Starting service monitor ... Ok 2021-04-05 15:52:27 Request start for elasticsearch_1 .... Ok 2021-04-05 15:52:27 Waiting for elasticsearch_1 for 120 sec ......... Ok 2021-04-05 15:52:35 Request start for zookeeper .... Ok 2021-04-05 15:52:35 Request start for kafka .... Ok 2021-04-05 15:52:35 Waiting for kafka for 120 sec ...... Ok 2021-04-05 15:52:37 Waiting for kafka to be alive for 120 sec ..... Ok 2021-04-05 15:52:42 Initializing pgsql ... Ok 2021-04-05 15:52:46 Request start for pgsql .... Ok 2021-04-05 15:52:46 Waiting for pgsql for 120 sec ..... Ok 2021-04-05 15:52:47 Creating database schema ................. Ok 2021-04-05 15:52:50 Generating hashes .... Ok 2021-04-05 15:52:52 Loading elasticsearch templates ............ Ok 2021-04-05 15:52:55 Creating kafka topics .................... Ok 2021-04-05 15:53:36 Creating schema objects ....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... Ok 2021-04-05 15:54:03 Request stop ....................................................... Ok 2021-04-05 15:54:16 Done [unravel@xyz ~]$
Apply the changes.
<Unravel installation directory>
/unravel/manager config apply<Unravel installation directory>
/unravel/manager refresh databricksStart all the services.
<unravel_installation_directory>
/unravel/manager startCheck the status of services.
<unravel_installation_directory>
/unravel/manager reportThe following service statuses are reported:
OK: Service is up and running.
Not Monitored: Service is not running. (Has stopped or has failed to start)
Initializing: Services are starting up.
Does not exist: The process unexpectedly disappeared. A restart will be attempted ten times.
You can also get the status and information for a specific service. Run the manager report command as follows:
<unravel_installation_directory>
/unravel/manager report <service>For example: /opt/unravel/manager report auto_action
5. Configure Unravel Log Receiver
Stop Unravel.
<unravel_installation_directory>
/unravel/manager stopReview and update Unravel Log Receiver (LR) endpoint. This is default set to local FQDN, only visible to workspaces within the same network. If this is not the case, run the following to set the LR endpoint and press ENTER:
<unravel_installation_directory>
/unravel/manager config databricks set-lr-endpoint<hostname>
<port>
For example: /opt/unravel/manager config databricks set-lr-endpoint <hostname><port>
Note
If you do not enter the port number for
<port>
, then the default port 4043 is considered for cases where SSL is not enabled and port 4443 in cases where SSL is enabled.Apply the changes.
<unravel_installation_directory>
/unravel/manager config apply<unravel_installation_directory>
/unravel/manager refresh databricksStart all the services.
<unravel_installation_directory>
/unravel/manager start
6. Connect Databricks cluster to Unravel
Run the following steps to connect the Databricks cluster to Unravel.
Register workspace in Unravel.
Sign in to Unravel UI, and from the upper right, click > Workspaces. The Workspaces Manager page is displayed.
Click Add Workspace and enter the following details.
Field
Description
Workspace Id
Databricks workspace ID, which can be found in the Databricks URL.
The random numbers shown after o= in the Databricks URL become the workspace ID.
For example, in this URL:https://<databricks-instance>/?o=987654321123456, the Databricks workspace ID is the random number after o=, which is 987654321123456.
Workspace Name
Databricks workspace name. A human-readable name for the workspace. For example,
ACME-Workspace
Instance (Region) URL
Regional URL where the Databricks workspace is deployed. Specify the complete URL. For example: https://dbc-1dbx661f-a33e.cloud.databricks.com
Tier
Select a subscription option: Standard or Premium. For Databricks Azure, you can get the pricing information from the Azure portal. For Databricks AWS you can get detailed information about pricing tiers from Databricks AWS pricing.
Token
Use the personal access token to secure authentication to the Databricks REST APIs instead of passwords. You can generate the token from the workspace URL (Go to Settings > User Settings > Access Token > Generate New Token)
See Authentication using Databricks personal access tokens to create personal access tokens.
Note
Users with admin or non-admin roles can create personal access tokens.
Note
After you click Add, it takes around 2-3 minutes to register the Databricks Workspace with Unravel.
Add Unravel configuration to Databricks clusters using any of the following options:
Global init script
Global init script applies the Unravel configurations to all clusters in a workspace. Do the following to set up Unravel configuration as Global init scripts:
If upgrading from a previous version of Unravel, you must remove all the existing scripts such as
unravel_cluster_init.sh
,unravel_spark_init.sh
, etc.On Databricks, go to Workspace > Settings > Admin Console > Global init scripts.
Click +Add and set the following:
Item
Settings
Name
Enter the name as unravel_init
Script
Copy and paste the following content in the Script box:
#!/bin/bash # # Runs Unravel Init scripts COUNTER=1 while [ ! -d "/dbfs" ] && [ $COUNTER -le 20 ]; do echo "$(date) Waiting for dbfs mount: RetryCount = ${COUNTER} ....." ((COUNTER++)) sleep 0.1 done UD_ROOT=/dbfs/databricks/unravel CLUSTER_INIT=${UD_ROOT}/unravel-db-sensor-archive/dbin/unravel_cluster_init.sh SPARK_INIT=${UD_ROOT}/unravel-db-sensor-archive/dbin/unravel_spark_init.sh if [ ! -f "${CLUSTER_INIT}" ]; then echo "Unravel Cluster Init ${CLUSTER_INIT} doesn't exist!" exit 0 else cp ${CLUSTER_INIT} /tmp/ chmod a+x /tmp/unravel_cluster_init.sh /tmp/unravel_cluster_init.sh fi if [ ! -f "${SPARK_INIT}" ]; then echo "Unravel Spark Init ${SPARK_INIT} doesn't exist!" exit 0 else cp ${SPARK_INIT} /tmp/ chmod a+x /tmp/unravel_spark_init.sh /tmp/unravel_spark_init.sh fi
Note
Unravel supports Databricks version 11.3 and below. Newer versions can be included by setting the environment variable
DATABRICKS_RUNTIME_VERSION
at the top of this scriptEnabled
Turn on the Enable toggle.
Click Add to save the settings.
Note
Cluster logging should be enabled at the cluster level. See Logging in Cluster init script for instructions.
Important
When you upgrade from an Unravel version below v4.7.5.0, you must disable or remove all the previously set up global init scripts (unravel_cluster_init, unravel_spark_init).
Cluster init script
Cluster init script applies the Unravel configurations at the cluster level. You can setup using one of the following options:
On Databricks, open a cluster and go to Advanced Options.
Edit the following settings:
Item
Settings
Logging
Set Destination to DBFS and copy and paste the following path in Cluster Log Path.
dbfs:/cluster-logs
Init Scripts
Set Destination to DBFS, copy and paste the following in Init Script Path and then click Add.
dbfs:/databricks/unravel/unravel-db-sensor-archive/dbin/install-unravel.sh
Note
Cluster logging should be enabled at the cluster level. See Logging in Cluster init script for instructions.
Add Unravel configurations to job clusters using API. Apply the following JSON format:
{ "settings": { "new_cluster": { "init_scripts": [ { "dbfs": { "destination": "dbfs:/databricks/unravel/unravel-db-sensor-archive/dbin/install-unravel.sh" } }, ... ], "cluster_log_conf": { "dbfs": { "destination": "dbfs:/cluster-logs" } }, ... }, ... } }
Note
Cluster logging should be enabled at the cluster level. See Logging in Cluster init script for instructions.
Set additional configurations if required.
Configure the Workspace for Data page.
Ensure that at least one of the workspaces is populated before you configure a workspace for the Data page.
To configure the Databricks for Data page, do the following:
Stop Unravel
<Unravel installation directory>
/unravel/manager stopSet the following property.
<Unravel installation directory>
/unravel/manager config properties set hive.metastore.<X>
.workspace.ids<Comma-separated list of Databricks workspaces>
Replace
<X>
with the metastore variables listed in the com.unraveldata.hive.metastore.list property. Refer here for more details about this property.Apply the changes.
<Unravel installation directory>
/unravel/manager config applyStart Unravel
<Unravel installation directory>
/unravel/manager start
Optionally, you can run healthcheck, at this point, to verify that all the configurations and services are running successfully.
<unravel_installation_directory>
/unravel/manager healthcheckHealthcheck is run automatically on an hourly basis in the backend. You can set the healthcheck intervals and email alerts to receive the healthcheck reports.
Tip
The workspace setup can be done anytime and does not impact the running clusters or jobs.
Refer to Databricks FAQ.