Home

Amazon Web Services (AWS) Databricks

Before installing Unravel in AWS Databricks, check and ensure that the installation requirements are completed and follow the below instructions to install and configure Unravel:

1. Create an EC2 instance and connect Databricks to Unravel VM
  1. On your AWS Console, go to the EC2 dashboard and click Launch Instance.

  2. Select the following options based on Unravel's instance requirements:

    • Base OS

    • Instance type and size

    • Ports

    • Networking

      The EC2 instance must be in the same region as the target EMR clusters, which Unravel EC2 node will be monitoring.

    • Security groups or policies

      • Create a security group that allows port 3000 and port 4043 from EMR cluster nodes' IP address, and put the security group member used on the EMR cluster in this rule.

      Sample inbound rule

      Type

      Protocol

      Port range

      Source

      All traffic

      All

      All

      Security group ID of this group or subnet IP block.

      For example, 10.10.0.0/16

      SSH

      TCP

      22

      0.0.0.0/0 or trusted public IP for SSH access

      Custom TCP Rule

      TCP

      443

      Security group ID used on the EMR cluster or subnet IP block (if the IP block belongs to a different VPC). Required for VPC peering connection

      Custom TCP Rule

      TCP

      3000

      Security group ID used on the EMR cluster or subnet IP block (if the IP block belongs to a different VPC). Required for VPC peering connection.

      Custom TCP Rule

      TCP

      4043

      Security group ID used on the EMR cluster or subnet IP block (if the IP block belongs to a different VPC). Required for VPC peering connection.

      Custom TCP Rule

      TCP

      4443

      Security group ID used on the EMR cluster or subnet IP block (if the IP block belongs to a different VPC). Required for VPC peering connection.

Review the Virtual Private Cloud (VPC) Peering options to connect Databricks with the Unravel VM.

Workspace

VPC Peering Options

Workspace and Unravel VM are in the same VPC

-

Workspace VPC is in a different Region

Use VPC Peering:

Workspace VPC is in a different AWS account

Use VPC Peering:

2. Download Unravel
3. Deploy Unravel
4. Install Unravel

You can install Unravel either with Interactive Precheck or manually without Interactive Precheck.

Note

Unravel recommends installation with Interactive Precheck.

To install Unravel manually, do the following:

You can run the setup command to install Unravel.

The setup command allows you to do the following:

  • Runs Precheck automatically to detect possible issues that prevent a successful installation. Suggestions are provided to resolve issues. Refer to Precheck filters for the expected value for each filter.

  • Let you run extra parameters to integrate the database of your choice.

    The setup command allows you to use a managed database shipped with Unravel or an external database. When you run the setup command run without any additional parameters, the Unravel managed PostgreSQL database is used. Otherwise, you can specify any of the following databases, which is supported by Unravel, with the setup command:

    • MySQL (Unravel managed as well as external MySQL database)

    • MariaDB (Unravel managed as well as external MariaDB database)

    • PostgreSQL (External PostgreSQL)

    Refer to Integrate database for details.

  • Let you specify a separate path for the data directory other than the default path.

    The Unravel data and configurations are located in the data directory. By default, the installer maintains the data directory under <Unravel installation directory>/data. You can also change the data directory's default location by running additional parameters with the setup command. To install Unravel with the setup command.

  • Provides more options for setup.

To install Unravel with the setup command, do the following:

  1. After deploying the binaries, if you are the root user, switch to Unravel user.

      su - <unravel user>

    Notice

    Only the Unravel user who owns the installation directory should run the setup command to install Unravel.

  2. Run setup command with any of the following databases (PostgreSQL, MySQL, MariaDB). Refer to setup options for all the additional parameters that you can run with the setup command.

    Tip

    Run --help with the setup command and any combination of the setup command for complete usage details.

    <unravel_installation_directory>/unravel/versions/<Unravel version>/setup --help
    

    Refer to Integrate database topic and complete the prerequisites before running the setup command with any other database other than Unravel managed PostgreSQL, which is shipped with the product. Extra parameters must be passed with the setup command when you use another database.

    Optionally, if you want to provide a different data directory, you can pass an extra parameter (--data-directory) with the setup command as shown below:

    <unravel_installation_directory>/unravel/versions/<Unravel version>/setup --enable-databricks --data-directory /the/data/directory

    Similarly, you can configure separate directories for other unravel directories—contact support for assistance.

    When you run the setup command, the Precheck utility, which identifies the issues that prevent a successful installation, is automatically run. Refer to Precheck filters list to view details of each item in the precheck run output.

  3. Review and update Unravel Log Receiver (LR) endpoint. By default, this is set to local FQDN only visible to workspaces within the same network. If this is not the case, run the following to set the LR endpoint:

    <unravel_installation_directory>/unravel/manager config databricks set-lr-endpoint <hostname> ''
    
    ## For example: 
    /opt/unravel/manager config databricks set-lr-endpoint <hostname> ''
    

    After you run this command, you are prompted to specify the port number. Ensure to press ENTER and leave it empty.

  4. Apply the changes.

    <Unravel installation directory>/unravel/manager config apply
    <Unravel installation directory>/unravel/manager refresh databricks
    
  5. Start all the services.

    <unravel_installation_directory>/unravel/manager start 
    
  6. Check the status of services.

    <unravel_installation_directory>/unravel/manager report 
    

    The following service statuses are reported:

    • OK: Service is up and running.

    • Not Monitored: Service is not running. (Has stopped or has failed to start)

    • Initializing: Services are starting up.

    • Does not exist: The process unexpectedly disappeared. A restart will be attempted ten times.

    You can also get the status and information for a specific service. Run the manager report command as follows:

    <unravel_installation_directory>/unravel/manager report <service> 
    ## For example: /opt/unravel/manager report auto_action
    
  1. Register workspace in Unravel.

    1. Sign in to Unravel UI and from the upper right click manage-icon.png> Workspaces. The Workspaces Manager page is displayed.

    2. Click Add Workspace and enter the following details.

      Field

      Description

      Workspace Id

      Databricks workspace ID, which can be found in the Databricks URL.

      Workspace Name

      Databricks workspace name, which can be found in the Databricks URL.

      Instance (Region) URL

      Regional URL where the Databricks workspace is deployed.

      Tier

      Select a subscription option: Standard or Premium.

      Token

      Personal access token to access Databricks REST APIs. Refer to Authentication using Databricks personal access tokens to create personal access tokens.

      Note

      Personal access tokens can be created with admin or non-admin roles.

      Note

      After you click the Add button, it will take around 2-3 minutes to register the Databricks Workspace with Unravel.

  2. Add Unravel configuration to Databricks clusters using any of the following options:

    • Global init script

      Global init script applies the Unravel configurations at Workspace level. To set up global init scripts, open this file and follow the instructions.

    • Cluster init script

      Cluster init script applies the Unravel configurations at cluster level. To setup cluster init scripts from the cluster UI, do the following:

      1. Go to Unravel UI, click Manage > Workspaces > Cluster configuration to get the configuration details.

      2. Follow the instructions and update each cluster (Automated /Interactive) that you want to monitor with Unravel.

        Add Unravel configuration to Databricks clusters. Go to Unravel UI and from the upper right click Manage manage-icon.png> Workspaces > Cluster configuration to get the configuration details. Follow the instructions and update every cluster (Automated /Interactive) in your workspace.

        Tip

        By default, the Ganglia metrics are enabled with Dcom.unraveldata.agent.metrics.ganglia_enabled property set to true.

        cluster-config-setup-databricks1.png
        cluster-config-setup-databricks2.png
        cluster-config-setup-databricks3.png
        cluster-config-setup-databricks4.png

        Note

        To add Unravel configurations to job clusters via API, refer How to setup cluster init scripts via cluster API.

  3. Set additional configurations if required.

  4. Configure the Workspace for Data page.

  5. Optionally, you can run healthcheck, at this point, to verify that all the configurations and services are running successfully.

    <unravel_installation_directory>/unravel/manager healthcheck
    

    Healthcheck is run automatically on an hourly basis in the backend. You can set the healthcheck intervals and email alerts to receive the healthcheck reports.

Tip

The workspace setup can be done at any time and does not impact the running clusters or jobs.

Validate Databricks workspace setup

Refer to Databricks FAQ.