Skip to main content

Home

Unravel on Databricks SaaS Setup Guide

This page provides step-by-step instructions for integrating Unravel with Databricks SaaS.

Databricks Components

Service Principal Creation

A service principal (SPN) is required to authenticate Unravel to the Databricks API. This non-human identity enables secure, automated access for monitoring and data collection, following best practices for automation and security. Service principals provide a secure way for Unravel to interact programmatically with Databricks APIs. This allows Unravel to monitor and optimize your Databricks environment without relying on individual user credentials.

To create and configure a service principal:

  1. Create the service principal (SPN):

  2. Configure access via Personal Access Token (PAT):

    1. Go to your workspace, and from the dropdown located in the upper right corner, select Settings.

      DBX-PAT.png
    2. Click the Advanced tab and click the Personal Access Tokens toggle.

    3. Create a Databricks personal access token for your Databricks workspace user.

      1. Go to your workspace, and from the dropdown located in the upper right corner, select Settings.

      2. Click Developer > Access Tokens > Manage. The Access Tokens page is displayed.

        manage.png
      3. Click the Generate New Token button. The new token is generated. You must save this token and keep it handy to register a new Databricks workspace. For more details, refer to Authentication using Databricks personal access tokens.

        Generate-new-token.png
Databricks - API Permissions

After creating the service principal, configure essential permissions to ensure Unravel operates with the minimum necessary privileges.

To configure API permissions:

  1. Go to Settings > Workspace Admin in your Databricks workspace.

  2. Under Identity and Access, locate the service principal and click Manage.

    SPN-Manage.png
  3. Under the Configuration tab for the SPN, ensure the following entitlements are enabled:

    Note

    These entitlements are typically granted by default to all users and service principals via the users group. If your organization manages entitlements individually, ensure these are checked for the SPN.

    entitlements.png
    • Workspace access: Grants access to the Data Science & Engineering and persona-based environments.

    • Databricks SQL access: Grants access to Databricks SQL.

  4. Enable Personal Access Tokens in Advanced Settings.

    1. Navigate to Settings > Workspace Admin > Advanced.

    2. Find the Personal Access Tokens section and ensure it is enabled.

      PAT.png
    3. Assign CAN USE Permission for Token Usage to the Service Principal.

      Token_Usage.png
Serverless Function

By default, non-admin tokens-including those used by service principals-do not have access to all users’ clusters and jobs in a Databricks workspace. Additionally, Databricks does not currently provide a built-in method to grant permissions to clusters and jobs that have not yet been created.

To address this, you must deploy a serverless function (AWS Lambda or Azure Function) in your environment. This function should run every minute with an admin token and automatically grant the following permissions to the Unravel service principal for any newly created resources:

  • CAN_ATTACH_TO permission for clusters

  • CAN_VIEW permission for jobs

For assistance with setting up this serverless function, contact Unravel support.

Sensor Deployment

An initialization script deploys a JAR-based Unravel sensor onto all compute instances within the workspace. This deployment is managed automatically via a cluster-scoped policy, so developers do not need to modify their code.

Init Script
  1. Create a folder named Unravel in the workspace directory.

    workspace-directory.png
  2. Assign CAN_MANAGE permissions to the Unravel service principal.

    unravel-permission.png
Cluster-Scoped Policy

Unravel recommends loading the Unravel init script using cluster-scoped init policies. This ensures the sensor is deployed to every compute instance automatically.

Apply the following property to a new or existing cluster-scoped policy:

{  "clusters": [    {      "init_scripts": [        {          "workspace": {            "destination": "$INIT_SCRIPT_PATH"          }        }      ]    }  ]}

Replace $INIT_SCRIPT_PATH with the actual path to your Unravel init script (for example, /Workspace/Unravel/install-unravel.sh).

Network Components

By default, Unravel sensors send data to the Unravel data plane using TLS v1.2 encryption. This ensures that all traffic is encrypted while in transit.

If your organization requires that no data traverses the public internet, you can use a PrivateLink endpoint to securely transmit Unravel sensor data. This approach keeps all traffic within your cloud provider’s private network.

Use a PrivateLink Endpoint

A PrivateLink endpoint allows Unravel sensor data to reach the Unravel data plane without any internet exposure.

To set up a PrivateLink endpoint:

  1. Contact Unravel support to request a PrivateLink connection to your endpoint.

  2. Verify the endpoint request in your cloud provider’s console.

  3. Select your Databricks workspace VPC as the destination for the PrivateLink endpoint.

This configuration ensures that all Unravel sensor data is transmitted privately and securely, meeting strict network and compliance requirements.

Unravel Configuration

Register a new Databricks workspace or edit details of an existing Databricks workspace.

  1. Sign in to Unravel UI, and from the upper right, click manage-icon.png> Workspaces. The Workspaces Manager page is displayed.

  2. Click Add Workspace. The Add Workspace dialog box is displayed. Enter the following details:

    saas-new-workspace.png

    Field

    Description

    Workspace Id

    Databricks workspace ID can be found in the Databricks URL.

    The numbers shown after o= in the Databricks URL become the workspace ID.

    For example, in this URL:https://<databricks-instance>/?o=3205148689792956, the Databricks workspace ID is the number after o=, which is 3205148689792956.

    Workspace-id.png

    Workspace Name

    Databricks workspace name. A name for the workspace. For example, ACME-Workspace. The Workspace name can be got from the Azure portal.

    Instance (Region) URL

    Regional URL where the Databricks workspace is deployed. Specify the complete URL. The expected format is protocol://dns or ip(:port).  Ensure that the URL does not end with a slash. For example, a valid input is: https://eastus.azuredatabricks.net. An invalid input is: https://eastus.azuredatabricks.net/.

    The URL can be got from the Azure portal.

    Tier

    Select a subscription option from: Standard, Premium, Enterprises, and Dedicated. For Databricks Azure, you can get the pricing information from the Azure portal. For Databricks AWS you can get detailed information about pricing tiers from Databricks AWS pricing.

    Token

    Use the personal access token to secure authentication to the Databricks APIs. You can generate the token from your Databricks workspace.

    Contact Unravel support for generating a personal access token through SPN on Databricks for AWS and Azure.

    See Authentication using Databricks personal access tokens for more details.

    Note

    Users with admin or non-admin roles can create personal access tokens. Non-admin users must ensure to fulfill certain requirements before creating personal access tokens. non-admin-requirements-for-token

Architecture Diagram

The reference architecture for Unravel SaaS with Databricks is illustrated in the diagram below. It highlights the main components and their interactions:

Architecture.png
  • Unravel fetches cluster, job, and other required information with the help of Databricks API.

  • Unravel Sensor is deployed on each monitored cluster to collect cluster metrics.

  • Unravel UI displays aggregated results, recommendations, insights, and more for the users.

Connectivity Descriptions

Reference connections from the architecture diagram

Method

Authentication

Encryption-In-Transit

1

API

Choice of either Databricks Personal Access Token or Service Principal Name

TLS over HTTPS, port 443 Unravel connects to Databricks API endpoint

2

API

Choice of Unravel basic auth or Azure Active Directory (AAD) auth (via SAML 2)

TLS over HTTPS, port 443 Databricks connects to Unravel API endpoint

3

UI

Choice of Unravel basic auth or Active Directory (via SAML 2) for access authentication. Session-based JWT during usage

HTTPS, port 443 Client connects to Unravel UI

Verify Configuration

  • Check that the Unravel sensor is deployed on all relevant clusters.

  • Monitor logs in the Unravel UI.

  • Verify that cluster and job data is being collected and displayed correctly.