Databricks workspace setup guide
This section provides instructions to connect a Databricks cluster to Unravel SaaS.
Create a Workspace token in Databricks.
Enable personal access token authentication for the workspace.
Go to your workspace, and from the dropdown located in the upper right corner, select Admin Settings.
Click the Advanced tab and click the Personal Access Tokens toggle. For more details, refer to Manage personal access tokens.
Create a Databricks personal access token for your Databricks workspace user.
Go to your workspace, and from the dropdown located in the upper right corner, select User settings.
Click Developer > Access Tokens > Manage. The Access Tokens page is displayed.
Click the Generate New Token button. The new token is generated. You must save this token and keep it handy to register a new Databricks workspace. For more details, refer to Authentication using Databricks personal access tokens.
Next to Access tokens, click Manage.
Register a new Databricks workspace or edit details of an existing Databricks workspace.
Sign in to Unravel UI, and from the upper right, click > Workspaces. The Workspaces Manager page is displayed.
Click Add Workspace. The Add Workspace dialog box is displayed. Enter the following details:
Field
Description
Workspace Id
Databricks workspace ID, which can be found in the Databricks URL.
The random numbers shown after o= in the Databricks URL become the workspace ID.
For example, in this URL:https://<databricks-instance>/?o=987654321123456, the Databricks workspace ID is the random number after o=, which is 987654321123456.
Workspace Name
Databricks workspace name. A human-readable name for the workspace. For example,
ACME-Workspace
Instance (Region) URL
Regional URL where the Databricks workspace is deployed. Specify the complete URL. Expected format is protocol://dns or ip(:port). Ensure that the URL does not end with a slash. For example, a valid input is: https://eastus.azuredatabricks.net. An invalid input is: https://eastus.azuredatabricks.net/.
Tier
Select a subscription option: Standard or Premium. For Databricks Azure, you can get the pricing information from the Azure portal. For Databricks AWS you can get detailed information about pricing tiers from Databricks AWS pricing.
Token
Use the personal access token to secure authentication to the Databricks REST APIs instead of passwords. You can generate the token from your Databricks workspace.
See Authentication using Databricks personal access tokens for more details.
Note
Users with admin or non-admin roles can create personal access tokens. Non-admin users must ensure to fulfill certain requirements before creating personal access tokens.
Note
After you click Add, it takes around 2-3 minutes to register the Databricks Workspace with Unravel.
Configure the Databricks cluster with Unravel:
Global init script applies the Unravel configurations to all clusters in a workspace. The following steps take you through the cluster configuration with Global init. If you are configuring the clusters without Global init, refer to these steps.
Note
Note: For spark-submit jobs, you can't configure Spark this way. Instead, copy the below snippet as spark-submit parameters.
For Spark 3.0.x and above, copy the following snippet:
For Spark 2.4.x and below, copy the following snippet:
Copy the following snippet to Spark > Spark Conf.
Under Logging, set Destination to DBFS and copy the below snippet as Cluster Log Path.
Under Init Script, set Destination to Workspace, copy the below snippet as Init Script Path and click Add.
Open a Databricks Workspace in Databricks.
In the Compute section, select a cluster (Automated/Interactive) and open the Configuration tab.
Locate Advanced Options and apply the following settings: