Databricks
Databricks Worker
The following properties can be used to configure Databricks Worker.
Property/Description | Set by user | Unit | Default |
---|---|---|---|
com.unraveldata.databricks.consumers Sets the quantity of Kafka consumer threads responsible for processing data from the Databricks Kafka topic. Ensure this value does not exceed the number of Kafka partitions allocated for the Databricks topic. | Optional | Integer | 8 |
com.unraveldata.databricks.reader.pool.size Sets the number of concurrent reader threads for polling the Databricks API. | Optional | Integer | 40 |
com.unraveldata.databricks.api.polling.interval Sets the interval, in seconds, between consecutive REST API requests for polling a Databricks workspace. | Optional | Seconds | 30 |
com.unraveldata.databricks.api.lookback.seconds Sets the lookback period for polling the Databricks API to retrieve historical data on every worker restart. | Optional | Seconds | 60 |
com.unraveldata.databricks.cache.size.max Sets the maximum number of items to be cached in memory. | Optional | Integer | 1000 |
com.unraveldata.databricks.cache.entry.ttl.secs Sets the timeout, in seconds, for cached items in the Databricks worker cache. | Optional | Seconds | 3600 |
com.unraveldata.databricks.allpurpose.photon.dbu.factor Sets the multiplication factor for the cost of non-Photon node types when Databricks Units (DBUs) are unavailable under all-purpose Photon. Enables dynamic cost management in such instances. | Optional | Double | 2 |
com.unraveldata.databricks.jobcompute.photon.dbu.factor Sets the multiplication factor for the cost of non-Photon node types when Databricks Units (DBUs) are unavailable under job compute Photon. Enables dynamic cost management in such instances. | Optional for Azure Mandatory for AWS | Double | 2.5 (Azure) 2.9 (AWS) NoteSet the value to 2.9 for accurate cost computation on AWS. |
Insight
The following properties can be used to configure the insight generation for Databricks.
Property/Description | Set by user | Unit | Default |
---|---|---|---|
com.unraveldata.databricks.insights.supported.clusters Defines a comma-separated list of clusters for which insights generation is supported for Databricks. | Optional | String | INTERACTIVE,AUTOMATED,AUTOMATEDLIGHTINTERACTIVE,AUTOMATED,AUTOMATEDLIGHT |
Discount pricing
The following properties can be used to configure the discount prices for Databricks.
Property/Description | Set by user | Unit | Default |
---|---|---|---|
com.unraveldata.databricks.vm.discount.percentage Discount percentage for virtual machines (VMs). For example: com.unraveldata.databricks.vm.discount.percentage = 20 | percent | 0 |
Redact property values
Property/Description | Set by user | Unit | Default |
---|---|---|---|
com.unraveldata.databricks.keys.toRedact Redacts spark property values if the property key is in the list. | required | string | password |
Azure AD (Databricks) REST API authentication
Property/Description | Set by user | Unit | Default |
---|---|---|---|
com.unraveldata.databricks.client_id Azure AD Databricks client ID. | set via manager command | string | - |
com.unraveldata.databricks.client_secret Azure AD Databricks client secret. | set via manager command | string | - |
com.unraveldata.databricks.tenant_id Azure AD Databricks tenant ID | set via manager command | string | - |
Also refer, Using Azure AD for Databricks REST API authentication
Using Azure AD for Databricks REST API authentication
You can use the Azure active directory for Databricks REST API authentication instead of the usual Personal Access Token authentication. Do the following:
Create a service principal.
From the Azure portal, log on to your Azure Account.
Select Azure Active Directory > App Registrations > New Registrations and register your app. You must also add the registered app to the admins group as shown in Step 3.
Go to Certificates & secrets and click on New client secret.
Describe the secret and its duration.
Click Add. The client's secret is displayed, which you must copy and keep handy.
Add the service principal as a contributor (or Reader) to each workspace.
Collect information on the following items before you assign the service principal to each workspace.
Items
Description
personal-access-token
Personal Access Token (PAT) that was used for managing workspace.
databricks-instance
URL of the corresponding workspace where you assign the service principal.
application-id
Application (client) ID of the application that was registered in the previous step.
display-name
Name of application that was created in the previous step.
Use SCIM API to assign the created service principal to each workspace.
curl --netrc -X POST -H "Authorization: Bearer
<personal-access-token>
"<databricks-instance>
/api/2.0/preview/scim/v2/ServicePrincipals \ --header 'Content-type: application/scim+json' \ --data '{ "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:ServicePrincipal" ], "applicationId": "<application-id>
", "displayName": "<display-name>
" }'
Add the registered app to the admins group.
Go to Workspace > Settings > Admin console > Groups > admins > Add users or Service principals.
Add the registered app that was created in Step 1.
Set properties of the app in Unravel:
Stop Unravel.
<Unravel installation directory>/unravel/manager stop
Run the following command and set the below properties using the manager tool.
<Unravel installation directory>/unravel/manager config databricks set-azure-ad --client <databricks-client-id> --tenant <databricks-tenant-id>
You are prompted to enter the client's secret. Type the secret, which will be masked, and press ENTER.
Properties
Description
com.unraveldata.databricks.client_id
Specify the client ID
com.unraveldata.databricks.client_secret
Specify the client secret
com.unraveldata.databricks.tenant_id
Specify the Tenant ID
Apply the changes.
<Unravel installation directory>/unravel/manager config apply
Start Unravel.
<Unravel installation directory>/unravel/manager start