Unravel for Databricks on Google Cloud Platform

Overview

This document is for Databricks and Google Cloud Platform (GCP) administrators who plan to deploy and configure Unravel for Databricks on GCP. Unravel for Databricks on GCP provides full-stack observability, cost optimization, and performance management for Databricks workloads running on GCP. Unravel connects to your Databricks workspaces, collects metadata from system tables, Spark event logs, and the Databricks control plane, and then surfaces actionable insights through a unified web interface.

What Unravel does for Databricks on GCP

Unravel provides the following capabilities for Databricks on GCP.

Cost observability and FinOps: Track cloud spend across workspaces, clusters, jobs, and users, identify top cost drivers, set budgets, and allocate costs by using chargeback.
Performance optimization: Analyze Spark application performance, identify bottlenecks, and get AI-driven recommendations for cluster rightsizing and query tuning.
Operational monitoring: Monitor job and workflow health, detect failures and SLA violations, and gain visibility into pipeline execution.
Data governance visibility: Understand data access patterns, including which tables are accessed by which users, applications, and workloads.

Deployment model for Unravel on GCP

Unravel is deployed as a SaaS solution that connects to your Databricks workspaces on GCP. The integration uses a dedicated Databricks service principal with least-privilege permissions to access operational metadata, and all data collected by Unravel is non-business data consisting of system table metadata, Spark execution metrics, and control plane configuration.

Supported Databricks environments on GCP

Unravel supports the following Databricks environments on Google Cloud Platform.

Databricks on Google Cloud Platform (GCP)
Unity Catalog-enabled workspaces
Classic and serverless compute (GKE-based Databricks runtime)
Google Cloud Storage (GCS) for event log and catalog storage

Key integration points between Unravel and Databricks

Unravel interacts with Databricks on GCP through three primary data channels.

System table metadata: Scheduled Spark jobs curate Databricks system tables into a dedicated Unravel catalog (Delta tables), and Unravel accesses these curated tables by using Databricks Delta Sharing to enable secure, read-only data sharing without requiring a SQL warehouse.
Spark event logs: Spark execution metadata (event logs, metrics, SQL plans) is stored in a customer-managed GCS bucket that is backed by a Unity Catalog volume, and Unravel reads these logs through Unity Catalog for performance analysis.
Control plane metadata: Unravel uses OAuth 2.0 with a service principal to authenticate to the Databricks REST API and Unity Catalog file and SQL APIs to retrieve configuration and telemetry data.

Prerequisites for deploying Unravel on Databricks in GCP

Before you deploy Unravel for Databricks on GCP, make sure the following requirements are met.

GCP requirements

A Google Cloud account with an active project.
GCS buckets provisioned for Spark event logs and the Unravel curated catalog.
Network connectivity between Unravel and Databricks endpoints (Private Service Connect is recommended for production environments).
Appropriate IAM roles for GCS bucket access from Databricks compute.

Databricks requirements

Databricks workspaces on GCP with Unity Catalog enabled.
Account admin or workspace admin access to create and configure a service principal.
Delta Sharing is enabled on the workspace for sharing curated tables with Unravel.
Databricks system tables are enabled for the workspace.
OAuth authentication is enabled for service principals.

Databricks service principal for Unravel

Create a dedicated Databricks service principal for Unravel, and use it for all API access and data queries. The following credentials are required.

Service principal client ID
Service principal OAuth secret
Databricks workspace URL (host)

Storage requirements on Google Cloud Storage

Configure two separate customer-managed GCS buckets.

Bucket	Purpose	Backing
Spark event logs	Stores Spark execution metadata (event logs, metrics, SQL plans).	Unity Catalog volume
Curated Unravel catalog	Stores curated system table data for Unravel queries.	Unity Catalog external location

These buckets must be isolated from other workloads, and neither bucket contains business data.

Network connectivity between Unravel, Databricks, and GCS

For production deployments, Unravel recommends private connectivity to Databricks.

Use Private Service Connect (PSC) for all Unravel access to Databricks REST, Unity Catalog files, and SQL APIs on the frontend connection.
Handle backend access from Databricks compute to GCS buckets through the existing Databricks backend connectivity for each workspace.
Do not configure separate VPC endpoints, because the workspace PSC and firewall rule configuration govern all access.
For maximum security, keep all traffic on private endpoints with no public egress.

Compliance and security posture

Unravel Data is SOC 2 compliant.
All software vulnerabilities are checked and validated by Black Duck before general availability certification.

Architecture of Unravel for Databricks on GCP

This section describes the architecture of Unravel for Databricks on GCP, including data flows, storage design, authentication, and the required permissions model.

Data flow for Databricks observability

Unravel collects operational metadata through three primary channels and does not access or store business data.

System table metadata

A scheduled Spark job (for example, every 15 minutes) reads raw Databricks system tables and curates them into an Unravel catalog as Delta tables, and the curated data contains no business data.
Unravel services access the curated tables by using Databricks Delta Sharing, with a Delta share configured on the Unravel catalog to provide secure, read-only access without requiring a dedicated SQL warehouse.
Unravel reads the shared tables from the curated GCS buckets through the Delta Sharing protocol.
Authentication and authorization: Grant the Unravel service principal USAGE on the unravel_curated catalog to provide least-privilege access with explicit grant and revoke permissions and an auditable trail.
Security posture: In production environments that use PSC, all Unravel data access stays on private endpoints with no public internet egress.

Spark event logs

Spark execution metadata (event logs, metrics, SQL plans) is stored in a customer-managed GCS bucket that backs a Unity Catalog volume and is strictly separated from other data.
Access to the Spark event logs bucket is governed by Unity Catalog privileges such as READ VOLUME on the dedicated log catalog and schema to enforce least privilege.
Security is based on physical log separation, auditable Unity Catalog access controls, and private network traffic.
Unravel reads these logs through Unity Catalog, parses the data, and ingests it into its own store for performance analysis.

Control plane metadata

Unravel uses OAuth 2.0 with the service principal to authenticate to Databricks and retrieve configuration and telemetry through the REST API and Unity Catalog file and SQL APIs.
The service principal is granted only the minimum required roles and permissions to follow the principle of least privilege and limit potential blast radius if credentials are compromised.

Storage architecture on Google Cloud Storage

GCS bucket isolation

These buckets are isolated and do not share storage.

Spark event logs: Customer-managed GCS bucket that backs a Unity Catalog volume.
Curated Unravel catalog: Separate customer-managed GCS bucket that backs a Unity Catalog external location.

Private connectivity for Databricks and GCS

All Unravel access to Databricks REST, Unity Catalog files, and SQL APIs uses PSC for the frontend connection.
Backend access from Databricks compute to all GCS buckets (Spark event logs, curated catalog, workspace buckets) uses the existing Databricks backend connectivity for each workspace.
No separate VPC endpoints are required because workspace PSC and firewall rules govern all access.
When PSC is configured, all traffic stays on private endpoints with no public egress.

User and authentication model

The following table describes the authentication model that Unravel uses for Databricks on GCP.

ID	Type	User	Authentication	Encryption
1	API	Databricks service principal	OAuth token for the Unravel service principal.	Private Service Connect
2	D2D	Databricks service principal	Unity Catalog-managed authentication.	Private Service Connect
3	UI	Customer user	SAML or SSO through an identity provider.	TLS over HTTPS

Create a separate Unravel service principal for each environment, such as staging and production, and do not reuse development or proof-of-value service principals.

Permissions model for the Unravel service principal

Grant the following Databricks ACLs to the Unravel service principal according to the principle of least privilege.

Permission	Type	Object	API	Purpose
CAN ATTACH TO	Cluster permission	Compute clusters	REST	Read cluster configurations and metadata for cost analysis and optimization.
CAN VIEW	Cluster permission	Compute clusters	REST	Read cluster ACLs to understand which users and groups can use a cluster.
CAN VIEW	Warehouse permission	SQL warehouses	REST	Access SQL warehouse configurations and performance metrics for cost analysis and monitoring.
CAN VIEW	Jobs permission	Jobs and workflows	REST	Read job configurations, execution history, and performance metrics.
CAN VIEW	Pipeline permission	Pipelines	REST	Get read-only access to Delta Live Tables pipeline configurations and status.
Groups (SCIM)	Identity or SCIM	Groups (principals)	REST	Retrieve workspace groups to map group names to group IDs.
CREATE RECIPIENT	UC privilege	Delta Sharing	SQL	Create a Delta Sharing recipient for Unravel access to curated tables.
CREATE SHARE	UC privilege	Delta Sharing	SQL	Create a Delta share on the Unravel curated catalog for sharing with Unravel.
USE CATALOG	UC privilege	unravel catalog	SQL	Access the Unravel catalog that contains curated system tables.
USE SCHEMA	UC privilege	unravel.monitoring	SQL	Access the monitoring schema that contains curated system tables.
SELECT	UC privilege	unravel.monitoring.*	SQL	Read all monitoring tables in the curated catalog.
READ VOLUME	UC privilege	unravel.monitoring	File	Access volumes that store Spark event logs shared with Unravel.

In this section:

Home