Skip to main content

Home

Prerequisites (Databricks)

Platform

Each version of Unravel has specific platform requirements. See ​Unravel's ​​Databricks​ compatibility matrix to confirm that your Databricks platform meets the requirement for the version of Unravel that you are installing.

Hardware
  • Azure instance type: Minimum: Standard_E8s_v3

  • EC2 instance type:

    • Recommended: r4.4xlarge (122 GiB RAM)

    • Virtualization type: HVM

Ports
GNU Compiler Collection (GCC)

GNU Compiler Collection (GCC) version 4.9.3, which consists of compilers and libraries for C, C++, etc., should be installed on the Unravel node for Cost > Budget estimation to function. Refer to Install GNU Compiler Collection (GCC) version 4.9.3Installing GNU Compiler Collection (GCC)

The following items are only Databricks Azure-specific prerequisites:

Permissions
  • You must already have an Azure account.

  • You must already have a resource group assigned to a region to group your policies, VMs, and storage blobs/lakes/drives.

    A resource group is a container that holds related resources for an Azure solution. In Azure, you logically group related resources such as storage accounts, virtual networks, and virtual machines (VMs) to deploy, manage, and maintain them as a single entity.

  • You must have root privilege to run commands on the VM.

Network

Unravel recommends deploying Azure Databricks workspaces in secure cluster mode (NPIP) with virtual network (VNET) injection. Such a deployment provides better controls on network and security, especially if you want to lock the workspace egress IP addresses.

The expected traffic between Azure Databricks and the Unravel server is as follows:

  • Inbound - Azure Databricks workspace egress IP addresses.

  • Outbound - Azure Databricks Access IP addresses

If you are concerned about locking down the inbound traffic on the Unravel server (same as the egress traffic on the Azure Databricks workspaces), you can consider the following options:

  • Your virtual network and subnet(s) must be big enough to be shared by the Unravel VM and the target Databricks cluster(s).

  • You can use an existing virtual network or create a new one, but the virtual network must be in the same region and same subscription as the Azure Databricks workspace that you plan to create.

  • A CIDR range between /16 - /24 is required for the virtual network.

  • There are two options to enable the communication between the Unravel server and the Databricks Data Plane:

    • Assign a public IP address to the Unravel Azure VM and open port 4043 for non-SSL and port 4443 for unsecured SSL.

    • Assign Unravel server No Public IP (NPIP) address, so that Unravel sensors installed on Databricks Data Plane can communicate (one-way) with the Unravel server via VNET peering or Virtual WAN.

      You must ensure that there is no overlap in VNET IP ranges and that the traffic is private.

    Assign a public IP address to the Unravel Azure VM and open port 4043 for non-SSL and 4443 for unsecured SSL.

  • Allow inbound SSH connections to the Unravel VM.

  • You must allow outbound Internet access and all traffic within the subnet (VSNET).

  • Azure IP ranges and service tags for Public Cloud can be found here.