Creating Azure storage
In Azure, storage is separate from compute. An HDI or Databricks cluster can use one or more storage accounts. You create a storage account and specify its storage format. Unravel supports the following Azure storage formats:
WASB | ADLS Gen 1 | ADLS Gen 2 or ABFS | |
---|---|---|---|
Description | Windows Azure Storage Blob (WASB or Blob) is a general-purpose storage format that uses a key-value store with a flat namespace. It has full support for:
| Azure Data Lake Storage Generation 1 (ADLS Gen 1) is a hierarchical file system. It has full support for:
| Azure Data Lake Storage Generation 2 (ADLS Gen 2 or ABFS) combines the features of WASB and ADLS Gen 1. |
Does Unravel support this format? | Yes | Yes | v4.5.2.0 onwards: Yes |
Does Unravel support encrypted access (SSL)? | Yes | Yes | v4.5.2.0 onwards: Yes |
Does Unravel support multiple storage accounts on a single Unravel VM? | Yes | v4.5.2.0 onwards: Yes | v4.5.2.0 onwards: Yes |
This topic explains how to create Azure storage for your HDI or Databricks cluster. Later, you'll tell Unravel about your storage account(s) and their storage format so that Unravel Server knows how/where to pull event logs and executor logs from the storage account (necessary for Spark on HDI; for other app types, logs are pushed to Unravel Server from its sensors).
The steps below assume that:
You have an Azure account.
You already have a resource group assigned to a region in order to group your policies, VMs, and storage blobs/lakes/drives.
A resource group is a container that holds related resources for an Azure solution. In Azure, you logically group related resources such as storage accounts, virtual networks, and virtual machines (VMs) to deploy, manage, and maintain them as a single entity.
You already have a virtual network for your resource group. This virtual network will be shared by your cluster and the Unravel VM.
Log in to the Azure portal.
Click Storage accounts | + Add.
On the Basics tab, enter values for the following fields:
Subscription: Select the subscription type.
Resource Group: Select the resource group to associate with this storage instance.
Storage Account Name: Enter a name, using lowercase letters and numbers.
Location: Select a data center region.
Performance: Select Standard or Premium:
Standard storage uses magnetic disks and is cheaper. Premium storage uses SSDs, so it has higher performance and is recommended for Spark and Kafka clusters.
Account kind: Select your storage format.
Replication: Select your desired replication to either be local, or always available in the same zone, region, or replicated geographically. See more choices in the Advanced section.
Locally redundant storage (LRS): Only handles failures within the data-center. Durability guarantee is 11 9's.
Zone-redundant storage (ZRS): Handles failures in the data-center and zone, but not region. Durability guarantee is 12 9's. Only supported on ADLS Gen 2.
Geo-redundant storage (GRS): Handles failures in the data-center, zone, and region, but does not allow read-access in another region in a failure scenario. Durability guarantee is 16 9's.
Read-access geo-redundant storage (RA-GRS) FIXLINK: Handles failures in the data-center, zone, region, and allows read-access in another region. Durability guarantee is 16 9's.
Access Tier: Only available for WASB storage and ADLS Gen 2. If you pick this option, select hot storage.
Click the Advanced tab.
Set Secure transfer required to Disabled or Enabled.
Note
Unravel doesn't support encryption (SSL) with WASB.
For Virtual Networks,select whether to allow traffic from all networks or only from within the virtual network and subnet(s) you specify.
Click Review + create.
If your settings are correct, click Create. To edit your settings, click Previous.
Resources
Comparison of WASB and ADLS Gen 1 |
Azure - creating a storage account |
Difference between replication types |