Prerequisites

Installation prerequisites

Complete the following prerequisites before installing Unravel.

Platform

Each version of Unravel has specific platform requirements. Check the Compatibility Matrix to confirm your Google Cloud platform meets the requirements for the Unravel version you are installing.

Hardware

Compute Engine GCE type: General-purpose:
- Minimum: n2-standard-16 / n1-standard-16 (64 GiB RAM)
- Maximum: n2-standard-64 / n1-standard-64(256 GiB RAM)
- Recommended: n2-standard-32 / n1-standard-32 (128 GiB RAM)
- Virtualization type: HVM
Root device type: Standard Persistent Disk / SSD persistent disks
Volume specifications:
- Minimum: 200GiB.
  In a PoC or evaluation, the minimum root disk space should be sufficient.
- When monitoring more BigQuery clusters or lots of jobs, we recommend a 300-500GB SSD persistent disks that can handle high rates of IOPS
- For production use, we recommend 500GiB SSD persistent disks.
  The Baseline IOPS (3 IOPS per GiB with a minimum of 100 IOPS, burstable to 3000 IOPS) is sufficient for Unravel.

Note

Unravel Server does not require heavy resources, but it's best to check your BigQuery Quotas as you proceed.

Sizing

Important

You must have separate nodes for the Unravel server and the external database.

Must not have any third-party applications installed.
Minimum requirements to install Unravel:
- Cores: 8
- RAM: 64 GB
The following table lists the minimum requirements for cores, RAM, and disks for a typical environment with default data retention and lookback settings.
Data includes Elasticsearch (ES) and the bundled database.
Note
In production environments, you can keep the Unravel software and Data directory on separate disks. Putting the Data directory on a separate high spin HDD with its own SATAIII (or equivalent) bus significantly increases IO bandwidth.
Architecture: x86_64
vm.max_map_count is set to 262144

MySQL Server

The minimum requirements for cores, RAM, and disk.

Jobs per day	Data retention length	Cores	RAM	Disk
Less than 50,000	30 days	4	32 GB	1 TB
60 days	4	32 GB	2 TB
50,000 to 100,000 to	30 days	8	64 GB	2 TB
60 days	8	64 GB	4 TB
Over 100,000	Contact Unravel Support.

Software

Operating system: RedHat/CentOS 6.4 - 7.4

Network

The following ports must be open on the Unravel GCE. In addition, the Unravel GCE must be able to access all ports on the BigQuery cluster.

Ports	Direction	Description
3000	Both	HTTPS traffic to and from Unravel UI.
4043	In	UDP and TCP ingest traffic from the entire cluster to Unravel Servers.

Settings related to IAM roles and firewall rules

To manage, monitor, and optimize the modern data applications running on your BigQuery cluster, Unravel needs data from the cluster as well as from apps running on the cluster. This data includes metrics, configuration information, and logs. Parts of this data are pushed to Unravel, and part is pulled by the daemons running on the Unravel server. For all the data to be accessible, there must be both inbound and outbound access between the Unravel server (on the GCE) and the BigQuery cluster.

The Unravel server must be in the same region as the target BigQuery clusters it is monitoring. There are two possible scenarios:
- Both the BigQuery cluster and the Unravel server are created on the same VPC, same subnet; and the security group allows all traffic from the same subnet.
- The BigQuery cluster is located on a different VPC than the Unravel server. In this case, you must configure VPC peering, route table creation, and update the firewall policy.
The Unravel server needs a TCP and UDP connection to the BigQuery master node. To implement this, do either of the following:
- Create a firewall rule that allows port 3000 and port 4043 from the BigQuery cluster node's IP address. Configure the firewall rule on Unravel Server to allow TCP traffic on ports 3000 for BigQuery cluster nodes.
- Put the member of the firewall rule used on the BigQuery cluster in this rule.
The Unravel server and BigQuery clusters must allow all outbound traffic.

Skill set

These instructions are self-contained and require only basic knowledge of GCP. You don't need to create any scripts or be familiar with any specific programming or scripting language.

These instructions assume you're proficient in:

Provisioning GCEs.
Creating and configuring the required IAM roles, firewall rules, etc.
Understanding GCP networking concepts such as virtual private clouds (VPCs) and subnets.
Running Ansible scripts, basic Unix commands, and AWS CLI commands.

BigQuery project configuration prerequisites

Ensure to have the following handy before you add the projects:

Google account with required IAM role permissions for gcloud CLI authentication
IAM role permissions for gcloud CLI authentication
The Google account used to authenticate with gcloud CLI must have the following permissions to create and delete resources on GCP for tracking BigQuery. These permissions fit into the security principle of least privilege , which means that these are the only permissions required to get the job done. If you do not have the following permissions, the Terraform script will not run properly.
Important
The IAM role permissions should be for every project the Google account user wants to integrate with Unravel.
iam.roles.create
iam.roles.delete
iam.roles.get
iam.roles.undelete
iam.roles.update
iam.serviceAccountKeys.create
iam.serviceAccountKeys.delete
iam.serviceAccountKeys.get
iam.serviceAccounts.create
iam.serviceAccounts.delete
iam.serviceAccounts.get
iam.serviceAccounts.getIamPolicy
iam.serviceAccounts.setIamPolicy
logging.sinks.create
logging.sinks.delete
logging.sinks.get
pubsub.subscriptions.create
pubsub.subscriptions.delete
pubsub.subscriptions.get
pubsub.topics.attachSubscription
pubsub.topics.create
pubsub.topics.delete
pubsub.topics.get
pubsub.topics.getIamPolicy
pubsub.topics.setIamPolicy
resourcemanager.projects.get
resourcemanager.projects.getIamPolicy
resourcemanager.projects.setIamPolicy
Execute the following commands from the Google Cloud shell to automate the creation of a role and bind it to the user.
Run the following to create the role.
gcloud iam roles create UnravelGoogleAuthRole --organization=<your-org-id-in-gcp> --title=UnravelGoogleAuthRole --description="Creation of role in GCP on an organization level for automated integrating bigquery projects with Unravel" --permissions=iam.roles.create,iam.roles.delete,iam.roles.get,iam.roles.undelete,iam.roles.update,iam.serviceAccountKeys.create,iam.serviceAccountKeys.delete,iam.serviceAccountKeys.get,iam.serviceAccounts.create,iam.serviceAccounts.delete,iam.serviceAccounts.get,iam.serviceAccounts.getIamPolicy,iam.serviceAccounts.setIamPolicy,logging.sinks.create,logging.sinks.delete,logging.sinks.get,pubsub.subscriptions.create,pubsub.subscriptions.delete,pubsub.subscriptions.get,pubsub.topics.attachSubscription,pubsub.topics.create,pubsub.topics.delete,pubsub.topics.get,pubsub.topics.getIamPolicy,pubsub.topics.setIamPolicy
For example:
gcloud iam roles create UnravelGoogleAuthRole --organization=592556919173 --title=UnravelGoogleAuthRole --description="Creation of role in GCP on an organization level for automated integrating bigquery projects with Unravel" --permissions=iam.roles.create,iam.roles.delete,iam.roles.get,iam.roles.undelete,iam.roles.update,iam.serviceAccountKeys.create,iam.serviceAccountKeys.delete,iam.serviceAccountKeys.get,iam.serviceAccounts.create,iam.serviceAccounts.delete,iam.serviceAccounts.get,iam.serviceAccounts.getIamPolicy,iam.serviceAccounts.setIamPolicy,logging.sinks.create,logging.sinks.delete,logging.sinks.get,pubsub.subscriptions.create,pubsub.subscriptions.delete,pubsub.subscriptions.get,pubsub.topics.attachSubscription,pubsub.topics.create,pubsub.topics.delete,pubsub.topics.get,pubsub.topics.getIamPolicy,pubsub.topics.setIamPolicy,resourcemanager.projects.get,resourcemanager.projects.getIamPolicy,resourcemanager.projects.setIamPolicy
Bind the role to a user.
gcloud organizations add-iam-policy-binding <your-org-id-in-gcp> --member='user:<user account>' --role='organizations/<unravel-organization-ID>/roles/UnravelGoogleAuthRole'
For example:
gcloud organizations add-iam-policy-binding 592556919173 --member='user:user@unraveldata.com' --role='organizations/592556919173/roles/UnravelGoogleAuthRole'
Project ID file in case you are integrating multiple projects at a time with Unravel.
Project ID text file
You can configure multiple projects with customer-supplied credentials. To implement this, you must create a text file containing the list of project IDs and provide the path of this file during integration. Each line in the text file must carry a single Project ID.
For example:
```
project-id-011
project-id-012
project-id-013
project-id-014
project-id-015
project-id-016
project-id-017
project-id-018
project-id-019
project-id-020
```
Log Receiver (LR) endpoint (only for Push method)
This is required only when the push method is configured to fetch data from bigquery.
Subscription ID (Optional)
Tip
The Subscription ID is not required in case of INFORMATION-SCHEMA based data polling.
The subscriber ID that you want to configure with a pub/sub topic. This is optional. If this is not provided, then the default subscription ID unravel-bigquery-sub is considered.
Note
If you have created the resources on GCP, then the Subscription ID is mandatory.

In this section:

Home

Prerequisites

Installation prerequisites

Platform

Hardware

Note

Sizing

Important

Note

Software

Network

Skill set

BigQuery project configuration prerequisites

Important

Tip

Note

Search results