Home

Prerequisites
Platform
Hardware
  • Compute Engine GCE type: General-purpose:

    • Minimum: n2-standard-16 / n1-standard-16 (64 GiB RAM)

    • Maximum: n2-standard-64 / n1-standard-64(256 GiB RAM)

    • Recommended: n2-standard-32 / n1-standard-32 (128 GiB RAM)

    • Virtualization type: HVM

  • Root device type: Standard Persistent Disk / SSD persistent disks

  • Volume specifications:

    • Minimum: 200GiB.

      In a PoC or evaluation, the minimum root disk space should be sufficient.

    • When monitoring more Dataproc clusters or lots of jobs, we recommend a 300-500GB SSD persistent disks that can handle high rates of IOPS

    • For production use, we recommend 500GiB SSD persistent disks.

      The Baseline IOPS (3 IOPS per GiB with a minimum of 100 IOPS, burstable to 3000 IOPS) is sufficient for Unravel.

Note

Unravel Server doesn't require heavy resources, but it's best to check your Dataproc Quotas as you proceed.

Sizing

Important

You must have separate nodes for the Unravel server and for the external database.

MySQL Server

The minimum requirements for cores, RAM, and disk.

Software
  • Operating system: RedHat/CentOS 6.4 - 7.4

Network

The following ports must be open on the Unravel GCE. In addition, the Unravel GCE must be able to access all ports on the Dataproc cluster.

Settings related to IAM roles and firewall rules

In order to manage, monitor, and optimize the modern data applications running on your Dataproc cluster, Unravel needs data from the cluster as well as from apps running on the cluster. This data includes metrics, configuration information, and logs. Parts of this data is pushed to Unravel, and part of it is pulled by the daemons running on Unravel Server. In order for all data to be accessible, there must be both inbound and outbound access between Unravel Server (on the GCE) and the Dataproc cluster.

  • The Unravel Server must be in the same region as the target Dataproc clusters it is monitoring. There are two possible scenarios:

    • Both the Dataproc cluster and the Unravel server are created on the same VPC, same subnet; and the security group allows all traffic from the same subnet.

    • The Dataproc cluster is located on a different VPC than the Unravel server. In this case you must configure VPC peering, route table creation, and update the firewall policy.

  • The Unravel Server needs a TCP and UDP connection to the Dataproc master node. To implement this, do either of the following:

    • Create a firewall rule that allows port 3000 and port 4043 from the Dataproc cluster node's IP address. Configure the firewall rule on Unravel Server to allow TCP traffic on ports 3000 for Dataproc cluster nodes.

    • Put the member of the firewall rule used on the Dataproc cluster in this rule.

  • The Unravel Server and Dataproc clusters must allow all outbound traffic.

  • Dataproc cluster nodes must allow all traffic from Unravel Server. If you can't allow the Unravel server to access all traffic, you must minimally allow it to access the cluster nodes' TCP ports 9870, 9866, and 9867

Ports

Direction

Description

3000

Both

Non-HTTPS traffic to and from Unravel UI.

4043

In

UDP and TCP ingest traffic from the entire cluster to Unravel Servers.

Skill set

These instructions are self-contained and require only basic knowledge of GCP. You don't need to create any scripts or be familiar with any specific programming or scripting language.

These instructions assume you're proficient in:

  • Provisioning GCEs.

  • Creating and configuring the required IAM roles, firewall rules, etc.

  • Understanding GCP networking concepts such as virtual private clouds (VPCs) and subnets.

  • Running Ansible scripts, basic Unix commands, and AWS CLI commands.