Home

Prerequisites
Platform
Hardware
  • EC2 instance type:

    • Minimum: r4.2xlarge (61 GiB RAM)

    • Maximum: r4.8xlarge (244 GiB RAM)

    • Recommended: r4.4xlarge (122 GiB RAM)

    • Virtualization type: HVM

  • Root device type: EBS

  • EBS volume specifications:

    • Minimum: 100GiB.

      In a PoC or evaluation, the minimum root disk space should be sufficient.

    • When monitoring more EMR clusters or lots of jobs, we recommend a 300-500GB Provisioned IOPS SSD (io1) volume with 3000 IOPS.

    • For production use, we recommend a 200GB Provisioned IOPS EBS and RDS volume.

      The Baseline IOPS (3 IOPS per GiB with a minimum of 100 IOPS, burstable to 3000 IOPS) is sufficient for Unravel

  • (Optional) RDS specifications:

    • DB instance class: db.r3.xlarge (4 vCPU, 30.5 GiB RAM)

    • Storage type: Provisioned IOPS (SSD)

    • Allocated storage: 200 GiB or above

    • Provisioned IOPS: 1000

Note

Unravel Server doesn't require a heavy resources, but it's best to check your AWS Service Limits as you proceed. For example, if you provision an Unravel EC2 instance from our CloudFormation template, check Virtual Private Cloud (Amazon VPC) Limits.

Sizing

Important

You must have separate nodes for the Unravel server and for the external database.

  • Unravel Server

    The minimum requirements for cores, RAM, and directories for a typical environment with default data retention and lookback settings.

    /usr/local/unravel is the storage location for Unravel binaries. /srv/unravel is used for Elasticsearch (ES) and the bundled database.

    Root device type recommended: EBS - Provisioned IOPS SSD (io1).

  • MySQL Server

    The minimum requirements for cores, RAM, and disk.

Access permissions

The Unravel EC2 instance must have read permission on the S3 bucket used by EMR clusters.

  • You need an AWS account. You must be able to connect to AWS for the deployment process.

  • Create an S3 ReadAccess only IAM role and assign it to Unravel Server to READ the archive logs on the S3 bucket configured for the EMR cluster. In other words, create an IAM role that contains the policy that can only READ the specific S3 bucket used on the EMR cluster; then, create an EC2 instance profile and add the IAM role to it.

  • AWS Permissions and Access:

    You must have permission to:

    • Create EC2 instances

    • Connect to EC2 instances

    • Install software on EC2 instances (you must have root access or "sudo root" permission in order to install the Unravel Server RPM)

    • Create security groups and IAM roles

    • Update IAM roles for the EMR cluster and the corresponding S3 storage

    • If you want to deploy Unravel for a new EMR cluster, you also need AWS permissions to create an EMR cluster and necessary S3 buckets, create and configure VPCs, etc.

Network

The following ports must be open on the Unravel EC2 instance. In addition, the Unravel EC2 instance must be able to access all ports on the EMR cluster.

Settings related to IAM roles and security groups

In order to manage, monitor, and optimize the modern data applications running on your EMR cluster, Unravel needs data from the cluster as well as from apps running on the cluster. This data includes metrics, configuration information, and logs. Parts of this data is pushed to Unravel, and part of it is pulled by the daemons running on Unravel Server. In order for all data to be accessible, there must be both inbound and outbound access between Unravel Server (on the EC2 instance) and the EMR cluster.

  • The Unravel Server must be in the same region as the target EMR cluster(s) it will be monitoring. There are two possible scenarios:

  • The Unravel Server needs a TCP and UDP connection to the EMR master node. To implement this, do either of the following:

    • Create a security group that allows port 3000 and port 4043 from the EMR cluster node's IP address. Configure the security group on Unravel Server to allow TCP traffic on ports 3000 for EMR cluster nodes.

    • Put the member of security group used on the EMR cluster in this rule.

  • The Unravel Server and EMR cluster(s) must allow all outbound traffic.

  • EMR cluster nodes must allow all traffic from Unravel Server. If you can't allow Unravel Server to access all traffic, you must minimally allow Unravel Server to access cluster nodes' TCP port 8020, 50010, and 50020.

Port(s)

Direction

Description

3000

Both

Non- HTTPS traffic to and from Unravel UI

4043

In

UDP and TCP ingest traffic from the entire cluster to Unravel Server(s)

Skill set

These instructions assume you're proficient in:

  • Provisioning EC2 instances and RDS instances

  • Creating and configuring the required IAM roles, security groups, and so on

  • Understanding AWS networking concepts such as virtual private clouds (VPCs), subnets, and so on

  • Running Ansible scripts, basic Unix commands, and AWS CLI commands

You don't need to create any scripts or be familiar with any specific programming/scripting language. These instructions are self-contained, and require only basic knowledge of AWS. Expert-level knowledge of AWS is not required.