Home

Prerequisites (Amazon EMR)
Platform
Hardware
  • EC2 instance type:

    • Minimum: r4.2xlarge (61 GiB RAM)

    • Maximum: r4.8xlarge (244 GiB RAM)

    • Recommended: r4.4xlarge (122 GiB RAM)

    • Virtualization type: HVM

  • Root device type: EBS

  • EBS volume specifications:

    • Minimum: 100GiB.

      In a PoC or evaluation, the minimum root disk space should be sufficient.

    • When monitoring more EMR clusters or lots of jobs, we recommend a 300-500GB Provisioned IOPS SSD (io1) volume with 3000 IOPS.

    • For production use, we recommend a 200GB Provisioned IOPS EBS and RDS volume.

      The Baseline IOPS (3 IOPS per GiB with a minimum of 100 IOPS, burstable to 3000 IOPS) is sufficient for Unravel

  • (Optional) RDS specifications:

    • DB instance class: db.r3.xlarge (4 vCPU, 30.5 GiB RAM)

    • Storage type: Provisioned IOPS (SSD)

    • Allocated storage: 200 GiB or above

    • Provisioned IOPS: 1000

    Also, refer to setting up Amazon RDS.

Note

Unravel Server does not require heavy resources, but it's best to check your AWS Service Limits as you proceed. For example, if you provision an Unravel EC2 instance from our CloudFormation template, check Virtual Private Cloud (Amazon VPC) Limits.

Sizing

Important

You must have separate nodes for the Unravel server and for the external database.

MySQL Server

The minimum requirements for cores, RAM, and disk.

Access permissions

The Unravel EC2 instance must have read permission on the S3 bucket used by EMR clusters.

  • You need an AWS account. You must be able to connect to AWS for the deployment process.

  • Create an S3 ReadAccess only IAM role and assign it to Unravel Server to READ the archive logs on the S3 bucket configured for the EMR cluster. In other words, create an IAM role that contains the policy that can only READ the specific S3 bucket used on the EMR cluster; then, create an EC2 instance profile and add the IAM role to it.

  • AWS Permissions and Access

    You must have permission to:

    • Create EC2 instances

    • Connect to EC2 instances

    • Install software on EC2 instances (you must have root access or "sudo root" permission in order to install the Unravel Server RPM)

    • Create security groups and IAM roles

    • Update IAM roles for the EMR cluster and the corresponding S3 storage

    • If you want to deploy Unravel for a new EMR cluster, you also need AWS permissions to create an EMR cluster and necessary S3 buckets, create and configure VPCs, etc.

Network

The following ports must be open on the Unravel EC2 instance. In addition, the Unravel EC2 instance must be able to access all ports on the EMR cluster.

Settings related to IAM roles and security groups

To manage, monitor, and optimize the modern data applications running on your EMR cluster, Unravel needs data from the cluster as well as from apps running on the cluster. This data includes metrics, configuration information, and logs. Parts of this data are pushed to Unravel, and part of it is pulled by the daemons running on Unravel Server. For all the accessibility to all the data, there must be both inbound and outbound access between Unravel Server (on the EC2 instance) and the EMR cluster.

  • The Unravel Server must be in the same region as the target EMR cluster(s) it will be monitoring. There are two possible scenarios:

  • The Unravel Server needs a TCP and UDP connection to the EMR master node. To implement this, do either of the following:

    • Create a security group that allows port 3000 and port 4043 from the EMR cluster node's IP address. Configure the security group on Unravel Server to allow TCP traffic on ports 3000 for EMR cluster nodes.

    • Put the member of the security group used on the EMR cluster in this rule.

  • The Unravel Server and EMR cluster(s) must allow all outbound traffic.

  • EMR cluster nodes must allow all traffic from Unravel Server. If you can't allow Unravel Server to access all traffic, you must minimally allow Unravel Server to access cluster nodes' TCP port 8020, 50010, and 50020.

Port(s)

Direction

Description

3000

Both

Non- HTTPS traffic to and from Unravel UI

4043

In

UDP and TCP ingest traffic from the entire cluster to Unravel Server(s)

Skillset

These instructions assume that you proficient in:

  • Provisioning EC2 instances and RDS instances

  • Creating and configuring the required IAM roles, security groups, and so on

  • Understanding AWS networking concepts such as virtual private clouds (VPCs), subnets, and so on

  • Running Ansible scripts, basic Unix commands, and AWS CLI commands

These instructions are self-contained and require only basic knowledge of AWS. Expert-level knowledge of AWS is not required.