Create an EC2 instance
On your AWS Console, go to the EC2 dashboard and click Launch Instance.
Select the following options based on Unravel's instance requirements:
Base OS
Instance type and size
EC2 instance's security group / IAM role
The best practice is to create an IAM role that contains the policy that only reads the specific S3 bucket used on the EMR cluster, and then create an instance profile and add the IAM role to it.
Ports
Networking
The EC2 instance must be in the same region with the target EMR clusters which Unravel EC2 node will be monitoring.
Security groups or policies
Create an S3 ReadAccess only IAM role and assign it to Unravel EC2 node to read the archive logs on the S3 bucket configured for the EMR cluster.
Create TCP and UDP connections from the EMR master node to Unravel EC2 node.
Create a security group that allows port 3000 and port 4043 from EMR cluster nodes' IP address, and put the member of the security group used on EMR cluster in this rule.
Sample inbound rule Type
Protocol
Port range
Source
All traffic
All
All
Security group ID of this group or subnet IP block.
For example, 10.10.0.0/16
SSH
TCP
22
0.0.0.0/0 or trusted public IP for SSH access
Custom TCP Rule
TCP
3000
Security group ID used on the EMR cluster or subnet IP block (if the IP block belongs to a different VPC). Required for VPC peering connection.
Custom TCP Rule
TCP
4043
Security group ID used on the EMR cluster or subnet IP block (if the IP block belongs to a different VPC). Required for VPC peering connection.
Sample outbound rule Type
Protocol
Port range
Source
All traffic
All
All
0.0.0.0/0
Note
The Unravel EC2 node should have all TCP access to the EMR cluster (server/parent or worker) nodes. You can grant access by inserting a security policy into both security groups of the EMR server/parent and worker with all TCP, all port range. The source is the security group ID of the Unravel VM. For example
If it isn't possible to allow the Unravel EC2 access to all traffic to the EMR cluster, you must minimally allow it to access cluster nodes' TCP ports 8020, 50010, and 50020.