Prerequisites

Home

Prerequisites

To deploy Unravel, first ensure that your environment meets these requirements.

Important

You must use an independent host for the Unravel server.

This host must:

Be managed by Cloudera.
Have Hadoop clients pre-installed.
Have no other Hadoop service or third-party applications installed.
Accessible to only Hadoop and Unravel Admins.

Platform

Each version of Unravel has specific platform requirements. Check Unravel's CDH compatibility matrix to confirm that your cluster meets the requirements for the version of Unravel that you're installing. Your CDH environment must be running Cloudera Manager (CM).

Sizing

Important

You must have separate nodes for the Unravel server and the external MySQL database.

Unravel Server

Architecture: x86_64
vm.max_map_count is set to 262144
Minimum requirements for cores, RAM, and disks:
The table below lists the minimum requirements for cores, RAM, and disks for a typical environment with default data retention and lookback settings.
/usr/local/unravel is the storage location for Unravel binaries. /srv/unravel is used for Elasticsearch (ES) and the bundled database.
In production environments, put /usr/local/unravel and /srv/unravel on separate disks. Putting /srv/unravel on a separate high spin HDD with its own SATAIII (or equivalent) bus significantly increases IO bandwidth.
If /usr/local/unravel or /srv/unravel doesn't have the minimum free space shown in the table below, create symbolic links for them to another disk. To check the space on a volume use the df command. For example,
```
df -h /srv
```
Jobs per day
Cores
RAM
/usr/local/unravel
/srv/unravel
Less than
50,000
8
96 GB
8 GB free
500 GB free
50,000 to
100,000 to
8
128GB
8 GB free
500 GB free
Over 100,000
Contact Unravel Support
All volumes are mounted.
/tmp is mounted with executable permissions. To re-mount /tmp with executable permissions use the following command:
```
mount -o remount,exec /tmp
```

Jobs per day	Cores	RAM	/usr/local/unravel	/srv/unravel
Less than 50,000	8	96 GB	8 GB free	500 GB free
50,000 to 100,000 to	8	128GB	8 GB free	500 GB free
Over 100,000	Contact Unravel Support

MySQL Server

Minimum requirements for cores, RAM, and disk.
Jobs per day
Data retention
length
Cores
RAM
Disk
Less than
50,000
30 days
4
32 GB
1 TB
60 days
4
32 GB
2 TB
50,000 to
100,000 to
30 days
8
64 GB
2 TB
60 days
8
64 GB
4 TB
Over 100,000
Contact Unravel Support.

Jobs per day	Data retention length	Cores	RAM	Disk
Less than 50,000	30 days	4	32 GB	1 TB
60 days	4	32 GB	2 TB
50,000 to 100,000 to	30 days	8	64 GB	2 TB
60 days	8	64 GB	4 TB
Over 100,000	Contact Unravel Support.

Software

If the Unravel host is running Red Hat Enterprise Linux (RHEL) 6.x, set its bootstrap.system_call_filter to false in elasticsearch.yml:
```
bootstrap.system_call_filter: false
```
libaio.x86_64 is installed.
If you're installing Unravel version 4.5.0.0, set SELINUX to permissive or disabled in /etc/sysconfig/selinux.
If you're installing Unravel version 4.5.0.1+, SELINUX can be set to enabled.
PATH includes the path to the HDFS+Hive+YARN+Spark client/gateway, Hadoop commands, and Hive commands.
If Spark2 service is installed, the Unravel host should be a client/gateway.
Zookeeper is not installed on the same host as the Unravel host.
NTP is running and in-sync with the cluster.

Permissions

Tip

The installation creates a local user unravel:unravel, but you can change this later.

You must have root access or "sudo root" permission in order to install the Unravel Server RPM.
If you're using Kerberos, we'll explain how to create a principal and keytab for Unravel daemons to use to access these HDFS resources:
- MapReduce logs (hdfs://user/history)
- YARN's log aggregation directory (hdfs://tmp/logs)
- Spark and Spark2 event logs (hdfs://user/spark/applicationHistory and hdfs://user/spark/spark2ApplicationHistory)
- File and partition sizes in the Hive warehouse directory (typically hdfs://apps/hive/warehouse)
Unravel needs access to the YARN Resource Manager's REST API (so that the principal can determine which resource manager is active).
Unravel needs access to the JDBC access to the Hive Metastore. Read-only access is sufficient.
If you're using Impala, Unravel needs access to the Cloudera Manager API. Read-only access is sufficient.

Network

On the new node, open the following ports:

Port(s)	Direction	Description
3000	Both	Traffic to and from Unravel UI
3316	Both	Database traffic
4020	Both	Unravel APIs
4021	Both	Host monitoring of JMX on `localhost`
4031	Both	Database traffic
4043	In	UDP and TCP ingest traffic from the entire cluster to Unravel Server(s)
4044-4049	In	UDP and TCP ingest spares for `unravel_lr*`
4091-4099	Both	Kafka brokers
4171-4174, 4176-4179	Both	ElasticSearch; localhost communication between Unravel daemons or Unravel Servers in a multi-host deployment
4181-4189	Both	Zookeeper daemons
4210	Both	Cluster access service
HDFS ports	Both	Traffic to/from the cluster to Unravel Server(s)
Hive metadata database port	Out	For YARN only. Traffic from Hive to Unravel Server(s) for partition reporting
8088	Out	Traffic from Unravel Server(s) to the Resource Manager API
8188	Out	Traffic from Unravel Server(s) to the ATS server(s)
11000	Out	For Oozie only. Traffic from Unravel Server(s) to the Oozie server

CDH-specific port requirements

Port(s)	Direction	Description
3000	Both	Traffic to and from Unravel UI If you plan to use Cloudera Manager to install Unravel's sensors, the Cloudera Manager service must also be able to reach the Unravel host on port 3000.
7180 (or 7183 for HTTPS)	Out	Traffic from Unravel Server(s) to Cloudera Manager

Port(s)

Direction

Description

3000

Both

Traffic to and from Unravel UI

If you plan to use Cloudera Manager to install Unravel's sensors, the Cloudera Manager service must also be able to reach the Unravel host on port 3000.

7180 (or 7183 for HTTPS)

Out

Traffic from Unravel Server(s) to Cloudera Manager

HDFS Ports

For HDFS, the access to the NameNode and DataNode should be provided. The default value for NameNode is 8020 and that of DataNode is 9866 and 9867. However, these can be configured to any other ports.

Services	Default port	Direction	Description
NameNode	8020	Both	Traffic to/from the cluster to Unravel servers.
DataNode	9866,9867	Both	Traffic to/from the cluster to Unravel servers.

In this section:

Would you like to provide feedback? Just click here to suggest edits.

Home

Prerequisites

Important

Platform

Sizing

Important

Software

Permissions

Tip

Network

Search results