Requirements

Home

Requirements

To deploy Unravel, first ensure that your environment meets these requirements:

Platform

Each version of Unravel has specific platform requirements. Check the compatibility matrix to confirm that your cluster meets the requirements for the version of Unravel that you are installing.

Host

Single cluster
In a single cluster deployment of Unravel, the following requirements should be fulfilled for an independent host:
- Be managed by Ambari/Cloudera.
- Must have Hadoop clients pre-installed.
- Must have no other Hadoop service or third-party applications installed.
- Accessible to only Hadoop and Unravel Admins.
Multi-cluster
In a multi-cluster deployment of Unravel, the following requirements should be fulfilled for the host on the core node and edge node:
- Core node
  - Accessible to Unravel Admins.
  - The server should be dedicated only to Unravel. Must have no other Hadoop service or third-party applications installed.
- Edge node
  - Be managed by Ambari/Cloudera.
  - Must have Hadoop clients pre-installed.
  - Must have no other Hadoop service or third-party applications installed.
  - Accessible to only Hadoop and Unravel Admins.

Minimum requirements to install Unravel:

Cores: 8
RAM: 96 GB

The following table lists the minimum requirements for cores, RAM, and disks for a typical environment with default data retention settings.

Jobs per day	Cores	RAM	Software	Data
Less than 50,000	8	96 GB	8 GB free	500 GB free
50,000 to 100,000 to	16	128 GB	8 GB free	1000 GB free
Over 100,000	Contact Unravel Support
Notice * In case specific features such as Kafka and HBase monitoring are enabled, the memory requirements will increase and vary from the above table.

Data includes Elasticsearch (ES) and the bundled database.

Note

In production environments, you can keep the Unravel software and Data directory on separate disks. Putting the Data directory on a separate high spin HDD with its own SATAIII (or equivalent) bus significantly increases IO bandwidth.

Architecture: x86_64
vm.max_map_count is set to 262144

Software requirements

Single cluster
- All default clients such as YARN, HDFS etc. are running.
- PATH includes the path to the HDFS+Hive+YARN+Spark client/gateway, Hadoop commands, and Hive commands.
- Clock synchronization service (such as NTP) is running and in-sync with the cluster.
Multi-cluster
In case of a multi-cluster deployment, confirm the following requirements for core node and edge node respectively:
- Core node
  - Clock synchronization service (such as NTP) is running and in-sync with the cluster.
- Edge node
  - All default clients such as YARN, HDFS etc. are running.
  - PATH includes the path to the HDFS+Hive+YARN+Spark client/gateway, Hadoop commands, and Hive commands.
  - Clock synchronization service (such as NTP) is running and in-sync with the cluster.

Permissions

Create an Installation directory and grant ownership of the directory to the user who installs Unravel. This user executes all the processes involved in Unravel installation.
If you are using Kerberos, you must create a principal and keytab for Unravel daemons to use.
Unravel must have read access to these HDFS resources:
- MapReduce logs (hdfs://user/history)
- YARN's log aggregation directory (hdfs://tmp/logs)
- Spark and Spark2 event logs (hdfs://user/spark/applicationHistory and hdfs://user/spark/spark2ApplicationHistory)
- File and partition sizes in the Hive warehouse directory (typically hdfs://apps/hive/warehouse)
Unravel needs access to the YARN Resource Manager's REST API.
Unravel needs read-only access to the database used by the Hive metastore.
If you plan to use Unravel's move or kill AutoActions, the Unravel username needs to be added to YARN's yarn.admin.acl property.
Unravel needs read-only access to the Application Timeline Server (ATS).
If you're using Impala, Unravel needs access to the Cloudera Manager API. Read-only access is sufficient.

Network

Note

All the Unravel ports can be customized. Refer to Configuring custom ports.

On the new node, open the following ports.

Port(s)	Direction	Description
3000	Both	Traffic to and from Unravel UI
3316	Both	Database traffic
4020	Both	Unravel APIs
4021	Both	Host monitoring of JMX on `localhost`
4031	Both	Database traffic
4043	In	UDP and TCP ingest traffic from the entire cluster to Unravel Server(s)
4044-4049	In	UDP and TCP ingest spares for `unravel_lr*`
4091-4099	Both	Kafka brokers
4171-4174, 4176-4179	Both	ElasticSearch; localhost communication between Unravel daemons or Unravel Servers in a multi-host deployment
4181-4189	Both	Zookeeper daemons
4210	Both	Cluster access service
HDFS ports	Both	Traffic to/from the cluster to Unravel Server(s)
Hive metadata database port	Out	For YARN only. Traffic from Hive to Unravel Server(s) for partition reporting.
8088	Out	Traffic from Unravel Server(s) to the Resource Manager API
8188	Out	Traffic from Unravel Server(s) to the ATS server(s)
11000	Out	For Oozie only. Traffic from Unravel Server(s) to the Oozie server

CDH-specific port requirements

Port(s)	Direction	Description
3000	Both	Traffic to and from Unravel UI If you plan to use Cloudera Manager to install Unravel's sensors, the Cloudera Manager service must also be able to reach the Unravel host on port 3000.
7180 (or 7183 for HTTPS)	Out	Traffic from Unravel Server(s) to Cloudera Manager

Port(s)

Direction

Description

3000

Both

Traffic to and from Unravel UI

If you plan to use Cloudera Manager to install Unravel's sensors, the Cloudera Manager service must also be able to reach the Unravel host on port 3000.

7180 (or 7183 for HTTPS)

Out

Traffic from Unravel Server(s) to Cloudera Manager

HDFS Ports

For HDFS, access to the NameNode and DataNode should be provided. The default value for NameNode is 8020 and that of DataNode is 9866 and 9867. However, these can be configured to any other ports.

Services	Default port	Direction	Description
NameNode	8020	Both	Traffic to/from the cluster to Unravel servers.
DataNode	9866,9867	Both	Traffic to/from the cluster to Unravel servers.

In this section:

Would you like to provide feedback? Just click here to suggest edits.

Home

Requirements

Platform

Host

Notice

Note

Software requirements

Permissions

Network

Note

CDH-specific port requirements

HDFS Ports

Search results