Requirements
To deploy Unravel, first ensure that your environment meets these requirements:
Platform
Each version of Unravel has specific platform requirements. Check the compatibility matrix to confirm that your cluster meets the requirements for the version of Unravel that you are installing.
Host
Single cluster
In a single cluster deployment of Unravel, the following requirements should be fulfilled for an independent host:
Be managed by Ambari/Cloudera.
Must have Hadoop clients pre-installed.
Must have no other Hadoop service or third-party applications installed.
Accessible to only Hadoop and Unravel Admins.
Multi-cluster
In a multi-cluster deployment of Unravel, the following requirements should be fulfilled for the host on the core node and edge node:
Core node
Accessible to Unravel Admins.
The server should be dedicated only to Unravel. Must have no other Hadoop service or third-party applications installed.
Edge node
Be managed by Ambari/Cloudera.
Must have Hadoop clients pre-installed.
Must have no other Hadoop service or third-party applications installed.
Accessible to only Hadoop and Unravel Admins.
Software requirements
Single cluster
All default clients such as YARN, HDFS etc. are running.
PATH
includes the path to the HDFS+Hive+YARN+Spark client/gateway, Hadoop commands, and Hive commands.Clock synchronization service (such as NTP) is running and in-sync with the cluster.
Multi-cluster
In case of a multi-cluster deployment, confirm the following requirements for core node and edge node respectively:
Core node
Clock synchronization service (such as NTP) is running and in-sync with the cluster.
Edge node
All default clients such as YARN, HDFS etc. are running.
PATH
includes the path to the HDFS+Hive+YARN+Spark client/gateway, Hadoop commands, and Hive commands.Clock synchronization service (such as NTP) is running and in-sync with the cluster.
Permissions
Create an Installation directory and grant ownership of the directory to the user who installs Unravel. This user executes all the processes involved in Unravel installation.
If you are using Kerberos, you must create a principal and keytab for Unravel daemons to use.
Unravel must have read access to these HDFS resources:
MapReduce logs (
hdfs://user/history
)YARN's log aggregation directory (
hdfs://tmp/logs
)Spark and Spark2 event logs (
hdfs://user/spark/applicationHistory
andhdfs://user/spark/spark2ApplicationHistory
)File and partition sizes in the Hive warehouse directory (typically
hdfs://apps/hive/warehouse
)
Unravel needs access to the YARN Resource Manager's REST API.
Unravel needs read-only access to the database used by the Hive metastore.
If you plan to use Unravel's move or kill AutoActions, the Unravel username needs to be added to YARN's yarn.admin.acl property.
Unravel needs read-only access to the Application Timeline Server (ATS).
If you're using Impala, Unravel needs access to the Cloudera Manager API. Read-only access is sufficient.
Network
Note
All the Unravel ports can be customized. Refer to Configuring custom ports.
CDH-specific port requirements
Port(s) | Direction | Description |
---|---|---|
3000 | Both | Traffic to and from Unravel UI If you plan to use Cloudera Manager to install Unravel's sensors, the Cloudera Manager service must also be able to reach the Unravel host on port 3000. |
7180 (or 7183 for HTTPS) | Out | Traffic from Unravel Server(s) to Cloudera Manager |
HDFS Ports
For HDFS, access to the NameNode and DataNode should be provided. The default value for NameNode is 8020 and that of DataNode is 9866 and 9867. However, these can be configured to any other ports.
Services | Default port | Direction | Description |
---|---|---|---|
NameNode | 8020 | Both | Traffic to/from the cluster to Unravel servers. |
DataNode | 9866,9867 | Both | Traffic to/from the cluster to Unravel servers. |