- Home
- Work-in-Progress Documentation
- Installation
- Single cluster installation (On-prem)
- Prerequisites - Single cluster (On-prem)
Prerequisites - Single cluster (On-prem)
To deploy Unravel, ensure that your environment meets these requirements:
Each version of Unravel has specific platform requirements. Check the compatibility matrix to confirm that your cluster meets the requirements for the version of Unravel that you are installing.
In a single cluster deployment of Unravel, you must fulfill the following requirements for an independent host:
Be managed by Ambari/Cloudera.
Must have Hadoop clients pre-installed and running (YARN, HDFS, etc.)
Must have no other Hadoop service or third-party applications installed.
Accessible to only Hadoop and Unravel Admins.
Database connectivity
Ensure to fulfill the following prerequisites for database connectivity:
MySQL
Create a
mysql
directory in/tmp
. Provide permissions and make them accessible to the user who installs Unravel.Download the following to
/tmp/mysql
directory:mysql-connector-java-<version>.tar.gz
For an external MySQL database, add the JDBC connector to
/tmp/<MySQL-directory>/<jdbcconnector>
directory. This can be either a tar file or a jar file.
MariaDB
Create a
mariadb
directory in/tmp
. Provide permissions and make them accessible to the user who installs Unravel.Download the following tar files to
/tmp/mariadb
directory:mariadb-java-client-2.6.0.jar
For external MariaDB, add the JDBC connector to
/tmp/<MariaDB-directory>/<jdbcconnector>
directory. This can be either a tar file or a jar file.
Unravel managed database service
Ensure to install the following for fulfilling the OS level requirements for Unravel managed database service.
numactl-libs (for libnuma.so)
libaio (for libaio.so)
Minimum requirements to install Unravel:
Cores: 8
RAM: 96 GB
The following table lists the minimum requirements for cores, RAM, and disks for a typical environment with default data retention and lookback settings.
Jobs per day
Cores
RAM
/usr/local/unravel
/srv/unravel
Less than
50,000
8
96 GB
8 GB free
500 GB free
50,000 to
100,000 to
8
128GB
8 GB free
500 GB free
Over 100,000
Contact Unravel Support
Data includes Elasticsearch (ES) and the bundled database.
Note
In production environments, you can keep the Unravel software and Data directory on separate disks. Putting the Data directory on a separate high spin HDD with its own SATAIII (or equivalent) bus significantly increases IO bandwidth.
Architecture: x86_64
vm.max_map_count
is set to262144
Create an Installation directory and grant ownership of the directory to the user who installs Unravel. This user executes all the processes involved in running Unravel.
If you are using Kerberos, you must create a principal and keytab for Unravel daemons to use.
Unravel must have read access to these HDFS resources:
MapReduce logs (
hdfs://user/history
)YARN's log aggregation directory (
hdfs://tmp/logs
)Spark and Spark2 event logs (
hdfs://user/spark/applicationHistory
andhdfs://user/spark/spark2ApplicationHistory
)File and partition sizes in the Hive warehouse directory (typically
hdfs://apps/hive/warehouse
)
Unravel needs access to the YARN Resource Manager's REST API.
URL and credentials of the Cluster Manager (Cloudera/Ambari).
Unravel needs read-only access to the database used by the Hive metastore.
Unravel users should have read-only access to hive server2.
If you plan to use Unravel's move or kill AutoActions, the Unravel username needs to be added to YARN's yarn.admin.acl property.
Unravel needs read-only access to the Application Timeline Server (ATS).
If you are using Impala, Unravel needs access to the Cloudera Manager API. Read-only access is sufficient.
Note
You can customize all the Unravel ports. Refer to Configuring custom ports.
On the new node, open the following ports.
Port(s) | Direction | Description |
---|---|---|
3000 | Both | Traffic to and from Unravel UI |
3316 | Both | Database traffic |
4020 | Both | Unravel APIs |
4021 | Both | Host monitoring of JMX on |
4031 | Both | Database traffic |
4043 | In | UDP and TCP ingest traffic from the entire cluster to Unravel Server(s) |
4044-4049 | In | UDP and TCP ingest spares for |
4091-4099 | Both | Kafka brokers |
4171-4174, 4176-4179 | Both | ElasticSearch; localhost communication between Unravel daemons or Unravel Servers in a multi-host deployment |
4181-4189 | Both | Zookeeper daemons |
4210 | Both | Cluster access service |
HDFS ports | Both | Traffic to/from the cluster to Unravel Server(s) |
Hive metadata database port | Out | For YARN only. Traffic from Hive to Unravel Server(s) for partition reporting. |
8088 | Out | Traffic from Unravel Server(s) to the Resource Manager API |
8188 | Out | Traffic from Unravel Server(s) to the ATS server(s) |
11000 | Out | For Oozie only. Traffic from Unravel Server(s) to the Oozie server |
For HDFS, you must provide access to the NameNode and DataNode. The default value for NameNode is 8020 , and that of DataNode is 9866 and 9867. However, these can be configured to any other ports.
Services | Default port | Direction | Description |
---|---|---|---|
NameNode | 8020 | Both | Traffic to/from the cluster to Unravel servers. |
DataNode | 9866,9867 | Both | Traffic to/from the cluster to Unravel servers. |
CDH specific port requirements
Port(s) | Direction | Description |
---|---|---|
3000 | Both | Traffic to and from Unravel UI If you plan to use Cloudera Manager to install Unravel's sensors, the Cloudera Manager service must also be able to reach the Unravel host on port 3000. |
7180 (or 7183 for HTTPS) | Out | Traffic from Unravel Server(s) to Cloudera Manager |