Home

Prerequisites

To deploy Unravel, first ensure that your environment meets these requirements.

Important

You must use an independent host for the Unravel server.

This host must:

  • Be managed by Cloudera.

  • Have Hadoop clients pre-installed.

  • Have no other Hadoop service or third-party applications installed.

  • Accessible to only Hadoop and Unravel Admins.

Platform

Each version of Unravel has specific platform requirements. Confirm that your CDH meets the requirements for the version of Unravel that you're installing. Your CDH environmenluxt must be running Cloudera Manager (CM).

Sizing
Software
  • If you're running Red Hat Enterprise Linux (RHEL) 6.x, boostrap.system_call_filter is set to false in elasticsearch.yml:

    boostrap.system_call_filter: false
  • libaio.x86_64 is installed.

  • For Unravel version 4.5.0.0, SELINUX is set to permissive or disabled in /etc/sysconfig/selinux. For Unravel versions 4.5.0.1+, SELINUX can be set to enabled.

  • PATH includes the path to the HDFS+Hive+YARN+Spark client/gateway, Hadoop commands, and Hive commands.

  • If Spark2 service is installed, the Unravel host should be a client/gateway.

  • Zookeeper is not installed on the same host as the Unravel host.

  • NTP is running and in-sync with the cluster.

Permissions

Tip

The installation creates a local user unravel:unravel, but you can change this later.

  • You must have root access or "sudo root" permission in order to install the Unravel Server RPM.

  • If you're using Kerberos, we'll explain how to create a principal and keytab for Unravel daemons to use to access these HDFS resources:

    • MapReduce logs (hdfs://user/history)

    • YARN's log aggregation directory (hdfs://tmp/logs)

    • Spark and Spark2 event logs (hdfs://user/spark/applicationHistory and hdfs://user/spark/spark2ApplicationHistory)

    • File and partition sizes in the Hive warehouse directory (typically hdfs://apps/hive/warehouse)

  • Unravel needs access to the YARN Resource Manager's REST API (so that the principal can determine which resource manager is active).

  • Unravel needs access to the JDBC access to the Hive Metastore. Read-only access is sufficient.

  • If you're using Impala, Unravel needs access to the Cloudera Manager API. Read-only access is sufficient.

Network
CDH-specific port requirements

Port(s)

Direction

Description

3000

Both

Traffic to and from Unravel UI

If you plan to use Cloudera Manager to install Unravel's sensors, the Cloudera Manager service must also be able to reach the Unravel host on port 3000.

7180 (or 7183 for HTTPS)

Out

Traffic from Unravel Server(s) to Cloudera Manager