Home

Installing Unravel on CDH+CM

This topic explains how to deploy Unravel on Cloudera Distribution of Hadoop (CDH). Your CDH environment must be running Cloudera Manager (CM).

Important

If you have not already done so, confirm your cluster meets Unravel's CDH​ compatibility matrix hosting requirements.

1. Configure the host

Use Cloudera Manager to allocate a cluster gateway/edge/client host with HDFS access, and create a gateway configuration for the host. The gateway configuration must have client roles for HDFS, YARN, Spark, Hive, and optionally, Spark2.

6. Configure Unravel with basic options
  1. (Optional) Enable additional daemons for high-volume workloads.

  2. Edit Unravel properties through automatic or manual configuration. Following is a reference list of the basic properties that you can configure for CDH installation.

  3. If Kerberos is enabled, create or identify a principal and keytab for Unravel daemons to use for access to HDFS and the REST API.

  4. If Sentry is enabled:

    1. Create your own alternate principal with narrow privileges and HDFS access permissions.

    2. Verify that the user running the Unravel daemon /etc/unravel_ctl has the permissions shown in the table below.

      Resource

      Principal

      Permission

      Purpose

      hdfs://user/spark/applicationHistory

      Your alt principal

      read+execute

      Spark event log

      hdfs://user/spark/spark2ApplicationHistory

      Your alt principal

      read+execute

      Spark2 event log (if Spark2 is installed)

      hdfs://user/history

      Your alt principal

      read+execute

      MapReduce logs

      hdfs://tmp/logs

      Your alt principal

      read+execute

      YARN aggregation folder

      hdfs://user/hive/warehouse

      Your alt principal

      read+execute

      Obtain table partition sizes with "stat" only

  5. If you are using a virus scanner,

    It is recommended to disable your virus scanner from scanning the elasticsearch directories which are located under <Unravel installation directory>/data.

7. Change the run-as user and group for Unravel daemons

Unravel daemons run under the local user unravel by default. However, if you have Kerberos or Sentry enabled, or a non-Kerberos cluster with simple Unix security, or a different username for the Unravel user, or a non-local user such as an LDAP user, run switch_to_user.sh script to change the Unix owner and group of the Unravel daemons.

9. Start Unravel

Run the following command to start Unravel:

<Unravel installation directory>/manager start

This completes the basic/core configuration.

10. Log into Unravel UI
  1. Find the hostname of Unravel Server.

    echo "http://$(hostname -f):3000/"

    If you're using an SSH tunnel or HTTP proxy, you might need to make adjustments.

  2. Using a supported web browser, (see ​Unravel's ​​Azure Databricks​ compatibility matrix), navigate to http://unravel-host:3000 and log in with username admin with password unraveldata.

    signin.png

    Unravel UI displays collected data.