Part 1: Installing Unravel Server on CDH+CM

Home

Part 1: Installing Unravel Server on CDH+CM

This topic explains how to deploy Unravel Server on Cloudera Distribution of Hadoop (CDH). Your CDH environment must be running Cloudera Manager (CM).

Important

If you have not already done so, confirm your cluster meets Unravel's CDH compatibility matrix hosting requirements.

1. Configure the host

Use Cloudera Manager to allocate a cluster gateway/edge/client host with HDFS access, and create a gateway configuration for the host. The gateway configuration must have client roles for HDFS, YARN, Spark, Hive, and optionally, Spark2.

2. Install MySQL

3. Install Unravel Server

Download the Unravel Server RPM.
Ensure that the host machine's local disks have the minimum space required.
Unravel Server uses two separate disks: one for binaries (/usr/local/unravel) and one for data (/srv/unravel). The separate disk /srv/unravel is beneficial for performance. If either of the disks do not have the minimum space required, create symbolic links for them to another disk drive.
Tip
To check the space on a volume use the df command. For example,
```
df -h /srv
```
Install the Unravel Server RPM.
```
sudo rpm -Uvh unravel-version.rpm
```

4. Configure MySQL

5. Configure Unravel Server with basic options

(Optional) Enable additional daemons for high-volume workloads.

In /usr/local/unravel/etc/unravel.properties, set general properties for Unravel Server.

Property/Description	Set by user	Unit	Default
com.unraveldata.customer.organization Customer name. Used to identify your installation for reporting and notification purposes in Unravel UI.	Optional	string	Not Set
com.unraveldata.advertised.url Defines the Unravel Server URL for HTTP traffic. Example: http://unravelserver.company.com:3000		string	http://{host}:3000
com.com.unraveldata.hdfs.timezone Timezone of HDFS, for example, US/Eastern, Etc/GMT-4, America/New_York. If the timezone is not set then an error message is logged and UTC timezone is used. Possible timezones can be obtained by calling `TimeZone.getAvailableIDs()`.		string	-
com.unraveldata.tmpdir The base location for Unravel process control files where Unravel's temp files reside.		string (path)	/srv/unravel/tmp
com.unraveldata.history.maxSize.weeks Number of weeks retained for search results in Elastic Search.		integer	5
com.unraveldata.retention.max.days Number of days to keep the heaviest data (such as error logs and drill-down details) in the SQL Database.		integer	30

Point Unravel Server to logs on HDFS.

Unravel collects HDFS logs for analysis. To point Unravel Server to these logs, set the following properties in /usr/local/unravel/etc/unravel.properties:

Property/Description	Unit	Default
com.unraveldata.job.collector.done.log.base HDFS path to `done` directory of MR logs as per cluster configuration. Don't include the hdfs:// prefix For example: com.unraveldata.job.collector.done.log.base=/mr-history/done.	string	/user/history/done
com.unraveldata.job.collector.log.aggregation.base HDFS path to the aggregated container logs (logs to process). Do not include the hdfs://prefix. The log format defaults to TFile. You can specify multiple logs and log formats (TFile or IndexedFormat.) Example: com.unraveldata.job.collector.log.aggregation.base=TFile:/tmp/logs//logs/,IndexedFormat:/tmp/logs//logs-ifile/.	CSL	/tmp/logs/*/logs/
com.unraveldata.spark.eventlog.location Comma-separated list of HDFS paths to the Spark event logs as per cluster configuration. Each path must include the hdfs:/// prefix. For example: com.unraveldata.spark.eventlog.location=hdfs:///spark1-history/,hdfs:///spark2-history/	CSL	hdfs:///user/spark/applicationHistory/

Property/Description

Set by user

Unit

Default

com.unraveldata.job.collector.done.log.base

HDFS path to done directory of MR logs as per cluster configuration. Don't include the hdfs:// prefix For example: com.unraveldata.job.collector.done.log.base=/mr-history/done.

string

/user/history/done

com.unraveldata.job.collector.log.aggregation.base

HDFS path to the aggregated container logs (logs to process). Do not include the hdfs://prefix. The log format defaults to TFile. You can specify multiple logs and log formats (TFile or IndexedFormat.)

Example: com.unraveldata.job.collector.log.aggregation.base=TFile:/tmp/logs/*/logs/,IndexedFormat:/tmp/logs/*/logs-ifile/.

CSL

/tmp/logs/*/logs/

com.unraveldata.spark.eventlog.location

Comma-separated list of HDFS paths to the Spark event logs as per cluster configuration. Each path must include the hdfs:/// prefix. For example: com.unraveldata.spark.eventlog.location=hdfs:///spark1-history/,hdfs:///spark2-history/

CSL

hdfs:///user/spark/applicationHistory/

For example,

com.unraveldata.job.collector.done.log.base=/user/history/done
com.unraveldata.job.collector.log.aggregation.base=/tmp/logs
com.unraveldata.spark.eventlog.location=hdfs://user/spark/applicationHistory,hdfs://user/spark/spark2

To confirm that you have the right path, use the hdfs dfs -ls command. For example,

hdfs dfs -ls /user/history/done
hdfs dfs -ls /tmp/logs

If Kerberos is enabled, create or identify a principal and keytab for Unravel daemons to use for access to HDFS and the REST API.

If Sentry is enabled:

Create your own alternate principal with narrow privileges and HDFS access permissions.

Verify that the user running the Unravel daemon /etc/unravel_ctl has the permissions shown in the table below.

Resource	Principal	Permission	Purpose
`hdfs://user/spark/applicationHistory`	Your alt principal	read+execute	Spark event log
`hdfs://user/spark/spark2ApplicationHistory`	Your alt principal	read+execute	Spark2 event log (if Spark2 is installed)
`hdfs://user/history`	Your alt principal	read+execute	MapReduce logs
`hdfs://tmp/logs`	Your alt principal	read+execute	YARN aggregation folder
`hdfs://user/hive/warehouse`	Your alt principal	read+execute	Obtain table partition sizes with "stat" only

If you are using a virus scanner
We recommend you disable your virus scanner from scanning the elasticsearch directories which are located under /srv/unravel.

6. Change the run-as user and group for Unravel daemons

Unravel daemons run under the local user unravel by default. However, if you have Kerberos or Sentry enabled, or a non-Kerberos cluster with simple Unix security, or a different username for the Unravel user, or a non-local user such as an LDAP user, run switch_to_user.sh script to change the Unix owner and group of the Unravel daemons.

7. Connect to the Hive metastore

8. Start Unravel services

Run the following command to start all Unravel services:

sudo /etc/init.d/unravel_all.sh start
sleep 60

This completes the basic/core configuration.

9. Log into Unravel UI

Find the hostname of Unravel Server.
```
echo "http://$(hostname -f):3000/"
```
If you're using an SSH tunnel or HTTP proxy, you might need to make adjustments.
Using a supported web browser, (see Unravel's Databricks compatibility matrix), navigate to http://unravel-host:3000 and log in with username admin with password unraveldata.
Unravel UI displays collected data.

10. Enable additional instrumentation

In this section:

Would you like to provide feedback? Just click here to suggest edits.

Home

Part 1: Installing Unravel Server on CDH+CM

Important

1. Configure the host

2. Install MySQL

3. Install Unravel Server

Tip

4. Configure MySQL

5. Configure Unravel Server with basic options

6. Change the run-as user and group for Unravel daemons

7. Connect to the Hive metastore

8. Start Unravel services

9. Log into Unravel UI

10. Enable additional instrumentation

Search results