Unravel server reference
This reference covers both on-premises and cloud deployments. Some daemons and properties only apply to on-premises deployments.
Daemons
The Unravel service is composed of many daemons which are summarized in the next table. The single script /etc/init.d/unravel_all.sh
can be used with a start or stop or restart argument to control all the daemons on a host, in the correct order. The suffix _N
means 1, 2, 3, or 4 separate daemons.
Daemon logs are located in /usr/local/unravel/logs
,
There are two UI logs:
unravel_ngui.log
andforever_ngui.log
.All other logs named with the following format,
unravel_
daemon_name
.log
. For example, the log forunravel_jcw2_1
isunravel_jcw2_1.log
.
Daemon Logical Name | Description |
---|---|
| bundled database (on a custom port) |
| Datastore REST API HTTP server |
| Event Worker |
| Hive Hook Worker EMR |
| Hitdoc Loader |
| Host monitor |
| "Job Analyzer" summarizes jobs |
| Job Collector Sensor YARN |
| Job Collector Sensor YARN for EMR |
| Job Collector Sensor Worker YARN |
| bundled Kafka (on a custom port) |
| Kafka Monitor |
| Log Receiver |
| Metrics Analyzer |
| aNGular Web UI |
| Oozie v4 Sensor |
| Partition Worker |
| Elasticsearch |
| Spark Worker |
| bundled TomCat (port 4020), internal REST API |
| "Tidy Dir" cleans up and archives hdfs directories, db retention cleaner. |
| Table Worker |
| User Digest (report generator) |
| Universal sensor/Impala |
| bundled Zookeeper (on a custom port) |
Adjustable properties
The file /usr/local/unravel/etc/unravel.properties
contains settings that can be preserved during an RPM upgrade. These properties are described in the following table:
Property Type/Property Name/Description | Default Value |
---|---|
General Unravel | |
com.unraveldata.tmpdir The base location for Unravel process control files where Unravel's temp files reside. | /srv/unravel/tmp |
HDFS | |
com.unraveldata.hdfs.batch.monitoring.interval.sec Number of seconds between checks for presence of hive queries and MR logs to load into Unravel for batch visibility; should be between 300 and 1800 (inclusive). | 300 |
com.unraveldata.hdfs.interactive.monitoring.interval.sec Number of seconds between checks for presence of hive queries and MR logs to load into Unravel for interactive visibility; should be between 5 and 60 (inclusive). | 30 |
JDBC | |
unravel.jdbc.username Unravel database user. | Set by user |
unravel.jdbc.password Password for unravel.jdbc.username. | Set by user |
unravel.jdbc.url URL for jdbc, determined by your database. Example: jdbc:mysql://127.0.0.1:3306/unravel_mysql_prod | Set by user |
Kafka | |
com.unraveldata.kafka.broker_list Embedded | 127.0.0.1:4091 |
Oozie | |
com.unraveldata.oozie.fetch.interval.sec Controls the rate the Oozie REST server is polled. Seconds between intervals for fetching Oozie workflow status. | 120 |
com.unraveldata.oozie.fetch.num Number of workflows to pull in each API call. | 100 |
oozie.server.url The Oozie server URL to be monitored by Unravel. If multiple servers exist, the server URL can be a comma-separated string where each part is IP of 1 Oozie server, e.g., http://ip-10-0-0-110.ec2.internal:11000,http://ip-10-0-0-114.ec2.internal:11000 | - |
Zookeeper | |
com.unraveldata.zk.quorum Embedded Zookeeper ensemble in form host1:port1,host2:port2. | 127.0.0.1:4181 |
Adjustable environment settings
Unravel runs the source command on /usr/local/unravel/etc/unravel.ext.sh
to allow site-specific environment variables to be set. The following table shows possible choices:
Environment Variable | Description | Default |
---|---|---|
| The standard way to specify the home directory of Oracle Java so that See thecompatibility matrix for compatible JAVA versions. | Should use update-alternatives to make correct Java first choice. |
| Last chance arguments to jvm to override other settings. | Unset |
| The directory containing the hadoop config files | As discovered by running hadoop fs -ls. |
| A base directory owned by user |
|
| The Web UI port on the primary or standalone Unravel installation ( |
|
| A destination directory owned by run-as user for log files. |
|
Adjustable root environment settings
Unravel's init.d
script runs the source command on /etc/unravel_ctl
to allow site-specific environment variables for the "run-as" and group membership of the daemons that Unravel Server runs:
Environment Variable | Description | Default if not Set |
---|---|---|
| The |
|
| The primary group membership of the user that runs the daemons. |
|
Directories and files
The following is a cross-reference of notable directories and files used by Unravel Server:
Path/Purpose | Expected Size | Notes |
---|---|---|
Control file for run-as. | n/a | Must be owned by root for security reasons. |
Convenience script for Unravel start, restart, status, and stop ; controls daemons in proper order. | n/a | For multi-host installations, run this script on both primary and secondary Unravel host. |
Unravel Server installation directory. | 1-2.5GB | This directory is created by installing the Unravel RPM; this is a fixed destination. |
An optional file for overriding | n/a | Optional; example syntax:
|
Site-specific settings for Unravel. | n/a | Keep a "golden" copy of this file; |
Version-specific values for Unravel like version number, build timestamp. | n/a | Updated during upgrades; do not modify this reference file in order to preserve traceability |
Logs written by Unravel daemons. | ~3.5GB max | Each daemon has a maximum of 100MB of logs, auto-rolled; use a symlink to put on another partition. |
Base directory for Unravel Server data kept separately from installation directory; contains messaging data for process coordination, bundled db, Elasticsearch indexes, temporary files. | 2-900GB; depending on activity level, retention | This directory or its subdirectories can be a symlink(s) to other volumes for disk io performance reasons to distribute load over multiple volumes. If this is an EBS on AWS, then it must be provisioned for max available IOPS and the Unravel Server must be EBS optimized. |