Home

Multi-cluster Properties

This document describes the properties that are handled in a multi-cluster deployment.

Layout of multi-cluster configurations for a component:

  • com.unraveldata.<component>.list = Comma-delimited list of variables representing the component instances.

  • com.unraveldata.<component>.<variable>.<property> = Value

    The variables are for configuration purposes only and do not need to correspond to any hostnames or other properties of the components. Acceptable characters: alphanumerics, hyphens, and underscores only.

Core node configurations for components

The following components are accessed from the Core node, where Unravel is installed.

  • Cloudera Manager (CM)

  • Ambari

  • Hive Metastore

  • Kafka

  • Pipeline (Workflows)

The following properties must be configured on the Core node in a multi-cluster setup.

Cloudera Manager - Multi-cluster
Ambari
Hive Metastore
Kafka
Pipelines
Edge node configurations for components

In a multi-cluster deployment for on-prem platforms, the following properties must be added to the Edge node server, for MR jobs to load jhist and logs, HDFS path for jhist/conf and yarn logs:

Property/Description

Default

com.unraveldata.min.job.duration.for.attempt.log

Minimum duration of a successful application or which executor logs are processed (in milliseconds).

600000

(10 mins)

com.unraveldata.job.collector.log.aggregation.base

HDFS path to the aggregated container logs (logs to process). Don't include the hdfs:// prefix. The log format defaults to TFile. You can specify multiple logs and log formats (TFile or IndexedFormat).

Example: TFile:/tmp/logs/*/logs/, IndexedFormat:/tmp/logs/*/logs-ifile/.

For HDP set this to: IndexedFormat:/app-logs/*/logs/.

/tmp/logs/*/logs/

com.unraveldata.job.collector.done.log.base

HDFS path to "done" directory of MR logs. Do not include the hdfs:// prefix.

For HDP set this to: /mr-history/done.

hdfs:///user/spark/applicationHistory/

The following properties must be added for Tez to the Edge node server in a multi-cluster deployment for on-prem platforms.