Skip to main content

Home

Enabling multi-node deployment of Spark workers for high-volume data processing

You can deploy additional Spark workers on a separate server other than the server where Unravel is installed with services to process high-volume data. This section provides instructions for the multi-node setup of the Spark worker.

Setting multi-node deployment for Spark workers

The multi-node setting must first be established on the Unravel main node, where Unravel is installed with all the services, and then on the worker node.

Multi-node setup on the main node
  1. Install Unravel on the main node using the setup command, set the license, and set the LR endpoint. Refer to Unravel Databricks installation. Other installation options

  2. Stop Unravel.

    <Unravel installation directory>/unravel/manager stop
  3. Set the worker node details in unravel.yaml file. Run the following manager commands:

    • Make Unravel daemons accessible for Spark daemons over the network.

      <Unravel installation directory>/unravel/manager config multinode public-listen enable
    • Add the hostname of the Spark daemon in unravel.yaml file.

      <Unravel installation directory>/unravel/manager config multinode add <host-key> <name> <hostname> --role spark_worker

      Enter the following values:

      • host-key - Node identifier

      • name - A name to identify the node.

      • hostname - The hostname of the new node, which is used for generating the configuration.

      For example:

      /opt/unravel/manager config multinode add workers-1 My Spark Workers my.server.domain --role spark_worker

      Optionally for advanced configuration, --host-alias can be added to look up the configuration when the server name used in the configuration differs from the name used when looking up the configuration.

    Tip

    You can run the following commands for assistance:

    • <Unravel installation directory>/unravel/manager/config multinode public-listen --help
    • <Unravel installation directory>/unravel/manager/config multinode add --help
  4. Set the JAVA_HOME environment and increase the partitions of the Spark topic. This ensures you use the JVM provided by Unravel for the multi-node deployment.

    Note

    This has to be configured only when increasing the number of Spark consumers. Refer to Advanced Spark configurations. Preferably, consumers and partitions should be either in equal numbers, or the number of consumers can be half the number of partitions. By default, there are 8 partitions.

    export JAVA_HOME=/<unravel_installation_directory>/unravel/versions/<version>/java
    /<unravel_installation_directory>/unravel/versions/<version>/kafka/bin/kafka-topics.sh
    --bootstrap-server localhost:4091 --alter --topic spark --partitions <count of partitions>

    For example:

    /opt/unravel/versions/<version>/kafka/bin/kafka-topics.sh --bootstrap-server localhost:4091 --alter --topic spark --partitions 16

  5. Apply changes.

    <Unravel installation directory>/unravel/manager config apply
  6. Start Unravel

    <Unravel installation directory>/unravel/manager start
Multi-node setup on the Worker node

After you set up the multi-node configurations on the main node, do the following on the worker node:

  1. Copy the unravel.yaml file from the main node to the worker node.

    For example:

    scp /opt/unravel/data/conf/unravel.yaml example@myserver:/tmp

  2. Install Unravel on worker node using unravel.yaml file that is copied from the main node. Refer Unravel Databricks installation >Manual installation section. Run the setup command as follows:Other installation options

    <Unravel installation directory>/unravel/versions/<Unravel version>/setup --config /tmp/unravel.yaml
  3. Set Unravel license. Also, refer to Setting Unravel license.

  4. Set the Spark worker instance count using the manager command if it is not done from the main node. Refer to Enabling multiple daemon workers for high-volume data. Enabling multiple daemon workers for high-volume data

  5. Set the count for Spark consumers. Unravel supports the processing of multiple records in parallel in a single Spark daemon. The number of Spark consumers defines how many records are processed simultaneously.

    /<Unravel installation directory>/opt/unravel/manager config worker set spark_worker consumer_count <count>
    

    For example:

    /opt/unravel/manager config worker set spark_worker consumer_count 4

  6. Apply changes.

    <Unravel installation directory>/unravel/manager config apply
  7. Start Unravel

    <Unravel installation directory>/unravel/manager start

Note

All the workspaces included in the main node must be registered on the worker node.

When you set up the multi-node on the Sparker worker node, all the workspaces on the main node are automatically registered with the worker node. However, if you have added or deleted a workspace on the main node later, you must register the workspace manually on the worker node. Refer to Importing workspaces to register workspace on a worker node.

Upgrading a multi-node deployment
  1. Run the following command on both the main node and worker node to stop Unravel.

    /<unravel_installation_directory>/unravel/manager stop
  2. Upgrade the main node.

    <unravel_installation_directory>/unravel/manager activate <unravel-version>
  3. Start Unravel on the main node.

    /<unravel_installation_directory>/unravel/manager start
  4. Upgrade the Spark worker node.

    <unravel_installation_directory>/unravel/manager activate <unravel-version>
  5. Start Unravel on the Spark worker node.

    /<unravel_installation_directory>/unravel/manager start
Importing workspaces to Spark worker node

When you set up the multi-node on the Sparker worker node, all the workspaces are automatically registered with the worker node. However, later if you add or remove a workspace from the main node using the UI, this will not get registered on the worker node. In such a scenario, you must import the workspaces on the worker node.

  1. Copy the unravel.yaml file from the main node to the worker node. This registers all the workspaces onto the Sparker worker node.

    For example:

    scp /opt/unravel/data/conf/unravel.yaml example@myserver:/tmp

  2. In case any of the workspaces do not get registered on the Spark worker node, you can do the following:

    1. Add all the workspaces in the workspaces: block of unravel.yaml file as shown:

      unravel:
        config:
          databricks:
            workspaces:
              ...

      Otherwise, you can create a custom file with only workspaces and provide the file path to that custom file.

    2. Specify the file path to unravel.yaml file or the custom file and run the following command:

      <Unravel installation directory>/unravel/manager config databricks import /path/to/file