Home

Troubleshooting

This section provides information for troubleshooting and recovery.

Edge node fails to communicate with the core node displaying error in the daemon log: Caused by: java.io.IOException: User limit of inotify watches reached

Issue

Sometimes, the edge node fails to communicate with the core node and java.net.ConnectException: Connection refused error is displayed. When you check the daemon log, you will notice that the issue is caused when the user limit is reached for inotify watches: Caused by: java.io.IOException: User limit of inotify watches reached

Solution

To resolve this issue, you can check and increase the threshold limit of the inotify watches on the core node as follows:

  1. On the core node, check and ensure if the max number of inotify watches has been reached.

  2. After ensuring that the inotify watches have reached the upper limit, access the /etc/sysctl.conf file as a root user.

  3. Using an editor, update /etc/sysctl.conf file and set the kernel parameter fs.inotify.max_user_watches to an increased limit. For example: fs.inotify.max_user_watches=524288.

  4. Apply the changes.

    sysctl -p
    

Upgrading from 4.6.2x, the Precheck fails for Hadoop when you activate the 4.7x version

Issue

When you are upgrading from Unravel version 4.6.2x multi-cluster environment and activate the v4.7x version, the Precheck fails with the following Hadoop error:

hadooperror.png
Solution

This is because of the com.unraveldata.multicluster.default_cluster.enabled property which indicates whether the core node is directly monitoring the Hadoop cluster or not. By default, this is property is set to true in Unravel 4.6.2x.

However, if you are not using the core node for hadoop monitoring, you must manually set this property to false before performing the upgrade in a multi-cluster environment. This will eliminate the Hadoop error in Precheck when you are upgrading in a multi-cluster environment from Unravel version 4.6.2x to 4.7x.

Before you upgrade to v4.7x, do the following:

  1. Stop Unravel

    <Unravel installation directory>/unravel/manager stop
    
  2. Set the com.unraveldata.multicluster.default_cluster.enabled property to false.

    <Unravel installation directory>/unravel/manager config properties set com.unraveldata.multicluster.default_cluster.enabled false
    
  3. Apply the changes.

    <Unravel installation directory>/unravel/manager refresh files
    
  4. Start Unravel.

    <Unravel installation directory>/unravel/manager start

Supplying a configuration for an instance Group in a running cluster on EMR overwrites Unravel Sensor properties added by the bootstrap script.

Issue

If you supply a configuration for an instance Group in a running cluster on EMR, it overrides the Unravel Sensor properties added by the bootstrap script.

Solution

You must add unravel properties along with the new configurations that are modified.

Diagnosing issues from log files

Whenever you face any issues during installation, you should first check the following log files to diagnose the issues:

The installation process is broken

Issue:

The installation process gets broken.

Solution:

Whenever the installation process gets broken, do the following:

  1. Stop Unravel.

    manager stop

    If the manager does not work, open the services directory, each service has a stop.sh script. Stop the service monitor (monit). and then run the stop.sh script.

    In case you do not have stop.sh scripts, send SIGTERM to all the services starting with the service monitor (monit)

    Caution

    Avoid using SIGKILL since that may cause some file corruption.

  2. Reinstall Unravel using the content in the data directory.

Files got deleted or corrupted

Issue:

The files got deleted or corrupted

Solution:
  1. Stop Unravel.

  2. Assuming that you have installed Unravel in /opt, run the following command:

    /opt/unravel/manager refresh files

    This regenerates all the scripts and configuration files.

    In case the refresh command did not regenerate the files or the manager is broken, then check <Unravel installation directory>/data/conf/current.yaml and run the following. The current.yaml file shows the current version that is installed.

    <Unravel installation directory>/versions/X.Y.Z/setup --config=<Unravel installation directory>/data/conf/unravel.yaml
  3. Start Unravel.

    <Unravel installation directory>/unravel/manager start

Unravel software got deleted

Issue:

Unravel software got deleted.

Solution:
  1. Stop Unravel.

  2. Check <Unravel installation directory>/data/conf/current.yaml for the current version that is installed.

  3. Unpack that same version in the exact location where it was deployed earlier.

    tar zxf unravel-SAME-VERSION.tar.gz -C /opt
  4. Run the following:

    <Unravel installation directory>/versions/X.Y.Z/setup --config=<Unravel installation directory>/unravel/data/conf/unravel.yaml
  5. Start the manager.

    <Unravel installation directory>/unravel/manager start

Restoring Unravel from a backup

Issue:

How to restore Unravel from a backup?

Solution:
  1. Stop Unravel.

  2. Restore the backup of the data directory.

  3. Open data/conf/unravel.effective.yaml and check for the following key paths:

    • base: <Unravel installation directory>

    • data: <Unravel installation directory>/data

  4. Make sure that the data is restored to the right location.

  5. Make sure the unravel user has full access and ownership of the base location and everything in it.

  6. Check< Unravel installation directory>/data/conf/current.yaml for the current version that is installed.

  7. Unpack that same version in the exact location where it was deployed earlier.

    tar zxf unravel-SAME-VERSION.tar.gz -C /opt
  8. Run the following:

    <Unravel installation directory>/versions/X.Y.Z/setup --config=<Unravel installation directory>/data/conf/unravel.yaml
  9. Start Unravel.

    <Unravel installation directory>/manager start

Troubleshooting Cloudera Distribution of Apache Hadoop (CDH) issues

Symptom

Problem

Remedy

hadoop fs -ls /user/unravel/HOOK_RESULT_DIR/ indicates that the directory does not exist

  • Unravel Server RPM is not yet installed, or

  • Unravel Server RPM is installed on a different HDFS cluster, or

  • HDFS home directory for Unravel does not exist, or

  • kerberos/sentry actions are needed

Install Unravel RPM on Unravel host.

or

Verify that user unravel user exists and has a /user/unravel/ directory in HDFS with write access to it.

ClassNotFound error for com.unraveldata.dataflow.hive.hook.UnravelHiveHook during Hive query execution

Unravel hive hook JAR was not found in in $HIVE_HOME/lib/.

Confirm that the UNRAVEL_SENSOR parcel was distributed and activated in Cloudera Manager.

or

Put the Unravel hive-hook JAR corresponding to hive-version in jar-destination on each gateway as follows:

cd /usr/local/unravel/hive-hook/;
cp unravel-hive-hive-version*hook.jar jar-destination

Oozie shell action fails with ClassNotFoundException on Hcat call after Unravel Hive Hooks were added to the cluster

HCatalog is part of Apache Hive. In such a case, the Hive Hook configuration is found, but the libraries that execute Hive Hook are missing.

Since this is a shell action, libraries need to exist on every node locally so that Sqoop command can locate it during command execution. You can add Unravel Hive Hook jar in /var/lib/sqoop or wherever the hive-hcatalog jars are located in the cluster.

Unravel stop and start fails with an error

Issue:

When Unravel is stopped and restarted immediately, the following error is displayed:

[Errno 1] Operation not permitted
[Errno 1] Operation not permitted
INS00160: Process '3366' is not owned by unravel
INS00161: Process '3366' is not owned by unravel, this can come from a stale pid file '/opt/unravel/run/mysql.pid'
Solution

When you do an ungraceful shutdown, the PID files will remain and if the PID is reused it may cause problems. You should ensure that unravel is stopped (it will if the server was just restarted) and delete the PID files in /opt/unravel/run