Home

Triggering an import of FSimage

The etl_fsimage task imports the latest FSimage from Namenode and incorporates it into a Hive table (unravel_hdfs_fsimage_master_orc) used for Unravel's Small Files/File Reports feature, and it also generates four (4) precomputed file reports.

We recommend scheduling etl_fsimage for a daily run (which is the default configuration). The default configuration leads to the triggering of etl_fsimage at 00:00 UTC every day. As of 4.5.3.0 you can configure the time and interval for downloading the FSimage, see FSimage properties.

However, there may be times when you want to import FSimage immediately, such as after Unravel Server is installed or upgraded. In this case, you have to start etl_fsimage manually by running the following script on the Unravel node:

curl -v http://localhost:5000/small-files-etl

This script ensures that the latest FSimage is incorporated in Unravel's Small Files/File Reports.

The etl_fsimage task's run time is proportional to the size of FSImage. In testing on a single node cluster, we observe the following run times. Please note that these times may not match your deployment; these times illustrate that etl_fsimage is proportional to the size of FSimage.

FSimage Size

etl_fsimage

19 GB

24 hours

9 GB

14 hours

4 GB

7 hours

Troubleshooting

If etl_fsimage fails with the warning OR if the Unravel user does not have dfsadmin privileges --i.e. you see the following error:

[2018-09-10 23:11:57,357: WARNING/ForkPoolWorker-1]* stderr: sudo: hdfs: command not found*

In this case, do the following:

  1. Fetch the FSImage as a user with dfadmin privileges using the commands

    rm -rf unravel_node_fsimage_dir/*
    hdfs dfsadmin -fetchImage unravel_node_fsimage_dir

    These commands delete all existing FSImage files and then copy the latest FSImage into the directory you specify (unravel_node_fsimage_dir).

    The directory unravel_node_fsimage_dir must be different then Unravel's default directory /srv/unravel/tmp/reports/fsimage and it should be readable by unravel user.

    Best practice is to run these commands in a cron job that completes before Unravel's etl_fsimage task is triggered every day at 00:00 UTC.

  2. Configure Unravel OnDemand to access FSImage from unravel_node_fsimage_dir by setting the following properties in unravel.properties:

    unravel.python.reporting.files.skip_fetch_fsimage=true
    unravel.python.reporting.files.external_fsimage_dir=unravel_node_fsimage_dir

    Note: For this to work, the OnDemand user must have read privileges for the directory specified by unravel_node_fsimage_dir.

  3. Restart the Unravel OnDemand daemon.

    rm -rf unravel_node_fsimage_dir/*
    hdfs dfsadmin -fetchImage unravel_node_fsimage_dir

Note

Unravel OnDemand assumes the FSImage filename starts with fsimage and does not end with extension .txt.