Triggering an import of FSimage
The etl_fsimage
task imports the latest FSimage from Namenode and incorporates it into a Hive table (unravel_hdfs_fsimage_master_orc
) used for Unravel's Small Files/File Reports feature, and it also generates four (4) precomputed file reports.
We recommend scheduling etl_fsimage
for a daily run (which is the default configuration). The default configuration leads to the triggering of etl_fsimage at 00:00 UTC every day. As of 4.5.3.0 you can configure the time and interval for downloading the FSimage, see FSimage properties.
However, there may be times when you want to import FSimage immediately, such as after Unravel Server is installed or upgraded. In this case, you have to start etl_fsimage
manually by running the following script on the Unravel node:
curl -v http://localhost:5000/small-files-etl
This script ensures that the latest FSimage is incorporated in Unravel's Small Files/File Reports.
The etl_fsimage
task's run time is proportional to the size of FSImage. In testing on a single node cluster, we observe the following run times. Please note that these times may not match your deployment; these times illustrate that etl_fsimage
is proportional to the size of FSimage.
FSimage Size |
|
---|---|
19 GB | 24 hours |
9 GB | 14 hours |
4 GB | 7 hours |
Troubleshooting
If etl_fsimage
fails with the warning OR if the Unravel user does not have dfsadmin
privileges --i.e. you see the following error:
[2018-09-10 23:11:57,357: WARNING/ForkPoolWorker-1]* stderr: sudo: hdfs: command not found*
In this case, do the following:
Fetch the FSImage as a user with
dfadmin
privileges using the commandsrm -rf unravel_node_fsimage_dir/* hdfs dfsadmin -fetchImage unravel_node_fsimage_dir
These commands delete all existing FSImage files and then copy the latest FSImage into the directory you specify (
unravel_node_fsimage_dir
).The directory
unravel_node_fsimage_dir
must be different then Unravel's default directory/srv/unravel/tmp/reports/fsimage and it should be readable by unravel user.
Best practice is to run these commands in a cron job that completes before Unravel's etl_fsimage task is triggered every day at 00:00 UTC.
Configure Unravel OnDemand to access FSImage from
unravel_node_fsimage_dir
by setting the following properties inunravel.properties
:unravel.python.reporting.files.skip_fetch_fsimage=true unravel.python.reporting.files.external_fsimage_dir=unravel_node_fsimage_dir
Note: For this to work, the OnDemand user must have read privileges for the directory specified by
unravel_node_fsimage_dir.
Restart the Unravel OnDemand daemon.
rm -rf unravel_node_fsimage_dir/* hdfs dfsadmin -fetchImage unravel_node_fsimage_dir
Note
Unravel OnDemand assumes the FSImage filename starts with fsimage
and does not end with extension .txt
.