Configuring FSimage
This topic explains how to configure FSimage which is triggered by default. See Toggling FSimage status for how to disable it.
FSImage requires configuration for some Data Page features and content, specifically to:
Automatically generate
Populate the partition and table size information in the Data page details section
Create the Small Files report upon user request.
Important
This feature runs as hdfs
user.
You must restart the unravel_ondemand
and unravel_ngui
daemons for any configuration changes to take effect. Execute the following commands:
/etc/init.d/unravel_ngui restart /etc/init.d/unravel_ondemand restart
Toggling FSimage status
Set the following property in /usr/local/unravel/etc/unravel.properties
.
Property/Description | Default |
---|---|
unravel.python.reporting.files.disable Enables or disables Unravel ability to generate Small Files and File Reports. Note
| false |
1. Determine how Unravel will access FSimage
Unravel can run as HDFS admin
This allows Unravel to access the FSimage on the Namenode using the
hdfs dfsadmin
command.Unravel can't run as HDFS admin
FSimage must be downloaded for Unravel to use it. Create a
cron
to download it to the Unravel server. The old image must be deleted before running thecron
job.The download can take up to 10 hours depending on the size of FSimage. The time can determine how often
cron
is/should be run.Edit
/usr/local/unravel/etc/unravel.properties
and set the following properties.Property/Description
Default
unravel.python.reporting.files.skip_fetch_fsimage
If HDFS admin privileges can not be granted, set this to true to allow Unravel's OnDemand process to use an externally fetched FSimage.
true
: OnDemand etl_fsimage process does not fetch FSimage from name node. Instead, the FSimage is expected to be available in directory specified by unravel.python.reporting.files.external_fsimage_dir.false
unravel.python.reporting.files.external_fsimage_dir
Directory for FSimage when skip_fetch_fsimage=true. The FSimage externally fetched is expected to be in this directory. Unravel uses the latest file in this directory which starts with " fsimage_".
This directory must be different than the Unravel's internal directory, i.e., /srv/unravel/tmp/reports/fsimage.
-
For example,
unravel.python.reporting.files.skip_fetch_fsimage=true; unravel.python.reporting.files.external_fsimage_dir=/srv/unravel/tmp/fsimages/reports;
2. Define the following
Unravel can use Spark or Hive to process the FSImage. Starting with v4.5.4.3 Spark is the default. The unravel.python.reporting.fsimage.run_mode defines which is used. See the FSImage properties for a complete list of properties.
Edit /usr/local/unravel/etc/unravel.properties
and set the following properties as needed. If you don't find the properties, add them.
If Spark is used to process the image.
If Hive is being used to process the image. We recommend that you use Spark instead of Hive to process the FSImage.
Property/Description
Default
unravel.hive.server2.host
FQDN or IP-Address of the HiveServer2 instance.
-
unravel.hive.server2.port
Port for the HiveServer2 instance.
You need only define this if the unravel.hive.server2.host port is not 1000.
10000
unravel.hive.server2.authentication
Define the authentication type. Possible values are:
KERBEROS
,LDAP
,NOSASL
,NONE
, orCUSTOM
.When set to
KERBEROS
you must also set kerberos.service.name=hive.-
unravel.hive.server2.kerberos.service.name
Set only when unravel.hive.server2.authentication=
KERBEROS
.This must be set to
hive
to run the various reports in a kerberos enviornment.-
If you are using a secured CDH or HDP cluster verify that you have configured Unravel correctly.
For a Sentry-Secured CDH see step 5.
For a Ranger-Secured HDP see step 6.
Important
As part of configuring the Data page Unravel connects to the Hive Metastore using the JDO properties. To load the specified JDBC driver, the jar file containing the JDBC driver class must be available in <unravel-installation-dir>/share/java
.
3. Importing etl_fsimage
The etl_fsimage
task imports the latest FSimage from Namenode. The etl_fsimage
run time is proportional to the image size, for example
The following two properties determine when FSimage triggered (interval) and at what time. The default is once a day at 00:00 (midnight).
Important
FSimage is a snapshot that becomes outdated with the passage of time, in other words, the older the image the more it diverges from the real-time structure.
There may be times when you need to import FSimage immediately, such as after Unravel Server is installed or upgraded. Run time following command to trigger the import.
curl -v “http://localhost:5000/small-files-etl”
4. Unravel UI features
Once FSimage has been successfully fetched you can go the UI to verify
The Details has the table and partition sizes.
The four data File reports are populated.
You can generate a Small files report.
Tips
The relevant log file is
/usr/local/unravel/ondemand/logs/unravel_ondemand.out
Run one of the following commands to display the progress of the
etl_fsimage
task.egrep 'ETL_FSIMAGE|FSIMAGE_REPORTS_UTILS' unravel_ondemand.out
grep etl_fsimage\(\) unravel_ondemand.out
Run one of the following commands to display the progress of the
run_small_files
which is started whenever Small Files Report is triggered from UI.egrep 'SMALL_FILES_REPORT|FSIMAGE_REPORTS_UTILS' unravel_ondemand.out
grep run_small_files\(\) unravel_ondemand.out
The FSimage file is present on Unravel node at
/srv/unravel/tmp/reports/ fsimage/fsimage.txt
.The FSimage file is present in HDFS at
/tmp/fsimage/fsimage.txt
.In case of problems, it may be helpful to look at HiveServer2 and Yarn logs.