Configuring FSImage
This topic explains how to configure FSimage which is triggered by default. See Toggling FSImage status for how to disable it.
Common configurations
FSImage requires configuration for some Data Page features and content, specifically to:
Automatically generate Files Report
Calculate and populate the partition and table size information on the Data page. Refer Table details section
Create the Small Files report upon user request.
Important
You must restart the unravel_ondemand
and ngui
daemons for any configuration changes to take effect. Execute the following commands:
/etc/init.d/ngui restart /etc/init.d/unravel_ondemand restart
Toggling FSImage status
Stop Unravel
<Unravel installation directory>/unravel/manager stop
Change the setting.
<Unravel installation directory>/unravel/manager config properties set unravel.python.reporting.files.disable true
This property enables or disables Unravel ability to generate Small Files and File Reports. Default is false.
Note
false
: enables the Small Files and Files reports in both the backend and UI.true
: disables the Small Files and Files reports. in both the backend and UI.Apply the changes.
<Unravel installation directory>/unravel/manager config apply
Start Unravel
<Unravel installation directory>/unravel/manager start
Refer to Unravel Properties for the complete list of properties that can be set using this command.
Define resources used to process FSImage
Stop Unravel
<Unravel installation directory>/unravel/manager stop
Set the following properties as shown:
<Unravel installation directory>/unravel/manager config properties set unravel.python.reporting.files.spark.cores 6 <Unravel installation directory>/unravel/manager config properties set unravel.python.reporting.files.spark.driver.memory 6
Apply the changes.
<Unravel installation directory>/unravel/manager config apply
Start Unravel
<Unravel installation directory>/unravel/manager start
Refer to Unravel Properties for the complete list of properties that can be set using this command.
Set the following properties.
Unravel uses Spark to process the FSImage. See the FSImage properties for a complete list of properties. The following properties define the resources used to process FSImage.
Important
As part of configuring the Data page Unravel connects to the Hive Metastore using the JDO properties. To load the specified JDBC driver, the jar file containing the JDBC driver class must be available in <unravel-installation-dir>/share/java
.
Configuring FSImage in a single cluster deployment
Accessing FSImage
Unravel can run as an HDFS admin.
This allows Unravel to access the FSImage on the Namenode using the
hdfs dfsadmin
command.Unravel cannot run as HDFS admin
FSImage must be downloaded for Unravel to use it. Create a
cron
to download it to the Unravel server. The old image must be deleted before running thecron
job.The download can take up to 10 hours depending on the size of FSImage. The time can determine how often
cron
is/should be run.Stop Unravel services with manager stop command and then set the following properties as follows:
<Unravel installation directory>/unravel/manager config properties set
<property>
<value>
Property/Description
Default
unravel.python.reporting.files.skip_fetch_fsimage
If HDFS admin privileges can not be granted, set this to true to allow Unravel's OnDemand process to use an externally fetched FSimage.
true
: OnDemand etl_fsimage process does not fetch FSImage from the name node. Instead, the FSImage is expected to be available in the directory specified by unravel.python.reporting.files.external_fsimage_dir.false
unravel.python.reporting.files.external_fsimage_dir
Directory for FSimage when skip_fetch_fsimage=true. The FSimage externally fetched is expected to be in this directory. Unravel uses the latest file in this directory which starts with " fsimage_".
This directory must be different than Unravel's internal directory, i.e., /srv/unravel/tmp/reports/fsimage.
-
For example,
<Unravel installation directory>/unravel/manager config properties set unravel.python.reporting.files.skip_fetch_fsimage true; <Unravel installation directory>/unravel/manager config properties set unravel.python.reporting.files.external_fsimage_dir /srv/unravel/tmp/fsimages/reports;
After setting the properties, start the Unravel services with manager start command.
Importing etl_fsimage
The etl_fsimage
task imports the latest FSImage from Namenode. The etl_fsimage
run time is proportional to the image size, for example
The following two properties determine when the FSImage is imported (interval) and at what time. The default is once a day at 00:00 (midnight).
Important
FSImage is a snapshot that becomes outdated with the passage of time, in other words, the older the image the more it diverges from the real-time structure.
There may be times when you need to import FSImage immediately, such as after Unravel Server is installed or upgraded. Run time following command to trigger the import.
curl -v http://localhost:5000/small-files-etl
Configuring FSImage in a multi-cluster deployment
Unravel Ondemand processes the HDFS FSImage as follows:
Fetches the raw FSImage from HDFS Namenode:
hdfs dfsadmin -fetchImage
Parses the raw FSImage into a tab-separated text file.
hdfs oiv
For this to work, FSImage must be fetched and parsed on a cluster gateway node.
In a Single cluster configuration, the Unravel core node itself is the cluster gateway node. Thus FSImage is fetched and parsed by the Unravel Ondemand process itself.
In a multi-cluster deployment, Unravel edge nodes are the cluster gateway nodes. These nodes have a trivial Unravel footprint and there is no way to fetch/process the FSImage using any unravel process.
Thus, for the FSImage to work, it must be fetched, parsed, and uploaded to the Unravel core node by a non-Unravel process/script. A template script is provided for this purpose, which must be run by each of the Unravel edge nodes.
Configuring the template script
You must configure the following parameters in the template script.
Parameters | Description |
---|---|
| Set Unravel generated UID for the cluster attached to the edge node. |
| Set Unravel core node’s fully qualified hostname. |
| Set Unravel user name. |
| If Unravel is installed in a directory other than |
Any user can run the template script. However, that user must have HDFS dfsadmin privileges. In addition, if the cluster is kerberised, an appropriate kinit
statement should be added to the script.
Running template script and setting up a cron job
FSImage is processed by the Unravel ondemand process every day at 00:00 UTC. To guarantee data freshness, the latest FSimage should be uploaded to the Unravel Core node a short time before 00:00 UTC. Before uploading the latest FSimage, ensure to do the following:
Fetch FSImage using hdfs dfsadmin -fetchImage command and find out the time taken to do it.
Parse FSImage using hdfs oiv command and find out the time taken to do it.
Upload FSImage to a temporary location on the Unravel Core node and find out the time taken to do it.
You must set the template script as a cron job that runs every day at such a time that the above three processes in which the template script runs finish before 00:00 UTC.
Note
The uploading of FSImage from the Unravel edge node to the Unravel core node is done using rsync. Appropriate permissions related to rsync (such as adding the Unravel edge node as a well-known SSH host, adding the public RSH key of the uploading user which is the user that runs the cron job) should be added to authorized SSH keys in the Unravel core node.
Verifying the FSImage configuration
After the FSImage has been successfully fetched you can go to the UI to verify.
The Tables have table and partition sizes.
The four data File reports are populated.
You can generate a Small files report.
Tip
The relevant log file is
<unravel-installation-directory>/logs/unravel_ondemand.out
Run one of the following commands to display the progress of the
etl_fsimage
task.egrep 'ETL_FSIMAGE|FSIMAGE_REPORTS_UTILS' unravel_ondemand.out
grep etl_fsimage\(\) unravel_ondemand.out
Run one of the following commands to display the progress of the
run_small_files
which is started whenever Small Files Report is triggered from UI.egrep 'SMALL_FILES_REPORT|FSIMAGE_REPORTS_UTILS' unravel_ondemand.out
grep run_small_files\(\) unravel_ondemand.out