Configuring FSImage (4.7.0.1 onwards)
Important
The FSImage is applicable only for CDH, CDP, and HDP platforms.
In Hadoop, the FSImage is a file stored on the OS file system. This file contains the complete directory structure (namespace) of the HDFS, details about the data location, and the information about which blocks are stored on which node.
FSImage must be configured in Unravel for some of the Data page features and content, specifically to:
Automatically generate Files Report.
Calculate and populate the partition and table size information on the Data page. Refer to the Table details section.
Create the Small Files report upon user request.
The FSImage status is enabled by default. To disable the feature, see Disable FSImage status.
The etl_fsimage
task processes the FSImage for each of the connected clusters. FSImage processing involves file report generation and table size extraction. The duration of the task depends on the size of the FSImage. The etl_fsimage
task imports the latest FSImage from Namenode. The etl_fsimage
run time is proportional to the image size, for example:
Caution
FSImage is a snapshot that becomes outdated with time. The older the image, the more it diverges from the real-time structure.
In Unravel, FSImage can be configured for a single cluster environment as well as a multi-cluster environment. The following sections are included here:
Setting cores and memory to process FSImage
You can set Unravel properties to define the resources that are used to process the FSImage. Run the following steps to define the resources. In a multi-cluster environment, you must perform the following steps on the core node.
Stop Unravel
<Unravel installation directory>/unravel/manager stop
For FSImage processing, a standalone Spark process is used. This process runs with the default 4 cores and16 GB memory, suitable for a small-sized FSImage file less than 10 GB.
To support larger FSImage files, set the configuration as follows:
<Unravel installation directory>/unravel/manager config ondemand fsimage resource <cores> <memory>
Apply the changes.
<Unravel installation directory>/unravel/manager config apply
Start Unravel
<Unravel installation directory>/unravel/manager start
Configuring FSImage in a single cluster environment
In a single cluster environment, FSImage can be configured differently based on the following conditions:
You can create a Cron job to download the FSImage to the Unravel server.
Unravel User with dfsadmin privileges
In a single cluster environment, if you are an Unravel user with dfsadmin privileges, you can run the following steps to download and configure the FSImage:
Stop Unravel.
<Unravel installation directory>/unravel/manager stop
Download the FSImage.
<Unravel installation directory>/unravel/manager config ondemand fsimage enable --automatic-fetch
Apply the changes.
<Unravel installation directory>/unravel/manager config apply
Start Unravel.
<Unravel installation directory>/unravel/manager start
Run the following command to trigger the FSImage import.
curl -v http://localhost:5000/small-files-etl
Unravel user without dfsadmin permissions
In a single cluster environment, if you are an Unravel user without dfsadmin privileges, you can run the following steps to download and configure the FSImage:
Stop Unravel.
<Unravel installation directory>/unravel/manager stop
Enable and fetch FSImage.
<Unravel installation directory>/unravel/manager config ondemand fsimage enable For example: /opt/unravel/manager config ondemand fsimage enable
/opt/unravel/tmp/ondemand_fsimage
is the default location to add the latest FSImage.Note
Unravel recommends not to change the default location unless there are any space constraints. In such a case, you can change the default location as follows:
<Unravel installation directory>/unravel/manager config ondemand fsimage location <location to download FSImage> ##For example: /opt/unravel/manager config ondemand fsimage location /tmp/ondemand
If you provide a different directory to add the latest FSImage, ensure that the Unravel user has the read permissions to that directory.
Apply the changes.
<Unravel installation directory>/unravel/manager config apply
Start Unravel.
<Unravel installation directory>/unravel/manager start
Run the following command to trigger the FSImage import.
curl -v http://localhost:5000/small-files-etl
Configuring FSImage in a multi-cluster deployment
This section provides instructions to configure FSImage in a multi-cluster deployment for Unravel version 4.7.0.1
In the multi-cluster environment, the following are applicable:
Only a user with dfsadmin permissions can fetch, parse and upload the FSImage. This can be an Unravel user or any other user.
The uploading of FSImage from the Unravel edge node to the Unravel core node is done using rsync. Appropriate permissions related to rsync (such as adding the Unravel edge node as a well-known SSH host, adding the public RSA key of the uploading user, who is the user that runs the cron job.) should be added to authorized SSH keys in the Unravel core node.
Follow these steps to authorize SSH keys on Unravel core node. You must execute these steps from the Unravel edge node:
Add the public SSH key of the user to the Unravel core node user's
$HOME/.ssh/authorized_keys
file.Add the Unravel edge node hostname as a known_host to Unravel core node.
Run the following commands for SSH passwordless login for rsync command execution. You can skip the step to generate the keys, if you already have the public keys.
ssh-keygen -t rsa (##Skip this step, if you already have the public keys.) ssh
<UNRAVEL_CORE_NODE_USER>
@<UNRAVEL_CORE_NODE_HOSTNAME>
mkdir -p .ssh cat ~/.ssh/id_rsa.pub | ssh<UNRAVEL_CORE_NODE_USER>
@<UNRAVEL_CORE_NODE_HOSTNAME>
'cat >> ~/.ssh/authorized_keys' ssh<UNRAVEL_CORE_NODE_USER>
@<UNRAVEL_CORE_NODE_HOSTNAME>
"chmod 700 ~/.ssh; chmod 640 ~/.ssh/authorized_keys"
To configure FSImage in a multi-cluster environment, do the following:
Run the following on the core node:
Stop Unravel.
<Unravel installation directory>/unravel/manager stop
Download the FSImage.
<Unravel installation directory>/unravel/manager config ondemand fsimage enable
/opt/unravel/tmp/ondemand_fsimage
is the default location to add the latest FSImage.Note
Unravel recommends not to change the default location unless there are any space constraints. In such a case, you can change the default location as follows:
<Unravel installation directory>/unravel/manager config ondemand fsimage location <location to download FSImage> ##For example: /opt/unravel/manager config ondemand fsimage location /tmp/ondemand
If you provide a different directory to add the latest FSImage, ensure that the Unravel user has the read permissions to that directory.
Apply the changes.
<Unravel installation directory>/unravel/manager config apply
Start Unravel.
<Unravel installation directory>/unravel/manager start
Run the following steps on each of the edge nodes:
Stop Unravel.
<Unravel installation directory>/unravel/manager stop
Ensure to set the SSH passwordless login for rsync command execution
Note
In case you have changed the default location on the core node (See step 1b above ), connect to the location set on the core node to upload the FSImage.
<Unravel installation directory>/unravel/manager config ondemand fsimage location --remote /opt/unravel/data/tmp/reports/fsimage
Download FSImage and upload to the location set in the core node.
<Unravel installation directory>/unravel/manager run ondemand fsimage fetch --upload-to-core
Important
FSImage is processed by the Unravel ondemand process every day at 00:00 UTC. The latest FSimage should be uploaded to the Unravel core node a short time before 00:00 UTC to guarantee data freshness.
Note
If the user is not an Unravel user but has dfsadmin privileges, run the following commands to download the FSImage.
<Unravel installation directory>/unravel/manager config ondemand fsimage kerberos
/path/to/keytab user@REALM
<Unravel installation directory>/unravel/manager run ondemand fsimage fetch --upload-to-coreApply the changes.
<Unravel installation directory>/unravel/manager config apply
Start Unravel.
<Unravel installation directory>/unravel/manager start
On the core node, trigger the FSImage import.
curl -v http://localhost:5000/small-files-etl
Configuring for FSImage download by external users
You can configure for FSImage download by external users for both single cluster and multi-cluster as follows:
Stop Unravel.
<Unravel installation directory>/unravel/manager stop
Enable the ondemand FSImage download.
<Unravel installation directory>/unravel/manager config ondemand fsimage enable
Optionally, you can change the default location for downloading FSImage.
/opt/unravel/data/tmp/reports/fsimage
is the default location where the FSImage is downloaded.<Unravel installation directory>/unravel/manager config ondemand fsimage location <location to download FSImage> ##For example: /opt/unravel/manager config ondemand fsimage location /tmp/reports/fsimage
Apply the changes.
<Unravel installation directory>/unravel/manager config apply
Start Unravel.
<Unravel installation directory>/unravel/manager start
Run the following command to trigger the FSImage import.
curl -v http://localhost:5000/small-files-etl
Creating Cron job to upload FSImage
You can create a cron to upload the FSImage to the Unravel server. The time to upload depends on the size of FSImage and the network bandwidth. You must assess this time to determine how often to run the cron job and configure it accordingly.
FSImage is processed by the Unravel ondemand process every day at 00:00 UTC. To guarantee data freshness, the latest FSimage should be uploaded a short time before 00:00 UTC. Before uploading the latest FSimage, observe the total time taken to run the script and accordingly set the cron job so that Unravel has access to the fresh FSImage before 00.00 UTC.
Verifying the FSImage configuration
After the FSImage has been successfully fetched you can go to the UI to verify.
The four data File reports are populated.
You can generate a Small files report.
Important
Table worker daemon checks for table sizes every 24 hours by default. So even if FSImage is run, it would take that much time to reflect the size. To short-circuit, you can restart the table_worker daemon.
Tip
The relevant log file is
<unravel-installation-directory>/logs/ondemand_tasks.out
Run one of the following commands to display the progress of the
etl_fsimage
task.egrep 'ETL_FSIMAGE|FSIMAGE_REPORTS_UTILS' ondemand_tasks.out
grep etl_fsimage\(\) unravel_ondemand.out
Run one of the following commands to display the progress of the
run_small_files
, which is started whenever the Small Files Report is triggered from UI.egrep 'SMALL_FILES_REPORT|FSIMAGE_REPORTS_UTILS' ondemand_tasks.out
grep run_small_files\(\) ondemand_tasks.out
Disabling FSImage status
The FSImage status is enabled by default. If you want to disable FSImage, perform the following steps.
Note
In a multi-cluster environment, you must perform the following steps on the core node.
Stop Unravel
<Unravel installation directory>/unravel/manager stop
Change the setting.
<Unravel installation directory>/unravel/manager config ondemand fsimage disable
Apply the changes.
<Unravel installation directory>/unravel/manager config apply
Start Unravel
<Unravel installation directory>/unravel/manager start