Configuring FSImage (4.7.0.1 onwards)
Important
The FSImage is applicable only for CDH, CDP, and HDP platforms.
In Hadoop, the FSImage is stored on the OS file system. This file contains the complete directory structure (namespace) of the HDFS, details about the data location, and information about which blocks are stored on which node.
FSImage is configured in Unravel for some of the Data page features and content, specifically to:
Automatically generate Files Report.
Calculate and populate the partition and table size information on the Data page. Refer to the Table details section.
Create the Small Files report upon user request.
Note
The FSImage status is enabled by default. To disable the feature, see Disable FSImage status.
The etl_fsimage
task processes the FSImage for each of the connected clusters. FSImage processing involves file report generation and table size extraction. The duration of the task depends on the size of the FSImage. The etl_fsimage
task imports the latest FSImage from Namenode. The etl_fsimage
run time is proportional to the image size, for example:
Caution
FSImage is a snapshot that becomes outdated with time. The older the image, the more it diverges from the real-time structure.
In Unravel, you can configure FSImage for a single cluster environment or a multi-cluster environment. This topic includes the following sections:
Important
FSImage is processed by the Unravel ondemand process every day at 00:00 UTC. The latest FSImage should be uploaded to the Unravel core node a short time before 00:00 UTC to guarantee data freshness.
Set cores and memory to process FSImage
You can set Unravel properties to define the resources to process the FSImage. Run the following steps to define the resources.
Note
In a multi-cluster environment, you must perform the following steps on the core node.
Stop Unravel
<Unravel installation directory>/unravel/manager stop
For FSImage processing, a standalone Spark process is used. This process runs with a default of 4 cores and 16 GB memory, suitable for a small-sized FSImage file of less than 10 GB.
To support larger FSImage files, set the configuration as follows:
<Unravel installation directory>/unravel/manager config ondemand fsimage resource
<cores>
<memory>
##For example: /opt/unravel/manager config ondemand fsimage resource 4 10gApply the changes.
<Unravel installation directory>/unravel/manager config apply
Start Unravel
<Unravel installation directory>/unravel/manager start
Configure FSImage in a single cluster environment
In a single cluster environment, FSImage is configured differently based on whether you can access the FSImage with hdfs dfsadmin permissions.
Configure FSImage when you have the hdfs dfsadmin permissions
Configure FSImage when you do not have the hdfs dfsadmin permissions
You can also create a Cron job to download the FSImage to the Unravel server.
In a single cluster environment, if you are an Unravel user with hdfs dfsadmin privileges, you can run the following steps to download and configure the FSImage:
Stop Unravel.
<Unravel installation directory>/unravel/manager stop
Download the FSImage.
<Unravel installation directory>/unravel/manager config ondemand fsimage enable --automatic-fetch
Apply the changes.
<Unravel installation directory>/unravel/manager config apply
Start Unravel.
<Unravel installation directory>/unravel/manager start
Run the following command to trigger the FSImage import.
curl -v http://localhost:5000/small-files-etl
In a single cluster environment, if you are an Unravel user without hdfs dfsadmin permissions, then any other user with the hdfs dfsadmin permissions can manually fetch, parse, and upload the FSImage. Later, you (Unravel user without hdfs dfsadmin privileges) can download and configure the FSImage.
As a user with hdfs dfsadmin permissions, run the following commands to fetch the raw FSImage from the HDFS Namenode and parse it into a tab-separated text file.
hdfs dfsadmin -fetchImage<path to fsimage file on local machine>
hdfs oiv<path to fsimage file on local machine>
Ensure to download the FSImage for Unravel usage to /opt/unravel/tmp/ondemand_fsimage
directory. This is the default directory.
Note
Unravel recommends not changing the default directory unless there are any space constraints. In such a case, you can change the default location as follows:
<Unravel installation directory>/unravel/manager config ondemand fsimage location <location to download FSImage> ##For example: /opt/unravel/manager config ondemand fsimage location /tmp/ondemand
If you provide a different directory to add the latest FSImage, ensure that the Unravel user has the read permissions to that directory.
As an Unravel user, do the following to configure FSImage:
Stop Unravel.
<Unravel installation directory>/unravel/manager stop
Enable and fetch FSImage.
<Unravel installation directory>/unravel/manager config ondemand fsimage enable For example: /opt/unravel/manager config ondemand fsimage enable
Apply the changes.
<Unravel installation directory>/unravel/manager config apply
Start Unravel.
<Unravel installation directory>/unravel/manager start
Run the following command to trigger the FSImage import.
curl -v http://localhost:5000/small-files-etl
Configure FSImage in a multi-cluster deployment
This section provides instructions to configure FSImage in a multi-cluster deployment for Unravel version 4.7.0.1 onwards. In the multi-cluster environment, the following are applicable:
Only a user with hdfs dfsadmin permissions can fetch, parse and upload the FSImage. Such a user can be an Unravel user or any other user.
You should use rsync to upload the FSImage from the Unravel edge node to the Unravel core node.
On the Unravel core node add the required permissions associated with rsync (Adding the Unravel edge node as a well-known SSH host, adding the public RSA key of the user who uploads and runs the cron job etc.) to the authorized SSH keys.
Execute the following steps on the edge node to authorize SSH keys on Unravel core node:
Add the public SSH key of the user to the Unravel core node user's
$HOME/.ssh/authorized_keys
file.Add the Unravel edge node hostname as a known_host to Unravel core node.
Run the following commands for SSH passwordless login for rsync command execution. You can skip the step to generate the keys if you already have the public keys.
ssh-keygen -t rsa (##Skip this step, if you already have the public keys.) ssh
<UNRAVEL_CORE_NODE_USER>
@<UNRAVEL_CORE_NODE_HOSTNAME>
mkdir -p .ssh cat ~/.ssh/id_rsa.pub | ssh<UNRAVEL_CORE_NODE_USER>
@<UNRAVEL_CORE_NODE_HOSTNAME>
'cat >> ~/.ssh/authorized_keys' ssh<UNRAVEL_CORE_NODE_USER>
@<UNRAVEL_CORE_NODE_HOSTNAME>
"chmod 700 ~/.ssh; chmod 640 ~/.ssh/authorized_keys"
To configure FSImage in a multi-cluster environment, do the following:
Run the following on the core node:
Stop Unravel.
<Unravel installation directory>/unravel/manager stop
Enable FSImage configuration.
<Unravel installation directory>/unravel/manager config ondemand fsimage enable
/opt/unravel/tmp/ondemand_fsimage
is the default location where the FSImage is added.Note
Unravel recommends not to change the default location unless there are any space constraints. In such a case, you can change the default location as follows:
<Unravel installation directory>/unravel/manager config ondemand fsimage location <location to download FSImage> ##For example: /opt/unravel/manager config ondemand fsimage location /tmp/ondemand
If you provide a different directory to add the latest FSImage, ensure that the Unravel user has the read permissions to that directory.
Apply the changes.
<Unravel installation directory>/unravel/manager config apply
Start Unravel.
<Unravel installation directory>/unravel/manager start
Run the following steps on each of the edge nodes:
In case you do not have the hdfs dfsadmin permissions, then any other user with the hdfs dfsadmin permissions can manually fetch, parse, and upload the FSImage. Later, you (Unravel user without hdfs dfsadmin privileges) can download and configure the FSImage.
As a user with hdfs dfsadmin permissions do the following to fetch and parse the FSImage:
hdfs dfsadmin -fetchImage
<path to fsimage file on local machine>
hdfs oiv<path to fsimage file on local machine>
Note
If it is a Kerberos enabled cluster, run the following command to set the Kerberos authentication for the user with the hdfs dfsadmin permissions:
<Unravel installation directory>/unravel/manager config ondemand fsimage kerberos
/path/to/keytab user@REALM
The FSImage, which is fetched externally should be placed at
/opt/unravel/tmp/ondemand_fsimage
on the core node. This is the default location.As an Unravel user, do the following to configure FSImage:
Stop Unravel.
<Unravel installation directory>/unravel/manager stop
Ensure to set the SSH passwordless login for rsync command execution
Run the following command to upload the FSImage to the location set on the core node:
<Unravel installation directory>/unravel/manager run ondemand fsimage fetch --upload-to-core
Note
In case you have changed the default location on the core node (See step 1b above ), then run the following command to connect to the changed location and upload the FSImage.
<Unravel installation directory>/unravel/manager config ondemand fsimage location --remote
<FSImage/location/configured/on/core/node>
For example: <Unravel installation directory>/unravel/manager config ondemand fsimage location --remote /opt/unravel/data/tmp/reports/fsimageApply the changes.
<Unravel installation directory>/unravel/manager config apply
Start Unravel.
<Unravel installation directory>/unravel/manager start
On the core node, trigger the FSImage import.
curl -v http://localhost:5000/small-files-etl
Configure for FSImage download by external users
You can configure for FSImage download by external users for both single cluster and multi-cluster as follows:
Stop Unravel.
<Unravel installation directory>/unravel/manager stop
Enable the ondemand FSImage download.
<Unravel installation directory>/unravel/manager config ondemand fsimage enable
Optionally, you can change the default location for downloading FSImage.
/opt/unravel/data/tmp/reports/fsimage
is the default location where the FSImage is downloaded.<Unravel installation directory>/unravel/manager config ondemand fsimage location <location to download FSImage> ##For example: /opt/unravel/manager config ondemand fsimage location /tmp/reports/fsimage
Apply the changes.
<Unravel installation directory>/unravel/manager config apply
Start Unravel.
<Unravel installation directory>/unravel/manager start
Run the following command to trigger the FSImage import.
curl -v http://localhost:5000/small-files-etl
Create Cron job to upload FSImage
You can create a cron to upload the FSImage to the Unravel server. The time to upload depends on the size of FSImage and the network bandwidth. You must assess this time to determine how often to run the cron job and configure it accordingly.
FSImage is processed by the Unravel ondemand process every day at 00:00 UTC. The latest FSimage should be uploaded a short time before 00:00 UTC to guarantee data freshness. Before uploading the latest FSimage, observe the total time taken to run the script and accordingly set the cron job so that Unravel has access to the fresh FSImage before 00.00 UTC.
Verify the FSImage configuration
After you have successfully fetched the FSImage, go to the UI and verify the FSImage configuration.
The four data File reports are populated.
You can generate a Small files report.
Important
Table worker daemon checks for table sizes every 24 hours by default. So even if FSImage is run, it would take that much time to reflect the size. To short-circuit, you can restart the table_worker daemon.
Tip
The relevant log file is
<unravel-installation-directory>/logs/ondemand_tasks.out
Run one of the following commands to display the progress of the
etl_fsimage
task.egrep 'ETL_FSIMAGE|FSIMAGE_REPORTS_UTILS' ondemand_tasks.out
grep etl_fsimage\(\) unravel_ondemand.out
Run one of the following commands to display the progress of the
run_small_files
, which is started whenever the Small Files Report is triggered from UI.egrep 'SMALL_FILES_REPORT|FSIMAGE_REPORTS_UTILS' ondemand_tasks.out
grep run_small_files\(\) ondemand_tasks.out
Disable FSImage status
By default, the FSImage status is enabled. If you want to disable FSImage, perform the following steps.
Note
In a multi-cluster environment, you must perform the following steps on the core node.
Stop Unravel
<Unravel installation directory>/unravel/manager stop
Change the setting.
<Unravel installation directory>/unravel/manager config ondemand fsimage disable
Apply the changes.
<Unravel installation directory>/unravel/manager config apply
Start Unravel
<Unravel installation directory>/unravel/manager start