Home

Configuring FSImage (4.7.0.1 onwards)

Important

The FSImage is applicable only for CDH, CDP, and HDP platforms.

In Hadoop, the FSImage is a file stored on the OS file system. This file contains the complete directory structure (namespace) of the HDFS, details about the data location, and the information about which blocks are stored on which node.

FSImage must be configured in Unravel for some of the Data page features and content, specifically to:

The FSImage status is enabled by default. To disable the feature, see Disable FSImage status.

The etl_fsimage task processes the FSImage for each of the connected clusters. FSImage processing involves file report generation and table size extraction. The duration of the task depends on the size of the FSImage. The etl_fsimage task imports the latest FSImage from Namenode. The etl_fsimage run time is proportional to the image size, for example:

Caution

FSImage is a snapshot that becomes outdated with time. The older the image, the more it diverges from the real-time structure.

In Unravel, FSImage can be configured for a single cluster environment as well as a multi-cluster environment. The following sections are included here:

Setting cores and memory to process FSImage

You can set Unravel properties to define the resources that are used to process the FSImage. Run the following steps to define the resources. In a multi-cluster environment, you must perform the following steps on the core node.

  1. Stop Unravel

    <Unravel installation directory>/unravel/manager stop
    
  2. For FSImage processing, a standalone Spark process is used. This process runs with the default 4 cores and16 GB memory, suitable for a small-sized FSImage file less than 10 GB.

    To support larger FSImage files, set the configuration as follows:

    <Unravel installation directory>/unravel/manager config ondemand fsimage resource <cores> <memory>
    
  3. Apply the changes.

    <Unravel installation directory>/unravel/manager config apply
    
  4. Start Unravel

    <Unravel installation directory>/unravel/manager start
Configuring FSImage in a single cluster environment

In a single cluster environment, FSImage can be configured differently based on the following conditions:

You can create a Cron job  to download the FSImage to the Unravel server.

Unravel User with dfsadmin privileges

In a single cluster environment, if you are an Unravel user with dfsadmin privileges, you can run the following steps to download and configure the FSImage:

  1. Stop Unravel.

    <Unravel installation directory>/unravel/manager stop
    
  2. Download the FSImage.

    <Unravel installation directory>/unravel/manager config ondemand fsimage enable --automatic-fetch
  3. Apply the changes.

    <Unravel installation directory>/unravel/manager config apply
    
  4. Start Unravel.

    <Unravel installation directory>/unravel/manager start
  5. Run the following command to trigger the FSImage import.

    curl -v http://localhost:5000/small-files-etl
Unravel user without dfsadmin permissions

In a single cluster environment, if you are an Unravel user without dfsadmin privileges, you can run the following steps to download and configure the FSImage:

  1. Stop Unravel.

    <Unravel installation directory>/unravel/manager stop
    
  2. Enable and fetch FSImage.

    <Unravel installation directory>/unravel/manager config ondemand fsimage enable
    
    For example: 
    /opt/unravel/manager config ondemand fsimage enable
    

    /opt/unravel/tmp/ondemand_fsimage is the default location to add the latest FSImage.

    Note

    Unravel recommends not to change the default location unless there are any space constraints. In such a case, you can change the default location as follows:

    <Unravel installation directory>/unravel/manager config ondemand fsimage location <location to download FSImage> 
    
    ##For example: 
    /opt/unravel/manager config ondemand fsimage location /tmp/ondemand
    

    If you provide a different directory to add the latest FSImage, ensure that the Unravel user has the read permissions to that directory.

  3. Apply the changes.

    <Unravel installation directory>/unravel/manager config apply
    
  4. Start Unravel.

    <Unravel installation directory>/unravel/manager start
  5. Run the following command to trigger the FSImage import.

    curl -v http://localhost:5000/small-files-etl
Configuring FSImage in a multi-cluster deployment

This section provides instructions to configure FSImage in a multi-cluster deployment for Unravel version 4.7.0.1

In the multi-cluster environment, the following are applicable:

  • Only a user with dfsadmin permissions can fetch, parse and upload the FSImage. This can be an Unravel user or any other user.

  • The uploading of FSImage from the Unravel edge node to the Unravel core node is done using rsync. Appropriate permissions related to rsync (such as adding the Unravel edge node as a well-known SSH host, adding the public RSA key of the uploading user, who is the user that runs the cron job.) should be added to authorized SSH keys in the Unravel core node.

    Follow these steps to authorize SSH keys on Unravel core node. You must execute these steps from the Unravel edge node:

    1. Add the public SSH key of the user to the Unravel core node user's $HOME/.ssh/authorized_keys file.

    2. Add the Unravel edge node hostname as a known_host to Unravel core node.

    3. Run the following commands for SSH passwordless login for rsync command execution. You can skip the step to generate the keys, if you already have the public keys.

       ssh-keygen -t rsa (##Skip this step, if you already have the public keys.)
       ssh <UNRAVEL_CORE_NODE_USER>@<UNRAVEL_CORE_NODE_HOSTNAME> mkdir -p .ssh
       cat ~/.ssh/id_rsa.pub | ssh <UNRAVEL_CORE_NODE_USER>@<UNRAVEL_CORE_NODE_HOSTNAME> 'cat >> ~/.ssh/authorized_keys'
       ssh <UNRAVEL_CORE_NODE_USER>@<UNRAVEL_CORE_NODE_HOSTNAME> "chmod 700 ~/.ssh; chmod 640 ~/.ssh/authorized_keys"

To configure FSImage in a multi-cluster environment, do the following:

  1. Run the following on the core node:

    1. Stop Unravel.

      <Unravel installation directory>/unravel/manager stop
      
    2. Download the FSImage.

      <Unravel installation directory>/unravel/manager config ondemand fsimage enable
      

      /opt/unravel/tmp/ondemand_fsimage is the default location to add the latest FSImage.

      Note

      Unravel recommends not to change the default location unless there are any space constraints. In such a case, you can change the default location as follows:

      <Unravel installation directory>/unravel/manager config ondemand fsimage location <location to download FSImage> 
      
      ##For example: 
      /opt/unravel/manager config ondemand fsimage location /tmp/ondemand
      

      If you provide a different directory to add the latest FSImage, ensure that the Unravel user has the read permissions to that directory.

    3. Apply the changes.

      <Unravel installation directory>/unravel/manager config apply
      
    4. Start Unravel.

      <Unravel installation directory>/unravel/manager start
  2. Run the following steps on each of the edge nodes:

    1. Stop Unravel.

      <Unravel installation directory>/unravel/manager stop
      
    2. Ensure to set the SSH passwordless login for rsync command execution

      Note

      In case you have changed the default location on the core node (See step 1b above ), connect to the location set on the core node to upload the FSImage.

      <Unravel installation directory>/unravel/manager config ondemand fsimage location --remote /opt/unravel/data/tmp/reports/fsimage
      
    3. Download FSImage and upload to the location set in the core node.

      <Unravel installation directory>/unravel/manager run ondemand fsimage fetch --upload-to-core

      Important

      FSImage is processed by the Unravel ondemand process every day at 00:00 UTC. The latest FSimage should be uploaded to the Unravel core node a short time before 00:00 UTC to guarantee data freshness.

      Note

      If the user is not an Unravel user but has dfsadmin privileges, run the following commands to download the FSImage.

      <Unravel installation directory>/unravel/manager config ondemand fsimage kerberos /path/to/keytab user@REALM
      <Unravel installation directory>/unravel/manager run ondemand fsimage fetch --upload-to-core
    4. Apply the changes.

      <Unravel installation directory>/unravel/manager config apply
      
    5. Start Unravel.

      <Unravel installation directory>/unravel/manager start
  3. On the core node, trigger the FSImage import.

    curl -v http://localhost:5000/small-files-etl
Configuring for FSImage download by external users

You can configure for FSImage download by external users for both single cluster and multi-cluster as follows:

  1. Stop Unravel.

    <Unravel installation directory>/unravel/manager stop
    
  2. Enable the ondemand FSImage download.

    <Unravel installation directory>/unravel/manager config ondemand fsimage enable
    
  3. Optionally, you can change the default location for downloading FSImage. /opt/unravel/data/tmp/reports/fsimage is the default location where the FSImage is downloaded.

    <Unravel installation directory>/unravel/manager config ondemand fsimage location <location to download FSImage> 
    ##For example:
    /opt/unravel/manager config ondemand fsimage location /tmp/reports/fsimage
  4. Apply the changes.

    <Unravel installation directory>/unravel/manager config apply
    
  5. Start Unravel.

    <Unravel installation directory>/unravel/manager start
  6. Run the following command to trigger the FSImage import.

    curl -v http://localhost:5000/small-files-etl
Creating Cron job to upload FSImage

You can create a cron to upload the FSImage to the Unravel server. The time to upload depends on the size of FSImage and the network bandwidth. You must assess this time to determine how often to run the cron job and configure it accordingly.

FSImage is processed by the Unravel ondemand process every day at 00:00 UTC. To guarantee data freshness, the latest FSimage should be uploaded a short time before 00:00 UTC. Before uploading the latest FSimage, observe the total time taken to run the script and accordingly set the cron job so that Unravel has access to the fresh FSImage before 00.00 UTC.

Verifying the FSImage configuration

After the FSImage has been successfully fetched you can go to the UI to verify.

Important

Table worker daemon checks for table sizes every 24 hours by default. So even if FSImage is run, it would take that much time to reflect the size. To short-circuit, you can restart the table_worker daemon.

Tip

  • The relevant log file is <unravel-installation-directory>/logs/ondemand_tasks.out

  • Run one of the following commands to display the progress of the etl_fsimage task.

    egrep 'ETL_FSIMAGE|FSIMAGE_REPORTS_UTILS' ondemand_tasks.out
    grep etl_fsimage\(\) unravel_ondemand.out
  • Run one of the following commands to display the progress of the run_small_files, which is started whenever the Small Files Report is triggered from UI.

    egrep 'SMALL_FILES_REPORT|FSIMAGE_REPORTS_UTILS' ondemand_tasks.out
    grep run_small_files\(\) ondemand_tasks.out
Disabling FSImage status

The FSImage status is enabled by default. If you want to disable FSImage, perform the following steps.

Note

In a multi-cluster environment, you must perform the following steps on the core node.

  1. Stop Unravel

    <Unravel installation directory>/unravel/manager stop
    
  2. Change the setting.

    <Unravel installation directory>/unravel/manager config ondemand fsimage disable
    
  3. Apply the changes.

    <Unravel installation directory>/unravel/manager config apply
    
  4. Start Unravel

    <Unravel installation directory>/unravel/manager start