Home

Configuring Small Files Report and Files Report

This topic explains how to enable or disable Small Files Reports and Files Reports in Unravel UI.

Small File Reports and File Reports features are enabled by default.

To toggle the status, on the Unravel Server, edit /usr/local/unravel/etc/unravel.properties and the following.

Property/Description

Set By User

Unit

Default

unravel.python.reporting.files.disable

Enables or disables Unravel ability to generate Small Files and File Reports.

true: disables the functionality in the Backend and UI.

false: enables the functionality in the Backend and UI to generate the Small Files/File Reports.

boolean

false

Required Settings

On Unravel Server, edit /usr/local/unravel/etc/unravel.properties as follows:

Property/Description

Set By User

Unit

Default

unravel.hive.server2.host

FQDN or IP-Address of the HiveServer2 instance.

Required

URL

-

unravel.hive.server2.port

Port for the HiveServer2 instance.

number

10000

Security-Based Settings

On Unravel Server, edit /usr/local/unravel/etc/unravel.properties as follows:

Property/Description

Set By User

Unit

Default

unravel.hive.server2.authentication

KERBEROS, LDAP, or CUSTOM.

When set to KERBEROS you must also set kerberos.service.name=hive.

-

-

unravel.hive.server2.kerberos.service.name

Set only when unravel.hive.server2.authentication=KERBEROS.

This must be set to hive to run the various reports in a kerberos enviornment.

string

-

If your CDH cluster is secured with Sentry, Unravel's Small Files and Files reports (on the Data Insights tab) won't contain any data until you change the permissions on your cluster as follows:

  1. Grant dfsadmin privilege or this alternative to the Unravel user.

    To grant this privilege, set/update the HDFS configuration property dfs.cluster.administrators and restart all services affected by this change.

    cloudera-manager-clusters-config-hdfs-admin.png

    If you can't grant dfsadmin privilege to the Unravel user, follow the steps in the Troubleshooting section of Triggering an import of FSImage. These steps allow Unravel to use the FSImage you import manually.

  2. Remove the following line from /usr/local/unravel/ondemand/unravel-python-1.0.0/scripts/hive_queries_reporting/hive_properties.hive:

    ADD JAR {UDF_JAR_LOC}/unravel-udf-0.1.jar

    This line is no longer required because UDFs jars are stored locally on the HiveServer2 node in Sentry-secured CDH clusters.

  3. Allow the Unravel user to submit Hive queries to a YARN queue.

    If you can't allow this on your default YARN queue, you can grant this permission on a different YARN queue:

    1. Create a different YARN queue for the Unravel user.

    2. Give the Unravel user permission to submit Hive queries to the new YARN queue.

    3. In unravel.properties on Unravel Server, set unravel.python.reporting.files.hive_mr_queue to the new YARN queue. For details on this property, see Unravel Properties.

  4. Create a new Sentry role, unravel_role, using the beeline CLI as the Hive admin user (hive by default).

    create role unravel_role
  5. Map the unravel group to unravel_role:

    grant role unravel_role to group unravel
  6. Set HDFS access privileges for the Unravel user:

    The Unravel user needs to copy FSImage to /tmp/fsimage, so allow this access as follows:

    grant all on uri 'hdfs:///tmp/fsimage' to role unravel_role;
  7. Grant the Unravel user the following privileges on the Hive tables under the default database:

    1. Create/drop/truncate/alter Hive tables

    2. Run/select/insert queries on Hive tables

      Alternatively, you can:

      • Use a different database for the Unravel user, such as unravel_db.

      • Give the Unravel user the above permissions on that database:

        grant all on database unravel_db to role unravel_role;
    3. In unravel.properties on Unravel Server, set unravel.python.reporting.files.hive_database to the alternate database. For details on this property, see Unravel Properties.

      For example,

      unravel.python.reporting.files.hive_database=unravel_db
  8. Add a JAR and create temporary UDFs:

    1. As the Unravel user, copy the Unravel JAR, /usr/local/unravel/ondemand/unravel-python-1.0.0/jars/fsimage_reports/unravel-udf-0.1.jar, to the HiveServer2 aux JAR path.

      Note

      The HiveServer2 aux JAR path is specified by the Hive Auxillary Jars directory (in a Hive Service wide configuration) OR by hive.reloadable.aux.jars.path (in a Hive Server2 hive_site.xml configuration). Get your settings from Cloudera Manager. The goal is to get the JAR into the path HiveServer2 recognizes for aux JARs. For more information, see https://www.cloudera.com/documentation/enterprise/5-14-x/topics/cm_mc_hive_udf.html#concept_t1x_srm_2r.

    2. Hive user and group should own this JAR.

      For example, if the HiveServer2 aux JAR path is /tmp/hive_jars, this directory and the copied jar must be owned by hive admin user (hive by default):

      chown -R hive:hive /tmp/hive_jars
    3. Grant the Unravel user access to this JAR:

      grant all on uri 'file:///tmp/hive_jars/' to role unravel_role;
    4. In Cloudera Manager, restart HiveServer2.

  9. Use the show grant role unravel_role command to verify that permissions now look like this:

    +--------------------------------------------+--------+------------+---------+-----------------+-----------------+------------+---------------+-------------------+----------+--+
    
    |                  database                  | table  | partition  | column  | principal_name  | principal_type  | privilege  | grant_option  |    grant_time     | grantor  |
    
    +--------------------------------------------+--------+------------+---------+-----------------+-----------------+------------+---------------+-------------------+----------+--+
    
    | file:///tmp/hive_jars/unravel-udf-0.1.jar  |        |            |         | unravel_role    | ROLE            | *          | false         | 1550829915318000  | --       |
    
    | unravel_db                                 |        |            |         | unravel_role    | ROLE            | *          | false         | 1550829820331000  | --       |
    
    | hdfs:///tmp/fsimage                        |        |            |         | unravel_role    | ROLE            | *          | false         | 1550830532328000  | --       |
    
    +--------------------------------------------+--------+------------+---------+-----------------+-----------------+------------+---------------+-------------------+----------+--+
    
    
Settings for Ranger-Secured HDP Clusters (With or Without Kerberos)

If your HDP cluster is secured with Ranger, Unravel's Small Files and Files reports (on the Data Insights tab) won't contain any data until you change the permissions on your cluster as follows:

  1. Grant dfsadmin access or this alternative to the Unravel user.

    For example:

    ambari-small-files-perm.png
  2. Allow the Unravel user to connect to HiveServer2.

  3. Allow the Unravel user to CREATE, TRUNCATE, ALTER, DROP, INSERT, and SELECT Hive tables.

    For example:

    Ranger-hive-001a.png
  4. Allow the Unravel user to change or switch the Hive database.

    For example:

    Ranger-hive-001.png
  5. Allow the Unravel user to submit Hive queries to a particular YARN queue.

    For example:

    Ranger-yarn-001.png
  6. Allow the Unravel user to use concurrent hive queries:

    Set hive.txn.manager=DbTxnManager and hive.support.concurrency=true in hive configuration.

  7. Allow the Unravel user to do the following actions dynamically:

    1. Set the following parameters in custom hive-site properties.

      hive.auto.convert.join
      hive.support.concurrency
      hive.support.sql11.reserved.keywords
      hive.txn.manager
      hive.variable.substitute
      mapred.job.queue.name
      mapreduce.map.java.opts
      mapreduce.map.memory.mb
      
      property name = hive.security.authorization.sqlstd.confwhitelist.append
      property value = 
      hive\\.auto\.convert\.join|hive\.support\.concurrency|hive\.support\.sql11\.reserved\.keywords|hive\.txn\.manager|hive\.variable\.substitute|mapred\.job\.queue\.name|mapreduce\.map\.java\.opts|mapreduce\.map\.memory\.mb
    2. Allow unravel user TempUDFAdmin privileges at global level.

Settings for SSL-Enabled Systems

Currently is not supported.