Configuring Small Files Report and Files Report
This topic explains how to enable or disable Small Files Reports and Files Reports in Unravel UI.
Small File Reports and File Reports features are enabled by default.
To toggle the status, on the Unravel Server, edit /usr/local/unravel/etc/unravel.properties
and the following.
Property/Description | Set By User | Unit | Default |
---|---|---|---|
unravel.python.reporting.files.disable Enables or disables Unravel ability to generate Small Files and File Reports. true: disables the functionality in the Backend and UI. false: enables the functionality in the Backend and UI to generate the Small Files/File Reports. | boolean | false |
Required Settings
On Unravel Server, edit /usr/local/unravel/etc/unravel.properties
as follows:
Property/Description | Set By User | Unit | Default |
---|---|---|---|
unravel.hive.server2.host FQDN or IP-Address of the HiveServer2 instance. | Required | URL | - |
unravel.hive.server2.port Port for the HiveServer2 instance. | number | 10000 |
On Unravel Server, edit /usr/local/unravel/etc/unravel.properties
as follows:
Property/Description | Set By User | Unit | Default |
---|---|---|---|
unravel.hive.server2.authentication KERBEROS, LDAP, or CUSTOM. When set to KERBEROS you must also set kerberos.service.name=hive. | - | - | |
unravel.hive.server2.kerberos.service.name Set only when unravel.hive.server2.authentication=KERBEROS. This must be set to | string | - |
If your CDH cluster is secured with Sentry, Unravel's Small Files and Files reports (on the Data Insights tab) won't contain any data until you change the permissions on your cluster as follows:
Grant
dfsadmin
privilege or this alternative to the Unravel user.To grant this privilege, set/update the HDFS configuration property
dfs.cluster.administrators
and restart all services affected by this change.If you can't grant dfsadmin privilege to the Unravel user, follow the steps in the Troubleshooting section of Triggering an import of FSImage. These steps allow Unravel to use the FSImage you import manually.
Remove the following line from
/usr/local/unravel/ondemand/unravel-python-1.0.0/scripts/hive_queries_reporting/hive_properties.hive
:ADD JAR {UDF_JAR_LOC}/unravel-udf-0.1.jar
This line is no longer required because UDFs jars are stored locally on the HiveServer2 node in Sentry-secured CDH clusters.
Allow the Unravel user to submit Hive queries to a YARN queue.
If you can't allow this on your default YARN queue, you can grant this permission on a different YARN queue:
Create a different YARN queue for the Unravel user.
Give the Unravel user permission to submit Hive queries to the new YARN queue.
In
unravel.properties
on Unravel Server, set unravel.python.reporting.files.hive_mr_queue to the new YARN queue. For details on this property, see Unravel Properties.
Create a new Sentry role,
unravel_role
, using thebeeline
CLI as the Hive admin user (hive by default).create role unravel_role
Map the
unravel
group tounravel_role
:grant role unravel_role to group unravel
Set HDFS access privileges for the Unravel user:
The Unravel user needs to copy FSImage to
/tmp/fsimage
, so allow this access as follows:grant all on uri 'hdfs:///tmp/fsimage' to role unravel_role;
Grant the Unravel user the following privileges on the Hive tables under the default database:
Create/drop/truncate/alter Hive tables
Run/select/insert queries on Hive tables
Alternatively, you can:
Use a different database for the Unravel user, such as
unravel_db
.Give the Unravel user the above permissions on that database:
grant all on database unravel_db to role unravel_role;
In
unravel.properties
on Unravel Server, set unravel.python.reporting.files.hive_database to the alternate database. For details on this property, see Unravel Properties.For example,
unravel.python.reporting.files.hive_database=unravel_db
Add a JAR and create temporary UDFs:
As the Unravel user, copy the Unravel JAR,
/usr/local/unravel/ondemand/unravel-python-1.0.0/jars/fsimage_reports/unravel-udf-0.1.jar
, to the HiveServer2 aux JAR path.Note
The HiveServer2 aux JAR path is specified by the Hive Auxillary Jars directory (in a Hive Service wide configuration) OR by
hive.reloadable.aux.jars.path
(in a Hive Server2 hive_site.xml configuration). Get your settings from Cloudera Manager. The goal is to get the JAR into the path HiveServer2 recognizes for aux JARs. For more information, see https://www.cloudera.com/documentation/enterprise/5-14-x/topics/cm_mc_hive_udf.html#concept_t1x_srm_2r.Hive user and group should own this JAR.
For example, if the HiveServer2 aux JAR path is
/tmp/hive_jars
, this directory and the copied jar must be owned by hive admin user (hive by default):chown -R hive:hive /tmp/hive_jars
Grant the Unravel user access to this JAR:
grant all on uri 'file:///tmp/hive_jars/' to role unravel_role;
In Cloudera Manager, restart HiveServer2.
Use the
show grant role unravel_role
command to verify that permissions now look like this:+--------------------------------------------+--------+------------+---------+-----------------+-----------------+------------+---------------+-------------------+----------+--+ | database | table | partition | column | principal_name | principal_type | privilege | grant_option | grant_time | grantor | +--------------------------------------------+--------+------------+---------+-----------------+-----------------+------------+---------------+-------------------+----------+--+ | file:///tmp/hive_jars/unravel-udf-0.1.jar | | | | unravel_role | ROLE | * | false | 1550829915318000 | -- | | unravel_db | | | | unravel_role | ROLE | * | false | 1550829820331000 | -- | | hdfs:///tmp/fsimage | | | | unravel_role | ROLE | * | false | 1550830532328000 | -- | +--------------------------------------------+--------+------------+---------+-----------------+-----------------+------------+---------------+-------------------+----------+--+
Settings for Ranger-Secured HDP Clusters (With or Without Kerberos)
If your HDP cluster is secured with Ranger, Unravel's Small Files and Files reports (on the Data Insights tab) won't contain any data until you change the permissions on your cluster as follows:
Grant
dfsadmin
access or this alternative to the Unravel user.For example:
Allow the Unravel user to connect to HiveServer2.
Allow the Unravel user to CREATE, TRUNCATE, ALTER, DROP, INSERT, and SELECT Hive tables.
For example:
Allow the Unravel user to change or switch the Hive database.
For example:
Allow the Unravel user to submit Hive queries to a particular YARN queue.
For example:
Allow the Unravel user to use concurrent hive queries:
Set
hive.txn.manager=DbTxnManager
andhive.support.concurrency=true
in hive configuration.Allow the Unravel user to do the following actions dynamically:
Set the following parameters in custom hive-site properties.
hive.auto.convert.join hive.support.concurrency hive.support.sql11.reserved.keywords hive.txn.manager hive.variable.substitute mapred.job.queue.name mapreduce.map.java.opts mapreduce.map.memory.mb property name = hive.security.authorization.sqlstd.confwhitelist.append property value = hive\\.auto\.convert\.join|hive\.support\.concurrency|hive\.support\.sql11\.reserved\.keywords|hive\.txn\.manager|hive\.variable\.substitute|mapred\.job\.queue\.name|mapreduce\.map\.java\.opts|mapreduce\.map\.memory\.mb
Allow unravel user TempUDFAdmin privileges at global level.
Settings for SSL-Enabled Systems
Currently is not supported.