Cloudera Distribution of Apache Hadoop (CDH)
CDH is Cloudera's distribution of Apache Hadoop and related process, management, storage, and security components (Spark, Hive, Pig, MapReduce, Impala, HBase, Kafka, Sentry, and so on).
This section explains how to deploy Unravel Server and Sensors on the Cloudera distribution of Apache Hadoop (CDH), with Cloudera Manager (CM).
For quick initial installation, you can use the hdfs
principal and its keytab. However, for production use you may want to create an alternate principal that has restricted access to specific areas and use its corresponding keytab. This topic explains how to do this.
You can name the alternate principal whatever you prefer; these steps name it unravel
. Its name doesn't need to be the same as the local username.
The steps apply only to CDH and have been tested using Cloudera Manager with the recommended Sentry configuration.
Check the HDFS default umask.
For access via ACL, the group part of the HDFS default umask needs to have read and execute access. This allows Unravel to see subdirectories and read files. The default umask setting on HDFS for both CDH and HDP is
022
. The middle digit controls the group mask, and ACLs are masked using this default group mode.You can check the HDFS umask setting from either Cloudera Manager or in
hdfs-site.xml
:In Cloudera Manager, check the value of dfs.umaskmode and make sure the middle digit is
2
or0
.In
hdfs-site.xml
file search for fs.permissions.umask-mode and make sure the middle digit is2
or0
.
Enable ACL inheritance.
In Cloudera Manager's HDFS configuration, search for
namenode advanced configuration snippet
, and set its dfs.namenode.posix.acl.inheritance.enabled property totrue
inhdfs-site.xml
. This is a workaround for an issue where HDFS was not compliant with the Posix standard for ACL inheritance. For details, see Apache JIRA HDFS-6962. Cloudera backported the fix for this issue into CDH5.8.4, CDH5.9.1, and later, setting dfs.namenode.posix.acl.inheritance.enabled tofalse
in Hadoop 2.x andtrue
in Hadoop 3.x.Restart the cluster to effect the change of dfs.namenode.posix.acl.inheritance.enabled to
true
.Change the ACLs of the target HDFS directories.
Run the following commands as global
hdfs
to change the ACLs of the following HDFS directories. Run these in the order presented.Set the ACL for future directories.
Note
Be sure to set the permissions at the
/user/history
level. Files are first written to anintermediate_done
folder under/user/history
and then moved to/user/history/done
.hadoop fs -setfacl -R -m default:user:unravel:r-x /user/spark/applicationHistory hadoop fs -setfacl -R -m default:user:unravel:r-x /user/history hadoop fs -setfacl -R -m default:user:unravel:r-x /tmp/logs hadoop fs -setfacl -R -m default:user:unravel:r-x /user/hive/warehouse
If you have Spark2 installed, set the ACL of the Spark2 application history folder:
hadoop fs -setfacl -R -m default:user:unravel:r-x /user/spark/spark2ApplicationHistory
Set ACL for existing directories.
hadoop fs -setfacl -R -m user:unravel:r-x /user/spark/applicationHistory hadoop fs -setfacl -R -m user:unravel:r-x /user/history hadoop fs -setfacl -R -m user:unravel:r-x /tmp/logs hadoop fs -setfacl -R -m user:unravel:r-x /user/hive/warehouse
If you have Spark2 installed, set the ACL of the Spark2 application history folder:
hadoop fs -setfacl -R -m user:unravel:r-x /user/spark/spark2ApplicationHistory
Verify the ACL of the target HDFS directories.
hdfs dfs -getfacl /user/spark/applicationHistory hdfs dfs -getfacl /user/spark/spark2ApplicationHistory hdfs dfs -getfacl /user/history hdfs dfs -getfacl /tmp/logs hdfs dfs -getfacl /user/hive/warehouse
On the Unravel Server, verify HDFS permission on folders as the target user (
unravel
,hdfs
,mapr
, or custom) with a valid kerberos ticket corresponding to the keytab principal.sudo -u unravel kdestroy sudo -u unravel kinit -kt
keytab-file
principal
sudo -u unravel hadoop fs -ls /user/history sudo -u unravel hadoop fs -ls /tmp/logs sudo -u unravel hadoop fs -ls /user/hive/warehouseFind and verify the keytab:
klist -kt
keytab-file
Warning
If you're using KMS and HDFS encryption and the
hdfs
principal, you might need to adjustkms-acls.xml
permissions in Cloudera Manager for DECRYPT_EEK if access is denied. In particular, the "done" directory might not allow decryption of logs by thehdfs
principal.If you're using "JNI" based groups for HDFS (a setting in Cloudera Manager), you need to add this line to
/usr/local/unravel/etc/unravel.ext.sh
:export LD_LIBRARY_PATH=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native
If Kerberos is enabled, set the new values for
keytab-file
andprincipal:
<Unravel installation directory>/manager config kerberos set --keytab /etc/security/keytabs/unravel.service.keytab --principal unravel/server@example.com <Unravel installation directory>/manager config kerberos enable
Important
Whenever you change Kerberos tokens or principal, restart all services,
<installation directory>/manager restart
.