Skip to main content

Home

Cloudera Distribution of Apache Hadoop (CDH)

CDH is Cloudera's distribution of Apache Hadoop and related process, management, storage, and security components (Spark, Hive, Pig, MapReduce, Impala, HBase, Kafka, Sentry, and so on).

This section explains how to deploy Unravel Server and Sensors on the Cloudera distribution of Apache Hadoop (CDH), with Cloudera Manager (CM).

For quick initial installation, you can use the hdfs principal and its keytab. However, for production use you may want to create an alternate principal that has restricted access to specific areas and use its corresponding keytab. This topic explains how to do this.

You can name the alternate principal whatever you prefer; these steps name it unravel. Its name doesn't need to be the same as the local username.

The steps apply only to CDH and have been tested using Cloudera Manager with the recommended Sentry configuration.

  1. Check the HDFS default umask.

    For access via ACL, the group part of the HDFS default umask needs to have read and execute access. This allows Unravel to see subdirectories and read files. The default umask setting on HDFS for both CDH and HDP is 022. The middle digit controls the group mask, and ACLs are masked using this default group mode.

    You can check the HDFS umask setting from either Cloudera Manager or in hdfs-site.xml:

    • In Cloudera Manager, check the value of dfs.umaskmode and make sure the middle digit is 2 or 0.

    • In hdfs-site.xml file search for fs.permissions.umask-mode and make sure the middle digit is 2 or 0.

  2. Enable ACL inheritance.

    In Cloudera Manager's HDFS configuration, search for namenode advanced configuration snippet, and set its dfs.namenode.posix.acl.inheritance.enabled property to true in hdfs-site.xml. This is a workaround for an issue where HDFS was not compliant with the Posix standard for ACL inheritance. For details, see Apache JIRA HDFS-6962. Cloudera backported the fix for this issue into CDH5.8.4, CDH5.9.1, and later, setting dfs.namenode.posix.acl.inheritance.enabled to false in Hadoop 2.x and true in Hadoop 3.x.

  3. Restart the cluster to effect the change of dfs.namenode.posix.acl.inheritance.enabled to true.

  4. Change the ACLs of the target HDFS directories.

    Run the following commands as global hdfs to change the ACLs of the following HDFS directories. Run these in the order presented.

    1. Set the ACL for future directories.

      Note

      Be sure to set the permissions at the /user/history level. Files are first written to an intermediate_done folder under /user/history and then moved to /user/history/done.

      hadoop fs -setfacl -R -m default:user:unravel:r-x /user/spark/applicationHistory
      hadoop fs -setfacl -R -m default:user:unravel:r-x /user/history
      hadoop fs -setfacl -R -m default:user:unravel:r-x /tmp/logs
      hadoop fs -setfacl -R -m default:user:unravel:r-x /user/hive/warehouse

      If you have Spark2 installed, set the ACL of the Spark2 application history folder:

      hadoop fs -setfacl -R -m default:user:unravel:r-x /user/spark/spark2ApplicationHistory
    2. Set ACL for existing directories.

      hadoop fs -setfacl -R -m user:unravel:r-x /user/spark/applicationHistory
      hadoop fs -setfacl -R -m user:unravel:r-x /user/history
      hadoop fs -setfacl -R -m user:unravel:r-x /tmp/logs
      hadoop fs -setfacl -R -m user:unravel:r-x /user/hive/warehouse

      If you have Spark2 installed, set the ACL of the Spark2 application history folder:

      hadoop fs -setfacl -R -m user:unravel:r-x /user/spark/spark2ApplicationHistory
  5. Verify the ACL of the target HDFS directories.

    hdfs dfs -getfacl /user/spark/applicationHistory
    hdfs dfs -getfacl /user/spark/spark2ApplicationHistory
    hdfs dfs -getfacl /user/history
    hdfs dfs -getfacl /tmp/logs
    hdfs dfs -getfacl /user/hive/warehouse
  6. On the Unravel Server, verify HDFS permission on folders as the target user (unravel, hdfs, mapr, or custom) with a valid kerberos ticket corresponding to the keytab principal.

    sudo -u unravel kdestroy
    sudo -u unravel kinit -kt keytab-file principal
    sudo -u unravel hadoop fs -ls /user/history
    sudo -u unravel hadoop fs -ls /tmp/logs
    sudo -u unravel hadoop fs -ls /user/hive/warehouse
    
  7. Find and verify the keytab:

    klist -kt keytab-file

    Warning

    If you're using KMS and HDFS encryption and the hdfs principal, you might need to adjust kms-acls.xml permissions in Cloudera Manager for DECRYPT_EEK if access is denied. In particular, the "done" directory might not allow decryption of logs by the hdfs principal.

    If you're using "JNI" based groups for HDFS (a setting in Cloudera Manager), you need to add this line to /usr/local/unravel/etc/unravel.ext.sh:

    export LD_LIBRARY_PATH=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native
  8. If Kerberos is enabled, set the new values for keytab-file and principal:

    <Unravel installation directory>/manager config kerberos set --keytab /etc/security/keytabs/unravel.service.keytab --principal unravel/server@example.com
    
    <Unravel installation directory>/manager config kerberos enable
    
    

    Important

    Whenever you change Kerberos tokens or principal, restart all services, <installation directory>/manager restart.