Alluxio and Apache Ranger Best Practices
February 2, 2022
By
No items found.

Introduction

Alluxio enables data orchestration for compute in any cloud. It unifies data silos on-premise and across cloud environments to provide the data locality, accessibility, and elasticity needed to reduce the complexities associated with orchestrating data for today’s big data and AI/ML workloads.

Alluxio is designed to help any framework access any data, from any storage at high performance regardless of the environment, which enables an organization to remain agile and competitive in adopting and experimenting with new and existing technologies.

 Figure 1. Alluxio Data Orchestration

Apache Ranger

Many organizations have expanded access to their data lake beyond their initial ETL and batch analytics users and they need a way to centralize how they define and enforce fine-grained access permissions. Increasingly, enterprise data managers are adopting Apache Ranger to meet that need.

Apache Ranger is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform. Ranger was created to meet the following goals:

  • Provide centralized security administration to manage all security related tasks in a central UI or using REST APIs.
  • Provide fine grained authorization to do a specific action and/or operation with Hadoop component/tool and manage through a central administration tool.
  • Standardize authorization method across all Hadoop components
  • Enhanced support for different authorization methods - Role based access control, attribute based access control, etc.
  • Centralize auditing of user access and administrative actions (security related) within all the components of Hadoop.

Alluxio and Apache Ranger

Alluxio implements a virtual file system that provides access to heterogeneous data stores, providing a unified namespace along with meta-data caching, data caching and policy driven data management services. To make the Alluxio virtual file system secure, Alluxio provides the following:

  • User Authentication
  • User Authorization
  • Access Control Lists (ACLs)
  • Data Path Authorization
  • Client-side Hadoop Impersonation
  • Auditing
  • Encryption

Alluxio integrates with Apache Ranger using a Ranger Plugin to support the user authorization and auditing mechanisms as shown in Figure 2 below.

 Figure 2. Ranger Authorization with Alluxio

As Apache Ranger administrators define centralized access policies in Ranger, those policies are retrieved and cached locally by the Alluxio master node and are enforced by Alluxio when users make read or write requests to the Alluxio virtual file system.

Best Practices

Alluxio supports using Apache Ranger to manage and enforce access to directories and files. There are two ways to use Ranger with Alluxio:

  1. Use Ranger to directly manage access permissions on Alluxio virtual file system paths. This method should be used when the Alluxio under file system (UFS) is not HDFS or Alluxio has two or more under file systems
  2. Using Alluxio’s unified namespace features, and Alluxio will be the main access layer. For example, Alluxio may have an HDFS UFS and an S3 compatible UFS that are mounted using a UNION UFS.
  3. Have Alluxio enforce existing Ranger policies for an HDFS under file system. Use this method when there are existing HDFS access policies being managed in Ranger and there are no other under file systems other than HDFS.

While it is possible to use Ranger to manage permissions for both Alluxio and the under file system, it is not recommended to enable both at the same time because it can be confusing to have multiple sources of truth.

Option 1. Ranger manages Alluxio file system permissions

With this option, the Alluxio service plugin needs to be enabled in the Ranger admin console. Since Alluxio uses the HDFS Ranger plugin type, a new HDFS service can be defined in the Service Manager page. 

Step 1. Create the Alluxio HDFS Service

In the Ranger admin console’s Service Manager page, click on the plus sign (+) to create a new service.

 Figure 3. Create Alluxio Service in Ranger

Ranger will display the Create Service page where the Alluxio master node will be referenced as the service to be targeted. In that page, enter the details for the Alluxio service, including a unique Service Name. If multiple Alluxio environments exist, for example: one for dev, one for test and several production environments in different data centers, then specific names for the Alluxio service should be used (such as alluxio-datacenter1-test). Again, since Alluxio uses the HDFS plugin, the Create Service page shows HDFS properties. In the Namenode URL property, enter the Alluxio master node URI (such as alluxio://alluxio-master:19998).

Figure 4. Ranger Service Properties

Setting Authorization Enabled to Yes will require that all users are authenticated and most organizations will set the Authentication Type to Kerberos. If the Ranger Admin service is configured with SSL certificates, then the Common Name for Certificate property should be set correctly (based on the CN specification for the SSL certificate) and the Alluxio master node should have access to those certificate files. Note that the Username and Password are set to the Ranger admin username and password, and not the Alluxio admin username and password. Clicking on the Create button will create the new HDFS Service and show it on the Service Manager page.

Figure 5. New HDFS Service

Step 2. Configure Alluxio Master Nodes

Once the Alluxio Ranger HDFS service is created using the Ranger admin console, the Alluxio master nodes can be configured to use the Ranger HDFS plugin to retrieve and cache Ranger policies. First, copy the core-site.xml, hdfs-site.xml, ranger-hdfs-security.xml, ranger-hdfs-audit.xml and ranger-policymgr-ssl.xml files from the $HADOOP_CONF directory on the HDFS namenode server to the $ALLUXIO_HOME/conf directory on the Alluxio master node servers. The ranger-hdfs-security.xml file should be modified to name the Alluxio Ranger HDFS Service defined using the Ranger admin console in Step 1 above. Like this:

<property> <name>ranger.plugin.hdfs.service.name</name> <value>alluxio-datacenter1-test</value> <description> Name of the Ranger service containing policies for this Alluxio instance </description> </property>

The alluxio-site.properties file on the Alluxio master nodes should be changed to enable Ranger integration, like this:

alluxio.security.authorization.plugins.enabled=true alluxio.security.authorization.plugin.name=<plugin name> alluxio.security.authorization.plugin.paths=/opt/alluxio/conf alluxio.security.authorization.permission.umask=077

The plugin name tells Alluxio to use a specific Ranger HDFS plugin, located in .jar files in the $ALLUXIO_HOME/lib directory. Several versions of Apache Ranger are supported and are implemented with these jar files:

alluxio-authorization-ranger-2.0-cdp-7.1-enterprise-2.7.0-2.4.jar alluxio-authorization-ranger-0.5-hdp-2.4-enterprise-2.7.0-2.4.jar alluxio-authorization-ranger-0.7-hdp-2.6-enterprise-2.7.0-2.4.jar alluxio-authorization-ranger-1.1-hdp-3.0-enterprise-2.7.0-2.4.jar alluxio-authorization-ranger-1.2-hdp-3.1-enterprise-2.7.0-2.4.jar alluxio-authorization-ranger-0.6-hdp-2.5-enterprise-2.7.0-2.4.jar alluxio-authorization-ranger-2.1-privacera-4.7-enterprise-2.7.0-2.4.jar

For example, if Privacera 4.7 is being used, then the plugin name would be specified as ranger-privacera-4.7, and if Hortonworks HDP 2.6 is being used, then the plugin name would be specified as ranger-hdp-2.6.

After copying the Ranger xml files and modifying the alluixo-site.properties file, restart the Alluxio master daemons.

Step 3. Restrict Alluxio permissions on sensitive directories 

When a Ranger policy is not available for a specific path, Alluxio will fall back to its own POSIX style permissions to determine if a user has access permissions on a directory or file. Therefore, it is recommended that all users except for the privileged root user be denied access to all the directories except for the /tmp directory. To enforce this, run the following Alluxio cli commands:

alluxio fs chmod 777 / alluxio fs chmod 777 /user alluxio fs chmod 777 /tmp alluxio fs chmod 700 /sensitive_data1 alluxio fs chmod 700 /sensitive_data2

Execute the chmod 077 … on any sub-directories that should be managed by Ranger policies.

When a terminal session is opened to one of the Alluxio nodes and an attempt is made to access the /sensitive_data1 directory as a non-root user, a permission denied message like this should be displayed:

$ id uid=1001(user1) gid=1001(alluxio-users) $ alluxio fs ls /sensitive_data1 Permission denied by authorization plugin: alluxio.exception.AccessControlException: Permission denied: user=user1, access=--x, path=/sensitive_data1: failed at /, inode owner=root, inode group=root, inode mode=rwx------

Step 4. Create Ranger Allow Policies

At this point the data management team and the data security team should review each directory or folder path in the under file system (HDFS, S3, GCS etc.) and determine which user groups or users should be granted access to each path.

Use the Ranger admin console to define an Allow policy by clicking on the alluxio-datacenter1-test HDFS Service link to display the list of defined policies. 

Figure 6. Alluxio HDFS Service

By default Ranger will create several policies for the admin users, but no policies exist yet for Alluxio users. Click on the Add New Policy button to display the Create Policy page.

Figure 7. List of Ranger Policies

In the Create Policy page, define an Allow policy for a specific user group on the user directory (/sensitive_data1), recursively. Allow Read,Execute only permissions. In this example, using the group name alluxio-users accomplishes that for all the users in that group. 

Figure 8. Create Allow Access Policy

Click the Add button to create the new policy and display the new policy in the list.

Figure 9. New Ranger Policy

Wait a minute for the policy to be retrieved and cached by the Alluxio master node. Then open a terminal session on an Alluxio node to test the allow policy. Run the alluxio fs ls command again and it should successfully show a listing of the sub-directory, like this:

$ id uid=1001(user1) gid=1001(alluxio-users) $ alluxio fs ls /sensitive_data1/dataset1/ -rw------- root root 283 PERSISTED 02-01-2022 14:59:45:457 100% /sensitive_data1/dataset1/data-file-001 $ alluxio fs copyFromLocal my_data-file-002 /sensitive_data1/dataset1/ Permission denied by authorization plugin: alluxio.exception.AccessControlException: Permission denied: user=user1, access=--x, path=/sensitive_data1/dataset1/my_data-file-002: failed at /, inode owner=root, inode group=root, inode mode=rwx------

Notice that the Ranger policy allowed read access to the /sensitive_data1/dataset1/ directory, but did not allow write access to it (the copyFromLocal command failed). This is because the Ranger policy only specified Read,Execute permissions on the /sensitive_data1 directory tree.

Later, use Ranger to add or remove user groups or specific users from the Allow and Deny policies. Alluxio will rescan the policies and will update its local policy cache, and enforce the policies when users make read or write requests to the Alluxio virtual file system.

Option 2. Alluxio enforces existing Ranger policies

With this option, there is no need to enable an Alluxio service plugin in the Ranger admin console, because Alluxio can use the policies defined in the existing HDFS service. The HDFS service should already exist in the Admin console as shown in Figure 10.

 Figure 10. Existing HDFS Ranger Service

However, the Alluxio master node will need to be configured to use Ranger as an authorizer.

Step 1.  Configure Alluxio Master Nodes 

The Alluxio master nodes can be configured to use the Ranger HDFS plugin to retrieve and cache Ranger policies. Copy the core-site.xml, hdfs-site.xml, ranger-hdfs-security.xml, ranger-hdfs-audit.xml and ranger-policymgr-ssl.xml files from the $HADOOP_CONF directory on the HDFS namenode server to the $ALLUXIO_HOME/conf directory on the Alluxio master node servers. 

Then, the alluxio-site.properties file on the Alluxio master nodes should be changed in two ways.

First, Ranger integration should be enabled, like this:

alluxio.security.authorization.plugins.enabled=true alluxio.security.authorization.permission.umask=077

Then, if HDFS is mounted as the root UFS, the Ranger plugin should be referenced as the plug in to use for the root UFS, like this:

alluxio.master.mount.table.root.option.alluxio.underfs.security.authorization.plugin.name=<plugin name> alluxio.master.mount.table.root.option.alluxio.underfs.security.authorization.plugin.paths=/opt/alluxio/conf

If HDFS is not being mounted as the root UFS, but is being mounted using the nested mount method, then the Alluxio mount command should include the options to specify the Ranger plugin name and plugin paths, like this:

alluxio fs mount \ --option alluxio.underfs.security.authorization.plugin.name=<plugin name> \ --option alluxio.underfs.security.authorization.plugin.paths=/opt/alluxio/conf \ --option alluxio.underfs.version=2.7 \ /my_hdfs_mount \ hdfs://<name node>:<port>/

The plugin name tells Alluxio to use a specific Ranger HDFS plugin, located in .jar files in the $ALLUXIO_HOME/lib directory. Several versions of Apache Ranger are supported and are implemented with these jar files:

alluxio-authorization-ranger-2.0-cdp-7.1-enterprise-2.7.0-2.4.jar alluxio-authorization-ranger-0.5-hdp-2.4-enterprise-2.7.0-2.4.jar alluxio-authorization-ranger-0.7-hdp-2.6-enterprise-2.7.0-2.4.jar alluxio-authorization-ranger-1.1-hdp-3.0-enterprise-2.7.0-2.4.jar alluxio-authorization-ranger-1.2-hdp-3.1-enterprise-2.7.0-2.4.jar alluxio-authorization-ranger-0.6-hdp-2.5-enterprise-2.7.0-2.4.jar alluxio-authorization-ranger-2.1-privacera-4.7-enterprise-2.7.0-2.4.jar

For example, if Privacera 4.7 is being used, then the plugin name would be specified as ranger-privacera-4.7, and if Hortonworks HDP 2.6 is being used, then the plugin name would be specified as ranger-hdp-2.6.

After copying the Ranger xml files and modifying the alluixo-site.properties file, restart the Alluxio master daemons.

Step 2. Re-format Alluxio Masters 

For these changes to take effect, the Alluxio master nodes need to be re-formatted, using the following command:

alluxio formatJournal

If using an embedded journal (alluxio.master.journal.type=EMBEDDED) , run the command on each master node. If using a journal type of UFS, then simply run the command once on any master node.

Now Alluxio should use the existing Ranger HDFS service policies to determine access permissions to HDFS UFS directories and files.

Summary

As data stewards and security teams provide broader access to their organization’s data lake environments, having a centralized way to manage fine-grained access policies becomes increasingly important. Alluxio can use Apache Ranger’s centralized access policies in two ways: 1) directly controlling access to virtual paths in the Alluxio virtual file system or 2) enforcing existing access policies for the HDFS under stores. 

To gain some hands-on experience using Alluxio with Apache Ranger, you may deploy Alluxio and Apache Ranger on your own computer using the Alluxio Ranger Best Practices sandbox at: https://github.com/gregpalmr/alluxio-ranger-sandbox. To learn more about Alluxio’s security, refer to the Alluxio documentation at: https://docs.alluxio.io/ee/user/stable/en/operation/Security.html.

1 Apache Ranger - https://ranger.apache.org
2 Alluxio Security - https://docs.alluxio.io/ee/user/stable/en/operation/Security.html

Share this post

Blog

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

No items found.