Before I open a JIRA, I'd like to know how you like it, a file based group mapping provider. The idea is as follows.1. Have a new user group mapping provider such as FileBasedGroupMapping, which consumes a mapping file like below:$HADOOP_CONF/groupsMapping.txt:group1:user1,user2group2:usuer3,user4groupX:user5 group1groupY:user6 group2...According to this file, the provider will get groups list for the users as:user1->group1,groupX #same for user2user3->group2,groupY #same for user4user5->groupXuser6->groupYNote for user1, it gets group1 directly as above mapping file; then, since group1 belongs to groupX, user1 must also belong to groupX, so groupX is also user1's group.

2. So what's the benefits1) It opens a door to role based access control for Hadoop. As you can see, in the mappingfile we can define virtual groups (or roles) like groupX, groupY to hold users and other groups. Such virtual groups can just be used as real groups, for example, assign to HDFS file as owner group, assign to MR queue level acl list, or in HBase/Hive, grant them some privileges on databases, tables.2) It makes it possible that in HDFS allows users from more than one groups to read/write some file/folder while disallows others not to. For example, if we want to allow only user1 plus users in group1, group2 to read/write into /data/secure, we can definea virtual group in the mapping file as "secureGroup:user1 group1,group2", then chgrp for the folder to be "secureGroup", and chmod for the folder as g+rw. 3) As told above, this makes much sense and not just try to resolve a corner case. As you may know, Hive supports HDFS as backend storage,and role based access control. Using Hive one can create a database and then grant some users/groups/roles with CREATE privilege on it. After that,some granted user (granted directly or via granted group or role) runs a cmd to create table in that database. It can pass the accesscontrol check in Hive but still may be failed by HDFS when Hive tries to create a file for the table in the database folder for the user,just due that the user hasn't write permission to the folder! To resolve such issues, we can easily achieve using this provider.3) Minor but very convinent, we can use this mapping file and provider to define some users, groups for test purpose, when don't want toinvolve ShellBasedGroupMapping or LdapGroupMapping.

>2) It makes it possible that in HDFS allows users from more than one groups to read/write some file/folder while disallows others not to. For example, if we want to allow only user1 plus users in group1, group2 to read/write into /data/secure, we can define a virtual group in the mapping file as "secureGroup:user1 group1,group2", then chgrp for the folder to be "secureGroup", and chmod for the folder as g+rw.

This feature is available on most operating systems like Linux, OS X, Windows, etc.On Linux, one can get the file ACL by using 'getfacl' command.On Windows, there is 'icacls.exe' command.The change may be not trivial though.

Before I open a JIRA, I'd like to know how you like it, a file based group mapping provider. The idea is as follows.1. Have a new user group mapping provider such as FileBasedGroupMapping, which consumes a mapping file like below:$HADOOP_CONF/groupsMapping.txt:group1:user1,user2group2:usuer3,user4groupX:user5 group1groupY:user6 group2...According to this file, the provider will get groups list for the users as:user1->group1,groupX #same for user2user3->group2,groupY #same for user4user5->groupXuser6->groupYNote for user1, it gets group1 directly as above mapping file; then, since group1 belongs to groupX,user1 must also belong to groupX, so groupX is also user1's group.

2. So what's the benefits1) It opens a door to role based access control for Hadoop. As you can see, in the mapping file we can define virtual groups (or roles) like groupX, groupY to hold users and other groups. Such virtual groups can just be used as real groups, for example, assign to HDFS file as owner group, assign to MR queue level acl list, or in HBase/Hive, grant them some privileges on databases, tables.2) It makes it possible that in HDFS allows users from more than one groups to read/write some file/folder while disallows others not to. For example, if we want to allow only user1 plus users in group1, group2 to read/write into /data/secure, we can define a virtual group in the mapping file as "secureGroup:user1 group1,group2", then chgrp for the folder to be "secureGroup", and chmod for the folder as g+rw. 3) As told above, this makes much sense and not just try to resolve a corner case. As you may know, Hive supports HDFS as backend storage, and role based access control. Using Hive one can create a database and then grant some users/groups/roles with CREATE privilege on it. After that,some granted user (granted directly or via granted group or role) runs a cmd to create table in that database. It can pass the access control check in Hive but still may be failed by HDFS when Hive tries to create a file for the table in the database folder for the user, just due that the user hasn't write permission to the folder! To resolve such issues, we can easily achieve using this provider.3) Minor but very convinent, we can use this mapping file and provider to define some users, groups for test purpose, when don't want to involve ShellBasedGroupMapping or LdapGroupMapping.

Thanks for your feedback!

Kai

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext