Permissions

The initial idea for authorization in Howl is to use the hdfs permissions to authorize metadata operations. To be able to do this, we would like to extend createTable() to add the ability to record a different group from the user's primary group and to record the complete unix permissions on the table directory. Also, we would like to have a way for partition directories to inherit permissions and group information based on the table directory. To keep the metastore backward compatible for use with hive. The following conf variables can be used to achieve these objectives:

Conf properties are preferable over API changes since the complete authorization design for hive is not finalized yet. These properties can be deprecated/removed when that is in place. These properties would also be useful to some installation of vanilla hive since at least DFS level authorization can now be achieved by hive without the user having to manually perform chgrp and chmod operations on DFS.

Reading data(Select)/Writing data (Insert)

This will simply be governed by the dfs permission at the time of the read and will result in runtime errors if the user does not have permissions.

Create table

Internal/External table without location specified

If the user has permissions to the directory pointed by hive.metastore.warehouse.dir then he can create the table.

Internal/External table with location specified

If the user has permissions to the location specified then he can create the table.

Drop Table

A user can drop a table (internal or external) only if he has write permissions to the table directory. A user could have write permission either by virtue of him being the owner of the table or through the group he belongs to. So if the permissions on the table directory allow him to write to it, he can drop the table.

Partition permissions

Partition directories will inherit the permissions and group of the table directory.

Alter table

A user can "alter" table if he has write permissions on the table directory. So any of the following alter table commands are allowed only if the user has write permissions on the table directory:

Show tables

Since the top level warehouse dir will have read/write permissions for everyone, show tables will show all tables (of default database) to all users.

Show Table/Partitions Extended

A user can issue "show table/partitions extended" on a table only if he has read permissions on the table directory. This query is of the form:

Show partitions

A user can issue "show partitions" on a table only if he has read permissions on the table directory.

Describe table/column/partition

A user can issue "describe table/column/partition" on a table only if he has read permissions on the table directory.

create db

Just like create table, create db will have db.group.name and db.permissions properties which will dictate the group and permissions of the db directory. This will be set up by the howl cli and the database directory will need to be updated with the appropriate chgroup and chmod operations. There will be NO inheritance of permissions from db directory to table directory. The table directory can have potentially different group/perms from the db directory. User can provide the group to which table should belong to, in create table statement. In case he chose not to, then group of table will be same as database group.

use db

use db will be permitted only if the user has read permission on the db directory. So subsequent operation like create table will still be authorized based on the rules laid above once the "use db" call has been authorized. So the user would need write permission on the db directory to be able create the table directory under it.

If db.tablename syntax is supported (I believe it may not be supported in the initial commit), then create db.tablename will need to check that the user has write permission on db directory.

Comments

This model is built around using group permissions to share table access. A few implementation notes concerning this.

  1. Conceptually it is possible for someone to create a table in directory /x/y/z where they own z but not x or y. If the owner of x or y then changes their permissions so that the table creator (or others in the group the table is owned by) cannot access direcotry z then the table creator (or others in the group) will not be able to access the table. For tables under the default directory, this will not be an issue. For tables outside this directory, it will be the responsibility of the user not to put their files in a place where this is likely to happen (note that this is true regardless of the authorization model we adopt).

  2. What permissions will the top level warehouse directory have? We could choose to let them have 777, so that anyone can create tables there. We could also define a howl group and declare that all users who wish to use Howl must use be in this group. The top level directory could then be 770, locking out any users who were not authorized to use Howl. Given that we expect in many installation all or most all data to be managed by Howl, it is not clear how useful this would be.

  3. There is a security hole in who can drop tables. Since the ability to delete a file is based on write permission in the directory that contains it and not write permission on the file itself, once a table is created but before it has any files anyone with permission to create tables can drop that table. (It is no longer a problem once files or directories have been created in the table directory, because removing those will require write permission on the table directory.) In the case of the default warehouse that means everyone has permission to drop empty tables. In the case of other databases it means everyone in the database group has these permissions. One way around that would be to create an empty file under the table directory, _empty.

  4. The above vulnerability also applies to database directories and could be resolved in the same way.
  5. In the POSIX permissions model the narrowist applicable category is used in testing whether a specific user has access to a file. Thus if the owner of a file does not have permission to read the file, but he is in a group that does, he still will not be able to read the file. To avoid strange side affects of this I propose that it be an error if the owner has less permission than the group. So rwxr-x--- would be accepted, but r-xrwx--- would return an error.

  6. Execute bits give a user permission to open a directory, which is required to open files for either reading or writing. Thus for all users classes it should be an error for the read or write bit to be set but not the execute bit. Thus r-x, -wx, rwx would all be valid combinations, but r--, -w-, and rw- would be generate errors.

  7. Currently, there is one outstanding issue. warehouse directory needs to have 777 permissions beacuse of HDFS bug:

https://issues.apache.org/jira/browse/HADOOP-7064 Till that is fixed, workaround is to have a staging dir. It works as follows:

Implementation Notes

http://wiki.apache.org/pig/Howl/AuthorizationImplNotes

Howl/HowlAuthorizationProposal (last edited 2011-01-26 18:31:13 by AshutoshChauhan)