This page will outline design of Howl Security.

Jira for authorization support in Hive

Authorization

Initially the thought is that Howl will have authorization implemented at some level to provide security. The initial implementation will be based on HDFS directory permissions. This may be enhanced/replaced by a role based model in a later release.

Permissions

The initial idea for authorization in Howl is to use the HDFS permissions to authorize metadata operations. To be able to do this, we would like to extend createTable() to add the ability to record a different group from the user's primary group and to record the complete Unix permissions on the table directory. Also, we would like to have a way for partition directories to inherit permissions and group information based on the table directory. To keep the metastore backward compatible for use with Hive, I propose having conf variables to achieve these objectives:

Conf properties are preferable over API changes since the complete authorization design for Hive is not finalized yet. These properties can be deprecated/removed when that is in place. These properties would also be useful to some installation of vanilla Hive since at least DFS level authorization can now be achieved by Hive without the user having to manually perform chgrp and chmod operations on DFS.

Reading data(Select)/Writing data (Insert)

This will simply be governed by the dfs permission at the time of the read and will result in runtime errors if the user does not have permissions.

Create table

Internal/External table without location specified

If the user has permissions to the directory pointed by hive.metastore.warehouse.dir then he can create the table.

Internal/External table with location specified

If the user has permissions to the location specified then he can create the table.

Drop Table

A user can drop a table (internal or external) only if he has write permissions to the table directory. A user could have write permission either by virtue of him being the owner of the table or through the group he belongs to. So if the permissions on the table directory allow him to write to it, he can drop the table.

Partition permissions

Partition directories will inherit the permissions/(owner,group) of the table directory.

Alter table

A user can "alter" table if he has write permissions on the table directory. So any of the following alter table commands are allowed only if the user has write permissions on the table directory:

Show tables

Since the top level warehouse dir will have read/write permissions for everyone, show tables will show all tables to all users.

Show Table/Partitions Extended

A user can issue "show table/partitions extended" on a table only if he has read permissions on the table directory. This query is of the form:

Show partitions

A user can issue show partitions on a table only if he has read permissions on the table directory.

Describe table/column/partition

A user can issue describe table/column/partition on a table only if he has read permissions on the table directory.

create db

Just like create table, create db will have db.group.name and db.permissions properties which will dictate the group and permissions of the db directory. This will be set up by the Howl CLI and the database directory will need to be updated with the appropriate chgroup and chmod operations. There will be NO inheritance of permissions from db directory to table directory. The table directory can have potentially different group/perms from the db directory.

use db

use db will be permitted only if the user has read permission on the db directory. So subsequent operation like create table will still be authorized based on the rules laid above once the use db call has been authorized. So the user would need write permission on the db directory to be able create the table directory under it.

If db.tablename syntax is supported (I believe it may not be supported in the initial commit), then create db.tablename will need to check that the user has write permission on db directory.

Implementation Details

Howl specific semantic Analyzers

To implement a CLI, Howl will have Howl specific semantic analyzers in place. It will be in these Howl specific semantic analyzers that the checks outlined above will be made to implement authorization.

Howl CLI

The Howl CLI program will take --group and --perms commandline options which will only apply to create table DDL queries. The value for --group will indicate the name of the Unix group for the table directory. The value for --perms will be of the form rwxrwxrwx which will indicate the Unix permissions on the table directory. The CLI program will have to partially parse the user supplied query to look for create table .* and set these values in the HiveConf (for use by createTable() metastore API). If these are not supplied, the CLI program should check what the umask is and warn the user that the table will be create with permissions dictated by the umask and if that is not intended, the user should drop and re-create the table with --group and --perms options. Similarly it should warn when the perms are too permissive like rwx for others.

To be able to do this, we should extend createTable() to add the ability to record a different group from the user's primary group and to record the complete Unix permissions on the table directory. Also, we would like to have a way for partition directories to inherit permissions and group information based on the table directory. To keep the metastore backward compatible for use with Hive, the conf variables discussed above will be used.

The Howl CLI will always set the property partitions.inherit.permissions to true. createTable() should also store these as table properties in the metastore so that a subsequent addPartition() can look at these and also do a chgrp and chmod - the changes in addPartition() should also be implemented.

Authentication

One line of thought is to use HTTP as transport and Thrift as serialization mechanism. Since in this setup the Howl server would be a Tomcat server, standard means of authentication for a tomcat server can be used. The one challenge is that HowlOutputFormat will need to connect to this server from the cluster nodes - authenticating those requests is difficult since they are on behalf of the user and not by the user himself.

Design yet to come.

Jira on HTTP servlet support in Thrift

Pradeep