This page will enlist use cases for authentication related to Howl. It will attempt to outline the required changes to enable those use cases.

Background and terminology

The Hadoop Security (!HadoopS) release uses Kerberos to provide authentication. On a secure cluster, the cluster servers (Namenode (nn), Jobtracker (jt), datanode, tasktracker) are themselves Kerberos (service) principals and end users are user principals and users and these services mutually authenticate to each other using Kerberos tickets. !HadoopS uses security tokens called "delegation tokens" (these are NOT Kerberos tickets but a Hadoop specific security token) to authenticate the map/reduce tasks. So at job submission time once the job client has provided the user Kerberos ticket to authenticate to the namenode and jobtracker, it is handed delegation tokens from the namenode so that the tasks can use these to talk to the namenode. These delegation tokens are stored in the "credential store" for the job and the job tracker automatically renews these for the job upto a maximum lifetime of 7 days.

Oozie use case

Oozie is a service which users use to submit jobs to the !HadoopS cluster. It somewhat resembles the Howl server since the Howl server also needs to act on behalf of users while accessing the !DFS. Users authenticate to oozie and then the oozie service acts on behalf of the user while working with job tracker or namenode. For this to work, both the namenode and jobtracker need to recognize the "oozie" prinicpal to be a "proxy user" principal (i.e. a principal that can act on behalf of other users). In addition namenode and jobtracker need to know the possibles IPs for the proxy user service, list of users or groups (.i.e all users belonging to the group would be allowed) that the oozie principal can act on behalf of. This proxy user list and associated information is maintained in a configuration read by the namenode and jobtracker. Once the user authenticates to oozie, oozie authenticates itself to nn/jt using the oozie principal and also uses the UserGroupInformation.doAs() to secure a JobClient object associated with the real user (it needs the real username for the doAs() which it gets hold of from the user authentication). Through this process, oozie adds delegation tokens (actually the JobClient code does this in a subsequent submitJob()) for the jt and primary nn into the new JobClient to pass on to the launcher map task for the Pig/MR job. If the Pig script/MR job run needs to access more than the primary name node, an oozie parameter should be used to specify the list of nns that need to be accessed and oozie will get delegation tokens for all of them through the jobclient.

Changes required in Howl

Use cases with Howl

Howl client running DDL commands

Pig script reading from and writing to tables in Howl

Hive query reading from and writing to tables in Howl

Java Map Reduce job reading from and writing to tables in Howl

Oozie running a Pig script which reads from or writes to tables in Howl

How will Oozie know that the Pig script interacts with Howl - will need some change in oozie to allow the work flow xml to indicate this?

Oozie running a Java MR job which reads from or writes to tables in Howl

How will Oozie know that the Java MR job interacts with Howl - will need some change in oozie to allow the work flow xml to indicate this?

Tools like DAQ invoke Howl API calls to register data

Howl/HowlAuthentication (last edited 2010-12-06 21:41:21 by AlanGates)