Some Discussions about the GSoC Project "Integration with ZooKeeper for location transparency and clustering/failover"

The GSoC project by Kevin Wu: http://wiki.apache.org/general/soc2010-cxf2738-proposal

What is ZooKeeper?

ZooKeeper [1] is a high-performance, open-source coordination service for distributed applications. It exposes common services - such as naming, configuration management, synchronization, and group services - in a simple interface so you don't have to write them from scratch. It is designed to be easy to program to, and uses a data model styled after the familiar directory tree structure of file systems. It runs in Java and has bindings for both Java and C.

Data model in ZooKeeper

The name space provided by ZooKeeper is much like that of a standard file system. A name is a sequence of path elements separated by a slash (/). Every node in ZooKeeper's name space is identified by a path.

http://hadoop.apache.org/zookeeper/docs/current/images/zknamespace.jpg

ZooKeeper in CXF-DOSGi [2]

The Apache CXF Distributed OSGi (CXF-DOSGi) project provides the Reference Implementation of the Distribution Provider component of the OSGi Remote Services Specification.

The CXF-DOSGi project comes with a RFC 119 compliant Discovery implementation that uses Apache ZooKeeper as the underlying technology for the Discovery Server.

In the CXF-DOSGi project, the CXF-DOSGi Discovery implementation interacts with the ZooKeeper server. The CXF-DOSGi service first publishes the service information which includes the service interface name and some properties into the ZooKeeper server, and then the CXF-DOSGi client discovers the service information from the ZooKeeper server. The CXF-DOSGi client can use the CXF-DOSGi service now. When the CXF-DOSGi service goes down, the client will be noticed.

The CXF-DOSGi implements the location transparency through the ZooKeeper.

More details can be referred in [3].

How to integrate ZooKeeper and CXF?

How to integrate ZooKeeper and CXF and make CXF support location transparency and server-side clustering? There are some questions should be answered. Firstly, what data should be stored and where the data should be stored? Secondly, how to get and use the location of services? Thirdly, how to synchronize the data in different servers? Fourthly, how to design the clustering client?

According to the CXF-DOSGi, the service interface, the service interface name, the location of service, some properties of the service should be stored.

The data should be stored in the ZooKeeper service, others will use these information to access the service.

The client or the clustering client can access the ZooKeeper and get the location of services, then the client will use it access the service.

In order to support clustering, the data in different servers should be synchronized. The ZooKeeper provides some simple interfaces for the synchronization and helps us implement the synchronization.

The clustering client can discover the service information through the ZooKeeper, and then access the service. If the service goes down, then the client will discover a new service in the clustering and access it. The clustering client is a smart proxy.

Some Discussions

The clustering client is a smart proxy. So the clustering only can be used for CXF clustering client. If we use a load balancer in the clustering, the client will connect the load balancer, and the load balancer redirects the service request to a server according to the ZooKeeper.

clustering-arch.jpg

References