Google Summer of Code 2008 – Project Proposal

Integrating Expat parser in to Apache Axis2C


Lasith Eranda Haputhanthri




Apache Axis2C is the C implementation of Axis2 Architecture and which is becoming one of the world’s popular open source product in world of Web Services. Since Axis2C is a C implementation it uses Libxml2 as the default XML parser which is C based XML parser. And it has another XML parser called Guththila which is having higher performance over Libxml2 parser. But Apache Axis2C is doesn't supports for the world’s most high performance parser Expat XML parser which is the underlying XML parser for the open source Mozilla project, perl's XML parser and other open source XML parser. In this project I'm going to integrate Expat XML parser to Apache Axis2C through parser abstract layer. In detail we have to integrate the parser in to the AXIOM (Axis Object Model) which is a pull based object model. Since Expat parser is not giving a direct Pull (StAX) API this project becomes a very interesting thing. But after examining Libxml2 integration to Apache Axis2C, it is obvious that it has it's own Pull API which allows to wrap the parser in to AXIOM by simply implementing appropriate methods using Libxml2 Pull API [1] [2]. So during this project it is supposed to implement a Pull API on top of Expat parser and wrap it in to AXIOM using that Pull API. After successful integration users can build Apache Axis2C with Expat parser and use the engine with Expat parser.

Benefits to Apache Axis2C

One of the major motivations behind implementing Axis2 architecture using C was to get higher performance. There are two major external facts which affect the performance of Apache Axis2C. Those are XML parser and the data binding. But with Apache Axis2C it support only ADB which is the fastest data binding with Axis2. So the performance wise only fact which affects Axis2C is the parser. Now Apache Axis2C can be used with Libxml2 and Guththila XML parsers which are good opportunity for the user to select one of the parser just concerning their service or client. Because after doing some benchmark with both parsers I found that Guththila XML parser has much higher performance than Libxml2 for services which exchange small data sets. But after benchmarking large data sets I found that Libxml2 performance is much higher than Guththila XML parser. Since Expat parser is having much higher performance over Libxml2 it is good effort of integrating Expat into Apache Axis2C and benchmarks the performance and compare with Guththila XML parser. Expat is a stream oriented XML parser which feeds in the document after registering handlers to the parser. Normally document is not fed in to the parser completely, since it fed partially parser can start on parsing it before it receives the whole document. So I can expect a better performance with Expat parser with large data sets with Apache Axis2C. If we get higher performance with Expat parser than Guththila parser people will be able to think about making Expat parser as the default parser for Apache Axis2C because with my understanding of Guththila XML parser I’m bit of suspicious whether Guththila parser support for the whole XML specification. But with the specification compliancy Libxml2 and Expat are far better than Guththila XML parser because they are really stable project which use by all over the world. According to my idea the only reason why Axis2C community thinking about keeping their own XML parser in their code base is the performance factor. If we can solve this problem with Expat parser we can avoid having the overhead of maintaining code of an internal XML parser inside Apache Axis2C codebase.

Implementation Details

As i discussed earlier main challenge in this project is to implement a StAX API on top of Expat parser.

• As far as I know the StAX API is used for streams, that is, whenever you are reading a stream and your parser founds places like START_ELEMENT, END_ELEMENT, CHARACTERS, and CDATA you have this possibility to get this information. In fact this is exactly what Expat does. But the main issue is that Expat doesn't giving StAX API directly. So with the first step i propose of writing a StAX API.

• With writing StAX API we have to register my own handlers for characters, start of elements and end of elements. Basically I propose of implementing xmldecl_handler, start_handler, end_handler and char_handler. After successful implementation of handlers in proper places I will be able to implement the API using those handlers. By using a data structure with shared variables we can pass information between handlers without using global variables.

• After implementing StAX API I have to write the wrapper using that API similar way like Apache Axis2C wrap Libxml2 using its StAX API. It is simply implement all the methods in these two source files [1], [2] which are for XML read and write.

• As an example i hope to implement methods like axis2_expat_reader_wrapper_next, axis2_expat_reader_wrapper_free, axis2_libxml2_reader_wrapper_get_attribute_count etc in expat_reader_wrapper.c source and methods like axis2_expat_writer_wrapper_free, axis2_libxml2_writer_wrapper_write_start_element etc in expat_writer _wrapper.c source.

• After successful wrapping I have to change the build system of Apach e Axis2C with enable configuring with Expat parser.

About me

I am Lasith Eranda Haputhanthiri and I live and study in Sri Lanka. I am presently an undergraduate in the University of Moratuwa which is the best engineering university in Sri Lanka. My field of specialization is Computer Science and Engineering. I am very much interested in algorithms and web application development area. And also I’m interest in open source software development also.

I have developed a passion for open source development after seeing the contribution it has done to the development of awareness in almost all the application domains. For students in developing countries like Sri Lanka there is hardly any opportunity to participate in any of the commercial research and development programs. However open source community has enabled us to participate in various projects and experience different aspects of programming.

This is my first involvement in Google Summer of Code. And I am willing to complete above task in by best. I am a fast learner and a good programmer. In last year of studies I was able to be in first twenty ranks in the university. I am also expecting a first class degree in Computer Science and Engineering.

I do not plan to do any other work during this summer or leave the country. However I am looking forward to get involved in open source development during my free time. I will be able to utilize 40 – 60 hours per week for the overdrive project. I will be applying only to the Apache Axis/2c foundation project for Summer of Code 2008.

My Experiences

I have been working with c and c++ for last two years. I involved with several set of developments such as a console driven cricket game which were developed using c and c++. So I was able to acquire a good knowledge of c and c++.

I have also worked with web application development in JEE. I spent my internship period which was conducted at Integrate System International (Pvt) Ltd, developing web applications using JEE. I also have a good knowledge with java scripts. I was able to get familiar with a set of java script libraries such as Jquery and prototypes. And also I’m interested to work with the open source development. I have downloaded and studied a several open source software projects. But I still didn’t have a time to put a considerable contribution on these open source development. I think this project will be a good opportunity to me to get involve with open source software development.

I have been programming for three years now which has given me a wider understanding about different architectural considerations related to different service models.