A Guided Tour of HttpCore (module-main)



Welcome

Welcome, visitors, to this guided tour of HttpCore. I am your tour guide. If you could please come a little closer and gather around me, so I don't have to shout? Thank you, that's much better. I'm about to give you a short introduction, and then we'll visit some interesting places in HttpCore so you can see how it works. Whenever you've got a question, feel free to ask. That's what I'm here for, to answer your questions.

As you probably know, HTTP is a protocol for exchanging messages between a client and a server. It's in widespread use, and it typically is running on top of plain TCP/IP or secure TLS/SSL sockets.
Here at Apache, there is an implementation of the client side of that protocol called the Commons HttpClient. Informally, we also call it "the 3.x codebase" or simply "the old code". It proved quite useful to a lot of people, but the old code has severe limitations in its design. For example, there is a class called HttpMethodBase. It represents a request and a response at the same time, and it also implements logic for processing both. This kind of design where different things are crammed together in a single place makes it really hard to maintain or extend the code.

Therefore we started a new successor project called HttpComponents. Based on the experience gained with the old code, it implements the HTTP protocol with a new approach. Above all, there are several modules dealing with different aspects of the big problem. As you can gather from its name, the HttpCore module is at the very heart of this effort. It defines stuff on which all the other modules depend and rely.
HttpCore deals with representation of HTTP messages, and with transport logic for sending and receiving those messages. It also defines a kind of framework infrastructure so other modules can plug in functionality. Unlike the old code, HttpCore is not specific to the client side of HTTP communication, it can also be used for the server side. And because it is so fundamentally different from its predecessor, we put all the code into an all-new package hierarchy so you don't confuse them.

Now, if you would like to follow me to the main hall... it's called package org.apache.http. You may want to keep the JavaDocs at hand, that will make it easier for you to follow my explanations.

Messages

The first problem we had to deal with is the representation of messages. If you don't know how to represent a message, you can't send or receive it, right? So here we have a set of interfaces for the building blocks of an HTTP message. There's the RequestLine for a request and the StatusLine for a response, both containing a ProtocolVersion. The latter is so elementary that we made it a class instead of an interface, and of course we have the HttpVersion derived from it. Then we have a Header with name and value, where the value can have multiple HeaderElements. And finally there is the message body, the HttpEntity.

So, out of these building blocks, we collect messages. Every HttpMessage has headers, which can be added or deleted at will. HttpRequest adds the request line, HttpEntityEnclosingRequest an entity. HttpResponse adds a status line and also an entity. For convenient integration into frameworks that explore the Factory pattern, there are factory interfaces for both requests and responses.

Now, if you would come over here and have a look through this window into the adjoining room? It's a bit too small for all of us to go into, but you can see the important things from here. It is called package org.apache.http.message. Notice that there are basic implementations for all of the message representation interfaces. You'll hardly need more than those when writing an application that uses HttpCore directly.

All right, are there more questions about the basic implementations? No? Good, then let's move on to the room over there. It is called package org.apache.http.entity. You can find a selection of message entities in there. Message entities are really not that much different from the request entities in HttpClient 3.x, except they are no longer tied to the client side. As in the old code, there are entities getting their content from a string, byte array, file, or input stream. The BasicHttpEntity is what we use when a message is received over a connection. You'll see the connections later today. We also have some advanced stuff for wrapping and buffering entities, and an EntityTemplate that simplifies writing a new entity if you have to.

Any more questions about entities? Fine, then let's pass through this door, back into the main hall. We're going to have a look at connections next.

Connections

Connections are needed to send HTTP message from client to server or the other way 'round. On the interface level, we have the HttpConnection. It allows for checking whether a connection is open, for closing it or shutting it down, and for getting statistical data if such has been gathered. To actually send and receive messages, you have to use either HttpClientConnection or HttpServerConnection, depending on what you implement. Obviously, the client connection allows for sending requests and receiving responses, whereas the server connections receives requests and sends responses. Messages are passed to and from the connections in terms of the interfaces we have just seen. We require two calls for sending the message header and the message entity. That allows for explicit handling of the expect-continue handshake, for example.

Before you ask any more questions, it's probably best we move on into the next room, which is called package org.apache.http.impl. As you can see, there is a whole bunch of connection implementation classes. Don't let that confuse you, it's just for keeping the code maintainable. All you really need to look at are the two classes DefaultHttpClientConnection and DefaultHttpServerConnection. You see, there are the bind operations I told you about, where you pass in an open socket to have an open connection. And inherited from a base class, there also is a getSocket method.

Now please, visitors... I know that the connections look very interesting and complicated, but you really don't want to miss the exciting things still coming up. So, if you follow me back to the main hall, and then on to the next room...

Execution

Here we are in package org.apache.http.protocol. This is the home of the framework for executing the higher levels of HTTP. Remember that the lower levels, in particular transport encodings, are dealt with automagically by the connections we have just left. The protocol framework here is concerned with putting the appropriate headers into messages, and with calling the connection methods at the right time in the right sequence.

For example, the expect-continue handshake is dealt with here, both on the client and server side. For those of you that are not familiar with the details of that handshake, I'll explain it briefly. When sending a message with a body that is large or tricky to generate, clients don't want to risk sending the message data just to get a simple error response from the server, for example because authentication is required. In that case, the client will put a special Expect: header into the request and send only the message headers. The server is expected to check the message headers, and to respond with a status code of 100 if it finds everything ready for processing the request entity. Only then will the client send the rest of the request. If the server detects a problem, it responds with the appropriate error status code and the request body is never sent.
Here we have the HttpRequestExecutor, the client side implementation for protocol execution. It handles the expect-continue handshake, and it also checks whether an incoming response has an entity that needs to be read. For the server side we have HttpService, which checks whether the incoming request has an entity, and uses HttpExpectationVerifier if the expect-continue handshake is employed. Both use an HttpProcessor to modify and interpret headers.

The framework for setting and interpreting headers is based on interceptors. Those are little classes which take care of one specific aspect, often just a single header. These are collected into a list of interceptors that need to be executed on a message before it is sent, or after it is received. A range of typically needed interceptors is provided, I'll just pick some examples.
Here we have the RequestUserAgent. It is a request interceptor for outgoing requests, so it is executed on requests on the client side before they are sent. Its only task is to add a User-Agent header, if there is none in the request. If you don't want a User-Agent header to be sent, you just don't add this interceptor to your list.

A trickier interceptor is RequestContent, also applied before a request is sent on the client side. It checks whether there is an entity in the request and sets up Content-Length and Transfer-Encoding headers if so. This is a must-have interceptor if you want to send a request entity. On the server side, ResponseContent does the same for the response.

The already mentioned HttpProcessor holds lists of request and response interceptors that should be applied. You set it up once when your application initializes.

You see this interface here, HttpContext. That is a collection of named attributes, where names are strings and attributes can be any kind of Java object. When a request is executed, it has one specific context. Likewise when a request is being serviced on the server side, of course. The interceptors, and many other parts of the framework, have access to this context. So your application can put some data - like a password - into the context, and an interceptor picks it up. On the other hand, an interceptor can put data into the context - like incoming cookies - and your application picks that up after the execution.
The context is also the place to keep session information, like the cookies that should be sent or passwords that have already been entered. Mind you, core does not handle cookies or authentication. Core is hardcore, it just provides the framework for doing that. The examples show what attributes need to be present in the context for the default interceptors to work. We have synchronized and unsynchronized implementations of the HttpContext interface.

Now, if you would kindly follow me to the last stop on our little tour...

Parameters

This is package org.apache.http.params, home of the parameter framework. We've introduced the preferences framework with version 3.0 of the old code. The 4.0 version is a natural evolution of that rather than a radical redesign. We keep maps of named parameters in instances of HttpParams. Parameters get attached to HTTP messages, so they are available to all objects involved in processing a message: interceptors, connections, and whatever else other modules are going to add on top of core.
The names of parameters are defined in PNames interfaces, where each interface lists parameters for a particular part of the framework. We also have Bean classes for these parameter sets. These beans don't store the parameters in attributes, but put them into a parameter map. This comes in handy if you want to use something like the Spring framework, which can populate beans from configuration files but wouldn't know what to do with a map.
In the old code, parameters were hierarchical. This feature is still present, we can link a map of parameters with another one providing defaults. However, this feature should never be used by applications directly. Parameters may and will be linked inside the framework, and having both application and framework set up parameter hierarchies would wreak havoc on both.

Caution has to be used when updating parameters after they have been passed to the framework. You should avoid to update a parameter set at all while execution or servicing is in progress. The default implementation of HttpParams is unsynchronized, because the framework will use it read-only.
The parameter values themselves should be read-only at all times. So if for example you stored a modifiable map as a parameter value, never modify that map again. If you have to update the parameter set with a new map, then copy the old one, modify the copy, and replace the old value with the modified copy.

Farewell

I hope you enjoyed our tour of the HttpCore module and found the experience enlightening. If you have any more questions, do not hesitate to post them on the user mailing list. Saying that, you might want to search the archives of the mailing lists first, in case somebody else already got an answer to a similar question. We are also considering to offer guided tours of other modules in the future. We'd be happy if you join one of those when they become available.

Thank you all, and see you next time!

GuidedTourOfHttpCore (last edited 2009-09-20 21:44:17 by localhost)