The Network Server is capable of managing multiple simultaneous JDBC connections with its clients. We call this "Session Management", and this page discusses some of the intricacies of the implementation.

This page was constructed as part of the work on DERBY-1326, DERBY-1219, DERBY-1338, DERBY-1020, and DERBY-51 (and possibly others?), although the page is intended to endure after the bugs are fixed, providing background information about the implementation.

Introduction

The principal objects involved in session management are the Session and the DRDAConnThread.

The server considers threads to be more expensive than socket connections. The server is generally willing to accept an arbitrarily large number of socket connections from clients at the same time, but has a limit on the number of threads that it is willing to create. So there may be more Session instances than DRDAConnThread instances.

Also, the server is willing to discard the Session instance when the client disconnects, whereas Thread creation and destruction is considered expensive, so the server caches and re-uses threads.

So the server tracks threads and sessions separately, and uses a queueing model to associate work requests (Session) with workers (Threads).

Session

There is a Session instance for each physical socket connection that the server has with a client. Via the Session object, the Network Server can access the various Database objects that the session is using.

Session instances are created by the Client Thread, and are tracked in the Network Server's sessionTable.

DRDAConnThread

There are typically several DRDAConnThread instances in the Network Server. These threads carry out work on behalf of Session instances.

DRDAConnThread instances are created by the Client Thread, and are tracked in the Network Server's thread list.

Dispatching work to threads

The server uses a very simple mechanism, called the RunQueue, to rendezvous and dispatch work to threads.

Any DRDAConnThread instance can service any Session instance, so the Network Server uses a very simple first-come,first-serve policy for matching up sessions and threads. To avoid starvation, there is a time-slicing mechanism as well.

Conceptually, the RunQueue may be in any of three states at any given time:

Thus, depending on the current balance of work-needing-a-thread versus threads-needing-work-to-do, the RunQueue at any given instant may be viewed either as a queue of work (Session instances), or as a queue of threads (DRDAConnThread instances).

Of course, this is strictly a conceptual viewpoint, because at the physical level, the RunQueue as a collection object (it is a Java Vector) only ever actually holds Session instances. When the server is in the state of "there are idle threads waiting for work to do", there is no explicit collection object which contains that list of idle threads; instead there is merely a "virtual" collection defined as "the threads which are currently waiting on the RunQueue in NetworkServerImpl.getNextSession()".

Session Lifetime

A Session instance is created by the Client Thread when a new socket connection is accepted from a client. The Session object is added to the Network Server sessionTable at that time.

The Session object is held until the client disconnects, which is typically handled by throwing a disconnect exception on the server side. The DRDAConnThread catches the disconnect exception, removes the session from the Network Server sessionTable, closes it, and discards it.

(I think that if the client disconnects at a point where there are currently more Session instances than DRDAConnThread instances, and the client's Session is currently on the RunQueue, then the Network Server will not detect the client disconnect until the next time that the Session moves to the head of the RunQueue and is dispatched to a thread for processing. But I haven't yet set up this situation in the debugger, so I'm not sure about this part.)

Session objects are also closed and discarded when the Network Server shuts down or is restarted.

Session closing and DERBY-1020

Bug DERBY-1020 is related to the proper handling of exceptions when an exception occurs closing the session especially after the database has been shutdown.

There may be some synchronization impact as well because the exceptions that occur when the database is intentionally shutdown are treated as unexpected exceptions and it will not go through the normal session shutdown codepath.

Note that DERBY-1020 is the root cause of DERBY-273 (now fixed) and DERBY-803, so there appear to be a number of problems with exceptions that occur during shutdown processing.

DRDAConnThread Lifetime

A DRDAConnThread instance is created by the Client Thread when it detects that a new Session has just been created, and there aren't any free threads currently, and the server has not yet reached its maximum number of threads.

Note that since there is no explicit collection object which contains the list of idle threads, the server code maintains a simple counter of the number of idle threads.

DRDAConnThread instances are only destroyed when the Network Server shuts down or is restarted. The mechanism for shutting down a DRDAConnThread instance is cooperative (that is, we don't call Thread.stop()): the Network Server calls the thread's close() method, which sets a flag on the thread, and the thread checks the flag periodically and exits its run() method if it discovers the closed() flag is set.

Note specifically here, with respect to DERBY-1326, that when the DRDAConnThread notices that it has been closed, immediately upon returning from the getNextSession() call in its run() method, that it does not perform any shutdown processing for the Session that it was told to process. This can cause a "hang", as discussed in DERBY-1219.

Network Server startup

When the Network Server starts up, it starts the embedded Derby engine, but does not start any Session or DRDAConnThread. Those objects will be created on demand when new client connections are made, as described above.

Network Server shutdown

When the Network Server shuts down, it makes a concerted effort to clean up its resources:

There are several observations worth making here:

The implications of interrupting the DRDAConnThread instances

Thread.interrupt() is a very powerful call, and the fact that we make this call during server shutdown can have unforeseen consequences. Consider, for example, DERBY-1338, in which the Thread.interrupt() call made during server shutdown has the unwanted side effect of interrupting the loading of a class file, causing the DRDAConnThread instance to report a ClassNotFound exception.

In general, it seems like it would be preferable to design a protocol in which we do not need to interrupt the daemon thread instances, or, if we do, that we only do so at a point where we are certain that it is safe to interrupt the threads.

Network Server restart

The Network Server has a feature by which it can restart itself, without needing to shut down and restart the entire application which contains the Network Server. The Network Server does this automatically when it detects that the underlying Derby engine has been shut down? The checkDataSource test contains some code which triggers this restart processing.

When the Network Server restarts itself, it does the following:

There are several observations worth making here.

Synchronization

There are several monitors used by the Network Server session management code for synchronization:

serverStartSync synchronization

The serverStartSync object has exactly one use, which is to single-thread the flow through the startNetworkServer() method so that the network server is always started (or restarted) exactly once, no matter how many threads call this routine at the same time.

The serverStartSync monitor is used in conjunction with the restartFlag boolean, and there is a comment at the head of the startNetworkServer() method regarding the use of these two variables which I don't understand.

sessionTable synchronization

The sessionTable object is a Java Hashtable, which is an automatically-synchronized collection object. Therefore calls to sessionTable.put(), sessionTable.get() and sessionTable.remove() do not need to be synchronized by the caller; the Hashtable will perform internal synchronization automatically.

However, when traversing an enumeration of the elements in the Hashable, the automatic synchronization does not help, so in these cases the Network Server code explicitly encloses these traversals in synchronized blocks.

threadList synchronization

The threadList object is a Java Vector, which is also an automatically-synchronized collection object. So, again, calls to threadList.addElement(), threadList.get(), threadList.size(), etc. do not need to be synchronized by the caller.

However, when traversing the entire threadList, the automatic synchronization does not help, so in these cases the Network Server code explicitly encloses the traversals in synchronized blocks.

Unfortunately, the Network Server code is inconsistent about these traversals. These traversals are enclosed in synchronized blocks:

However, these traversals are not enclosed in synchronized blocks:

This inconsistent synchronization is not explained in the code.

runQueue synchronization

The runQueue object is also a Java Vector, so again it is an automatically-synchronized object, and again the collection traversals need to be explicitly synchronized.

Furthermore, the runQueue object is also used for wait() and notify() calls, so some additional synchronization is needed to perform these operations.

And the runQueue synchronization is also used to protect the freeThreads variable, which is the count of the number of threads currently waiting on the runQueue object.

Unfortunately, the Network Server code is again inconsistent in its use of the explicit synchronization. There is inconsistent or missing synchronization in at least these locations:

And it seems to me that the getFreeThreads() method is inherently dangerous, since even though it is internally synchronized, the instant that the method returns to its caller the value could change.

Possible ways to simplify/repair the synchronization

Here are some ideas I've had about possible changes to the synchronization:

freeThreads counter maintenance

As discussed earlier, there is no explicit collection object which contains the set of threads which are currently idle, waiting for work to do.

Instead, the Network Server maintains an integer variable, called freeThreads, to count the number of such threads. Immediately before calling runQueue.wait(), the thread increments this counter, and immediately after returning from runQueue.wait(), the thread decrements this counter.

However, it is also possible for the thread to be interrupted, in which case it doesn't return normally from runQueue.wait(), but rather throws an InterruptedException from that method.

The getNextSession() method handles this possibility by catching the InterruptedException message, and decrementing the counter in this case as well.

However, when the Network Server restarts, it also unconditionally sets the counter to 0. This means that there is a latent problem, which I uncovered when I experimented with having the Network Server interrupt the threads during restart processing. What happens is as follows:

The counter then goes negative, and becomes completely useless. The Client Thread thinks that there are plenty of free threads, because the free threads counter is not 0, so it never starts any new threads, but in fact there aren't any free threads (there are -1 or -2 or some number like that).

The symptom is that, after the Network Server is restarted, assuming that it interrupts the threads during restart processing, it then never starts any new threads, although it continues to accept new TCP/IP connections and creates new Session objects.

The fix, I believe, is simply to remove the unconditional setting of the freeThreads counter to 0 in the restart processing, and let the incrementing and decrementing logic in getNextSession() handle all the maintenance of this counter.

Benign wakeups and notifyAll

There is a standard convention for thread programming, which dates back to Andrew Birrell's classic paper, "An Introduction to Programming with Threads" for handling the waiting and waking up of threads.

As Birrell explains (pages 14-15), threads which are waiting on a resource should always be coded as:

You will note that the getNextSession() method is already coded in the correct style, which is good.

When the wait loop is coded in this fashion, Birrell explains, one benefit is (p. 15):

Here is where there is a slight issue in the NetworkServer code. In the runQueueAdd() method, note that we currently issue runQueue.notify(). This means that we are waking up at most one thread to go handle the newly-queued Session.

Unfortunately, at this point, runQueueAdd() does not have the proper information to be certain that it is waking up the correct thread. In particular, in the case of a NetworkServer restart, what if some of the waiting threads happened to be "poisoned" (threads that had been closed by the restart code)?

If we simply change runQueueAdd() so that it calls notifyAll(), rather than notify(), then everything will still function properly even if the current mix of threads waiting on the runQueue contains both closed threads and non-closed threads. All of those threads will be awoken; one non-closed thread will eventually grab the session and go run it; all the other threads will either detect that they have been closed and break out of getNextSession() or detect that there are no more Sessions to run and go back to sleep.

Changing the Session Management code

I think that there are a number of problems in the Session Management code which need to be addressed, as part of the bugs DERBY-1326 and DERBY-51, and possibly others.

However, before we proceed to changing the code, I want to make sure that we are correctly understanding the current operation of the code, so I'm only taking the wiki page this far right now, and I intend to return to this page at a later date and start to propose changes to the code, once we've reviewed the material so far.

NetworkServerSessionManagement (last edited 2009-09-20 22:11:54 by localhost)