The DataDictionary implementation contains several major caches of descriptors:

Since a TableDescriptor object is the root of a tree of objects describing that table (its columns, its constraints, its triggers, its conglomerates), caching the TableDescriptor also implicitly caches the table's ColumnDescriptors, its ConstraintDescriptors, its TriggerDescriptors and so forth.

Caching is crucial to DataDictionary performance; otherwise we would constantly need to be reading metadata from the SystemTables on disk. Sharing one copy of the DataDictionary information in memory among many users also reduces memory footprint. So the DataDictionary tries very hard to read the SystemTables information into memory as rarely as possible, and tries to hold it in memory as long as possible.

There are two reasons why the DataDictionary can't always do this:

When the database schema is modified, the DataDictionary uses a very simple mechanism: it empties the caches and starts over. It does not make any special efforts to determine which cached information has become invalid, but instead just removes it all.

LanguageSystem code which accesses the DataDictionary has to follow the reading/writing protocol in order to ensure the correct operation of the caches. This protocol involves calling startReading / doneReading when reading information from the DataDictionary, and calling startWriting / doneWriting when updating the database schema.

For example, CreateTableConstantAction calls startWriting when it is creating a new table in the database. It then generates a new TableDescriptor for the new table and calls addDescriptor to add the information about the table to the SystemTables.

In general, the DataDictionary caching mechanism is trouble-free and efficient. However, at times it may be useful to understand its operation, both for performance reasons and for debugging reasons.

For performance analysis, the DataDictionary cache has the following properties:

For debugging, consider the following example: DERBY-1724 is an interesting case of a situation in which DataDictionary caching plays a role. DERBY-1724 was a manifestation of DERBY-1583, which was an underlying bug involving an incorrect assumption about the ColumnDescriptor object. A ColumnDescriptor object may or may not have an internal pointer to a corresponding TableDescriptor object. When a ColumnDescriptor is first created by SYSCOLUMNSRowFactory, it does not have a TableDescriptor pointer. This is because not all ColumnDescriptor objects are necessarily tied to particular tables; some may be expressions computed at runtime, for instance. At the point where FromBaseTable determines that a particular ColumnDescriptor is definitely tied to a particular TableDescriptor, it sets the table descriptor pointer in that ColumnDescriptor. Since ColumnDescriptor objects are cached, this updated object remains in memory for subsequent use. This means that code which uses the ColumnDescriptor may or may not find that the table descriptor pointer has already been set, depending on whether or not the cache has managed to retain the descriptor in memory since the pointer was set. And, to close the chain of logic, the DERBY-1724 bug script contains a DDL statement (GRANT) in a transaction, which causes the cache to be disabled and thus enables the conditions for the DERBY-1583 bug to be triggered.

DataDictionaryCaching (last edited 2009-09-20 22:11:46 by localhost)