Derby Database Corruption Issues

Derby is a very stable database.

However, just like with any DBMS, it is still possible to corrupt a database.

This document explores some of the issues related to database corruption, including things that might cause corruption, and techniques for dealing with a corrupted database.

Categories of Corruption

There are two types of corruption

Possible Causes of Database Corruption

The Derby Developers community is always attempting to ensure database corruption is as unlikely as is possible. However, there are some issues that have shown in the past to lead to database corruption:

Prevention/Preparedness

Matching the list of possible causes above, one can do the following to prepare and prevent database corruption:

Salvage

If the database has become corrupt, and you do not have a usable backup, there are still a couple of approaches for salvaging as much as possible from the damaged databases. This is not a guaranteed mechanism of restoring a database - depending on the corruption, the tools may not be able to access the database, and even if they do, in the end, there may still be data missing, and it's impossible to say what. Having a proper backup plan, and restoring from backup is still a better option.

Optional Tools for working with corrupt databases

There are some optional tools that have been written to facilitate crawling a database in the face of corruption. These are not part of the product distributions, and at time of writing this (December 2013), they are not even checked-in into the source code. To use them (currently), you will have to download the various tools from their JIRAs and compile with access to the derby.jars. These tools are (partly) based on some documentation on the Apache Derby web site:

The optional tools are found in these JIRAs:

Note: currently, to compile LogFileReader.java in trunk, you need to adjust the permissions of the method org.apache.derby.iapi.services.io.RegisteredFormatIds.TwoByte from private to public.

ListOfTables

First it is helpful to understand the connection between tablenames and the .dat filenames. See this page: ListFileNamesOfTables. A quick way to go from container/conglomerate number is to convert the container number from dec to hex. Then add 'c' to the front and '.dat' as extension. If the hex number ends in 0 it is a base table, and if it ends in 1, it is a btree (index).

RawDBReader

The first of these tools is DataFileVTI and RawDBReader.java of DERBY-6136. The JIRA issue provided a .sql file which can be used to access a database. You have to change the path in the .sql.

Note, that you *do* need to know the passwords of the various users to access their schemas. Otherwise, the tool will give an error like this:  The exception 'java.lang.Exception: APP is not the owner of the database 

Note also, that if you've built the tools using trunk, and the database was created with an older Derby version, you need the special alpha-version upgrade flag, e.g.:  java -Dderby.database.allowPreReleaseUpgrade=true org.apache.derby.tools.ij dataFileVTI-1.sql 

Which for instance for the syschemas showed this:

ij> select * from sysschemas;
    SCHEMAID|SCHEMANAME|AUTHORIZATIONID
    -------------------------------------------------------------------
    c013800d-00f8-5b53-28a9-00000019ed88|SYSIBM|DBA
    8000000d-00d0-fd77-3ed8-000a0a0b1900|SYS|DBA
    c013800d-00fb-2641-07ec-000000134f30|SYSCAT|DBA
    c013800d-00fb-2642-07ec-000000134f30|SYSFUN|DBA
    c013800d-00fb-2643-07ec-000000134f30|SYSPROC|DBA
    c013800d-00fb-2644-07ec-000000134f30|SYSSTAT|DBA
    c013800d-00fb-2647-07ec-000000134f30|NULLID|DBA
    c013800d-00fb-2648-07ec-000000134f30|SQLJ|DBA
    c013800d-00fb-2646-07ec-000000134f30|SYSCS_DIAG|DBA
    c013800d-00fb-2649-07ec-000000134f30|SYSCS_UTIL|DBA
    80000000-00d2-b38f-4cda-000a0a412c00|APP|APP
    2faf0139-011a-2edf-f872-00000306a538|TSTSCHEMA|DIDA

A second mechanism to access the database is by registering a custom tool. You need to have a database physically upgraded to at least 10.10 or the syscs_register_tool procedure will not be available:

 call syscs_util.syscs_register_tool('customTool',<register_or_un>,'<name of the class>,'<controlschemaname>',<tableprefix>, '<database_location>','<bootpwd>','<dbo_username>','<dbo_password'>); 

For example:

 call syscs_util.syscs_register_tool('customTool',true,'RawDBReader','CONTROL','RAW_', 'c:/10tst/opttools/corruptdb/TST',null,'APP',null); 

At this point, you can select contents from the corrupted database. Tor instance, if there's a table 'sched' in the schema 'TSTSCHEMA', like so:  select * from RAW_TSTSCHEMA.SCHED; 

I found, that a good way to effectively see all schemas and their tables was to run this query in ij (after issuing maximumdisplaywidth 20;)

ij> SELECT SCHEMANAME, TABLENAME FROM sys.sysschemas s, sys.systables t WHERE s.schemaid = t.schemaid

SCHEMANAME          |TABLENAME
-----------------------------------------
...
TSTSCHEMA          |SCHED
RAW_TSTSCHEMA      |SCHED
...

and you unregister the tool like so;  call syscs_util.syscs_register_tool ('customTool',false,'RawDBReader','CONTROL','RAW_'); 

The schemas starting with 'RAW_' will be removed again when you unregister the tool.

See DERBY-6136: for more examples when the database is encrypted, or using other authentication mechanisms.

TableSignatureReader.java and DataFileReader.java

This is from DERBY-5201. The intention is to read the seg0.

TableSignatureReader

This tool prints out the signature of a table, which is needed for the DataFileReader. This tool did not work with just compiled classes for me, it gave a 'java.sql.Exception: No suitable driver". But it worked with jars (derbyrun.jar) in the classpath.

Usage:  java TableSignatureReader connectionURL schemaName tableName 

where

   connectionURL e.g. "jdbc:derby:db1"
   schemaName case-sensitive schema name, e.g. APP
   tableName case-sensitive table name, e.g. T1

Here's an example use...  java TableSignatureReader "jdbc:derby:db1" SYS SYSALIASES 

...which prints out the following result:

( "ALIASID" char( 36 ), "ALIAS" varchar( 128 ), "SCHEMAID" char( 36 ), "JAVACLASSNAME" long varchar, "ALIASTYPE" char( 1 ), "NAMESPACE" char( 1 ), "SYSTEMALIAS" boolean, "ALIASINFO" serializable, "SPECIFICNAME" varchar( 128 ) )

DataFileReader

Usage:  java DataFileReader $dataFileName [ -v ] [ -d $D ] [ -p $P ] [ -n $N ] [ -e $encryptionAttributes $serviceProperties ] 

where

   -v Verbose. Print out records and slot tables. Field data appears as byte arrays. If you do not set this flag, the tool just decodes the page headers.
   -d Data signature. This makes a verbose printout turn the field data into objects. $D is a row signature, e.g., "( a int, b varchar( 30 ) )"
   -p Starting page. $P is a number which must be at least 1, the first page to read after the header. Page 0 (the header) is always read.
   -n Number of pages to read. $N is a positive number. Defaults to all subsequent pages.
   -e If the database is encrypted, you must supply the encryption attributes and the location of service.properties.

For example, the following command deserializes all of the records in the SYSCONGLOMERATES file:  java DataFileReader db/seg0/c20.dat -v -d "( a char(36), b char(36), c bigint, d varchar( 128), e boolean, f serializable, g boolean, h char( 36 ) )" 

The following command decodes the entire SYSCOLUMNS conglomerate:  java DataFileReader db/seg0/c90.dat -v -d "( a char(36), b char(128), c int, d serializable, e serializable, f char( 36 ), g bigint, h bigint, i bigint )" 

Note the special 'serializable' type in the preceding example. Use 'serializable' for user-defined types and for the system columns which are objects.

The following example decrypts and deserializes an entire SYSCONGLOMERATES file, dumping the result into an xml file for inspection:

 java DataFileReader wombat/seg0/c20.dat -v -d "( a char(36), b char(36), c bigint, d varchar( 128), e boolean, f serializable, g boolean, h char( 36 ) )" -e Wednesday wombat/service.properties > z.xml 

Note the special 'serializable' type in the preceding example. Use 'serializable' for user-defined types and for the system columns which are objects.

Here are examples of using this tool on encrypted databases:

 java DataFileReader encryptedDB/seg0/c490.dat -v -d "( a varchar( 50 ), b char( 11 ) )" -e "encryptionKey=abcd1234efab5678" encryptedDB/service.properties > ~/junk/z.xml 

 java DataFileReader bootpasswordDB/seg0/c490.dat -v -d "( a varchar( 50 ), b char( 11 ) )" -e "bootPassword=mysecretpassword" bootpasswordDB/service.properties > ~/junk/zz.xml 

Other examples of usage:

1) Decode an entire data file, putting the resulting xml in the file z.xml. You can then view that file using a browser like Firefox, which lets you collapse and expand the elements.  java DataFileReader db/seg0/c20.dat -v -d "( a char(36), b char(36), c bigint, d varchar( 128), e boolean, f serializable, g boolean, h char( 36 ) )" > z.xml 

2) Pretty-print the file header:

java DataFileReader db/seg0/c20.dat -n 1
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<dataFile>
  <fileHeader>
    <allocPage number="0">
      <formatableID>118</formatableID>
      <pageHeader>
        <isOverFlowPage>false</isOverFlowPage>
        <status hexvalue="1">
          <flag>VALID_PAGE</flag>
        </status>
        <pageVersion>9</pageVersion>
        <slotsInUse>0</slotsInUse>
        <nextRecordID>6</nextRecordID>
        <pageGeneration>0</pageGeneration>
        <previousGeneration>0</previousGeneration>
        <beforeImagePageLocation>0</beforeImagePageLocation>
        <deletedRowCount>1</deletedRowCount>
      </pageHeader>
      <nextAllocPageNumber>-1</nextAllocPageNumber>
      <nextAllocPageOffset>0</nextAllocPageOffset>
      <containerInfoLength>80</containerInfoLength>
      <containerInfo>
        <formatableID>116</formatableID>
        <containerStatus hexvalue="0">
        </containerStatus>
        <pageSize>4096</pageSize>
        <spareSpace>0</spareSpace>
        <minimumRecordSize>12</minimumRecordSize>
        <initialPages>1</initialPages>
        <preAllocSize>8</preAllocSize>
        <firstAllocPageNumber>0</firstAllocPageNumber>
        <firstAllocPageOffset>0</firstAllocPageOffset>
        <containerVersion>0</containerVersion>
        <estimatedRowCount>71</estimatedRowCount>
        <reusableRecordIdSequenceNumber>0</reusableRecordIdSequenceNumber>
        <spare>0</spare>
        <checksum>2463908068</checksum>
      </containerInfo>
      <allocationExtent>
        <extentOffset>4096</extentOffset>
        <extentStart>1</extentStart>
        <extentEnd>10216</extentEnd>
        <extentLength>8</extentLength>
        <extentStatus hexvalue="30000010">
          <flag>HAS_UNFILLED_PAGES</flag>
          <flag>KEEP_UNFILLED_PAGES</flag>
          <flag>NO_DEALLOC_PAGE_MAP</flag>
        </extentStatus>
        <preAllocLength>7</preAllocLength>
        <reserved1>0</reserved1>
        <reserved2>0</reserved2>
        <reserved3>0</reserved3>
        <freePages totalLength="8" bitsThatAreSet="0"/>
        <unFilledPages totalLength="8" bitsThatAreSet="1"/>
      </allocationExtent>
    </allocPage>
  </fileHeader>
  <pageCount>1</pageCount>
</dataFile>

3) Count the number of pages in a data file:

 java DataFileReader db/seg0/c20.dat | grep pageCount <pageCount>9</pageCount> 

4) Decode 3 pages, starting at page 2. This one is a little tricky because the header page is always decoded. So you need to ask for 4 pages (3 data pages plus 1 header page):

 java DataFileReader db/seg0/c20.dat -v -p 4 -n 3 

LogFileReader and ControlFileReader

These are from DERBY-5195: Create tools for browsing the files in the database log directory.

LogFileReader

Usage:  java LogFileReader $logFileName [ -v ] [ -p $P ] [ -n $N ] [ -e $bootPassword $serviceProperties ]  where

   -v Verbose. Deserialize the logged operations. If you do not set this flag, the tool just decodes the wrapper headers.
   -p Starting position. $P is a positive number, the offset of the first log entry to read. This causes the tool to skip reading the file header as well.
   -n Number of records to read. $N is a non-negative number. If you do not specify this flag, the tool prints all subsequent log entries.
   -e If the database is encrypted, you must supply the boot password and the location of service.properties.

The following example decrypts and deserializes an entire log file, dumping the result into an xml file for inspection:

 java LogFileReader wombat/log/log1.dat -v -e Wednesday wombat/service.properties > z.xml 

More examples:

1) Decode an entire log file, putting the resulting xml in the file z.xml. You can then view that file using a browser like Firefox, which lets you collapse and expand the elements. Because the -v switch is specified, the contents of the logged operations are deserialized and <details> elements are populated with the toString() results:

 java LogFileReader db/log/log2.dat -v > z.xml 

2) Pretty-print a log file header:

java LogFileReader db/log/log2.dat -n 0
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<logFile>
  <formatableID>128</formatableID>
  <obsoleteVersion>9</obsoleteVersion>
  <logFileNumber>2</logFileNumber>
  <previousLogRecordEndInstant>
    <logFileNumber>1</logFileNumber>
    <position>11510</position>
  </previousLogRecordEndInstant>
  <logRecords>
  </logRecords>
  <recordCount>0</recordCount>
</logFile>

3) Count the number of entries in a log file:

java LogFileReader db/log/log2.dat | grep recordCount
<recordCount>156</recordCount>

4) Decode 3 log entries, starting at a given record offset:

java LogFileReader db/log/log2.dat -v -p 29363 -n 3
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<logFile>
  <logRecords>
    <wrapper>
      <forwardLength>1818</forwardLength>
      <logInstant>
        <logFileNumber>2</logFileNumber>
        <position>29363</position>
      </logInstant>
      <logRecord>
        <formatableID>129</formatableID>
        <groups hexvalue="180">
          <flag>XA_NEEDLOCK</flag>
          <flag>RAWSTORE</flag>
        </groups>
        <transactionID value="192"/>
        <operation type="org.apache.derby.impl.store.raw.data.PurgeOperation">
          <details>Page Operation: Page(2,Container(0, 81)) pageVersion 75 : Purge : 36 slots starting at 37 (recordId=43) (recordId=44) (recordId=45) (recordId=46) (recordId=47) (recordId=48) (recordId=49) (recordId=50) (recordId=51) (recordId=52) (recordId=53) (recordId=54) (recordId=55) (recordId=56) (recordId=57) (recordId=58) (recordId=59) (recordId=60) (recordId=61) (recordId=62) (recordId=63) (recordId=64) (recordId=65) (recordId=66) (recordId=67) (recordId=68) (recordId=69) (recordId=70) (recordId=71) (recordId=72) (recordId=73) (recordId=74) (recordId=75) (recordId=76) (recordId=77) (recordId=78)</details>
        </operation>
      </logRecord>
      <backwardLength>1818</backwardLength>
    </wrapper>
    <wrapper>
      <forwardLength>90</forwardLength>
      <logInstant>
        <logFileNumber>2</logFileNumber>
        <position>31197</position>
      </logInstant>
      <logRecord>
        <formatableID>129</formatableID>
        <groups hexvalue="180">
          <flag>XA_NEEDLOCK</flag>
          <flag>RAWSTORE</flag>
        </groups>
        <transactionID value="192"/>
        <operation type="org.apache.derby.impl.store.raw.data.InsertOperation">
          <details>Page Operation: Page(1,Container(0, 81)) pageVersion 147 : Insert : Slot=1 recordId=79</details>
        </operation>
      </logRecord>
      <backwardLength>90</backwardLength>
    </wrapper>
    <wrapper>
      <forwardLength>17</forwardLength>
      <logInstant>
        <logFileNumber>2</logFileNumber>
        <position>31303</position>
      </logInstant>
      <logRecord>
        <formatableID>129</formatableID>
        <groups hexvalue="112">
          <flag>LAST</flag>
          <flag>COMMIT</flag>
          <flag>RAWSTORE</flag>
        </groups>
        <transactionID value="192"/>
        <operation type="org.apache.derby.impl.store.raw.xact.EndXact">
          <details>EndXact null Committed : transactionStatus = Committed</details>
        </operation>
      </logRecord>
      <backwardLength>17</backwardLength>
    </wrapper>
  </logRecords>
  <recordCount>3</recordCount>
</logFile>

ControlFileReader

This tool reads the control file in the log directory and pretty-prints it as an xml file.

Usage:

java ControlFileReader db/log/log.ctrl
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<controlFile>
  <formatableID>128</formatableID>
  <obsoleteVersion>9</obsoleteVersion>
  <logCounter>
    <logFileNumber>1</logFileNumber>
    <position>11369</position>
  </logCounter>
  <majorVersion>10</majorVersion>
  <minorVersion>9</minorVersion>
  <subversionRevision>1092193</subversionRevision>
  <flags flags="1">
    <flag>IS_BETA_FLAG</flag>
  </flags>
</controlFile>

java ControlFileReader generated/toursdb/toursdb/log/log.ctrl
<?xml version="1.0" encoding="UTF-8"?><controlFile>
    <formatableID>128</formatableID>
    <obsoleteVersion>9</obsoleteVersion>
    <logCounter>
        <logFileNumber>1</logFileNumber>
        <position>607076</position>
    </logCounter>
    <majorVersion>10</majorVersion>
    <minorVersion>10</minorVersion>
    <subversionRevision>1</subversionRevision>
    <flags flags="0"/>
    <checksum>0</checksum>
</controlFile>

DatabaseCorruption (last edited 2014-12-13 16:32:01 by MikeMatrigali)