Table of Contents
In Derby 10.3, a user might create a table with CHAR datatypes along with other datatypes. The task at hand is if the user has asked for territory based collation, then we want these CHAR datatypes to collate differently than the CHAR datatypes that exist today in 10.2 And if the user hasn't requested territory based collation, then we want these CHAR datatypes to collate the way they do in 10.2 today. So, in short, the CHAR datatype in 10.3 will have different collation behavior depending on what user has requested at the database create time. But as far as the user is concerned, they are just SQL CHAR datatypes and not new SQL datatypes.
In the original proposal, the intention was to introduce new internal CHAR datatype which extended current CHAR datatype in Derby. This would have been implemented by having a new format id associated with the new internal CHAR datatypes. But with that proposal, there was overhead associated with implementing new getter methods in DataValueFactory for this new internal datatype and the type compiler associated with the new internal datatype etc. The other issue with the proposal was that there are many places in the code today where we get character datatypes and all of those cases will have to be inidividually investigated to see which CHAR datatype implementation they should use. So, if the character datatype is getting instantiated for CHAR columns in system tables, then we should use existing CHAR datatype implementation. But, if they were getting instantiated for user table, then the new internal CHAR datatype should be instantiated. AND there will be places where we can't determine which one of the two CHAR implementations should we use, for eg a string value in a query 'abc'.
The second proposal(current) was based on the idea that CHAR with territory based collation differs from the CHAR with default collation in only one aspect and ie how they are collated. Rest everything is same. So, as long as we know at the collation time, which kind of collation we are dealing with, we should be fine and hence there is no need to generate new internal CHAR datatypes. With that proposal, at compile time, when we associate a DataTypeDescriptor (DTD) with a character string type, we tell what kind of collation should be associated with that DTD and how was that collation derived "collation derivation". If the collation derivation is "none", then the collation type should be ignored. Otherwise, the collation type associated with DTD can be UCS_BASIC/TERRITORY_BASED. Char columns associated with SYS schemas will always have UCS_BASIC in DTD associated with them. Char columns from user schema will have UCS_BASIC/TERRITORY_BASED depending on what user has requested through COLLATION attribute in the jdbc url at database create time. Char columns that are not associated with a specific schema will have their DTD marked with collation as described by the rules in the section "Collation Determination" later on in this page. So, as you can see, collation information will be saved at the column level in language layer. Store will follow the same granularity and it will write the collation type for each and every column in it's metadata (ie for char datatypes as well as non-char datatypes). This collation type will make sense for only char datatypes. For the other datatypes, collation type will be ignored.
Some of the complexity is coming from the fact that a single database can have 2 different collations associated with it's columns, ie, SYS schema will always use UCS_BASIC for it's collation. But all the user schemas will use either UCS_BASIC/TERRITORY_BASED collation. If the collation was of only one type for the entire database, the design/implementation would have been far easier and we could keep collation information at database level rather than column level.
The thread for these design proposals can be found on Derby Dev list at http://www.nabble.com/Collation-feature-discussion-p9526316.html and http://www.nabble.com/Collation-implementation-WAS-Re%3A-Should-COLLATION-attribute-related-code-go-in-BasicDatabase--p9496608.html
In Derby 10.3, there will be two character sets associated with the database. These 2 character sets will have identical character repertoire(UCS) but they may have different default collation associated with them depending on the value of JDBC url attribute COLLATION. The 2 character sets will be
a)USER character set - default collation of UCS_BASIC/TERRITORY_BASED depending on the value of jdbc url attribute COLLATION specified at create database time.
b)SQL_IDENTIFIER character set - default collation of UCS_BASIC.
(Section 4.2.7 indicates a character set has a default collation).
As per SQL spec, Section 11.1<schema definition>, there is an optional syntax to associate a character set for a schema at create schema time. Syntax Rule 5 says that if a character set is not specified by the user, then the character set associated with schema is implementation defined. In Derby 10.3, system schemas will be associated with SQL_IDENTIFER character set and all the user schemas will be associated with USER character set. Futher on, General Rule 3 specifies that the character set associated with schema is used as the default character set for all <column definitions>. Based on this, all the user character columns will pick up the collation associated with USER character set and all the system character columns will pick the up the collation associated with SQL_IDENTIFIER character set.
Following section 11.4 SR10b) it is as though any user character column is defined using CHARACTER SET USER and any system column using CHARACTER SET SQL_IDENTIFIER. E.g.
CREATE TABLE T (NAME VARCHAR(100))
is equivalent for the purposes of the SQL Standard to this:
CREATE TABLE T (NAME VARCHAR(100) CHARACTER SET USER)
Note Derby does not actually support the CHARACTER SET clause.
The character set specification for string literals in SQL specification is not as well defined as for <column definitions>. SQL spec Section 5.3<literal>, Syntax Rule 14b says that if the character set is not specified for character string literal, then character string literal's character set will be the character set of the SQL-client module. Derby does not implement SQL-client module, but definition of SQL-client module in Section 13.1 says that SQL-client module definition has mandatory <module name clause> which is defined in Section 13.2 <module name clause>. The Syntax Rule 4 in this section says that if a character set is not specified for the SQL-client module, then it's character set is "implementation-defined". For Derby, this implementation-defined character set for SQL-client module will be USER. And hence the string literals will have the USER character set associated with them. June 4th 2007 The current code in Derby 10.3 uses the collation type of current compilation schema for string literals rather than using the collation type associated with USER character set. There is a Jira entry DERBY-2731 and a thread http://www.nabble.com/Collation-and-string-literals-limitation-in-SQL-standard--tf3868448.html#a10959872 started by Dan to see what should be the correct collation type for string literals.
Some other background information on collation derivation and collation type from SQL spec : SQL spec talks about character string types having collation type and collation derivation associated with them (SQL spec Section 4.2.2 Comparison of character strings). If collation derivation says explicit or implicit, then it means that there is a valid collation type associated with the charcter string type. If the collation derivation is none, then it means that collation type can't be established for the character string type.
a)Collation derivation will be explicit if COLLATE clause has been used for character string type (this is not a possibility for Derby 10.3, because we are not planning to support SQL COLLATE clause in this release).
b)Collation derivation will be implicit if the collation can be determined w/o the COLLATE clause eg CREATE TABLE t1(c11 char(4)) then c11 will have collation of USER character set. Another eg, TRIM(c11) then the result character string of TRIM operation will have collation of the operand, c11.
c)Collation derivation will be none if the aggregate methods are dealing with character strings with different collations (Section 9.3 Data types of results of aggregations Syntax Rule 3aii).
Following lists the collation type association for various occurences of character string types and for operations involving character string types in Derby 10.3
1)String literal - USER will be the character set of string literals. And collation type will be the default collation of USER which can be UCS_BASIC/TERRITORY_BASED. The collation derivation will be implicit. June 4th 2007 The current code in Derby 10.3 uses the collation type of current compilation schema for string literals rather than using the collation type associated with USER character set. There is a Jira entry DERBY-2731 and a thread http://www.nabble.com/Collation-and-string-literals-limitation-in-SQL-standard--tf3868448.html#a10959872 started by Dan to see what should be the correct collation type for string literals.
2)<column definition> (of character string type) SQL spec Section 11.1 <schema definition>, General Rule 3 specifies that the character set associated with schema is used as the default character set for all <column definitions>. Based on this, character columns in user tables will have USER character set associated with them and hence collation type of UCS_BASIC/TERRITORY_BASED. The collation derivation will be implicit.
3)<character string type> (SQL spec section 6.1 <data type> Syntax Rule 3b and 16) - Rule 3b says that collation type of character string type is the character set's collation AND rule 16 says that if <character string type> is not contained in a <column definition>, then an implementation-defined character set is associated with the <character string type>. We define Derby's implementation-defined character set for such <character string type> to be current compilation schema's character set. The collation derivation will be implicit.
4)<cast specification> Section 6.12 Syntax Rule 10 says that if the target data type of CAST is character string type, then collation type of result character string of CAST specification is the collation of the character set of the target data type. Based on 3) above, that will be the collation of the current compilation schema's character set. The collation derivation will be implicit.
5)Result character string types from UPPER, LOWER, TRIM(LTRIM, RTRIM), SUBSTR will have the same collation type and derivation as their operand. This comes from SQL spec Section 6.29 <string value function> Syntax Rules 8, 8, 11d, 4 respectively).
6)CHAR, VARCHAR functions do not look like they are defined in the SQL spec. Their behavior can be defined as similar to CAST ie, the result character string of CHAR/VARCHAR will have the same collation as current compilation schema's character set. The collation derivation will be implicit.
6a) XMLSERIALIZE (defined in SQL Standard Part 14) has the declared type of the explicit character type in the function and thus falls under the rules defined in 3).
7)For user defined functions' that return character string type, the return type's collation will have the same collation as of the character set of the schema that the function is defined in. The collation derivation will be implicit.
8)JDBC parameters (ie. ?) where the type of the parameter is a character type will get their collation from the context, just like it is done for other attributes of a character type (like nullability, length etc. More info about how this works can be found at http://db.apache.org/derby/docs/10.2/ref/rrefsqlj1083019.html). Also look at the thread http://www.nabble.com/Collation-and-parameter-markers-%28-%29-tf3866040.html#a10952369
9)For CURRENT_USER, SESSION_USER, SYSTEM_USER, SQL spec Section 6.4 Syntax Rule 4 says that their collation type is the collation of character set SQL_IDENTIFIER. In Derby's case, that will mean, the collation of these functions will be UCS_BASIC. The collation derivation will be implicit.
10)CURRENT ISOLATION, CURRENT SCHEMA and CURRENT SQLID seem to be Derby specific functions, I didn't find them in the SQL spec. But in order to match their behavior with the other functions covered in 9) above, their return character string type's collation will be the collation of character set SQL_IDENTIFIER. The collation derivation will be implicit.
11)Aggregate operators involving all character string type operands(Concatenation (see 6.28SR5b), CASE, NULLIF, COALESCE) will follow SQL spec Section 9.3 Data types of results of aggregations. In other words, if all the operands have the collation derivation of IMPLICIT and same collation type is associated with them, then the collation type of resultant character string type will be same and it's collation derivation will be IMPLICIT. But if this criteria for equality is not satisfied, then the resultant character string type will have collation derivation of NONE. For CASE, NULLIF and COALESCE, this behavior is implemented in ValueNodeList.getDominantTypeServices() and for CONCAT, the behavior is implemented in ConcatenationOperatorNode.resolveConcatOperation().
12)And last but not least, the collation methods will follow the rules defined by SQL spec in Section 9.13 Collation determination Syntax Rules 2 and 3e. In other words, at least one operand shall have a declared type collation (that means if the comparison is sysChar1|userChar1 > sysChar2|userChar2, then such a comparison will fail because both sides of > operator have collation derivation of none after applying the rules of 10 above) AND every operand whose collation derivation is implicit shall have the same declared type collation (that means if the comparison is sysChar1 > userChar1WithTerritoryBasedCollation, then such a comparison will fail because left operand has collation derivation as implicit and collation type as UCS_BASIC and the right operand has collation derivation implicit and collation type as TERRITORY_BASED)
1)Set the correct collation type for return parameter from user defined functions when the return type is a character string type.
2)Some of the numbered Rules in Collation Derivation sections on this page need to be discussed to make sure we are setting the collation right. This will make sure that in future, when we do start to support different collation for different user schemas, things don't fall apart. The discussion has been started by Dan in couple threads (http://www.nabble.com/Collation-and-parameter-markers-%28-%29-tf3866040.html#a10952369 and http://www.nabble.com/Collation-and-string-literals-limitation-in-SQL-standard--tf3868448.html#a10959872).
3)DERBY-2765 Make sure that all the DTDs for character string types have correct collation assigned to them. Just to be sure that we have not missed anything, we should go through the Derby code and look for character string DTD creation and see if the collation is set correctly for them.
o All store releated code changes have been implemented.
1)Add tests for this feature. This a broad umbrella task but I do want to mention some specific tests that we should be testing. Currently tests for this feature can be found in java/testing/org/apache/derbyTesting/functionTests/tests/lang/CollationTest.java and java/testing/org/apache/derbyTesting/functionTests/tests/lang/CollationTest2.java. Below are test cases that need to be verified, as they are verified will move them below to the completed task section.
a)For both a newly created 10.3 database and an upgraded 10.3 database, make sure that metadata continues to show the scale for character datatypes as 0 (rather than the collation type value). That is, test that the scale of the character datatypes is always 0 and it didn't get impacted negatively by the overloading of scale field as collation type in TypeDescriptor.
b)Test case for recovery - have an outstanding transaction with insert/delete/updates that affect one or more character indexes (all with a collation setting that is different from default collation). Make sure those log records get to the log and then crash the server. Restarting the server will then run through the recovery code and will ensure that we test for correct collation usage at recovery time. Mike has put more info about this in DERBY-2336.
c)CREATE VIEW should have collation type UCS_BASIC/TERRIOTRY_BASED assocatied with it's character columns. The exact collation will be determined by what is the value of the COLLATION attribute. This is same as what would happen for CREATE TABLE. Have a test for global temporary tables with character colums too.
d)Write a test case for store btree recovery which will ensure that Locale info in correctly set in DVF by BasicDatabase during recovery case.
e)Make sure that a soft-upgraded pre-10.3 database continues to work with pre-10.3 release, ie the store level column metadata structure should remain unchanged. This ties in with item 1) under Store section above.
f)Add a test case for hard-upgrade of pre-10.3 database. This will cause a change in store level column metadata structure. This ties in with item 1) under Store section above.
g)Add tests for sorts that can fit in memory and the sorts that have to spill over to disk. It will be good to make sure that correction collation get used for them.
h)Add tests to set the collation attribute on the jbdc url and testing it later with VALUES SYSCS_UTIL.SYSCS_GET_DATABASE_PROPERTY('derby.database.collation') Some possible tests are
h1)connect 'jdbc:derby:c:/dellater/db1;create=true;territory=it'; //no collaiton attribute but is should get set to UCS_BASIC.
h3)connect 'jdbc:derby:c:/dellater/db1;create=true;territory=it;collation=CUSTOM_COLLATION'; //invalid value for collation attribute
h4)connect 'jdbc:derby:c:/dellater/db1;create=true;territory=it;collation=TERRITORY_BASED;collation=TERRITORY_BASED';//give same value twice
h5)connect 'jdbc:derby:c:/dellater/db1;create=true;territory=it;collation=TERRITORY_BASED;collation=UCS_BASIC';//give 2 different values. The last value should get picked
h6)connect 'jdbc:derby:c:/dellater/db1;create=true;territory=it;collation=UCS_BASIC;collation=TERRITORY_BASED';//same as vi but valid values in different order
h7)After database create time, try to give collation attribute again. It should get quitely ignored. connect 'jdbc:derby:c:/dellater/db1;create=true;territory=it;collation=TERRITORY_BASED'; connect 'jdbc:derby:c:/dellater/db1;collation=UCS_BASIC';
i)When adding tests for ALTER TABLE, try both add character column AND increase the length of an existing character column.
j)Upgrade a pre-10.3 database and make sure the upgraded database continues to use UCS_BASIC for all collations.
Metadata query changes
1)Fix the metadata queries so that string literals can be compared with character columns from system schema. String literals will have the USER character set associated with them and system character columns will have SQL_IDENTIFIER associated with them. In order to compare them, both the sides of the comparison should come from same character set. Following casting of string literal will achieve that. syscharcol = CAST(string_literal as varchar(128)).
o No network server changes are planned as part of the 10.3 Collation project.
1)Make sure the space padding at the end of various character datatypes is implemented commented correctly in javadocs. This padding is used in collation related methods. For eg check SQLChar.stringCompare method.
1)Subclasses with non-default-collation have been implemented for character types SQLChar, SQLVarchar, SQLLongvarchar and SQLClob. The new subcalsses are called CollatorSQLChar, CollatorSQLVarchar, CollatorSQLLongvarchar and CollatorSQLClob, respectively. They reside in derby.iapi.types package. This work was done by revisions 516864, 516869 and 518479, 524545.
2)The type definition of a data type is described by DTD (DataTypeDescriptor). This DTD will have two additional attributes called collation type and collation derivation. DTD probably will need getter and setter for the 2 additional attributes.(These new attributes only apply to collation sensitive types, namely char datatypes. For other data types, these attributes will be ignored.) As per SQL spec, the collation derivation can hold 3 values, "explicit", "implicit" and "none". In Derby 10.3, the collation derivation will never be "explicit" because Derby 10.3 does not support SQL Standard's COLLATE clause. In Derby 10.3, the collation derivation can be "implicit" or "none". If collation derivation is "none", then it means the collation type can't be determined. This can happen when an aggregate function is working with operands of different collation types. If the result of such an aggregate function is character string type, then it's collation derivation will be "none", ie it can not be determined. Other than this aggregate "none" case, the collation derivation will always be "implicit" and collation type will be (0)UCS_BASIC/(1)TERRITORY_BASED. Which one of the 2 collation types is picked for a character string type is explained in detail in section "Collation Determination". This work was done by revisions 525568 and 525729, 527033, 527250. The work for this line item went in as part Jira entry DERBY-2524.
3)BasicDatabase needs to set the Locale value in the DVF after DVF has booted. DERBY-2535 Make Locale available in DataValueFactory(DVF). This Locale object will be used to construct Collator object if the user has requested territory based collation. Changes went in as part of revision 527034
4)The check to see if two types are comparable or not has been moved out from TypeCompiler into DTD. The new interface in DTD is aptly called "comparable". The reason for this move is while comparing 2 types, now we should check that collations match too when dealing with string types. TypeCompiler does not have collation information and hence the comparable method in it couldn't use collation in determining the result of the comparable method. That is why the comparable method got removed from TypeCompiler. DTD has the collation information and hence it is the right candidate for the comparable method. DERBY-2569 revision 530910
5)WorkHorseForCollatorDatatypes overrides all the collation related methods so that it uses the non-default Collator. All the non-default-collation-sensitive classes have an instance of WorkHorseForCollatorDatatypes which is used to call the collation related methods. This ensures that these collation related methods are implemented in one central place rather than in all the collation-sensitive classes.
6)Cleanedup the new interface DTD.comparable to use isXXX methods rather than looking for StoredFormatIds. DERBY-2569
1)At compile time, we make sure that the correct character class (ex SQLChar vs CollatorSQLChar) is generated so at run time, we do not need to check what kind of Collator object need to be used for character types. It is all handled at the code generation time and the appropriate runtime methods (ex like method in SQLChar vs like method in WorkHorseForCollatorDatatypes) get called at runtime. The changes for this went in as part of DERBY-2583.
1)Set the correct collation type in SchemaDescriptor. On this page, Section Collation Determination, item 2) says that column definitions pick up their collation from the schema that they are defined in. In order for that to happen, schema should have the right collaiton type associated with it. This work of attaching right collation type to a schema went in as part of DERBY-2528 with revision 526237, 526243 and 526313.
2)If collation attribute is specified without the territory attribute, then the territory defaults to the locale of the JVM. DERBY-2669
1)The TypeDescriptor for a character column always has 0 for scale because scale does not apply to character datatypes. Starting Derby 10.3, this scale field in TypeDescriptor will be overloaded to indicate the collate type of the character. So, if user has requested for TERRITORY_BASED collation, then the scale in TypeDescriptor for user columns(character) will be 1(TERRITORY_BASED). The scale will be always 0(UCS_BASIC) for SYS schema character columns and for databases with collation set to UCS_BASIC. These changes will go in readExternal and writeExternal methods of TypeDescriptor. Using the value 0 for UCS_BASIC will ensure that pre-10.3 databases with scale field as 0 in TypeDescriptor will continue to use UCS_BASIC after upgrade to 10.3, because 0 in scale field corresponds to UCS_BASIC collation type. This work was done by revisions 525568 and 525729. The work for this line item went in as part Jira entry DERBY-2524.
1)At the time of database create time, optional JDBC url attribute COLLATION is validated by the boot code in data dictionary and the validated value of COLLATION(if none specified by user, then it will default to UCS_BASIC which is also the only collation available on pre-10.3 databases) attribute is saved as derby.database.collation property in the properties conglomerate. This work was done by revisions 511283, 527199, 527235.
1)DERBY-2534 Add new api "public StringDataValue getValue(RuleBasedCollator)" on StringDataValue. This method will return either the base DVDs for char datatypes or it will return collation sensitive DVD for char datatypes. As part of this Jira entry, added api which will allow Derby's store code to construct the right character type DVD based on the collation type. The changes for this went in as revision 526668.
1)In Derby 10.2, store uses Monitor to create DVD template rows. The logic of creating DVDs using formatids has been factored out from Monitor into DataValueFactory. This way, all data type related code will go in DVF. More discussion on this topic can be found at http://www.nabble.com/how-should-store-get-an-object-based-on-format-id-and-collation-id--tf3562574.html#a9950394
2)This item is related to item 1. With Derby 10.3, collation type will be the additional metadata in store for each column. So, when store calls DVF to create DVD template row, it passes the formatids and the collation types. DVF assoicates the correct Collator with the DVD for Char datatypes depending on the collation type. And in order to find the correct Collator, DVF needs to know the locale of the database. This locale information is set on DVF using a new method on DVF called void setLocale(Locale). This call is made by BasicDatabase after DVF has finished booting and before store starts booting.
3)This item is related to item 2. When DVF gets called by store to create right DVD for given formatid and collation type, for formatids associated with character datatypes, it first creates the base character datatype class which is say SQLChar. Then it calls getValue method on the DVD with the RuleBasedCollator corresponding to the collation type as the parameter. (This RuleBasedCollator will be null for UCS_BASIC collation). The getValue method returns SQLChar or CollatorSQLChar depending on whether RuleBasedCollator is null or not. getValue is the new method which has been added to the interface StringDataValue.
4)When a character column is added using CREATE TABLE/ALTER TABLE, make sure that the correct collate type is populated in the TypeDescriptor's scale field in the SYS.SYSCOLUMNS table. In order to do this, CREATE TABLE/ALTER TABLE need to get their schema descriptor's collation type. The collation type of the schema descriptor will decide the collation type of the character columns defined in CREATE TABLE/ALTER TABLE. This comes from item 2 under Collation Determination section on this page. ALTER TABLE changes should work for both ADD COLUMN and MODIFY character column increase length. Part of the code changes required for this item has gone in revision 526385 and 526454 as part of DERBY-2530. Some more language changes and some store changes need to go in for this task to finish. Those changes involve passing the collation type from language to the store layer as part of CREATE TABLE and ALTER TABLE.
5)Store column level metadata for collate in Store. Store keeps a version number that describes the structure of column level metadata. For existing pre-10.3 databases which get soft upgraded to 10.3, the structure of column level metadata will remain same as 10.2 structure of column level metadata, ie they will not include collate information in their store metadata. For any conglomerate created in a 10.3 new database or a 10.3 hard upgraded database a new version would be used in Store to include information about the collation for each column's metadata stored. This means that during upgrade, store needs to change the sturcture of column level metadata to include collate information.
6)Calls to sorter did not need to change. The current interface requires that a non-empty template be passed in. That template can be used by the sorter to create new objects with correct collation.
Tests 1)Add tests for this feature. This a broad umbrella task but I do want to mention some specific tests that we should be testing. Currently tests for this feature can be found in java/testing/org/apache/derbyTesting/functionTests/tests/lang/CollationTest.java and java/testing/org/apache/derbyTesting/functionTests/tests/lang/CollationTest2.java. Below are test cases that have been implemented.
e)Make sure that a soft-upgraded pre-10.3 database continues to work with pre-10.3 release, ie the store level column metadata structure should remain unchanged. This ties in with item 1) under Store section above. Upgrade tests pass, we could add more test cases but the basic change is to base table and index level metadata which gets exercised by existing cases.
h8)Connect to a pre-10.3 database in soft upgrade mode (tested for all pre 10.3 db's by upgrade suite) connect 'jdbc:derby:c:/dellater/db102';
h9)Upgrade a pre-10.3 database (tested for app pre 10.3 db's by upgrade suite) connect 'jdbc:derby:c:/dellater/db102;upgrade=true';
Related Bugs/Improvements currently not planned for 10.3 release
Type Bugs Type Improvements
1)CollatorSQLChar has a method called getCollationElementsForString which currently gets called by like method. getCollationElementsForString gets the collation elements for the value of CollatorSQLChar class. But say like method is looking for pattern 'A%' and the value of CollatorSQLChar is 'BXXXXXXXXXXXXXXXXXXXXXXX'. This is eg of one case where it would have been better to get collation element one character of CollatorSQLChar value at a time so we don't go through the process of getting collation elements for the entire string when we don't really need. This is a performance issue and could be taken up at the end of the implementation. Comments on this from Dan and Dag can be found in DERBY-2416.
2)TypeDescriptorImpl's readExternal and writeExternal methods currently save only collation type of a character string type column in SYSCOLUMNS's COLUMNDATATYPE column. Collation derivation of character string type column does not get saved anywhere. In this release of Derby, collation derivation of these persistent character string type columns is always "implicit" and hence it is safe even if we don't save the collation derivation anywhere. In readExternal method, we can always initialize the collation derivation to be "implicit" for character string type columns. But in some future release of Derby, it might be possible to define an explicit collation type for a column using SQL's COLLATE clause. In such a case, the collation derivation of persistent column's won't be implicit, rather it will be explicit. In order to support that, may be we should consider saving the collation derivation starting Derby 10.3 itself. Look at thread http://www.nabble.com/-jira--Created%3A-%28DERBY-2524%29-DataTypeDescriptor%28DTD%29-needs-to-have-collation-type-and-collation-derivation.-These-new-fields-will-apply-only-for-character-string-types.-Other-types-should-ignore-them.-p9842379.html
Until DERBY-1478 is implemented, LanguageBasedOrdering page provides an intermediate solution to achieve language based ordering.