Q: Why does Apache Derby prefer bytecode generation over reflection, or other techniques?

A: A SQL interpreter needs to evaluate lots of expressions. Consider the following statement:

  select a+b from t where c*d > e*f;

In this statement, the SQL interpreter has to evaluate

  c*d > e*f

For this expression evaluation, all SQL interpreters generate bytecode. At least, all SQL interpreters which we know about. For most SQL interpreters, the bytecode is something proprietary. At runtime, the bytecode is evaluated on a special virtual machine built into the SQL interpreter.

What is different about Derby is that Derby doesn't define its own proprietary bytecode and Derby doesn't supply a special virtual machine in its execution layer. Instead, Derby just uses Java bytecode and the Java virtual machine for expression evaluation.

For a pure-Java database like Derby, the traditional approach would mean that every expression would be evaluated on 2 virtual machines: first the proprietary virtual machine and then ultimately the Java virtual machine. The advantages of the Derby approach are supposed to be:

  1. Derby doesn't have to maintain its own proprietary virtual machine.
  2. Eliminating one of the virtual machines ought to result in measurable performance boosts for queries involving expressions and lots of rows.

Reflection is a further issue for expressions which invoke external routines coded in Java. When the bytecode generator was created 15 years ago, Java reflection was an expensive operation. We believe that the performance of reflection has improved significantly since then.