HBase Secondary Indexing
This is a design document around different approaches to secondary indexing in HBase.
Eventually Consistent Secondary Indexes using Coprocessors
The basic idea is to use an additional (secondary) table for each index on the main (primary) table. A coprocessor binding to a family would be used to define a given secondary index on that family (or specific column(s) within it). The WAL would be used to ensure durability and a shared queue makes the secondary update async from the callers POV. Normal HBase timestamps would be used for any conflict resolution and to make operations idempotent.
When a Put comes in to the primary table, the following would happen (assuming a single index update to a single secondary table for now):
1. Generate WALEdit for primary table
2. Generate a new, special kind of WALEdit for secondary table update
3. Append and sync both WALEdits to the HLog
4. Apply primary table edits to MemStore and commit RWCC
5. Add secondary table edit job to shared work queue
6. Return to client
The shared queue would be a thread or threadpool that picks up these secondary table edit jobs and applies them using a normal Put operation to the secondary table.
On failover of primary table, primary edits would be replayed normally, and secondary edits would be applied to the secondary table/server as is done with the shared queue.
The big open questions are around how to deal with the WAL and replay.
The secondary table could be offline because of another RS failure, so we may have long-waiting secondary updates. How can we guarantee all secondary updates are applied when evicting an old HLog? Also, we want to avoid over-exploiting operations being idempotent and not just aggressive reapplying everything.
Do we need to keep track of each HLog and it's pending secondary updates and prevent log eviction until they are done?
Or should the workers applying secondary edits write back into the WAL that the edit is complete (and thus durable on the other server so does not need to be replayed if this one fails over)?
Or we could tie secondary edits to each memstore, and the flushing of a memstore can only happen if its secondary edits have all been applied, which would tie in with the existing semantics around log eviction... but that has other implications and won't really help with preventing too much over replay.
Other open questions:
- Creation of secondary tables (auto-bootstrapped? part of coprocessor init? manual?)
- Read API
- Declaration of indexes via API or shell syntax rather than programatically with a coprocessor-per-index
- Creation of indexes on existing tables (build of indexes based on current data and kept up to date)
- Option to apply secondary update in a synchronous fashion (if you want to take performance hit and have stronger consistency of the index)
- Storing of primary table data in secondary table to provide single-lookup denormalized join
Secondary Indexes using Optimistic Concurrency Control
These are implemented by Transactional HBase / IndexedTable.
Currently this lives here: https://github.com/hbase-trx/hbase-transactional-tableindexed
In-memory Secondary Indexes for Indexed Scans
This was implemented once but I'm not sure where it lives anymore.