In Blur 0.2 we are going to drop the idea of Rows and Records. These were just artificial constructs around Lucene documents they only caused confusion.
Blur will follow Lucene’s data structure as closely as possible.
A Document contains 1 or more fields, with a String as the name and a byte (ByteBuffer in thrift) as the value with a Type to define the data that is being passed in the given field.
- Document(s) can be added, updated, and deleted in a table.
While the documents are being added or updated within a single call (Thrift or MapReduce), they will be guaranteed to exist within the same Lucene segment. This will be used during some specialized queries. See the Lucene IndexWriter for more details.
In the MapReduce framework, the ability to group several Documents together for a single add or update call will be implemented through the use of common key output in the Mappers. The key type will likely need to be defined by the end user based on their needs. The key itself won't need to be used for indexing. If the end user wishes to use the information, they will need to add the key data to the Document itself.
Analyzers will allow for more customized indexes to be created. Currently this functionality is wrapped up in the TableDescriptor, this will need to change.