The page contains information related to the design of the mailbox hbase module as Ioan Eugen Stan's GSoC2011 project.

Everyone is welcome to update/inject with his ideas

Data Model

Hereafter some information gathered from JIRA:

* one row for the mailbox meta data (mailboxId, uidvalidity, namespace, username ...).
* one row for the message metadata ( mailboxId, uid, size, headers, flags, messagecontentId...).
* one row per message content where I splitted the messagecontent in 1mb parts and put each "raw" byte[] in a new column. This makes sure we don't get to big column (not sure if this is also needed for hbase, in cassandra big columns are a problem)

- If you look at the classes, we could have Mailbox, Subscription and Message tables.
- A row per mailbox, subscription and message
- The unanswered question are: 1. The structure of the rowkey? - 2. Header and Property as separate table or as additional column to the message row.

messagesMetaData(CF): {
  mailboxId/uid: {
    uid: 1,
    mailboxId: 184e-ske1-igk2-gj71
    flags.recent: true,
    flags.deleted: true,
    flags.seen: true,
    flags.deleted: false,
    flags.seen: false,
    flags.flagged: true,
    bodyOctets: 19484
    fullContentOctets: 10304
    properties: namespace::localname::value;;namespace2::localname2::value2
    headers: byte[],
    mediaType: text,
    subType: plain,
   textualLineCount: 24
  }
}

messagesContent(CF): {
  mailboxId/uid: {
    1: byte[],
    2: byte[],
    3: byte[]
  }
}

UID Generation

How are we going to generate the UID? Probably with the HBase Increment class.

Indexes

None for the time being

Queries

List here all needed queries and the Scanner implementation that will support them.

Hereafter, a dump of the JPA queries that can help review the datamodel is correct

From AbstractJPAMessage: @NamedQueries({

From JPAMailbox

From JPASubscription @NamedQueries({

})

GSoC2011HBaseMailbox (last edited 2011-06-16 06:38:36 by EricCharles)