Differences between revisions 1 and 2
Revision 1 as of 2011-06-16 06:28:38
Size: 221
Editor: EricCharles
Comment:
Revision 2 as of 2011-06-16 06:38:36
Size: 7526
Editor: EricCharles
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
The page contains the documentation related to the design of the mailbox hbase module as Ioan Eugen Stan's GSoC2011 project. Everyone is welcome to update/inject with his ideas. The page contains information related to the design of the mailbox hbase module as Ioan Eugen Stan's GSoC2011 project.
Line 3: Line 3:
DataModel Everyone is welcome to update/inject with his ideas
Line 5: Line 5:
UID Generation = Data Model =
Hereafter some information gathered from JIRA:
Line 7: Line 8:
Queries {{{
* one row for the mailbox meta data (mailboxId, uidvalidity, namespace, username ...).
* one row for the message metadata ( mailboxId, uid, size, headers, flags, messagecontentId...).
* one row per message content where I splitted the messagecontent in 1mb parts and put each "raw" byte[] in a new column. This makes sure we don't get to big column (not sure if this is also needed for hbase, in cassandra big columns are a problem)
}}}
{{{
- If you look at the classes, we could have Mailbox, Subscription and Message tables.
- A row per mailbox, subscription and message
- The unanswered question are: 1. The structure of the rowkey? - 2. Header and Property as separate table or as additional column to the message row.
}}}
{{{
messagesMetaData(CF): {
  mailboxId/uid: {
    uid: 1,
    mailboxId: 184e-ske1-igk2-gj71
    flags.recent: true,
    flags.deleted: true,
    flags.seen: true,
    flags.deleted: false,
    flags.seen: false,
    flags.flagged: true,
    bodyOctets: 19484
    fullContentOctets: 10304
    properties: namespace::localname::value;;namespace2::localname2::value2
    headers: byte[],
    mediaType: text,
    subType: plain,
   textualLineCount: 24
  }
}

messagesContent(CF): {
  mailboxId/uid: {
    1: byte[],
    2: byte[],
    3: byte[]
  }
}
}}}
= UID Generation =
How are we going to generate the UID? Probably with the HBase Increment class.

= Indexes =
None for the time being

= Queries =
List here all needed queries and the Scanner implementation that will support them.

Hereafter, a dump of the JPA queries that can help review the datamodel is correct

From AbstractJPAMessage:
@NamedQueries({
    @NamedQuery(name="findRecentMessagesInMailbox",
            query="SELECT message FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.recent = TRUE"),
    @NamedQuery(name="findUnseenMessagesInMailboxOrderByUid",
            query="SELECT message FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.seen = FALSE ORDER BY message.uid ASC"),
    @NamedQuery(name="findMessagesInMailbox",
            query="SELECT message FROM Message message WHERE message.mailbox.mailboxId = :idParam"),
    @NamedQuery(name="findMessagesInMailboxBetweenUIDs",
            query="SELECT message FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.uid BETWEEN :fromParam AND :toParam"),
    @NamedQuery(name="findMessagesInMailboxWithUID",
            query="SELECT message FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.uid=:uidParam"),
    @NamedQuery(name="findMessagesInMailboxAfterUID",
            query="SELECT message FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.uid>=:uidParam"),
    @NamedQuery(name="findDeletedMessagesInMailbox",
            query="SELECT message.uid FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.deleted=TRUE"),
    @NamedQuery(name="findDeletedMessagesInMailboxBetweenUIDs",
            query="SELECT message.uid FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.uid BETWEEN :fromParam AND :toParam AND message.deleted=TRUE"),
    @NamedQuery(name="findDeletedMessagesInMailboxWithUID",
            query="SELECT message.uid FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.uid=:uidParam AND message.deleted=TRUE"),
    @NamedQuery(name="findDeletedMessagesInMailboxAfterUID",
            query="SELECT message.uid FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.uid>=:uidParam AND message.deleted=TRUE"),

    @NamedQuery(name="deleteDeletedMessagesInMailbox",
            query="DELETE FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.deleted=TRUE"),
    @NamedQuery(name="deleteDeletedMessagesInMailboxBetweenUIDs",
            query="DELETE FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.uid BETWEEN :fromParam AND :toParam AND message.deleted=TRUE"),
    @NamedQuery(name="deleteDeletedMessagesInMailboxWithUID",
            query="DELETE FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.uid=:uidParam AND message.deleted=TRUE"),
    @NamedQuery(name="deleteDeletedMessagesInMailboxAfterUID",
            query="DELETE FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.uid>=:uidParam AND message.deleted=TRUE"),

    @NamedQuery(name="countUnseenMessagesInMailbox",
            query="SELECT COUNT(message) FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.seen=FALSE"),
    @NamedQuery(name="countMessagesInMailbox",
            query="SELECT COUNT(message) FROM Message message WHERE message.mailbox.mailboxId = :idParam"),
    @NamedQuery(name="deleteMessages",
            query="DELETE FROM Message message WHERE message.mailbox.mailboxId = :idParam"),
    @NamedQuery(name="findLastUidInMailbox",
            query="SELECT message.uid FROM Message message WHERE message.mailbox.mailboxId = :idParam ORDER BY message.uid DESC"),
    @NamedQuery(name="deleteAllMemberships",
            query="DELETE FROM Message message")
From JPAMailbox
    @NamedQuery(name="findMailboxById",
        query="SELECT mailbox FROM Mailbox mailbox WHERE mailbox.mailbox.mailboxId = :idParam"),
    @NamedQuery(name="findMailboxByName",
        query="SELECT mailbox FROM Mailbox mailbox WHERE mailbox.name = :nameParam and mailbox.user is NULL and mailbox.namespace= :namespaceParam"),
    @NamedQuery(name="findMailboxByNameWithUser",
        query="SELECT mailbox FROM Mailbox mailbox WHERE mailbox.name = :nameParam and mailbox.user= :userParam and mailbox.namespace= :namespaceParam"),
    @NamedQuery(name="deleteAllMailboxes",
        query="DELETE FROM Mailbox mailbox"),
    @NamedQuery(name="findMailboxWithNameLikeWithUser",
        query="SELECT mailbox FROM Mailbox mailbox WHERE mailbox.name LIKE :nameParam and mailbox.user= :userParam and mailbox.namespace= :namespaceParam"),
    @NamedQuery(name="findMailboxWithNameLike",
        query="SELECT mailbox FROM Mailbox mailbox WHERE mailbox.name LIKE :nameParam and mailbox.user is NULL and mailbox.namespace= :namespaceParam"),
    @NamedQuery(name="countMailboxesWithNameLikeWithUser",
        query="SELECT COUNT(mailbox) FROM Mailbox mailbox WHERE mailbox.name LIKE :nameParam and mailbox.user= :userParam and mailbox.namespace= :namespaceParam"),
    @NamedQuery(name="countMailboxesWithNameLike",
        query="SELECT COUNT(mailbox) FROM Mailbox mailbox WHERE mailbox.name LIKE :nameParam and mailbox.user is NULL and mailbox.namespace= :namespaceParam"),
    @NamedQuery(name="listMailboxes",
        query="SELECT mailbox FROM Mailbox mailbox")
From JPASubscription
@NamedQueries({
    @NamedQuery(name = "findFindMailboxSubscriptionForUser",
        query = "SELECT subscription FROM Subscription subscription WHERE subscription.username = :userParam AND subscription.mailbox = :mailboxParam"),
    @NamedQuery(name = "findSubscriptionsForUser",
        query = "SELECT subscription FROM Subscription subscription WHERE subscription.username = :userParam")
})

The page contains information related to the design of the mailbox hbase module as Ioan Eugen Stan's GSoC2011 project.

Everyone is welcome to update/inject with his ideas

Data Model

Hereafter some information gathered from JIRA:

* one row for the mailbox meta data (mailboxId, uidvalidity, namespace, username ...).
* one row for the message metadata ( mailboxId, uid, size, headers, flags, messagecontentId...).
* one row per message content where I splitted the messagecontent in 1mb parts and put each "raw" byte[] in a new column. This makes sure we don't get to big column (not sure if this is also needed for hbase, in cassandra big columns are a problem)

- If you look at the classes, we could have Mailbox, Subscription and Message tables.
- A row per mailbox, subscription and message
- The unanswered question are: 1. The structure of the rowkey? - 2. Header and Property as separate table or as additional column to the message row.

messagesMetaData(CF): {
  mailboxId/uid: {
    uid: 1,
    mailboxId: 184e-ske1-igk2-gj71
    flags.recent: true,
    flags.deleted: true,
    flags.seen: true,
    flags.deleted: false,
    flags.seen: false,
    flags.flagged: true,
    bodyOctets: 19484
    fullContentOctets: 10304
    properties: namespace::localname::value;;namespace2::localname2::value2
    headers: byte[],
    mediaType: text,
    subType: plain,
   textualLineCount: 24
  }
}

messagesContent(CF): {
  mailboxId/uid: {
    1: byte[],
    2: byte[],
    3: byte[]
  }
}

UID Generation

How are we going to generate the UID? Probably with the HBase Increment class.

Indexes

None for the time being

Queries

List here all needed queries and the Scanner implementation that will support them.

Hereafter, a dump of the JPA queries that can help review the datamodel is correct

From AbstractJPAMessage: @NamedQueries({

  • @NamedQuery(name="findRecentMessagesInMailbox",

    • query="SELECT message FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.recent = TRUE"),

    @NamedQuery(name="findUnseenMessagesInMailboxOrderByUid",

    • query="SELECT message FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.seen = FALSE ORDER BY message.uid ASC"),

    @NamedQuery(name="findMessagesInMailbox",

    • query="SELECT message FROM Message message WHERE message.mailbox.mailboxId = :idParam"),

    @NamedQuery(name="findMessagesInMailboxBetweenUIDs",

    • query="SELECT message FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.uid BETWEEN :fromParam AND :toParam"),

    @NamedQuery(name="findMessagesInMailboxWithUID",

    • query="SELECT message FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.uid=:uidParam"),

    @NamedQuery(name="findMessagesInMailboxAfterUID",

    • query="SELECT message FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.uid>=:uidParam"),

    @NamedQuery(name="findDeletedMessagesInMailbox",

    • query="SELECT message.uid FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.deleted=TRUE"),

    @NamedQuery(name="findDeletedMessagesInMailboxBetweenUIDs",

    • query="SELECT message.uid FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.uid BETWEEN :fromParam AND :toParam AND message.deleted=TRUE"),

    @NamedQuery(name="findDeletedMessagesInMailboxWithUID",

    • query="SELECT message.uid FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.uid=:uidParam AND message.deleted=TRUE"),

    @NamedQuery(name="findDeletedMessagesInMailboxAfterUID",

    • query="SELECT message.uid FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.uid>=:uidParam AND message.deleted=TRUE"),

    @NamedQuery(name="deleteDeletedMessagesInMailbox",

    • query="DELETE FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.deleted=TRUE"),

    @NamedQuery(name="deleteDeletedMessagesInMailboxBetweenUIDs",

    • query="DELETE FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.uid BETWEEN :fromParam AND :toParam AND message.deleted=TRUE"),

    @NamedQuery(name="deleteDeletedMessagesInMailboxWithUID",

    • query="DELETE FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.uid=:uidParam AND message.deleted=TRUE"),

    @NamedQuery(name="deleteDeletedMessagesInMailboxAfterUID",

    • query="DELETE FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.uid>=:uidParam AND message.deleted=TRUE"),

    @NamedQuery(name="countUnseenMessagesInMailbox",

    • query="SELECT COUNT(message) FROM Message message WHERE message.mailbox.mailboxId = :idParam AND message.seen=FALSE"),

    @NamedQuery(name="countMessagesInMailbox",

    • query="SELECT COUNT(message) FROM Message message WHERE message.mailbox.mailboxId = :idParam"),

    @NamedQuery(name="deleteMessages",

    • query="DELETE FROM Message message WHERE message.mailbox.mailboxId = :idParam"),

    @NamedQuery(name="findLastUidInMailbox",

    • query="SELECT message.uid FROM Message message WHERE message.mailbox.mailboxId = :idParam ORDER BY message.uid DESC"),

    @NamedQuery(name="deleteAllMemberships",

    • query="DELETE FROM Message message")

From JPAMailbox

  • @NamedQuery(name="findMailboxById",

    • query="SELECT mailbox FROM Mailbox mailbox WHERE mailbox.mailbox.mailboxId = :idParam"),

    @NamedQuery(name="findMailboxByName",

    • query="SELECT mailbox FROM Mailbox mailbox WHERE mailbox.name = :nameParam and mailbox.user is NULL and mailbox.namespace= :namespaceParam"),

    @NamedQuery(name="findMailboxByNameWithUser",

    • query="SELECT mailbox FROM Mailbox mailbox WHERE mailbox.name = :nameParam and mailbox.user= :userParam and mailbox.namespace= :namespaceParam"),

    @NamedQuery(name="deleteAllMailboxes",

    • query="DELETE FROM Mailbox mailbox"),

    @NamedQuery(name="findMailboxWithNameLikeWithUser",

    • query="SELECT mailbox FROM Mailbox mailbox WHERE mailbox.name LIKE :nameParam and mailbox.user= :userParam and mailbox.namespace= :namespaceParam"),

    @NamedQuery(name="findMailboxWithNameLike",

    • query="SELECT mailbox FROM Mailbox mailbox WHERE mailbox.name LIKE :nameParam and mailbox.user is NULL and mailbox.namespace= :namespaceParam"),

    @NamedQuery(name="countMailboxesWithNameLikeWithUser",

    • query="SELECT COUNT(mailbox) FROM Mailbox mailbox WHERE mailbox.name LIKE :nameParam and mailbox.user= :userParam and mailbox.namespace= :namespaceParam"),

    @NamedQuery(name="countMailboxesWithNameLike",

    • query="SELECT COUNT(mailbox) FROM Mailbox mailbox WHERE mailbox.name LIKE :nameParam and mailbox.user is NULL and mailbox.namespace= :namespaceParam"),

    @NamedQuery(name="listMailboxes",

    • query="SELECT mailbox FROM Mailbox mailbox")

From JPASubscription @NamedQueries({

  • @NamedQuery(name = "findFindMailboxSubscriptionForUser",

    • query = "SELECT subscription FROM Subscription subscription WHERE subscription.username = :userParam AND subscription.mailbox = :mailboxParam"),

    @NamedQuery(name = "findSubscriptionsForUser",

    • query = "SELECT subscription FROM Subscription subscription WHERE subscription.username = :userParam")

})

GSoC2011HBaseMailbox (last edited 2011-06-16 06:38:36 by EricCharles)