Differences between revisions 4 and 5
Revision 4 as of 2009-09-20 21:45:15
Size: 6460
Editor: localhost
Comment: converted to 1.6 markup
Revision 5 as of 2018-04-13 05:44:43
Size: 0
Editor: JoanTouzet
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#redirect How_to_store_hierarchical_data
= Storing Hierarchical Data in CouchDB =

For the original article this example was taken from, see http://blog.paulbonser.com/2008/07/04/storing-hierarchical-data-in-couchdb/
For comparison with how this is usually done with relational DBs, see http://www.sitepoint.com/article/hierarchical-data-database/

== Storing the Tree ==

Store the full path to each node as an attribute in that node's document.

For example:

    {"_id":"Food", "path":["Food"]},
    {"_id":"Fruit", "path":["Food","Fruit"]},
    {"_id":"Red", "path":["Food","Fruit","Red"]},
    {"_id":"Cherry", "path":["Food","Fruit","Red","Cherry"]},
    {"_id":"Tomato", "path":["Food","Fruit","Red","Tomato"]},
    {"_id":"Yellow", "path":["Food","Fruit","Yellow"]},
    {"_id":"Banana", "path":["Food","Fruit","Yellow","Banana"]},
    {"_id":"Meat", "path":["Food","Meat"]},
    {"_id":"Beef", "path":["Food","Meat","Beef"]},
    {"_id":"Pork", "path":["Food","Meat","Pork"]}

In a real system you'd probably want to use some sort of UUID instead of descriptive strings, since conflicts between node names could be bad. In fact, it'd probably be much faster to just use numbers, since comparisons on numbers are generally much faster. For the purposes of this example, however, it's much easier to understand if it's descriptive text.

Once that data is in your DB, it's time to get it out again!

== Retrieving the whole tree ==

The CouchDB map function to retrieve the whole tree is nice and simple:

function(doc) {
    emit(doc.path, doc)

Using the path as the key, the documents will be sorted as above, with each parent immediately followed by its children.

One option to get the data into an actual tree would be to add a reduce function to the view:

function(keys, vals) {
    tree = {};
    for (var i in vals)
        current = tree;
        for (var j in vals[i].path)
            child = vals[i].path[j];
            if (current[child] == undefined)
                current[child] = {};
            current = current[child];
        current['_data'] = vals[i];
    return tree;

'''''Note: don't use this reduce function, since it doesn't take the rereduce parameter into account, and would most likely not work correctly if a rereduce was done.'''''

Another option would be to use a client-side function to accomplish the same thing, for example in Python:

class TreeNode(dict): pass

def tree_from_rows(list):
    tree = {}
    for item in list:
        current = tree
        for child in item.value['path']:
            current = current.setdefault(child, TreeNode())
        current.data = item.value
    return tree

This code does the job nicely and allows me to use the same function to build a tree from several different views without duplicating code.

== Getting a subtree ==

To get all the nodes which are underneath a specific node, I implemented the view's reduce function as follows:

function(doc) {
    for (var i in doc.path) {
        emit([doc.path[i], doc.path], doc)

Again, this is pretty simple. The only difference from the last view is that I can now query this view with a startkey and endkey (see the HttpViewApi) to get only nodes under a certain node. You could actually do that with the previous view, except you'd have to include the full path to the node in the startkey, which is a bit too much.

For example, if you had CouchDB running on your machine right now with the above example data loaded and went to http://localhost:5984/tree/_view/tree/descendants?startkey=[%22Fruit%22]&endkey=[%22Fruit%22,{}]

== How Many Descendants ==

Getting the number of descendants for a given node is simple. The view is as follows:

'descendant_count': {
    'map': 'function(doc) { for (var i in doc.path) { emit(doc.path[i], 1) } }',
    'reduce': 'function(keys, values) { return sum(values) }'

This will count the parent node as well, so you will probably want to subtract one from it at some point. To use this view simply call it with the key parameter set to the id of the desired root node.

== Getting the immediate children of a node ==

Sometimes you just want to get a list of nodes which are immediately under a given node. This can be done by using a map with the following map function:

function(doc) {
    emit([doc.path.slice(-2,-1)[0], doc.path], doc)

This map function simply takes the second-to-last element from the path and uses that as the first element in the key. You can query this view in the same way as the "getting a subtree" view above.

== Adding a node ==

Adding a node to the tree is fairly simple. Set the new node's path to be the path of the desired parent node with the new node's ID appended to the end. That's it.

== Deleting a node ==

Deleting a node is a bit trickier since any given node may have some number of children. You can get the list of nodes in the subtree as outlined above and then do a bulk update to delete each of them.

Depending on the data being stored, deleting the whole sub-tree might not ever be something you want to do, in a discussion forum, for example, you might want to simply delete a single offensive post, leaving any replies which might have been posted. Even in this case, it's more likely that you'd want to set a flag indicating the deletion rather than actually deleting the post.

== Moving a node to another parent ==

This is an instance where being able to update just certain fields in a document would be handly, since bulk-updating a large chunk of documents could start to kill performance.

Either way, if something needs to be reparented, it's just a matter of getting all nodes which are children of a certain node, then doing a bulk update to change their paths to wherever they need to be.

This part worries me a bit, because there's a chance that somebody else could add a new child node while you are in the process of moving the sub-tree, leaving that new node dangling by itself in a sub-tree which no longer exists. I'm not sure of the best approach to avoid such a problem.

There is some example Python code to handle using the above views to fetch trees available at http://git.paulbonser.com/?p=couchdb.git;a=summary