Differences between revisions 7 and 8
Revision 7 as of 2013-08-02 21:02:11
Size: 18607
Editor: 178
Revision 8 as of 2018-04-12 14:09:44
Size: 0
Editor: JoanTouzet
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
## page was renamed from The_CouchDB_Vision

= The CouchDB Vision Proposal (NS) =

This is a WIP to move items from [[http://markmail.org/search/?q=%22What%27s+our+Why%3F%22+list%3Aorg.apache.couchdb.dev+order%3Adate-forward|the "What's our Why?" thread]] to a wiki page. I am hoping to form a concrete proposal that I will bring to the dev@ list and vote on. I would like to incorporate as much feedback and perspective as possible, but cannot promise to accommodate everyone! If you have a comment, please post a note to the list. :) — Noah


=== Notes ===

"We believe in challenging the status quo. We believe in thinking different. We do that with great design and a focus on the user experience. We just happen to make computers."

I you talk about what you believe, you will attract those that believe what you believe.

When you talk about what you believe, people will join you for their own reasons, for their own purpose.

What you do simply serves as proof of what you believe.

"Martin Luther King gave his 'I have a dream' speech, not his 'i have a plan' speech."

Our existing message stinks.

We need to figure out what we stand for, what we believe in. And then we figure out how we're gonna do that.

This will define a consistent internal vision for the project and will help us to attract people who believe in what we believe.

Once we have our why, it can inform our how.

When we're talking about product direction we can say "well, how is this related to what we're trying to do here?"

Whatever this ends up looking like, I think this is how we should talk about CouchDB. This structure could be a template for anything. A talk, a sales pitch, the homepage itself. The important thing is that we start from "why?" and we build up from foundations.

From Jan:

"The number one thing that people did NOT like about CouchDB is that it is confused. CouchDB has a torn identity, half database, half application server. It wasn’t clear (and I am part responsible for this) what CouchDB is and wants to be. In everybody’s defence, I think, it just took a while to figure it out. Now is a good time to put our findings in writing and fix this."

"The number one request from people was to clear up CouchDB’s story, to have a clear, bold vision that captures people and that they can easily understand and share and support and move forward."

"Before I lay it out, I understand that I will be ruffling some feathers. I think that is both necessary and healthy. I think the picture I am going to paint will make a lot of people in the CouchDB community happy, some with concessions, but I utterly and strongly believe that this vision of what CouchDB is has the power to set the course for the next five years of the project and attract a whole lot of new people both as users and contributors."


I want to learn from understanding what the PRIMARY and SECONDARY features for CouchDB are. I already feel a bit bad about that the PRIMARY ones are two (“a database” *and* “that replicates”), but I think that is as little as it gets.

I want CouchDB’s new identity to be a database that replicates. I want to provide a slide deck for a “CouchDB in 25 minutes” presentation* that everybody can take and give and customise, but I want that one of the first things you say “CouchDB is a database that replicates”. I want that if you ask anyone inside the CouchDB developer community (you!) about what CouchDB is to answer “CouchDB is a database that replicates” and then follow up explaining what we mean, and *then* add a few more of the SECONDARY features that you particularly like.

(Noah's commentary: how does this play with the idea that everything we do should stem from our "why". why are we building a database that replicates? what's our vision? what do we stand for? i think both models are compatible. our existing approach is to say "what" couchdb is. jan's suggestion is to start with "how", and then get to "what". i am suggesting that we add another one on top of that, and start with "why". then say "how", and then say "what". i don't think these are incompatible. apple might have "challenge the status quo" as a "why", but it's marketing can still lead with the one sentence "how" in the same vein as "couchdb is a database that replicates". some thinking / discussion to do here. i think it will depend on context. homepage, talk, etc, etc. even jan's talk that was linked starts, essentially, with the "why. his "why" is listed as "i <3 the web", "i <3 reliable web infrastructure"! so jan is already doing this in his talks! so maybe this is a template. "we/i love X. we believe in Y. [BEAT] which is why i hack on couchdb. it's a database that replicates". voila! @@ these needs bringing back to this list, or working into a questions/issues section)

@@ go through https://dl.dropboxusercontent.com/u/82149/CouchDB-in-25-Minutes.pdf

I want that people who barely look at CouchDB comment on an unrelated Hacker News thread write “…CouchDB is a database that replicates, maybe that is a better fit for your problem”.

I want that the CTO of the newly funded startup thinks “I seem to have a replication problem to solve, maybe CouchDB can help.”

I want to move CouchDB’s development forward, and when we ask ourselves whether to add a feature, we run it by our PRIMARY feature set and ask “does it support ‘CouchDB is a database that replicates’” and if it does we go ahead and build it, and if it doesn’t we may consider it as a SECONDARY feature, or we discard it altogether.

(Noah's commentary: again, i'm gonna go one step higher than this. and i'm gonna suggest that we also ask ourselves "how does this help us work towards our why?"

(I don’t actually care what the final slogan will be, and please bike-shed
 this to no avail, but it should capture what I mean with “CouchDB is a
 database that replicates”, a phrase that we can burn into everybody’s
 head that captures CouchDB’s PRIMARY feature, its PRIMARY value
 proposition, the ONE thing that explains WHY we are excited about

[comments on plugin system elided]


Apache CouchDB is the focal point for The Replicating Society.


If I would like today define couchdb based on my rcouch experience and the ports I did, I would say: "Apache Couchdb allows you to handle and synchronize your data between different locations and devices in quasi realtime over and on the web in a P2P manner without SPOF".

So for me couchdb isn't only a database that replicate, it is also a way to ease the usage of your data, the way you can view them in your applications or directly on the web and over the web.

== Why? ==

 * What's your purpose?
 * What's your cause?
 * What's your belief?

Note: values should be verbs. i.e. "to make distributed data easy". you can't do nouns. easy to worm out of. if your values are verbs, you can measure them, commit to them, challenge people on them. see http://www.startwithwhy.com/Learn/LearningLibrary.aspx?control=ViewGalleryPhotos&HideLink=1&GalleryID=10&photoID=29&cat=1 for more

Suggestions made:

 * peer-to-peer replication of apps and datasets
 * your data, everywhere
 * "relax"
 * Painless distributed systems
 * Decentralised web
 * Put the data where you need it
 * "I have a dream that distributed data will be easy"
 * "CouchDB almost wants to be the Git, for databases"
 * "We believe that distributed data should be easy"

From Jan:

"I want to live in a world where people are empowered to understand and are capable to decide where their data lives. I want to live in a world where developers build apps that support that, not because they went out of their way to implement it, but because it is a feature of the software platform they are using."

"I want to be able to help people improve their lives in regions of the world where ubiquitous network access isn’t — and sometimes that is just a major western capital’s subway — but more likely is it a lesser developed location, or a rural area that will never see mobile broadband, let alone wired broadband because there is no financial incentive."

"I want to live in a world where technology solves more problems than it creates. One of those ways is allow people to use software wherever they are in whatever context they need it in. More often than not, that means far away from fast network access[...]"

"My primary motivation for working on Apache CouchDB is to help build the world I want to live in[...]"


I want to live in a world where people are empowered to understand and are capable to decide where their data lives.

I want to live in a world where technology solves more problems than it creates.

My primary motivation for working on Apache CouchDB is to help build the world I want to live in.

== How? ==

 * How do we do it?
 * How does our product differentiate?
 * How are we different?
 * How are we better?

Suggestions made:

 * Schema-less/document-oriented
 * Replication
 * "of the web"
 * "some kind of big data handling"
 * "couchdb on their mobile"
 * we take care of your data
 * we take care of exchanging your data
 * we take care of rendering your data
 * We handle your data / you handle display
 * Painless multi-master replication
 * Effortless clustering and sharding
 * Co-location of data, queries, and views
 * Deep browser and platform integration
 * Built of the Web
 * Database runs anywhere

From Jan:

"In the past year I have interviewed a fair number of people, let’s say 50, from those who have heard about CouchDB to users to core devs."

"The ONE feature that makes CouchDB relevant is multi-master replication. There is no exception, this is the ONE thing that makes CouchDB exceptional. NOBODY else has that, and even the decent proprietary solutions that are just coming to market suck where we KICK ASS."

"There are many other things that people like about CouchDB: reliability, no schema, HTTP interface, the view system, etc. But NONE of these people would care if CouchDB didn’t have multi-master replication."

CouchDB is a database that replicates.

Think of it as git for your data-layer. Not in a sense where you manage text files and diff and merge, but in the sense that you have a local version of your data and one or multiple remote ones and you can seamlessly move your data between them, back and forth and crossover.

Imagine a local checkout of your data that you can work on, and then share it with Lucie across the table, she finds some issues and fixes up the data, and shares it with Tim across the room. Tim fixes two more issues and you pull both their changes into your copy. We conclude the whole thing is golden and we push it to staging, where our continuous integration runs and decides that the data is good to go into production, so it pushes it to production. There the data is picked up from various clients, some mobile over there, some web over here, a backup system in the Tokyo office…

Or you have hospitals in remote regions in Africa that collect local health data, like how many malaria infections a region has and they all share their results over unreliable mobile connections and the data still makes it eventually maybe with a few hours delay and the malaria expert in the capital city sees an increased outbreak of some illness and is able to send out medicine in time to arrive for the patients to help. Where today the expert takes months to travel between the hospitals to collect that data manually and find out that there was a lethal outbreak two months ago and everybody died.

(Somebody built this, CouchDB does save lives, I get teary every time I tell this story (like now). Our work doesn’t get more noble than this.)

Or imagine millions of mobile users with access to terabytes of data in the cloud, replicating the bits they need to their phones and tablets, allowing super-fast low-latency access for a stellar user experience, while giving access to sheer amounts of data and allowing full write access on the mobile device to be replicated back to the cloud when connections exist.

(Our friends at Cloudant have a couple of those customers.)

That is the power of CouchDB.


Replication is the PRIMARY feature of CouchDB. “is a database” means “stores your data, safely and securely”, “that replicates” highlights the primary feature.

do these bits belong here or in previous section? - There are many more very cool features of CouchDB, even the details on how we achieve reliability and data safety or how replication works are mindblowingly cool. The simple HTTP interface, the JSON store, the app-server features, map reduce views, all very excellent things that make CouchDB unique, but it is very important to understand that they are SECONDARY features.

(@@ does this bit go into the "what" bit? need to research difference. think we can lead with replication as the primary feature, but include it in the "what"?)


@@ where does this bit go? should it even be included? might be worth punting the whole "couch-like" stuff to a separate doc, and only referencing it from this vision statement?

And then, CouchDB is one more thing. CouchDB isn’t just the Erlang implementation of this whole replicating database idea. CouchDB is also the wire protocol, the specification that makes all the magic work. Apache CouchDB is the focal point for The Replicating Society*.

(* cue your Blade Runner jokes)

Apache CouchDB is THE standard for data freedom and exchange and is the clearing house, the centre for an ecosystem that includes fantastic projects like PouchDB and the TouchDBs, MAx Ogden’s `dat` and whichever else follow these. Not saying we merge those projects in, they can stand on their own, but we should embrace everything that makes the interoperable replication world a reality.

http://couchdb.apache.org is going to be the centre of the data replication universe.

(Noah's commentary: I think we should call this "Couch" and capitalise on the "-DB" less prefix that people have used elsewhere. this should be a reclamation effort on our part, to own, and define what a "couch-like" system is. this needs further discussion on the list.)


The ONE feature that makes CouchDB relevant is multi-master replication.

== What? ==

 * What do we do?
 * What do we make?

Suggestions made:

 * Erlang
 * JavaScript
 * MapReduce
 * hoodie
 * kanso
 * erica
 * couchapp
 * Message hub (nodejistsu, hoodie are using couchdb as a message hub somehow)

Jan outlines his idea of a "core":

 * remote & local replication
 * MR-views & GeoCouch enabled by default (ideally abstracted away with nice “query dsl”)
 * HTTP interface
 * Fu/Fauxton
 * configuration
 * stats
 * docs
 * plugin system with Erlang (and in the future JavaScript support via Node.js)


 * plugin system

Note also:

"And yes, this explicitly includes things like shows and lists and update functions and rewrites and vhosts. We should make it super simple to add these, but for a default experience, they are very, very confusing. We should have a single plugin “CouchApp Engine” which includes Benoit’s vision of CouchApps done right that is just a click away to install."

Jan lays out our "specs":

 * Apache CouchDB implements the CouchDB vision: It is a database that replicates.

 * Document Database:
   * Data records are standard JSON.
   * Unlimited Binary data storage with attachments.
   * (alternatively arbitrary mime docs with special rules for JSON docs)

 * Fault-tolerant:
   * Data is always safe. Tail-append storage ensures no messing with already committed data.
   * Errors are isolated, recovery is local and doesn’t affect other parallel requests.
   * Recovery of fatal errors is immediate. There is no “fixup phase” after a restart.
   * Software updates and bugfix deployment without downtime.

 * Highly Concurrent:
   * Erlang makes good use of massively parallel network server installations.
   * Garbage collection happens roughly on a per-request basis. GC in one request doesn’t affect other requests.

 * Cluster / BigCouch / Big Data:
   * Includes a Dynamo-style clustering and cluster-management feature that allows to spread data and load over multiple physical machines.
   * Scales up to Petabytes of data.

 * Secondary 2D and 3D indexing
   * Using incremental and asynchronous index updates for high-performance queries.

 * Makes good use of hardware:
   * Tail-append storage allows for serial write access to storage media, which is a best-case-scenario for spinning disks and SSDs.

 * Small Core & Flexible Plugin System:
   * Some features are only useful for a small group of people, these can be installed with a super simple plugin management system that is built into the admin interface.
   * Get new features with a click or tap.
   * Plugins can be written in Erlang (and in JavaScript in the future).

 * Cross Platform Support
   * Runs on any POSIX UNIX as well as Windows.
   * Support for some embedded devices like Android and RaspberryPi.


The ONE feature that makes CouchDB relevant is multi-master replication.


An HTTP API, a small code base. The HTTP API is more important than some are saying today. Of course we could use a binary protocol it would be faster. But it is just a matter of time. With HTTP 2.0 coming at the end of the year, the already working implementations using SPDY, the HTTP couchdb api will be exchanged in binary stream. Using couchdb over and on the web is really one of its key features.

I didn't have to query them across multiple tables, simply map them then query them to match some pattern. I didn't have to organize them at first in tables or columns. I just had to store my document and create views (index) on them. The views can be later edited or edited, but the documents, the way I store the data don't change. Which was perfectly fit the way I code, iterating over features ans sometimes completely change the way I'm using/view the data in the code. CouchDB was giving me way to manipulate data I didn't have since a while, since I played with hypercard or lotus notes.

Incremented views and the way couchdb is storing the data are designed for the new storages we have today (ssds and others)