Wednesday, December 29, 2010
MongoDB is one the best of the new crop of post-relational or post-schema-driven persistence engines designed for large scale cloud storage. Like CouchDB, another document-centric and schema-less cloud scale engine, MongoDB also offer several robust features: MapReduce support over its data, attribute-level indexing, auto-sharding, in-place updating, and high-availability. What MongoDB does not have is equally interesting: mandatory schema and SQL-like restrictions on data access and programming. Written in C++ and designed for multi-language access (Java first and foremost it seems), MongoDB is what I would term an instance-oriented cloud store: instances (documents) can be highly variant in structure and the system elegantly scales and continues to perform. A lot of this has to do with the manner in which the auto-sharding and replica management happens behind the scenes. This automagical behavior is reminiscent of the old(er) world of RDBMS storage that has dominated the enterprise computing space from the late 1980s through early 2000s but has jettisoned many of the sacred cows of SQL-based storage: homogeneous structure, convoluted graph-oriented operation (subqueries), and scale limitations based on the design of locking managers that were designed for I/O characteristics that simply do not obtain in the cloud space.
I have had a chance to play with MongoDB, Membase, Hive, Cassandra, and CouchDB and can definitely say that this feels like the inevitable direction of cloud scale storage and computation (see Spark, Dryad, MapReduce, etc.). Microsoft has proprietary auto-sharding storage systems (SQL Azure and Azure XStore) which I shall write about on another occasion. All that said, schema-free storage, indexing, auto-sharding, and high performance make for a compelling offering in MongoDB.