I am not Martin Scholl

nor am I Ludwig Wittgenstein 
« Back to blog

MongoGate — or let's have a serious NoSQL discussion

There is a well-known rant on MongoDB ranking high at HackerNews right now. In case you haven't stumbled upon it, here is the link to it: http://pastebin.com/raw.php?i=FD3xe6Jt

I'd like to call the opinion leak over at Pastebin as the MongoGate rant. 
[Update: As it turns out, it is a Hoax. – My criticism holds still.]

But, limiting the discussion on MongoDB alone is plain wrong. The marketing fluff around NoSQL is the problem. And I'd like to tell you why I think this is the case.

Picturing The NoSQL Landscape

Let me first picture the NoSQL landscape. Also, please let me hereby declare this first: this is my biased view as a founder who lately works in the shiny land of NoSQL.

So what is NoSQL from my point of view:

  1. Enabling Scalability: NoSQL is a technical approach to a problem many companies have with relational databases. Problems are different but in the end it's all about scalable performance. The general trade-off is between availability and consistency. 
  2. Model Data Differently: What the NoSQL-movement tries to start is a new way of thinking about data and how to model data storage more closely along application requirements.
  3. Be Different: NoSQL also is an approach to be different (not necessarily better but different). My understanding is, for many developers and companies in the game it ultimately is about being Not-like-SQL-based Data Stores as a way to say "I do not drive European cars."

Yes, I'm oversimplifying here. Read on, this is a rant.

The NoSQL Assembler Language

What the NoSQL-movement clearly has enabled is a way to reliably store and scale Key-Value-like data access. I would like to call this the data storage assembler level.

The only problem is: we (the NoSQL community) treat the innovation of a scalalbe KV-store like it was the holy cow: "My god, I can push x thousand KV-ops through this Redis [or substitute Redis with any other KV-Store out there] thing". Now let's view this from a 10.000 feet perspective of a developer just wanting to start a new "cat picture community" (aka social network for <X>). Must she/he care about this? Should he care? Is Key-Value-access really an issue here?

Or should a developer better care for how to get some SW build which helps manage and share cat photos instead of managing data access like it was Assembler Programming Languange's revival? 
Everybody knows Assembler eventually yields the highest-performing application code out there (given many many years of extreme Assembler programming and a lot of bit-trickery in his/her toolkit).

But: do you know x86 Assembler? Do you write in it? Or do you use higher-level language abstractions?

The "Model Data Differently" Myth

What a blatant lie this is! As a developer you are free to choose, but don't ask for something different:

  • Key-Value: we had that already. It's assembler. Put in whatever you want, live with nasty side-effects like "WTF who overrides my nicely cached value with that garbage?!" or "How do I manage my keyspace?" or "What is this nice MD5-hashed key for again?!" (please, don't ask for efficient key listing here, it's just key-value, OK, bro?!)
  • DocumentDB: Store it as a "structured" document – or – "Put in it whatever data you like (but no blobs or other "strange" stuff please). Whether that doc is complete or, whatever – well, this is completely your problem, not mine, dear developer!"
  • Highly-Dimensional DBs: (this is Cassandra, HBase-land folks) "Dear developer, what about this nice mind-blowing way of modelling data? Isn't it cool to store your timeline in a super column family wrapped in a column family hashed by the username appended with a timestamp?"

So, you as a developer a free to choose from rich ways to express yourself from these 3 categories. – oh wait, they do not model your application well? Well, that's your problem! 

The Being Different Myth

Instead of using the relation model, what the NoSQL movement brings to the table is you can now choose from other ways to reach your own hell. You are free to pick KV-stores, Document DBs or other more complex ways of expressing yourself (beneath some SQL-stuff of course).

But does this really transcend the current state of the art? Is this really different from SQL-based systems?!

I bet not. You have more options to choose from, but they still have an application-data-store- impedance mismatch involved. Developing applications hasn't become much easier since the NoSQL trend. Clearly, the NoSQL movement has questioned the state-of-the-art in plenty of positive ways, but the positive net effect on programmers and productivity still remains to be proven. 

Issues Unsolved

Let me give you some examples for core problems that the NoSQL movement still hasn't solved:

  • Managing Highly-dimensional data and access to it: I am not referring to <Key, Subkey>-like constructs as in the HBase / Cassandra-way of thinking. I'm thinking of e.g. geo/spatial data here. Where are the solutions out there?
  • Storing and Managing Timelines as in Twitter / Facebook: everybody knows timelines, many have them in their toolkit / application integrated. But where are scalable ways to store timelines as constructed from a social graph in a scalable way?
  • Scalable Sorted Indexing: Look at what Microsoft does with its Azure storage and learn what else has to be done to have seriously performing sorted indexes that scale and adapt dynamically along their usage and query patterns.
  • Truly Scalable Secondary Indexes: Also, secondary indexes, quite a new construct in the NoSQL world, often have serious scalability strings attached. First hashing keys, effectively destroying their order, to then perform a broadcast on your cluster for getting a hold of your keys and values simply is not scalable, folks! Live with it, not every problem can be solved by consistent hashing! And don't get me on consistency of those secondary indexes...
  • Storing, Managing and Processing Events in a way to yield much better insight into running systems, detect anomalies, and much much more (Twitter's Storm is a real start though)
  • [I will stop here for now]

These Problems are core to many issues application developers have. Let me conclude here: The NoSQL world still has no serious answers to these problems (Riak developers with their secondary indexes might disagree here though). 

What we need

In my opinion, the MongoGate issue seems to be a real and serious issue for the users and developers of MongoDB. But let me say this: it is just the symptom, not the cause.

The problem is: NoSQL is not a solution at all. It's a trade-off. You exchange a hard problem (scalability) and get several other problems on return. These trade-offs lie in development effort, application-data-store impedance mismatches, managability, lack of 3rd party tools, application complexity concerns due to eventual consistency, and many other issues. They all have to be communicated more openly, more freely.

What we do not need is another MongoGate. We (as NoSQL data store developers) need to focus on the problem more closely. We better focus on applications. We better help application developers develop their stuff more simple, more streamlined and with a higher degree of scalability (in that order, not the other way 'round).

And finally: We need to solve the really hard problems!

Comments (12)

Nov 06, 2011
foljs said...
What the NoSQL-movement clearly has enabled is a way to reliably store and scale Key-Value-like data access.

Er, actually no. The whole MongoGate is based on the premise that it did neither the one (reliably store) nor the other (reliably scale).

Nov 06, 2011
Martin Scholl said...
A minor headline edit: NoSQL counter examples -> Issues Unsolved
Nov 07, 2011
Casey Strouse said...
I think that the point they make about using NoSQL for rapid prototyping really hits the nail on the head; this is what IMO Mongo and pals is best for. Often I find that once I do something with Mongo the client wants the production version of their application to use MySQL anyway. Mongo does save me a lot of time in getting my concepts and ideas off the ground though and so I think indeed there's a place for NoSQL that's here to stay.
Nov 07, 2011
Kevin Ferron said...
Well the hype is pretty dangerous.. of course nosql solutions are not a panacea, nor was the selling point ever 'it makes developers jobs easier'... I think the critical thing you are missing on is.. there are some situations where even a sharded rdbms simple cannot scale.. not even just efficiently.. just no.. I know it drives long time sql experts crazy, but if I need 7 hotswappable boxes to deal with 100k t/s, as opposed to 50 sharded master slaves with detailed replication and failover rules to get even half the throughput (no way on writes with real ACID).. guess which "trade-off" i'm going to take?
Nov 07, 2011
Martin Scholl said...
Kevin, absolutely. I don't drink the relational kool-aid either btw.

Making developer's life simpler is an important topic because it makes the technology more broadly applicable. It ultimately leads to the biggest impact (just look at how MySQL has driven startups for so long now).
It is just data storage and application land have become so disconnected that even the NoSQL movement with its more flexible data storage methods hasn't closed this gap appropriately. As a company we are working in this field and have seen the results of having a more application-driven data storage layer at hand. The effects are much better and the return higher than we ever anticipated.
Nov 07, 2011
Martin Scholl said...
Made a 2nd update and linked to http://news.ycombinator.com/item?id=3205573
Nov 07, 2011
dfeldman said...
I'm really new to the NoSQL world, but started looking at it via CouchBase due to its ability to replicate data offline in a mobile app. There's a lot of complexity to understand in CouchBase itself, and it's new enough that the documentation can be frustrating. But offline replication in a SQL world (or any world where you have to write it yourself) is tough, so I remain intrigued.

Again, it's a trade-off, and I'd be psyched to see more (and more nuanced) options for automatic replication.

Nov 08, 2011
jim snavely said...
look up the CAP principle. you can only have two of the following: consistency, availbility and partition tolerance. nosql lets you tune those trade offs.
Dec 06, 2011
Alexander Trefz said...
*Er, actually no. The whole MongoGate is based on the premise that it did neither the one (reliably store) nor the other (reliably scale).*

So, Yahoo is neither reliable nor scalable since, what? 1994 or so? Cause they use a custom noSQL DB since the beginning.

Dec 06, 2011
jim snavely said...
that may or may not be true of mongoDB. But even then, you can't extrapolate that to all nosql databases. Cassandra seems to scale pretty well for facebook.
Dec 06, 2011
Alexander Trefz said...
But the article questions not only Mongo but the whole NoSQL-idea itself. This concern is technically invalid as Yahoo proves. To state that mongo might be unreliable or does not scale well is a completely different thing.
Dec 24, 2011
Tipubd said...
Very Informatic Post

Leave a comment...