There is a well-known rant on MongoDB ranking high at HackerNews right now. In case you haven't stumbled upon it, here is the link to it: http://pastebin.com/raw.php?i=FD3xe6Jt
I'd like to call the opinion leak over at Pastebin as the MongoGate rant.
[Update: As it turns out, it is a Hoax. – My criticism holds still.]
But, limiting the discussion on MongoDB alone is plain wrong. The marketing fluff around NoSQL is the problem. And I'd like to tell you why I think this is the case.
Picturing The NoSQL Landscape
Let me first picture the NoSQL landscape. Also, please let me hereby declare this first: this is my biased view as a founder who lately works in the shiny land of NoSQL.
So what is NoSQL from my point of view:
- Enabling Scalability: NoSQL is a technical approach to a problem many companies have with relational databases. Problems are different but in the end it's all about scalable performance. The general trade-off is between availability and consistency.
- Model Data Differently: What the NoSQL-movement tries to start is a new way of thinking about data and how to model data storage more closely along application requirements.
- Be Different: NoSQL also is an approach to be different (not necessarily better but different). My understanding is, for many developers and companies in the game it ultimately is about being Not-like-SQL-based Data Stores as a way to say "I do not drive European cars."
Yes, I'm oversimplifying here. Read on, this is a rant.
The NoSQL Assembler Language
What the NoSQL-movement clearly has enabled is a way to reliably store and scale Key-Value-like data access. I would like to call this the data storage assembler level.
The only problem is: we (the NoSQL community) treat the innovation of a scalalbe KV-store like it was the holy cow: "My god, I can push x thousand KV-ops through this Redis [or substitute Redis with any other KV-Store out there] thing". Now let's view this from a 10.000 feet perspective of a developer just wanting to start a new "cat picture community" (aka social network for <X>). Must she/he care about this? Should he care? Is Key-Value-access really an issue here?
Or should a developer better care for how to get some SW build which helps manage and share cat photos instead of managing data access like it was Assembler Programming Languange's revival?
Everybody knows Assembler eventually yields the highest-performing application code out there (given many many years of extreme Assembler programming and a lot of bit-trickery in his/her toolkit).
But: do you know x86 Assembler? Do you write in it? Or do you use higher-level language abstractions?
The "Model Data Differently" Myth
What a blatant lie this is! As a developer you are free to choose, but don't ask for something different:
- Key-Value: we had that already. It's assembler. Put in whatever you want, live with nasty side-effects like "WTF who overrides my nicely cached value with that garbage?!" or "How do I manage my keyspace?" or "What is this nice MD5-hashed key for again?!" (please, don't ask for efficient key listing here, it's just key-value, OK, bro?!)
- DocumentDB: Store it as a "structured" document – or – "Put in it whatever data you like (but no blobs or other "strange" stuff please). Whether that doc is complete or, whatever – well, this is completely your problem, not mine, dear developer!"
- Highly-Dimensional DBs: (this is Cassandra, HBase-land folks) "Dear developer, what about this nice mind-blowing way of modelling data? Isn't it cool to store your timeline in a super column family wrapped in a column family hashed by the username appended with a timestamp?"
So, you as a developer a free to choose from rich ways to express yourself from these 3 categories. – oh wait, they do not model your application well? Well, that's your problem!
The Being Different Myth
Instead of using the relation model, what the NoSQL movement brings to the table is you can now choose from other ways to reach your own hell. You are free to pick KV-stores, Document DBs or other more complex ways of expressing yourself (beneath some SQL-stuff of course).
But does this really transcend the current state of the art? Is this really different from SQL-based systems?!
I bet not. You have more options to choose from, but they still have an application-data-store- impedance mismatch involved. Developing applications hasn't become much easier since the NoSQL trend. Clearly, the NoSQL movement has questioned the state-of-the-art in plenty of positive ways, but the positive net effect on programmers and productivity still remains to be proven.
Issues Unsolved
Let me give you some examples for core problems that the NoSQL movement still hasn't solved:
- Managing Highly-dimensional data and access to it: I am not referring to <Key, Subkey>-like constructs as in the HBase / Cassandra-way of thinking. I'm thinking of e.g. geo/spatial data here. Where are the solutions out there?
- Storing and Managing Timelines as in Twitter / Facebook: everybody knows timelines, many have them in their toolkit / application integrated. But where are scalable ways to store timelines as constructed from a social graph in a scalable way?
- Scalable Sorted Indexing: Look at what Microsoft does with its Azure storage and learn what else has to be done to have seriously performing sorted indexes that scale and adapt dynamically along their usage and query patterns.
- Truly Scalable Secondary Indexes: Also, secondary indexes, quite a new construct in the NoSQL world, often have serious scalability strings attached. First hashing keys, effectively destroying their order, to then perform a broadcast on your cluster for getting a hold of your keys and values simply is not scalable, folks! Live with it, not every problem can be solved by consistent hashing! And don't get me on consistency of those secondary indexes...
- Storing, Managing and Processing Events in a way to yield much better insight into running systems, detect anomalies, and much much more (Twitter's Storm is a real start though)
- [I will stop here for now]
These Problems are core to many issues application developers have. Let me conclude here: The NoSQL world still has no serious answers to these problems (Riak developers with their secondary indexes might disagree here though).
What we need
In my opinion, the MongoGate issue seems to be a real and serious issue for the users and developers of MongoDB. But let me say this: it is just the symptom, not the cause.
The problem is: NoSQL is not a solution at all. It's a trade-off. You exchange a hard problem (scalability) and get several other problems on return. These trade-offs lie in development effort, application-data-store impedance mismatches, managability, lack of 3rd party tools, application complexity concerns due to eventual consistency, and many other issues. They all have to be communicated more openly, more freely.
What we do not need is another MongoGate. We (as NoSQL data store developers) need to focus on the problem more closely. We better focus on applications. We better help application developers develop their stuff more simple, more streamlined and with a higher degree of scalability (in that order, not the other way 'round).
And finally: We need to solve the really hard problems!
Comments [2]