I am not Martin Scholl

nor am I Ludwig Wittgenstein 

5 Simple Steps for Less Procrastination and more Getting Things Done

My personal rules for less procrastination to get more things done:

  1. Start the day by creating or doing something useful: this ranges from slicing an apple for your breakfast or learning to play the piano or doing sports
  2. At work, the first thing you create is your daily to-do list.
  3. Start working on at least one of the todo list's items for a longer period of time, at least 30 minutes
  4. Only then dare to start your email / twitter / whatever client
  5. goto 3.

Comments [2]

On Kodak

I owned and used analog photo gear for some 2y only and as such I
wasn't exposed to Kodak as much as probably you are.
Therefore I really enjoyed reading this article on Kodak as a company
producing nice products and changing the way it is connected with the
daily life. You might know all the details and what "Kodakery" means,
but I didn't:

http://www.theatlantic.com/technology/archive/2012/01/the-triumph-of-kodakery-the-camera-maker-may-die-but-the-culture-it-created-survives/250952/

The article also is tale about our (so-called "IT") industry, what the meme of
important companies are back then and now.

Comments [0]

MongoGate — or let's have a serious NoSQL discussion

There is a well-known rant on MongoDB ranking high at HackerNews right now. In case you haven't stumbled upon it, here is the link to it: http://pastebin.com/raw.php?i=FD3xe6Jt

I'd like to call the opinion leak over at Pastebin as the MongoGate rant. 
[Update: As it turns out, it is a Hoax. – My criticism holds still.]

But, limiting the discussion on MongoDB alone is plain wrong. The marketing fluff around NoSQL is the problem. And I'd like to tell you why I think this is the case.

Picturing The NoSQL Landscape

Let me first picture the NoSQL landscape. Also, please let me hereby declare this first: this is my biased view as a founder who lately works in the shiny land of NoSQL.

So what is NoSQL from my point of view:

  1. Enabling Scalability: NoSQL is a technical approach to a problem many companies have with relational databases. Problems are different but in the end it's all about scalable performance. The general trade-off is between availability and consistency. 
  2. Model Data Differently: What the NoSQL-movement tries to start is a new way of thinking about data and how to model data storage more closely along application requirements.
  3. Be Different: NoSQL also is an approach to be different (not necessarily better but different). My understanding is, for many developers and companies in the game it ultimately is about being Not-like-SQL-based Data Stores as a way to say "I do not drive European cars."

Yes, I'm oversimplifying here. Read on, this is a rant.

The NoSQL Assembler Language

What the NoSQL-movement clearly has enabled is a way to reliably store and scale Key-Value-like data access. I would like to call this the data storage assembler level.

The only problem is: we (the NoSQL community) treat the innovation of a scalalbe KV-store like it was the holy cow: "My god, I can push x thousand KV-ops through this Redis [or substitute Redis with any other KV-Store out there] thing". Now let's view this from a 10.000 feet perspective of a developer just wanting to start a new "cat picture community" (aka social network for <X>). Must she/he care about this? Should he care? Is Key-Value-access really an issue here?

Or should a developer better care for how to get some SW build which helps manage and share cat photos instead of managing data access like it was Assembler Programming Languange's revival? 
Everybody knows Assembler eventually yields the highest-performing application code out there (given many many years of extreme Assembler programming and a lot of bit-trickery in his/her toolkit).

But: do you know x86 Assembler? Do you write in it? Or do you use higher-level language abstractions?

The "Model Data Differently" Myth

What a blatant lie this is! As a developer you are free to choose, but don't ask for something different:

  • Key-Value: we had that already. It's assembler. Put in whatever you want, live with nasty side-effects like "WTF who overrides my nicely cached value with that garbage?!" or "How do I manage my keyspace?" or "What is this nice MD5-hashed key for again?!" (please, don't ask for efficient key listing here, it's just key-value, OK, bro?!)
  • DocumentDB: Store it as a "structured" document – or – "Put in it whatever data you like (but no blobs or other "strange" stuff please). Whether that doc is complete or, whatever – well, this is completely your problem, not mine, dear developer!"
  • Highly-Dimensional DBs: (this is Cassandra, HBase-land folks) "Dear developer, what about this nice mind-blowing way of modelling data? Isn't it cool to store your timeline in a super column family wrapped in a column family hashed by the username appended with a timestamp?"

So, you as a developer a free to choose from rich ways to express yourself from these 3 categories. – oh wait, they do not model your application well? Well, that's your problem! 

The Being Different Myth

Instead of using the relation model, what the NoSQL movement brings to the table is you can now choose from other ways to reach your own hell. You are free to pick KV-stores, Document DBs or other more complex ways of expressing yourself (beneath some SQL-stuff of course).

But does this really transcend the current state of the art? Is this really different from SQL-based systems?!

I bet not. You have more options to choose from, but they still have an application-data-store- impedance mismatch involved. Developing applications hasn't become much easier since the NoSQL trend. Clearly, the NoSQL movement has questioned the state-of-the-art in plenty of positive ways, but the positive net effect on programmers and productivity still remains to be proven. 

Issues Unsolved

Let me give you some examples for core problems that the NoSQL movement still hasn't solved:

  • Managing Highly-dimensional data and access to it: I am not referring to <Key, Subkey>-like constructs as in the HBase / Cassandra-way of thinking. I'm thinking of e.g. geo/spatial data here. Where are the solutions out there?
  • Storing and Managing Timelines as in Twitter / Facebook: everybody knows timelines, many have them in their toolkit / application integrated. But where are scalable ways to store timelines as constructed from a social graph in a scalable way?
  • Scalable Sorted Indexing: Look at what Microsoft does with its Azure storage and learn what else has to be done to have seriously performing sorted indexes that scale and adapt dynamically along their usage and query patterns.
  • Truly Scalable Secondary Indexes: Also, secondary indexes, quite a new construct in the NoSQL world, often have serious scalability strings attached. First hashing keys, effectively destroying their order, to then perform a broadcast on your cluster for getting a hold of your keys and values simply is not scalable, folks! Live with it, not every problem can be solved by consistent hashing! And don't get me on consistency of those secondary indexes...
  • Storing, Managing and Processing Events in a way to yield much better insight into running systems, detect anomalies, and much much more (Twitter's Storm is a real start though)
  • [I will stop here for now]

These Problems are core to many issues application developers have. Let me conclude here: The NoSQL world still has no serious answers to these problems (Riak developers with their secondary indexes might disagree here though). 

What we need

In my opinion, the MongoGate issue seems to be a real and serious issue for the users and developers of MongoDB. But let me say this: it is just the symptom, not the cause.

The problem is: NoSQL is not a solution at all. It's a trade-off. You exchange a hard problem (scalability) and get several other problems on return. These trade-offs lie in development effort, application-data-store impedance mismatches, managability, lack of 3rd party tools, application complexity concerns due to eventual consistency, and many other issues. They all have to be communicated more openly, more freely.

What we do not need is another MongoGate. We (as NoSQL data store developers) need to focus on the problem more closely. We better focus on applications. We better help application developers develop their stuff more simple, more streamlined and with a higher degree of scalability (in that order, not the other way 'round).

And finally: We need to solve the really hard problems!

Comments [12]

12 Lessons Steve Jobs Taught Guy Kawasaki

Taken from this video with Guy Kawasiki on the 12 lessons he learned by working with Steve Jobs 2 times

12 Steve Jobs Lessions
View more presentations from zeit_geist.

Comments [0]

Sunday Reading 09/25

Comments [0]

XML Reports for Erlang Common Test Runs

Within our Continuous Integration process, we heavily depend on running tests using Erlang's excellent Common Test (CT) framework. With CT's help we not only can run tests in a moderately distributed fashion, but more importantly we have plenty of options to run subsets and different combinations configuration options in a really easy way. 

But there always was one downside: those tests didn't show up in Hudson. Although CT generates nice HTML logs of the test runs, the only way of CT integration in Hudson was to use an error return code to signal failure. 

Read the rest of this post »

Filed under  //   CI   common_test   ct   erlang   surefire  

Comments [0]

11-week29 Favorite Readings

Readings, blogs and publications I have enjoyed this week [in no particular order]:

Read the rest of this post »

Filed under  //   CS   VC   funding   m/r   mapreduce   opensource  

Comments [0]

A Call for an Honest and Sustainable OpenSource Software License

Finding a proper license for our BigData Software «i(2)»

Usually, all the hip web2.0 guys that recently got VC money start their first blog post with laying out their vision, telling the world that their invention will mean a revolution to everybody's daily life. I will not do so.

Instead, I am free to admit: I don't know what the heck I'm actually doing right now. To make a long story short: we are writing software that shall be a toolkit for data-driven software. Think of the role of Hadoop for processing unstructed data or RoR for developing web-apps. We aim to produce a toolkit and framework to really simplify the creation of advanced data-driven software — non-humbly speaking, the Ruby-on-Rails for data-driven apps.

Now this if far-off from being concrete enough to give you a basic understanding of what we do, clearly a topic for a future blog post. Instead, this post shall be about an obvious question: which license shall we pick for our software? 

Finding a proper license that matches our business model and is fair at the same time

Our business model is quite clear: make money by helping other companies making money with our software; comprising training, enhancing our software, writing custom data apps, etc. — all in all nothing spectacular.

But like every software company thinking about OpenSource'ing their core product, we have the fear somebody just takes the software and refuses to give something back (not necessarily meaning money) to the original authors or us as a company. Since our software is really interesting for companies making money by selling data or offering data-driven services, we think it is normal and fair to demand a fair share of the revenue that our software enables to earn.

What we have looked at so far, is:

  • GPLv2 / GPLv3: well it's for a framework that GPL is not suitable
  • LGPL: suitable but we are unsure how developers think about LGPL. Please use posterous's comment function so we can find out!!
  • Apache License 2.0, the classical Berkeley license: both do not ensure aforementioned tit-for-tat

On the other hand, there is a more severe problem that we also don't know how to resolve right now: when we apply a tit-for-tat scheme on the i(2) software using companies, we also have to come up with a model to have developers benefit from i(2) when they have contributed 

The impedance mismatch between licensing code and business model

I come to believe, this is caused by a fundamental issue in the current OpenSource licensing regime. The abstract nature of current OpenSource software licenses is to manage actions on/for what is being licensed: software as a collection of binaries and its backing source code. Please notice, a business model has no place in the licenses. IMHO there is clear impedance mismatch between managing the rights around software as a IP problem and managing revenue as a business problem. 

Our current research result is: there is no "tit-for-fat OpenSource license" out there that fits our needs. Our neads are simple and not really new for any OpenSource company: enable developers to happily extend the software while a good enough revenue stream is ensured such that the community – at large – can benefit from the software it develops.

Fine. How have companies solved this issue in the past?

Companies hack the licensing regime. Certain licenses are chosen solely for their side-effects to ultimately secure a certain business model. GPL is the most prominent one as can be seen with MySQL AB.

But what is missing still? An OpenSource software license that takes sharing and the generation of revenue directly into account. The license of my dreams goes something like this:

« Do whatever you like with the software and its source code but do not hold us liable for anything. When you start earning decent money, pay us a fair dollar.»

A Call for an Honest and Sustainable OpenSource Software License

I'm not a lawyer nor an experienced OpenSource business person, so please don't shoot the messenger. Instead, please help us in our quest to solve this OpenSource and tit-for-tat problem. In my opinion, the current OpenSource landscape and ultimately the OpenSource licensing offerings, are simply not sufficient. We need a license that ensures that both parties can benefit from each other: software users and software writers. We need a license that makes sure developers and companies backing OpenSource software have a sustainable stream of revenue to keep the software alive, make a decent dollar with it.

Please help us getting such a license and leave your comment. Or contact me directly via martin (at) infinipool.com as well as @zeit_geist.

Filed under  //   i2   opensource  

Comments [5]