Saturday, May 31, 2014

Memories of the way we were...

The fascinating thing about Hadoop is the obviousness of its evolutionary needs. For example, MapReduce coupled with reliable scale out storage was a powerful - even revolutionary - effect for organizations with both lots of and multi-structured data. Out of the gate, Hadoop unlocked data "applications" that were for all intents and purposes unimplementable. At the same time, it didn't take much imagination to see that separating the compute model from resource management would be essential for future applications that did not fit well with MapReduce itself. It took a lot of work and care to get YARN defined, implemented and hardened, but the need for YARN itself was fairly obvious. Now it is here and Hadoop is no longer about "batch" data processing.

Note, however, it takes a lot of work to make the evolutionary changes available. In some cases, bolt on solutions have emerged to fill the gap. For key value data management, HBase is a perfect example. Several years ago, Eric Baldeschwieler was pointing out that HDFS could have filled that role. I think he was right, but the time it would take to get "HBase-type" functionality implemented via HDFS would have been a very long path indeed. In that case, the community filled the gap with HBase and it is being "back integrated" into Hadoop via YARN in a way that will make for a happier co-existence.

Right now we are seeing multiple new bolt on attempts to add functionality to Hadoop. For example, there are projects to add MPP databases on top of Hadoop itself. It's pretty obvious that this is at best a stop gap again - and one that comes at a pretty high price - I don't know of anyone that seriously thinks that a bolt on MPP is ultimately the right model for the Hadoop ecosystem. Since the open source alternatives look to be several years away from being "production ready", that raises an interesting question: is Hadoop evolution moving ahead at a similar or even more rapid rate to provide a native solution - a solution that will be more scalable, more adaptive and more open to a wider range of use cases and applications - including alternative declarative languages and compute models?

I think the answer is yes: while SQL on Hadoop via Hive is really the only open source game in town for production use cases - and its gotten some amazing performance gains in the first major iteration on Tez that we'll talk more about in the coming days - its clear that the Apache communities are beginning to deliver a new series of building blocks for data management at scale and speed: Optiq's Cost Based Optimizer; Tez for structuring multi-node operator execution; ORC and vectorization for optimal storage and compute; HCat for DDL. But what's missing? Memory management. And man has it ever been missing - that should have been obvious as well (and it was - one reason that so many people are interested in Spark for efficient algorithm development).

What we've seen so far has been two extremes available when it comes to supporting memory management (especially for SQL) - all disk and all memory. An obvious point here is that neither is ultimately right for Hadoop. This is a long winded intro to point to two, interrelated pieces by Julian Hyde and Sanjay Radia unveiling a model that is being introduced across multiple components called Discardable In-memory Materialized Query (DIMMQ). Once you see this model, it becomes obvious that the future of Hadoop for SQL - and not just SQL - is being implemented in real time. Check out both blog posts:

http://hortonworks.com/blog/dmmq/

http://hortonworks.com/blog/ddm/


Sunday, May 04, 2014

Berdyaev on Dostoevsky

I just finished reading Nicholai Berdyaev's interpretative study on Dostoevsky. On the one hand, this is a work that will be difficult to digest without reading at least the four major novels: Demons (or The Possessed), Crime and Punishment, The Idiot and Brothers Karamazov - as well, I might add, as the Adolescent (or A Raw Youth, as it is sometimes titled). Berdyaev pursues his themes by reference to both characters and arguments that appear in those works. On the other hand, he does such a fine job of concisely presenting major thematic elements and positions that may be non-obvious to American or english language readers, that I would with some hesitation recommend it as a "preface" to reading Doestoevsky's novels. In the latter case, some substantial portion of the discussion would be lost on the reader, but the context it provides overall would certainly be helpful to those approaching the great author's oeuvre for the first time. In particular, the theme of "spiritual" freedom as a necessary condition for human development seems be a correct reading of Dostoevsky and Berdyaev works this idea out from a number of angles. And happily Berdyaev is quite comfortable criticizing some of Dostoevsky's pointedly bad ideas as well.

There are two things I would note as well - Berdyaev is a fascinating critic and character in the development of Russian philosophy, specifically the religious inspired philosophers that in some way were heirs to Soloviev; Berdyaev operates in the role of a philosophical social commentator as opposed to a primarily theological tradition - in this case he is very different than contemporaries like Sergius Bulgakov or Pavel Florensky. I am most familiar with him through his earlier work, including the Meaning of the Creative Act. This book, of course, echoes Berdyaev's thinking, but he is quite clear in distinguishing his critique from the views of his subject, which makes the book all the more valuable in that it seems to avoid projecting his reading of Dostoevsky into Dostoevsky himself. Of course, others may disagree with this - and perhaps my own reading of both authors is colored by my own interpretation.

However, this certainly weighs on the question of how I would rank Berdyaev's critique of Dostoevsky: while it is not the subtlest discussion I have read, it is one of the simplest and in my view "most correct" readings of the author I have encountered. I would go so far as to suggest that Berdyaev's work deserves a primary place in the secondary literature on Dostoevsky. In fact, I would place it alongside Joseph Frank's monumental intellectual and literary biography as recommended companions to Dostoevsky's novels.

Addendum: I should have mentioned Berdyaev's final assesssment: "So great is the worth of Dostoevsky that to have produced him is by itself sufficient justification for the existence of the Russian people in the world." And that my friends is in my view true.