Wednesday, July 23, 2014
Monday, July 21, 2014
A few weeks back, Mike Olson of Cloudera spoke at Spark Summit on how Spark relates to the future of Hadoop. The presentation can be found here:
In particular I want to draw attention to the statement made at 1:45 in the presentation that describes Spark as the "natural successor to MapReduce" - it becomes clear very quickly that what Olson is talking about is batch processing. This is fascinating as everyone I've talked to immediately points out one obvious thing: Spark isn't a general purpose batch processing framework - that is not its design center. The whole point of Spark is to enable fast data access and interactivity.
The guys that clearly "get" Spark - unsurprisingly - are DataBricks. In talking with Ion and company, it's clear they understand the use cases where Spark shines - data scientist driven data exploration and algorithmic development, machine learning, etc. - things that take advantage of the memory mapping capabilities and speed of the framework. And they have offered an online service that allows users to rapidly extract value from cloud friendly datasets, which is smart.
Cloudera's idea of pushing SQL, Pig and other frameworks on to Spark is actually a step backwards - it is a proposal to recreate all the problems of MapReduce 1: it fails to understand the power of refactoring resource management away from the compute model. Spark would have to reinvent and mature models for multi-tenancy, resource managemnet, scheduling, security, scaleout, etc that are frankly already there today for Hadoop 2 with YARN.
The announcement of an intent to lead an implementation of Hive on Spark got some attention. This was something that I looked at carefully with my colleagues almost 2 years ago, so I'd like to make a few observations on why we didn't take this path then.
The first was maturity, in terms of the Spark implementation, of Hive itself, and Shark. Candidly, we knew Hive itself worked at scale but needed significant enhancement and refactoring for both new features on the SQL front and to work at interactive speeds. And we wanted to do all this in a way that did not compromise Hive's ability to work at scale - for real big data problems. So we focused on the mainstream of Hive and the development of a Dryad like runtime for optimal execution of operators in physical plans for SQL in a way that meshed deeply with YARN. That model took the learnings of the database community and scale out big data solutions and built on them "from the inside out", so to speak.
Anyone who has been tracking Hadoop for, oh, the last 2-3 years will understand intuitively the right architectural approach needs to be based on YARN. What I mean is that the query execution must - at the query task level - be composed of tasks that are administered directly by YARN. This is absolutely critical for multi-workload systems (this is one reason why a bolt on MPP solution is a mistake for Hadoop - it is at best a tactical model while the system evolves). This is why we are working with the community on Tez, a low level framework for enabling YARN native domain specific execution engines. For Hive-on-Tez, Hive is the engine and Tez provides the YARN level integration for resource negotiation and coorindation for DAG execution: a DAG of native operators analogous the the execution model found in the MPP world (when people compare Tez and Spark, they are fundamentally confused - Spark could be run on Tez for example for a much deeper integration with Hadoop 2 for example). This model allows the full range of use cases from interactive to massive batch to be administered in a deeply integrated, YARN native way.
Spark will undoubtedly mature into a great tool for what it is designed for: in memory, interactive scenarios - generally script driven - and likely grow to subsume new use cases we aren't anticipating today. It is, however, exactly the wrong choice for scale out big data batch processing in anything like the near term; worse still, returning to a monolithic general purpose compute framework for all Hadoop models would be a huge regression and is a disastrously bad idea.
Sunday, July 20, 2014
Dependent Rational Animals
One of the criticisms of MacIntyre is that his critique of rational ethics is, on the one hand, devastating; on the other hand, his positive case for working out a defense of his own position - a revivification of social ethics in the Aristotelian-Thomist tradition(s) was somewhat pro forma. I think this is legitimate in so far as it relates to After Virtue itself (I believe I have read the latest edition - 3 - most recently), though I am not enough of a MacIntyre expert to offer a defensible critique of his work overall.
I do, however, want to draw attention to Dependent Rational Animals specifically in this light. Here MacIntyre begins with is the position of human as animal - as a kind of naturalist starting point for developing another pass at the importance of the tradition of the virtues. What is most remarkable is that in the process of exploring the implications of our "animality" MacIntyre manages to subvert yet another trajectory of twentieth century philosophy, this time as it relates to the primacy of linguistics. The net effect is to restore philosophical discourse back toward the reality of the human condition in the context of the broader evolutionary context of life on earth without - and this I must say is the most amazing part of this book - resorting to fables-masked-as-science (evolutionary psychology).
Monday, July 07, 2014
Thursday, July 03, 2014
Yet this is exactly what Ryszard Kapuscinski accomplishes in his series of talks published as The Other. Here, the Polish journalist builds on his experience and most importantly on the reflections on the Lithuanian-Jewish philosopher Emmanual Levinas to reflect on how the encounter with the Other in a broad, cross cultural sense is the defining event - and opportunity - in late (or post) modernity. For Kapuscinski, the Other is the specifically the non-European cultures in which he spent most of his career as a journalist. For another reader it might be someone very much like Kapuscinski himself.
There are three simple points that Kapuscinski raises that bear attention:
1) The era we live in provides a unique, interpersonal opportunity for encounter with the Other - which is to say that we are neither in the area of relative isolation from the Other that dominated much of human history nor are we any longer in the phase of violent domination that marked the period of European colonial expansion. We have a chance to make space for encounter to be consistently about engagement and exchange, rather than conflict.
2) This encounter cannot primarily technical, its must be interpersonal. Technical means are not only anonymous but more conducive to inculcating mass culture rather than creating space for authentic personal engagement. The current period of human history - post industrial, urbanized, technological - is given to mass culture, mass movements, as a rule - this is accelerated by globalization and communications advances. And while it is clear that the early "psychological" literature of the crowd - and I am thinking not only of the trajectory set by Gustave LeBon, but the later and more mature reflections of Ortega y Gasset - were primarily reactionary, nonetheless they point consistently to the fact that the crowd involves not just a loss of identity, but a loss of the individual: it leaves little room for real encounter and exchange.
While the increasing ability to encounter different cultures offers the possibility of real engagement, at the same time modern mass culture is the number one threat to the Other - in that it subordinates the value of whatever is unique to whatever is both common and most importantly sellable. In visiting Ukraine over the last few years, what fascinated me the most were the things that made the country uniquely Ukrainian. Following a recent trip, I noted the following in a piece by New York Times columnist Nicholas Kristof on a visit to Karapchiv: "The kids here learn English and flirt in low-cut bluejeans. They listen to Rihanna, AC/DC and Taylor Swift. They have crushes on George Clooney and Angelina Jolie, watch “The Simpsons” and “Family Guy,” and play Grand Theft Auto. The school here has computers and an Internet connection, which kids use to watch YouTube and join Facebook. Many expect to get jobs in Italy or Spain — perhaps even America."
What here makes the Other both unique and beautiful is being obliterated by mass culture. Kristof is, of course, a cheerleader for this tragedy, but the true opportunity Kapuscinski asks us to look for ways to build up and offer support in encounter.
3) Lastly and most importantly, for encounter with the Other to be one of mutual recognition and sharing, the personal encounter must have an ethical basis. Kapuscinski observes that the first half of the last century was dominated by Husserl and Heidegger - in other words by epistemic and ontological models. It is no accident, I think, that the same century was marred by enormities wrought by totalizing ideologies - where ethics is subordinated entirely, ideology can rage out of control. Kapuscinski follows Levinas in response - ultimately seeing the Other as a source of ethical responsibility is an imperative of the first order.
The diversity of human cultures is, as Solzhenitzyn rightly noted, the "wealth of mankind, its collective personalities; the very least of them wears its own special colors and bears within itself a special facet of God's design." And yet is only if we can encounter the Other in terms of mutual respect and self-confidence, in terms of exchange and recognition of value in the Other, that we can actually see the Other as a treasure - one that helps ground who I am as much as reveals the treasure for what it is. And this is our main challenge - the other paths, conflict and exclusion, are paths we cannot afford to tread.
Sunday, June 08, 2014
By the way, it's not entirely the case that the English are unable to create something of the same texture today - several times during the performance I thought of Judith Weir's one person, unaccompanied opera King Harald's Saga. Weir's work is much shorter, principally a musical composition and less poetically rich, so it is difficult to compare the two directly: Beowulf remains the provenance of a balladeer first and foremost, and this is a genre that more and more feels lost to our world - poetry today rarely seems to be meant to be read allowed and even more rarely follows epic formats. This is a lost social phenomena, for which we are impoverished: in fact, the last long work of a balladeer I read was Ethiopian Enzira Sebhat, itself a medieval work dedicated to the Virgin Mary. The closest - though only indirectly comparable - works to the Enzira Sebhat that I am aware of currently being composed are akathistos hymns of the Russian Orthodox tradition. And while many of those recent compositions are less-than-accomplished literary works, they unquestionably represent a rich and living and at times very beautiful means of transmission of communal memory and values. I am not aware of any recent akathistos compositions that have the expressive beauty and originality of the Byzantine hymnographer Romanos the Melodist, the modern akathist has sometimes proven a source of inspiration for exceptionally great art: the late Sir John Tavener's setting of the "thanksgiving akathist" being perhaps the most significant case in point.