Description

Naiad is an investigation of data-parallel dataflow computation in the spirit of Dryad and DryadLINQ, but with a focus on incremental computation. Naiad introduces a new computational model, differential dataflow, operating over collections of differences rather than collections of records, and resulting in very efficient implementations of programming patterns that are expensive in existing systems. [Source: Microsoft Research]

"Our goal with Naiad was to address one of the recurring requests for systems like Dryad and DryadLINQ, incremental recomputation, but in so doing found that the necessary mechanisms gave rise to a new computational model, differential dataflow, capable of efficiently processing substantially more complex computations than current systems support, namely incremental and arbitrarily nested iterative dataflow computation."

Microsoft Researcher Frank McSherry joins us to discuss what this all means and how it would be useful in the big data problem space (a big problem space...). Demos included, of course.

Can you explain more about the .FixedPoint() linq method? I found the naiad.pptx presentation with a lot of animations, but I'm unable to follow it without the narrative that I presume went with it. I'll check out the paper, but if it's anything like this video, perhaps there are some prerequisite materials you can recommend?

From a practical standpoint, I'm interested in how to integrate this with a datasource like SQL Server for persistence. To maintain a twitter-like service optimized for individual user/viewer queries, I assume that we'd have the data in an intuitive, application-specific schema of Users, Tweets, and Mentions. As changes are committed to the persisted data store (SQL Server), we must independently notify the Naiad cluster of the change - and then the Naiad cluster members (or is it just the Controllers?) can service user queries quickly. Is it advisable to allow this notification to Naiad to come from the database layer rather than from the application layer, using something like SQL Query Notifications? Or does the high volume of changes that might be expected rule out Query Notifications? The benefit of being able to listen to SQL Server directly for relevant changes would be relative transparency to existing applications, less chance of data getting out of sync due to a forgotten call to Naiad.

Alternatively, perhaps Naiad becomes the primary datastore and is augmented to persist its Dataflow dataset in a Naiad-friendly relational schema?