The original paper was published in 2013 via Microsoft Research. Interestingly enough, this paper was republished in 2016 as a short article, with most of the authors now working at Google.

On tracking a message's position (in the compute graph) via timestamp "modification":

Messages in a timely dataflow system flow only along edges, and their timestamps are modified by ingress, egress, and feedback vertices.

The modification is more like an augmentation:

Each event has a timestamp and a location (either a vertex or edge), and we refer to these as a pointstamp

On using this pointstamp to determine a watermark:

Since events cannot send messages backwards in time, we can use this structure to compute lower bounds on the timestamps of messages an event can cause

On using this pointstamp and watermark to determine a window:

By applying this computation to the set of unprocessed events, we can identify the vertex notifications that may be correctly delivered.

On the distinction between the logical graph and the physical one (in the case of a distributed implementation), and on the implications of using either to determine (compute) watermarks and windows:

using the logical graph ensures that the size of the data structures used to compute the relation depends only on the logical graph and not the much larger physical graph

On routing and executing messages vs coordination notifications:

When faced with multiple runnable actions (messages and notifications to deliver) workers break ties by delivering messages before notifications, in order to reduce the amount of queued data. Different policies could be used, such as prioritizing the delivery of messages and notifications with the earliest pointstamp to reduce end-to-end latency

On the performance characteristics of routing 're-entrant' messages in iterative/cyclical graphs:

Without support for re-entrancy, the implementations of many iterative patterns would overload the system queues. Re-entrancy allows the vertex implementation to coalesce incoming messages in [the 'on receive' handler], and thereby reduce overall memory consumption.

We adapt the approach for progress tracking based on a single global frontier to a distributed setting in which multiple workers coordinate independent sets of events using a local view of the global state.

The worker does not immediately update its local occurrence counts as it dispatches events, but instead broadcasts (to all workers, including itself) progress updates

On stragglers:

we view the resulting micro-stragglers as the main obstacle to scalability for low-latency workloads

On related work:

MillWheel is a recent example of a streaming system with punctuations (and sophisticated fault-tolerance) that adopts a vertex API very similar to Naiad’s, but does not support loops

We refer to the recomputed state as the partial solution of the iteration, and henceforth distinguish between two different kinds of iterations: bulk iterations, and incremental iterations.

On 'classes of operators' in a dataflow system:

It is interesting, however, to distinguish between certain classes of operators. First, we distinguish between operators that produce output by consuming one record, called record-at-a-time operators, and operators that need to consume multiple records before producing output, called group-at-a-time operators

On encoding a 'partial solution update' mutation in a dataflow:

In imperative programming, updating the partial solution is achievable by [...] passing a reference to the state of the partial solution and modifying that shared state. Dataflow programs (like functional programs) require that the operators/functions are side effect free. [...] We hence express an update of a record in the partial solution through the replacement of that record.

On why a dataflow operator/function needs to be side effect free:

An intransparent side effect would void the possibility of automatic parallelization, which is one of the main reasons to use dataflow programming for large scale analytics.

On the applicability of iterative dataflow:

Every algorithm that can be expressed via a message-passing interface can also be expressed as an incremental iteration.

On the fundamental optimization of the incremental iteration:

By computing a delta set instead of the next partial solution, we can achieve that the iteration returns fewer records when fewer changes need to be made to the partial solution.

On characteristics of the step function:

Since ∆ must return two data sets, it is necessarily a non-tree DAG.

Because the delta set D is the result of a dataflow operator, it is naturally an unordered bag of records, rather than a set. D may hence contain multiple different records that are identified by the same key, and would replace the same record in the current partial solution.

we allow the optional definition of a comparator for the data type of S. Whenever a record in S is to be replaced by a record in D, the comparator establishes an order among the two records.

On Pregel:

Pregel is a graph processing adoption of bulk synchronous parallel processing. Programs directly model a graph, where vertices hold state and send messages to other vertices along the edges. By receiving messages, vertices update their state.

It is straightforward to implement Pregel on top of Stratosphere’s iterative abstraction: the partial solution holds the state of the vertices, the workset holds the messages. The step function ∆ updates the vertex state from the messages in the working set and creates new update messages.

Pregel combines vertex activation with messaging, while incremental iterations give you the freedom to separate these aspects

On how the core innovation of Spark — RDDs — actually contribute to an improvement over Hadoop/MapReduce:

Resilient Distributed Datasets (RDDs), a set of in-memory data structures able to cache intermediate data across a set of nodes, in order to efficiently support iterative algorithms.

On RDD data provenance:

Spark is built on top of RDDs (read-only, resilient collections of objects partitioned across multiple nodes) that hold provenance information (referred to as lineage) and can be rebuilt in case of failures by partial recomputation from ancestor RDDs.

On configuring task parallelism (for a given workload):

Spark’s default parallelism parameter (spark.def.parallelism) refers to the default number of partitions in the RDDs returned by various transformations

Flink’s default parallelism parameter (flink.def.parallelism) allows to use all the available execution resources

in Flink the partitioning of data is hidden from the user

On shuffle tuning:

The default buffer size is pre-configured in both frameworks to 32 KB, but it can be increased on systems with more memory, leading to less spilling to disk and better results.

On configuration for performance tuning and fault tolerance:

Spark needs a careful parameter configuration (for parallelism, partitions etc.), which is highly dependent on the dataset, in order to obtain an optimal performance. In Flink’s case, one needs to make sure that enough memory is allocated so that its CoGroup operator that builds the solution set in memory could be successfully executed.

On operationalizing these engines:

While extensive research efforts have been dedicated to optimize the execution of MapReduce based frameworks, there has been relatively less progress on identifying, analyzing and understanding the performance issues of more recent data analytics frameworks like Spark and Flink.

On performance evaluation:

Overall, most of the previous work typically focuses on some specific low-level issues of big data frameworks that are not necessarily well correlated with the higher level design. It is precisely this gap that we aim to address in this work by linking bottlenecks observed through parameter configuration and low level resource utilization with high-level behavior in order to better understand performance.

On separating pipeline models from engines:

Identifying and understanding the impact of the different architectural components and parameter settings on the resource usage and, ultimately, on performance, could trigger a new wave of research on data analytics frameworks. Apache Beam [47] is such an example, proposing a unified framework based on the Dataflow model [48], that can be used to execute data processing pipelines on separated distributed engines like Spark and Flink.

An event id consists of three fields: ServerIP, ProcessID
and Timestamp.

On timestamp global ordering:

Note that generating the timestamp for each event id on each server locally may be adversely impacted by clock skew on the local machine. To limit the skew to S seconds, we use the TrueTime API [8] that provides clock synchronization primitives. By using GPS and atomic clocks, the primitives guarantee an upper bound on the clock uncertainty. Each server generating event ids executes a background thread that sends an RPC to one of the TrueTime servers every S seconds to synchronize the local clock.

The node local dispatcher does a form of filtering, only sending events to the "joiner" if the event has not already been processed:

Before sending events to the joiner, the dispatcher looks up each event id in IdRegistry to make sure the event has not been joined. This optimization technique significantly improved performance as observed from the measured results (Section 4).

On transactional atomicity:

when a joiner commits a click id to the IdRegistry, it also sends a globally unique token to the IdRegistry (consisting of the joiner server address, the joiner process identifier, and a timestamp) along with the click id.

]]>

“Steve created the only lifestyle brand in the tech industry,” Larry Ellison said.

Speaking of which, process is usually based on old and legacy ways of doing things. As a result, it rarely adapts to new requirements, is often complex and burdensome, and brings with it all sorts of baggage like training and documentation overhead. To an organization delivering software, process is "

Speaking of which, process is usually based on old and legacy ways of doing things. As a result, it rarely adapts to new requirements, is often complex and burdensome, and brings with it all sorts of baggage like training and documentation overhead. To an organization delivering software, process is "scar tissue" and should be removed.

“Steve Jobs once asked me for some advice about retail, but I said, ‘I am not sure at all we are in the same business.’ I don’t know if we will still use Apple products in 25 years, but I am sure we will still be drinking Dom Pérignon.

“Steve Jobs once asked me for some advice about retail, but I said, ‘I am not sure at all we are in the same business.’ I don’t know if we will still use Apple products in 25 years, but I am sure we will still be drinking Dom Pérignon.”

Technology is predicated on change; luxury, however, is predicated on heritage and connection to tradition.

But one thing I would strongly caution against is accepting the current industry rubric about “content.” That term does a lot of nefarious rhetorical work in the media industry (essentially arguing that platforms, intermediaries and infrastructures should have more cultural as well as economic importance than the culture carried and

But one thing I would strongly caution against is accepting the current industry rubric about “content.” That term does a lot of nefarious rhetorical work in the media industry (essentially arguing that platforms, intermediaries and infrastructures should have more cultural as well as economic importance than the culture carried and enacted in the medium).

Business models have changed to encourage obsolescence — a really noxious phenomenon when applied to culture and one on which our descendents will judge us, along with our general destruction of the environment.

Business models have changed to encourage obsolescence — a really noxious phenomenon when applied to culture and one on which our descendents will judge us, along with our general destruction of the environment.

Design up to now is widely conceived as directing the complex assembly and composition of elements into consumable “products” (including buildings). This process also typically bestows a consciously created veneer of aesthetic novelty onto these products, as a way of promoting their (temporary) desirability. (Though often characterized as great art,

Design up to now is widely conceived as directing the complex assembly and composition of elements into consumable “products” (including buildings). This process also typically bestows a consciously created veneer of aesthetic novelty onto these products, as a way of promoting their (temporary) desirability. (Though often characterized as great art, it is only rarely regarded as such by later generations.) This linear process continues with the rapid obsolescence and disposal of the products, and the creation of new and improved products (with fashionably new artistic veneers) to replace them.

This is a fundamentally unsustainable process.

Adaptive design is a continuous (and continuously beneficial and profitable) process of transformation, in which novel aspects are typically combined with enduring and recurrent ones. Artistic aspects have to work in service to this evolutionary pattern, and not be allowed to dictate it. Design, according to this definition, creates a transformation “from existing states to preferred ones,” as the great polymath Herbert Simon put it.

As software pioneer Ward Cunningham described it, specifying the desired behaviors always required elaborate definitions and standards, while, paradoxically, generating them often only required the identification (through a process of adaptive iteration) of a much simpler set of generative rules.

As software pioneer Ward Cunningham described it, specifying the desired behaviors always required elaborate definitions and standards, while, paradoxically, generating them often only required the identification (through a process of adaptive iteration) of a much simpler set of generative rules.

not to specify the behavior desired, but rather, create the conditions in which that behavior is most likely to be generated.

To those less kind, Bitcoin has become synonymous with everything wrong with Silicon Valley: a marriage of dubious technology and questionable economics wrapped up in a crypto-libertarian political agenda that smacks of nerds-do-it-better paternalism.

Silicon Valley has a seemingly endless capacity to mistake social and political problems for technological ones,

To those less kind, Bitcoin has become synonymous with everything wrong with Silicon Valley: a marriage of dubious technology and questionable economics wrapped up in a crypto-libertarian political agenda that smacks of nerds-do-it-better paternalism.

Silicon Valley has a seemingly endless capacity to mistake social and political problems for technological ones, and Bitcoin is just the latest example of this selective blindness.

Working in technology has an element of pioneering, and with new frontiers come those who would prefer to leave civilization behind. But in a time of growing inequality, we need technology that preserves and renews the civilization we already have. The first step in this direction is for technologists to engage with the experiences and struggles of those outside their industry and community.