<div class="gmail_extra">And this is the bit that concerns me the most. At scale you should only be making two assumptions: (1) everything breaks all the time (2) you will have network partitions. Checkpoint/restart is a lazy option that has no place in modern software. Yet there doesn't seem to be a priority to go beyond checkpoint restart and rethinking software architecture. I would argue that's as much or more important than figuring out manycore.</div>