tag:blogger.com,1999:blog-8899645800948009496.post4890754415665382215..comments2019-01-10T08:54:51.241-08:00Comments on DBMS Musings: Partitioned consensus and its impact on Spanner’s latencyDaniel Abadihttps://plus.google.com/100624108819371937525noreply@blogger.comBlogger6125tag:blogger.com,1999:blog-8899645800948009496.post-13667780436257717332018-12-14T12:08:26.081-08:002018-12-14T12:08:26.081-08:00Don&#39;t worry --- I wasn&#39;t talking about you...Don&#39;t worry --- I wasn&#39;t talking about you when I said somebody accused me of hiding latency. That was somebody else (either on Twitter or HackerNews or elsewhere --- I don&#39;t remember). But obviously our conversation on my previous post was primarily about latency, and half of the section on &quot;Latency for linearizable read-only transactions&quot; came out from that conversation. Daniel Abadihttps://www.blogger.com/profile/16753133043157018521noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-76883687759742976032018-12-14T11:49:14.468-08:002018-12-14T11:49:14.468-08:00Ha, yes, I figured you were referencing me. I do w...Ha, yes, I figured you were referencing me. I do want to point out that I never &quot;accused [you] of purposely avoiding discussing latency&quot;. That would require me to know your intent, which I&#39;d never presume to do. My point was that to get a complete and fair understanding of the tradeoffs involved, it&#39;s essential to discuss latency in geo-distributed use cases. This latest post suggests that you agree.<br /><br />Regarding comparisons of latency I&#39;ll make two points (assuming geo-distributed use cases):<br /><br />1) You mentioned that &quot;it is possible to group users into partitions such that many of their interactions will be with other users within that partition (e.g. partition by a user’s location)&quot;. I&#39;d strengthen this; not only is it possible, it is *vital* if you want users around the world to enjoy low latencies for common operations. Every serious commerce application will do this. Today that&#39;s typically done by hosting separate DBs in each region of the world (i.e. today&#39;s architectures are already partitioned). A database worthy of the &quot;global&quot; adjective should give those same latency and consistency advantages for data that can be partitioned, but *also* allow cross-region transactions for data that can&#39;t be partitioned. This is not something that unified-consensus systems can do without sacrificing either consistency or latency. But partitioned-consensus systems can do this (at the cost of accepting clock-skew risk).<br /><br />2) It&#39;s both possible and practical to use a distributed txn protocol that can ack the client after paying latency equivalent to only one round of consensus (at cost of more recovery work if coordinator fails). I&#39;m not sure what&#39;s been published on this topic, and it&#39;s challenging to discuss details in a blog comment. But I will say that the latency advantage that you&#39;re asserting is due to choice of algorithm, not due to some limitation that is theoretically impossible to address. Referring back to my example in my previous comment, I fully expect that tomorrow&#39;s partitioned-consensus systems will commit that transaction in ~100ms rather than ~200ms.AndyAndRachelhttps://www.blogger.com/profile/17684854848290797842noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-58082928999592601242018-12-14T10:05:18.508-08:002018-12-14T10:05:18.508-08:00Yes, I saw that. I just wanted to make sure I unde...Yes, I saw that. I just wanted to make sure I understood the &quot;Latency for write transactions&quot; section correctly, since it was worded in a way that confused me.AndyAndRachelhttps://www.blogger.com/profile/17684854848290797842noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-84703780628171243412018-12-14T09:45:14.379-08:002018-12-14T09:45:14.379-08:00See also my conclusion where I state what you said...See also my conclusion where I state what you said in your comment explicitly.Daniel Abadihttps://www.blogger.com/profile/16753133043157018521noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-70698771209961112962018-12-14T09:43:24.464-08:002018-12-14T09:43:24.464-08:00Hi AndyAndRachel,
Thanks for coming back. Your co...Hi AndyAndRachel,<br /><br />Thanks for coming back. Your comments on my previous post were part of the inspiration for this post.<br /><br />The post states: &quot;the preparation stage for unified consensus is approximately 10-15ms&quot;. Note the words &quot;the preparation state&quot;. The first paragraph never estimates the total cost of consensus. Daniel Abadihttps://www.blogger.com/profile/16753133043157018521noreply@blogger.comtag:blogger.com,1999:blog-8899645800948009496.post-4167576840140446412018-12-14T09:39:19.763-08:002018-12-14T09:39:19.763-08:00Can you clarify the &quot;Latency for write transa...Can you clarify the &quot;Latency for write transactions&quot; section? Paragraph 2 says that consensus is possible in 10-15ms for a unified-consensus system, while paragraph 3 says consensus requires 10-400ms for a partitioned-consensus system. That seems to suggest that there are cases where the partitioned system might take (400/15)=~25x as long. I don&#39;t think that&#39;s what you&#39;re saying, right?<br /> <br />To illustrate, and to make sure I&#39;m correctly understanding, take the following example. Say that (using Calvin terms) replica A is in a US DC and replica B is in a EU DC. Say that roundtrip latency is 100ms. A client transaction is initiated from the US DC that includes writes to multiple partitions. Calvin will take ~100ms to commit that txn and ack the client (ignoring the batch accumulation time). A partitioned consensus system with a traditional 2PC protocol would take ~200ms. Is that correct? If so, then the takeaway would be that writes to geo-distributed replicas take roughly 2x as long for a partitioned-consensus system using traditional 2PC.AndyAndRachelhttps://www.blogger.com/profile/17684854848290797842noreply@blogger.com