Tag Archives: SOSP

The CS community recently discussed extending the Q&A session that occurs after each talk at a conference into a more formal written Q&A. More specifically, this was raised during the business meeting at SOSP and the proposal was to publish the results in SIGOPS OSR. The idea was this written extension to Q&A could really get to the bottom of the issues raised, and it wouldn’t let speakers avoid questions by saying, “Let’s take that offline.” There was some push back against this with arguments like “most questions are just misunderstandings” and “that will add a lot of pointless work for speakers/authors.”

In this post I’ll examine the questions asked at the end of my SOSP talk on COPS. We’ll look at a summary of each of the questions asked and my written response, and then hopefully we’ll be able to conclude if a written Q&A is a good idea or not. The full transcript of each question with comments and clarification added in square brackets is toggable with the transcript links.

Question 1

Hussam: So I’m actually a bit confused by how you achieve partition tolerance. If my operations are going to block until a server that has a dependency that I depend on responds back to me, I can talk to a datacenter perform an operation, that datacenter gets partitioned away, I talk to another datacenter but that datacenter didn’t see any of the operations that I depend on so I’m going to block.

Wyatt: Sure, so I think the question, if I can paraphrase the question, is that you have multiple datacenters, it’s possible you depend on something that’s being replicated from one datacenter, that datacenter gets partitioned away, and then your updates aren’t going to show up to a third datacenter until these updates are propagated from the now partitioned datacenter. Is that correct?

Hussam: Yeah, and I’m blocking meanwhile.

Wyatt: You’re not blocking anywhere. These operations won’t show up right away. Causal consistency doesn’t say, “I see thing right away.” It’s not strong consistency like that. You’ll still get to see consistent values, they just won’t be super up-to-date.

Hussam: I’ll take it offline.

Question Summary: The question could be interpreted two ways, so we’ll look at both.

Interpretation A: “What happens if a client is partitioned from the datacenter they are accessing?” (Note: Much of the feedback and questions after the talk were questions like A, so I think this is what Hussam meant.)

Written Answer: The clients of our system are the web servers collocated in the datacenter with the storage cluster, so they won’t be partitioned. What you are really asking about are not the direct clients of the storage system, but the human who is a client of a web browser who is a client of a web server who is a client of the storage system. Our system doesn’t provide consistency directly for those clients three levels away, but we think it’s an important and interesting problem, and we’re actively thinking about it.

Interpretation B: “What happens if a datacenter that is replicating data you depend on is partitioned?” (This is what I interpreted Hussam to mean at the time.)

Written Answer: No operations will ever block, but your new put operations won’t show up in other datacenters until their dependencies have shown up in that datacenter. So there is no blocking, but this comes at the cost of not guaranteeing your updates show up everywhere immediately.

Question 2

Maysam: Let’s put details, implementations, and your wide-area setting aside. From an abstract point of view I see lots of similarities between your model [causal+ consistency] and snapshot isolation. First, both of you might maintain multiple versions of data. Second, both of you talk about snapshots. And third, both of you try to detect and avoid write-write conflicts. I wonder about the differences.

[Note: COPS does not avoid write-write conflicts. We only allow single key put operations, so we can only have write-write conflicts between two put operations. These can happen and are then either resolved by the last-writer-wins rule or the convergent conflict handler function.]

Wyatt: Are you asking me about the difference between this [causal+ consistency] and what Jinyang just talked about, PSI, or just Snapshot Isolation in general?

Maysam: In general.

Wyatt: In general, snapshot isolation is sort of a database property, so it’s a stronger consistency then what you get [with causal+]. Snapshot isolation you can do these transaction that have reads and writes and things like that. We don’t have that in our system. What we have in our system is we’re guaranteeing you low latency. Things will always complete right away, very quickly, no matter what.

Maysam: But look at this from an abstract point of view. I want to compare causal+ from an abstract point of view to snapshot isolation.

Wyatt: This is sort of tricky. In the last talk Jinyang had this spectrum of consistency models. What she was showing you was more from the database side, where you have these transactions that involve multiple keys at the same time, and multiple updates, and multiple operations and things like that. And we’re more from the shared memory side or something like that, where all of these things involve one operation at a time. So how exactly they interact, it’s a very complex graph of how these consistency models interact. I would say Snapshot Isolation is definitely a stronger property than what we provide, but we do so with better performance characteristics.

Maysam: But you talk about write-write conflicts. Write-write conflict make sense if you have write conflicts between two transactions. You didn’t call it transactions, but I guess in the paper you call it context or something like that. You didn’t call it transactions but you call it context or something like that. You give it a different name. But still it is kind of context, but it is kind of transactions.

[Maysam is confused here, the context we describe in the paper is part of the client API for identifying different clients, it has nothing to do with transactions.]

Wyatt: So we only have read transactions. You can only read multiple values in a transaction.

Maysam: So when you talk about write-write conflicts, is it between [trailed off]

Wyatt: Write-write conflicts? We can have write-write conflicts in our system, but we have to use the last writer wins rule, or we have to use some sort of application specific function that is going to resolve these conflicts for us.

[Again, write-write conflicts are only for two puts to the same key. There are not general transactions in COPS.]

Maysam: But, to have write-write conflicts, you first need to [cut off]

Ant Rowstron (Session Chair): I think we need to take this offline and head onto the next question.

Question Summary: What are the differences between snapshot isolation and causal+ consistency?

Question Answer: Causal+ consistency deals with single key put operations and single or multi key get operations. Snapshot isolation is stronger that causal+ because it deals with general transactions that can include many different put and get operations. In addition, snapshot isolation ensures there are never conflicting transactions in the system (avoids write-write conflicts). While causal+ doesn’t have the notion of a transaction, but does allow and then resolves conflicting writes to the same key (embraces single key write-write conflicts).

Question 3

Marcos: You made a case that gets are not enough therefore you need get transactions. [Wyatt says “yes”]. The previous person was asking about other types of transactions. You could also make the argument that puts are not enough and you need put transactions. In fact, you need more general transactions. And you mentioned that you have more the perspective of a shared-memory system, but there we have transactional memory as well. And so, I’m wondering without general transactions isn’t that the same thing as trying to go to war with rocks and stones when you have machine guns available, which is what general transactions are.

Wyatt: I would agree with the first half of what you said and strongly disagree with the second half. So I think put transactions are important and it’s something that I’m thinking about. What else did you say? General transactions. My view of your work in the previous paper and this work is that they’re sort of complementary approaches. Like, we really want to have low latency, we say operations must be really really fast. In your work, you say, “We have to have these transactions. We have to avoid write-write conflicts.” I think there’s places for both of these and I think ultimately you’d have some sort of system that would join the two. And I don’t think this is like using rocks, this is like using something that you know is going to be really fast. I’m never going to have to do that slow 2PC across the wide area [unlike in walter].

Question Summary: Can you compare COPS and Walter? (Walter was the system described in the previous talk, one of whose authors asked this question.)

Question Answer: The two systems provide complementary approaches. COPS guarantees successful low latency operations at the cost of not providing general transactions. Walter guarantees conflict-free general transaction at the cost of allowing transactions to abort and (sometimes) having to do wide-area locking via two phase commit, which is directly incompatible with low latency.

Question 4

Ant Rowstrom (Session Chair): Can we keep the last two questions very short. Marc, is it a question?

Marc: A comment and a question.

Ant: Can we have the question?

Marc: The comment is, I think your causal+ property is much too strong, you can get exactly the same results with something a lot simpler. But we can take that offline. The question is, you said explicit dependency tracking is novel. It’s been around for a long time. It’s been beaten to death.

Wyatt: No, no, no. I didn’t mean to say explicit dependency tracking itself is a novel technique. Doing this is conjunction with decentralized replication is a new technique.

Marc: The question is, vector clocks were invented because explicit dependency tracking is complicated and slow. So I’m really puzzled why didn’t you just use vector clocks.

[Note: I misunderstood Marc’s question here, see the response to what he was asking in the “written answer” below. I thought he wanted to know why do we use lamport timestamps (small fixed size) to establish a causal order instead of (much larger) vector clocks that give a more precise order.]

Wyatt: So we don’t use vector clocks because we’re talking about really big systems. And when we have this really big system, like let’s say I have a thousand nodes, then I’m going to have a vector clock with a thousand entries in it. [Marc (while Wyatt is still speaking): Yeah but there are compressed versions of that.] So it’s going to be huge compared to the small amount of metadata we’re propagating around normally.

Written Answer: We use explicit dependencies because they are compatible with distributed verification, whereas vector clocks are not. They would need a centralized serialization point in each datacenter to ensure that updates from other datacenters are applied in the correct causal order.

Question 5

Ant Rowstrom(Session Chair): Okay, let’s go for the last one. Is it quick?

Question from Unidentified, un for short.

Un: Where is the metadata stored physically?

Wyatt: Metadata? It’s physically stored both on servers and in the client library.

Un: On the servers where? Like in memory, or … My question is actually, “How do you deal with corruption of metadata or failures on the side?”

Wyatt: So failures inside a datacenter. We looked at this like, this is not what our main contribution is. And we took existing techniques like chain replication, that give you this strong consistency, that give you this fault tolerance inside these datacenters. We said we’ll just build on top of that, that’s not where our contribution is. And in terms of dealing with bit flips, you’d probably want checksums in your system. I think Amazon came out with that, “we really want that, it screwed things up awhile ago.”

Un: Thanks.

Question Summary: How do you deal with different types of failures?

Written Answer: That’s not where our innovation is, so we just used existing techniques to deal with failures (currently, chain replication).

Discussion

In reviewing the questions, it seem pretty clear that almost all questions stem from confusion surrounding parts of the system that were gone over quickly or skipped in the talk. These are good questions to have immediately after a talk, other people in the audience are probably confused about the same things. However, the questions only make sense with the context from either the talk or the paper and almost all of them would be clarified by reading the paper.

So let’s break down the potential audience for the extended answers:

1) OSR readers who didn’t see or don’t remember the talk and didn’t read the paper. The questions and answer wouldn’t make any sense to these people.

2) OSR readers who saw the talk, didn’t read the paper, and remember the talk over a month later. Based on how much I remember from talks I saw a month ago, I don’t think this will be a very populous group.

3) OSR readers who read the paper. The paper should cover everything that was asked about, so the extra written answers should be unnecessary. (E.g., Section 2/Fig 1 answer question 1, Related Work answers question 3)

4) People who watched the talk on youtube. This audience is relatively large, the video of the talk has 224 view after being up for about a week. They have exactly the same context as IRL audience members, and I know they have some of the same questions. For instance, Todd Hoff, who wrote a post about COPS on his high scalability blog, also thought of question 5: why not use vector clocks? Given I misinterpreted the question at the time, it’s good to have a correct answer here!

So while the audience for written answers in OSR would be tiny, I think there is an audience for more detailed answers to questions: youtube viewers! I’m now all for written answers to questions, but I think that a blog, like this, is the appropriate venue for publishing them and not OSR!