On Fri, Jun 12, 2009 at 10:16:45PM -0400, Steven Rostedt wrote:> > On Sat, 13 Jun 2009, Frederic Weisbecker wrote:> > > On Wed, Jun 10, 2009 at 03:53:14PM -0400, Steven Rostedt wrote:> > > From: Steven Rostedt <srostedt@redhat.com>> > > > > > This adds the design document for the ring buffer and also> > > explains how it is designed to have lockless writes.> > > > > > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>> > > ---> > > Documentation/trace/ring-buffer-design.txt | 949 ++++++++++++++++++++++++++++> > > 1 files changed, 949 insertions(+), 0 deletions(-)> > > create mode 100644 Documentation/trace/ring-buffer-design.txt> > > > > > diff --git a/Documentation/trace/ring-buffer-design.txt b/Documentation/trace/ring-buffer-design.txt> > > new file mode 100644> > > index 0000000..cca290b> > > --- /dev/null> > > +++ b/Documentation/trace/ring-buffer-design.txt> > > @@ -0,0 +1,949 @@> > > + Lockless Ring Buffer Design> > > + ===========================> > > +> > > +Copyright 2009 Red Hat Inc.> > > + Author: Steven Rostedt <srostedt@redhat.com>> > > + License: The GNU Free Documentation License, Version 1.2> > > + (dual licensed under the GPL v2)> > > +> > > +Written for: 2.6.31> > > +> > > +Terminology used in this Document> > > +---------------------------------> > > +> > > +tail - where new writes happen in the ring buffer.> > > +> > > +head - where new reads happen in the ring buffer.> > > +> > > +producer - the task that writes into the ring buffer (same as writer)> > > +> > > +writer - same as producer> > > +> > > +consumer - the task that reads from the buffer (same as reader)> > > +> > > +reader - same as consumer.> > > +> > > +reader_page - A page outside the ring buffer used solely (for the most part)> > > + by the reader.> > > +> > > +head_page - a pointer to the page that the reader will use next> > > +> > > +tail_page - a pointer to the page that will be written to next> > > +> > > +commit_page - a pointer to the page with the last finished non nested write.> > > +> > > +cmpxchg - hardware assisted atomic transaction that performs the following:> > > +> > > + A = B iff previous A == C> > > +> > > + R = cmpxchg(A, C, B) is saying that we replace A with B if and only if> > > + current A is equal to C, and we put the old (current) A into R> > > +> > > + R gets the previous A regardless if A is updated with B or not.> > > +> > > + To see if the update was successful a compare of R == C may be used.> > > +> > > +The Generic Ring Buffer> > > +-----------------------> > > +> > > +The ring buffer can be used in either an overwrite mode or in> > > +producer/consumer mode.> > > +> > > +Producer/consumer mode is where the producer were to fill up the> > > +buffer before the consumer could free up anything, the producer> > > +will stop writing to the buffer. This will lose most recent events.> > > +> > > +Overwrite mode is where the produce were to fill up the buffer> > > +before the consumer could free up anything, the producer will> > > +overwrite the older data. This will lose the oldest events.> > > +> > > +No two writers can write at the same time (on the same per cpu buffer),> > > +but a writer may preempt another writer, but it must finish writing> > > +before the previous writer may continue. This is very important to the> > > +algorithm. The writers act like a "stack".> > > +> > > +> > > + writer1 start> > > + <preempted> writer2 start> > > + <preempted> writer3 start> > > + writer3 finishes> > > + writer2 finishes> > > + writer1 finishes> > > +> > > +This is very much like a writer being preempted by an interrupt and> > > +the interrupt doing a write as well.> > > +> > > +Readers can happen at any time. But no two readers may run at the> > > +same time, nor can a reader preempt another reader. A reader can not preempt> > > +a writer, but it may read/consume from the buffer at the same time as> > > +a writer is writing, but the reader must be on another processor.> > > +> > > +A writer can preempt a reader, but a reader can not preempt a writer.> > > +But a reader can read the buffer at the same time (on another processor)> > > +as a writer.> > > +> > > +The ring buffer is made up of a list of pages held together by a link list.> > > +> > > +At initialization a reader page is allocated for the reader that is not> > > +part of the ring buffer.> > > +> > > +The head_page, tail_page and commit_page are all initialized to point> > > +to the same page.> > > +> > > +The reader page is initialized to have its next pointer pointing to> > > +the head page, and its previous pointer pointing to a page before> > > +the head page.> > > +> > > +The reader has its own page to use. At start up time, this page is> > > +allocated but is not attached to the list. When the reader wants> > > +to read from the buffer, if its page is empty (like it is on start up)> > > +it will swap its page with the head_page. The old reader page will> > > +become part of the ring buffer and the head_page will be removed.> > > +A new head page goes to the page after the old head page (but not> > > +the page that was swapped in).> > > > > > > > I wonder if you could reformulate this last sentence. It took me> > some time to understand it.> > Yuck, that last sentence is ugly.> > > > > > > I first understood it as:> > > > """> > A new page which comes from nowhere is> > going to become a (and not "the") head page. Moreover, it will> > be pointed by old_head_page->next...(which is actually true btw),> > but this new head page will not be the next pointer on the page> > that has just been swapped in.> > """> > > > Well, actually may be it's because my english understanding is a bit....> > No, I think I wrote that at 3am.> > How about this:> > "The page after the inserted page (old reader_page) will become the new > head page."> > ?

Perfect!

> > > > > > > > > +> > > +Once the new page is given to the reader, the reader could do what> > > +it wants with it, as long as a writer has left that page.> > > +> > > > > > +A sample of how the reader page is swapped: Note this does not> > > +show the head page in the buffer, it is for demonstrating a swap> > > +only.> > Note above.> > > > > +> > > + +------+> > > + |reader| RING BUFFER> > > + |page |> > > + +------+> > > + +---+ +---+ +---+> > > + | |-->| |-->| |> > > + | |<--| |<--| |> > > + +---+ +---+ +---+> > > + ^ | ^ |> > > + | +-------------+ |> > > + +-----------------+> > > > > > > > But may be you could also show the head page at the same time,> > that would help the readers IMO (not those on the ring buffer,> > but at least those from real life who can preempt several things..)> > I could add the H, but I just wanted to concentrate on the swap without > having too many details. But if you think the H would help, I'm fine with > it.>

You're right. It's better to only keep the page swapping picture. Theheader page is explained just after anyway.