Causeway: A message-oriented distributed debugger

Causeway is an open source postmortem distributed debugger for examining the behavior of distributed programs built as communicating event loops. Its message-oriented approach follows the flow of messages across process and machine boundaries.

Getting Started

The simplest way to get started is to launch Causeway from a command line shell and then open one of the examples.

$ cd e/src/esrc/scripts
$ rune causeway.e-swt

From the Welcome view select an example program from the Help menu.

Optionally, the sources and trace logs can be specified on the command line.

$ rune -Dsrc=<srcRootDir> causeway.e-swt <logs>...

Java's default memory settings are sufficient for the examples but larger programs need more stack and heap space.
Use the -Xss (stack) and -Xmx (heap) options to increase Java default memory sizes. Follow the amount with m for Mb or k for Kb.
Notice the format does not follow the name=value convention. The J option tells rune to pass the option to Java.

$ rune -J-Xss128m -J-Xmx128m causeway.e-swt

Setting Causeway's debug flag enables a Debug view. As events are selected in the viewer, the Debug view shows the corresponding trace record in the log file.

$ rune -Dcauseway_debug=true causeway.e-swt

Waterken Example (Ajax-style)

This Java program ran on the Waterken server instrumented to generate Causeway's trace log format. The program is a distributed implementation of a procedure for handling new purchase orders.

Before an order is placed, certain conditions must be met: the item is in stock and available, the customer’s account is in good standing, and the delivery options are up to date. An object residing in the “buyer” process (or vat) has remote references to objects
residing in the “product” and “accounts” processes. The buyer queries the remote objects with asynchronous message sends. This example uses Ajax-style continuation-passing: a request carries a callback argument to which a response should be sent.

The screenshot below shows the principal views from Causeway's postmortem display for this example.

Causeway Viewer

Process-order view: This view lists events in chronological order, organized by vat. For example, clicking the "buyer" tab shows all events logged by the buyer vat. The events are ordered by turn number and within each turn, anchor number.

Message-order view: This view shows the order in which events caused other events by sending messages. Message order is reflected in the outline structure: nested events were caused by the parent event. Causeway assigns each vat a color so we can see when message flow crosses vat boundaries.

Stack Explorer and Source view: These views are familiar from sequential debugging.

Individual tree items represent events and their descriptive labels depend on the information available in the trace record for the event. Causeway labels the tree items according to the following priority.

The "text" field string. This field is required for Comment records. It is optional for Sent, SentIf, and Resolved records.

If the trace record has at least one stack trace with a source and span, a single line of source code from the source file specified in the top stack entry.

If there's no span, source file name and function name specified in the top stack entry.

If nothing else, a Causeway comment.

Menu Commands

File >> Set Source Root... point at the root directory of the sources.

(Limitation: Cannot select a folder. Must go into the folder and select the multiple files.)

File >> Export... translates Causeway's message graph (DAG) to the GraphViz DOT format and writes the dot file to a local disk. The dot file is a human-readable text file. It specifies a graph using the DOT language. GraphViz must be downloaded and installed to see the graph visualization. The graph below was generated for the Waterken example described above.

Search >> Find Lost Messages reports sent messages that were not received. The message graph is searched for Sent or SentIf events having no corresponding Got event. Of course, the message may actually have been received, but the event was not logged.

Tools >> Set Filter Options... presents all source files seen during parsing of the trace logs. Individual files can be filtered out.
(Limitation: These settings are not persistent across launches.)

Context Menu Commands

Bookmark bookmarks the currently selected event.

Find Multiples finds the multiple causes of a joining event.

Search Stacks

(Limitation: Not all source lines ... and currently, there is no visual indication...)

The corresponding Got record matches on message. The message delivery in the product vat starts a new turn, turn 2.
Being at the top of a new turn, there is limited stack capture and getting a source span through Java reflection, is not practical.

Reporting true to teller has two log entries: a Sent and its corresponding Got.

The registration of when-blocks, logged as SentIf records, are filtered from the message-order view, as they don't contribute to the understanding of message flow. However, they do appear in process-order, as shown below.

Performance Issues in Waterken

Due to the expense of stack capture in Java, tracing in Waterken incurs roughly, an order of magnitude performance penalty. If tracing is off, there is no penalty.

Waterken guarantees message order delivery and if a connection is dropped, there's enough information to know about partial success. For example, if 2 messages (msg1, msg2) are sent from vat A to vat B, they are guaranteed to be processed in the order sent. If the connection is dropped after msg1 is successfully sent, when the connection is re-established, it is known that only msg2 must be resent.

The identifiers used to support these guarantees are also used for tracing. The advantage of these multi-purpose identifiers is there is no overhead when tracing is off (i.e., unique message identifiers, just for tracing, are not sent out over the wire.)

Note: Resending a message after a connection is re-established can result in 2 identical Sent events being logged. Causeway notices when the event records are identical and ignores the duplicate.

See Also

Our current development effort is to generalize Causeway to support asynchronous message-passing programs running on event loop-based platforms in general, not just E. Our initial focus has been on the Waterken server.