Context Navigation

The analysis of the semantics of +RTS -s and the list of needed new events

Here is a sample output of +RTS -s, annotated with the new events required to simulate it in ThreadScope?.
More concrete event proposals will follow after discussion.
Note that eventually we'd like to generate such a summary for any user-selected time interval of runtime
and we may need more or different events for this or we may choose to skip some information (for example
the INIT or EXIT times do not make sense for an interval).
Here is a screenshot
of what we can already do using the current set of events. It happens we can do as much for the whole runtime
as for the time intervals in this case.

237,179,528 bytes allocated in the heap

We'd need an event for each memory allocation to calculate the total allocated heap.

52,785,584 bytes copied during GC

An event with a summary of all copying done, probably after the end of each GC.

17,272,336 bytes maximum residency (5 sample(s))

Either a separate event for that, perhaps emitted only after major GC when we know how much memory
is really used by the program. The docs explain the "n samples" above saying "only checked during major garbage collections".
Or we can try to calculate it (how often to sample?) from all events. in particular memory deallocation events and GC memory freeing events.

Ask JaffaCake? if RequestParGC is enough to tell seq/par and why some StartGC
follow neither RequestParGC nor RequestSeqGC (on the same capability, at least).
Split the current GC events into generations and into seq/par
(if RequestParGC is not enough to tell seq/par). Note that we don't want
to report the CPU time, only the elapsed time, and that's fine.

Parallel GC work balance: 1.00 (6391526 / 6375794, ideal 2)

This is quite convoluted: total words copied during parallel GCs divided
by the average over all parallel GCs of maximal number of words copied
by any thread in a single par GC. Events needed: the events added above suffice,
but quite a bit of extra state will have to be maintained when reading
the events. Ask JaffaCake? if +RTS -s could be modified to make this figure simpler.

Ask JaffaCake? how the "tasks" relate to the "threads" for which we generate
events. For now, to the existing GC events we can add the info about which task
does the job, but may miss something this way. BTW, is the time between event GCIdle and GCWork counted
as GC time of the task? Generally, is GCIdle useful for us here in any way?
Similarly, can be calculate the MUT (elapsed) times just but taking the total
time a thread (or cap?) is run and subtracting the GC time and any pauses (and are all
pauses visible through events we have already?).

needs updating (not sure for which GHC version, though). Otherwise, we have enough
events for that (we calculate this using the SparkCounters? events,
but we could also use the precise per-spark events).

(Note that there may be more positions here, e.g., for profiling.)
We can sum up the GC time from GC events. We'd also like to have the MUT
figure, but it's not obvious if we can manage to get it from all the thread (task)
events that we have (or add above). It's also not clear if adding events
needed to get the other times is worth it. After any extra events
are added, let's see if we can get any more of these summary times,
perhaps by adding a minor event emitted just once. Note that the INIT time is
necessary for the Productivity figure below (INIT does not count as "productive
time" in +RTS -s)..

%GC time 89.5% (75.3% elapsed)

This line does not appear in threaded RTS, so we disregard it.

Alloc rate 564,074,971 bytes per MUT second

The events added above should suffice, except that we use elapsed time,
not CPU time.

Productivity 59.0% of total user, 76.6% of total elapsed

The events added above should be enough. Again. we only do the elapsed case.
Ask JaffaCake? if it's OK that the "% of total elapsed" in +RTS -s is actually the CPU time divided by elapsed time, not elapsed divided by elapsed.