This is even more efficient in practice i think than 'make stackcheck' type of analysis, because it measures the stack footprint in practice.

That way we can prove/disprove theories whether a regression was caused by stack overflow or not.

We can also do practical profiling of worst-case stack footprint - much like the latency tracer works. (just in a different 'space' of 'latency'.)

A couple of suggestions:

- could we somehow include the per stacktrace line stack frame size information as well? This probably needs an extension to save_stack_trace() though.

- it would be nice to introduce a treshold and automation to emit a WARN_ON() exactly once during bootup if this threshold is ever exceeded. That way automated testing efforts like randconfig testing would automatically do worst-case-stack-footprint testing as well.

- please link this tracer plugin into the stack redzone check mechanism we already have. Right now our stack overflow warnings are statistical: they only happen if an irq handler happens to notice a deep stack, or if task teardown happens to see corruption of a specific area of the stack. Neither of which are particularly efficient in practice. One thing to be careful about is when to print: i'd suggest to introduce a stack overflow warning that uses only early_printk() and not the regular printk. This is a truly emergency mechanism and getting the message out in time (before we self-destruct via corrupting the task structure) is important.