Ftrace_caller can gather and save the current r2 contents, no problem;
but the point is, it needs to be restored *after* the replacement function.
I see 3 ways to accomplish this:

1st: make _every_ replacement function aware of this, and make it restore
the TOC manually just before each return statement.

2nd: provide a global hook to do the job, and use a stack frame to execute it.

3rd: have a global hook like solution 2, but let it have its own data
structure, I'd call it a "shadow stack", for the real return addresses.
See struct fgraph_cpu_data in kernel/trace/trace_functions_graph.c

Using heuristics to determine whether the call was local or global
makes me feel highly uncomfortable; one day it will break and
nobody will remember why.

Balbir, the problem with your patch is that it goes only half the way from
my solution 2 towards solution 1. When you call a helper function on return,
you need a place to store the real return address.

I'll try to demonstrate a solution 1 as well, but you'll probably won't like
that either...