Profiling .NET Core app on Linux

I’m still busy learning how to troubleshoot .NET Core apps on Linux. Now I more or less understand where to start if I need to analyze .NET Core memory usage, but that’s not the most frequent problem we usually have. Most of the time the main problem is high CPU usage, which obviously leads to the question “What this thing’s doing now??”.

On Windows I’d usually start with analyzing application logs, trace files or performance reports. If that wasn’t enough I’d download Perfview tool and profiled running executable itself. How many threads are there? What functions they execute the most? What their call stacks are? And so forth. The only downside of that tool is that I never read its documentation, so whole troubleshooting experience sometimes resembled a meditation on the monitor, worshiping the holy Google and wild guessing.

While logs and traces are obviously there on Linux as well, I was wondering, can I profile the app in the similar way I would do on Windows? The answer is “Oh, yes”. Monitor, worshiping, guessing – it’s all there.

There’re multiple tools to use out there, but the basic toolkit for profiling .NET Core app on Linux seems to be perf utility along with lttng and perfcollect. Let’s have a look at all of them.

The playground (profiling starts later)

We’ll need a .NET Core demo app, Linux host and some dev tools installed on it. With the help of Vagrant and VirtualBox, I can have all of that.

Firstly, the demo project, which basically does some pointless math in a loop until we stop it:

Sampling function calls with perf

It’s surprising how easy it is to get started with perf. Don’t get me wrong, you can spend evening reading the manuals, getting into underlying concepts and even trying to understand how to interpret the results. But in the same time there’s no complex setup and we’re just few commands away from getting something meaningful in response.

The Vagrantfile already installed perf, so here’s what we can do. Let’s start our dotnet app in background and perf it.

Starting the demo .NET Core app

Shell

1

2

3

4

5

6

7

8

9

dotnet build

#Microsoft (R) Build Engine version 15.5.179.9764 for .NET Core

#...

#Build succeeded.

# 0 Warning(s)

# 0 Error(s)

dotnet bin/Debug/netcoreapp2.0/calc.dll&

#[1] 13689

Starting the app in background returned the process ID – 13689, which is handy, because this is exactly what we need to pass to perf. Along with that we’ll kindly ask it to collect call stacks (-g) and then use its record sub command to begin sampling.

Now start perf, keep it running for a dozen of seconds or so, and then terminate it with ctrl+C:

Start perf

Shell

1

2

3

4

sudo perf record-p13689-g

#^C[ perf record: Woken up 4 times to write data ]

#failed to mmap file

#[ perf record: Captured and wrote 0.899 MB perf.data ]

And now, a drumroll – viewing the results:

Display results

Shell

1

sudo perf report

Well… it’s colorful, but not quite useful. The table basically shows what functions were executing while we were taking our samples, and what percentage of samples they were found in. “Children” column tells how many samples child functions were running, and “Self” tells, well, how long it was running its own code. Duh.

If I was looking for bottlenecks, I’d probably pay close attention to entries whose “Self” value is high enough and close to its “Children” count, and I can clearly see 4 of such kind. What I don’t see is their names!

Fortunately, we can tell CLR to emit additional debugging info, which perf will then use to convert function addresses to their names. The only thing we need to do for to happen is to set COMPlus_PerfMapEnabled environmental variable to 1 either for dotnet executable specifically, or to the whole shell session (export COMPlus..):

Restart the app and profiler

Shell

1

2

3

4

5

6

7

COMPlus_PerfMapEnabled=1dotnet bin/Debug/netcoreapp2.0/calc.dll&

#[1] 13723

sudo perf record-p13723-g

#^C[ perf record: Woken up 14 times to write data ]

#failed to mmap file

#[ perf record: Captured and wrote 3.378 MB perf.data ]

There’s one thing though. We used sudo to start perf, so most likely its perf.data result file will be owned by root. However, CLR debugging data was generated by dotnet, which runs as regular user. In normal day that would made perf report a bit paranoid, but we can force it to ignore such nuances with -f flag:

View results

Shell

1

sudo perf report-f

As you can see, some CoreCLR symbols are still missing, but all program’s methods and function names are there! By drilling down into parent-child tree I can see that the hottest functions are Aggregate and two lambdas we passed to them: one for multiplication and one for addition.

Viewing results with FlameGraph

perf‘s tabular result is useful, but we can make it even better by converting profiling data into the flame chart. That’s fairly easy to do: we’ll need to download FlameGraph project and use two perl scripts from it: stackcollapse-perf and flamegraph:

As I’m running a headless Ubuntu machine, it would be tricky to view SVG file on it. However, the VM has python2.7 installed, its 80th port is forwarded to port 8080 of host machine (see Vagrantfile), so I can serve SVG file with python web server and view the picture through the browser on host machine:

Start web server

Shell

1

sudo python-mSimpleHTTPServer80

Cool! That’s way better. The graph is fully interactive, so we can see additional details when moving the mouse over the functions, or zoom into particular call stacks.

Collecting trace events with lttng

There’s certain type of events which would be tricky to collect with perf. For instance, garbage collections, assembly loads, thrown or caught exceptions – we probably could see some traces of them in perf report, but most likely they would be buried under some other data, or happened between the samples and therefore missed completely.

On the other hand dotnet runtime can publish different kinds of tracing events which we could consume with the tool called lttng. The only things we’d need for that are lttng itself and COMPlus_EnableEventLog environment variable set to 1.

Let’s change our demo code a little bit so it throws and immediately silences the exception. The challenge then would be to find out whether or not we can track those exceptions down. We actually had the same problem in production once, when one of our counters told us that number of thrown exceptions increased almost ten times between releases, but because they were caught and ignored we didn’t knew where they actually are.

So, the changes in the code:

Throwing exception

C#

1

2

3

4

5

6

7

8

9

10

11

//...

DoSomeMath();

try

{

thrownewException("Completely useless exception");

}

catch

{

// Good luck finding it

}

//...

Build/restart:

Start the app with tracing enabled

Shell

1

2

3

4

dotnet build

# ...

COMPlus_EnableEventLog=1dotnet bin/Debug/netcoreapp2.0/calc.dll&

# [1] 13824

To start collecting events we need to create a new lttng session, enable events we’re going to listen to and actually start listening. Then, when we presumably collected enough of data, we’d stop collector and destroyed lttng session.

It sounds pretty straightforward, but where do we find out which events are available in dotnet runtime? The good news is that perfcollect script, which also can be used for tracing and profiling, lists all of them. This is just the small extract of what it actually has:

Exception events

Shell

1

2

3

4

5

6

7

8

DotNETRuntime:ExceptionThrown_V1

DotNETRuntime:ExceptionCatchStart

DotNETRuntime:ExceptionCatchStop

DotNETRuntime:ExceptionFinallyStart

DotNETRuntime:ExceptionFinallyStop

DotNETRuntime:ExceptionFilterStart

DotNETRuntime:ExceptionFilterStop

DotNETRuntime:ExceptionThrownStop

As we’re interested in exception events only, we can use the mask to subscribe to all of them:

Using lttng

Shell

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

lttng create mysession

#Session mysession created.

#Traces will be written in /home/ubuntu/lttng-traces/mysession-20171226-060602

lttng enable-event--userspace--tracepoint DotNETRuntime:Exception*

# UST event DotNETRuntime:Exception* created in channel channel0

lttng start

# Tracing started for session mysession

# Few seconds later

lttng stop

# Waiting for data availability

# Tracing stopped for session mysession

lttng destroy

# Session mysession destroyed

It would be tricky to analyze lttng binary reports, if we didn’t have babeltrace utility. I wasn’t the one who installed it, so it probably came along with lttng itself lttng create returned the path to the trace files (line 3), so without further ado:

It doesn’t tell where exactly exception fired, it also doesn’t have a stack trace, but it does tell exception type, it’s message and the method name it was caught in! More than enough to track this thing down.

Sampling and tracing with perfcollect

In dotnet‘s repository there’s a tool called perfcollect, which gathers almost the same data as perf and lttng do. In fact, perfcollect is nothing more than a bash script, which, you guessed it, calls perf and lttng. However, there might be a few reasons to use it:

It installs almost all of its dependences by itself, including perf and lttng, which works on all major flavors of Linux.

Like a PerfView on Windows, it puts everything it managed to collect into single zip archive, so it might be easier to move it between production and development machines.

It can use .NET Core’s crossgen utility, which might extract a little bit more debugging symbols from native DLLs than you’d get by simply using perf and lttng.

perfcollect has view subcommand, which under the hood uses perf or lttng to display collected data (it’s simpler to know just one command).

perfcollect‘s results file can be brought to Windows machine and analyzed with PerfView.

It’s also quite easy to use. As this is just a bash script, simply download it, make it executable and call install to bring in all dependences.

Install perfcollect

Shell

1

2

3

4

curl-OL http://aka.ms/perfcollect

chmod+xperfcollect

sudo./perfcollect install

#...

Because under the hood it still going to use perf and lttng, both of our environmental variables have to be set:

As you can see, perfcollect used perf and babeltrace‘s viewers we’re already familiar with.

Getting extra debugging symbols with crossgen

If perfcollect could find crossgen utility, the data it collected would be a little bit more readable – we’d have debugging symbols from native libraries. However, looking for correct crossgen and putting it into the right place might be tricky.

Firstly, the simplest way to get crossgen is to build our app in self-contained mode. In other words – publish it. When we do so, NuGet will automatically download crossgen and put it somewhere under ~/.nuget directory:

Secondly, we need to find the right place. perfcollect will look for crossgen in the same folder as libcoreclr.so file used by our dotnet executable. The location of that file will depend. If it’s running as self-contained application, the right libcoreclr.so will be in application publish folder.

If it’s not self-contained, like in our case, we’ll use libcoreclr.so from dotnet’s shared runtime directory. I’m using .NET Core SDK 2.1.3, which comes with runtime 2.0.4, so I’d expect the correct path would have 2.0.4 in it.

And it seems that I’m right:

Find correct libcoreclr.so

Shell

1

2

3

sudo find/-name libcoreclr.so

# ...

# /usr/share/dotnet/shared/Microsoft.NETCore.App/2.0.4/libcoreclr.so

I probably wouldn’t do that in production, but inside of disposable VM I don’t see any problems in copying crossgen into runtime folder itself:

If we start perfcollect again, it won’t complain on missing crossgen anymore. However, visually I didn’t notice any difference in the output with or without it. Maybe it would get more obvious if I copied trace result to Windows machine and analyzed PerfView, or profiled more complex example. But even like this the output looks good enough.

Conclusion

The biggest challenge for me when I thought about how to profile .NET Core on Linux was what question should I even ask Google? The gap between what I usually do and Linux profiling/tracing ecosystem was just too big. More than that, I can’t say that there’re lots of articles about this topic out there. Maybe people just stopped making bugs altogether.

However, I was lucky to find a series of articles (like this one) by Sasha Goldshtein, which provided a very good place to start. In fact, he already covered these tools, which makes me think: is it a plagiarism to write about the tools somebody else already wrote about, or not yet? But anyway, if you’re wondering what else is out there, be sure to check his articles out.