Posted
by
kdawson
on Saturday November 13, 2010 @10:25AM
from the truth-of-kernel dept.

francis-giraldeau writes "Linux Tracing Toolkit (LTTng) provides high-performance kernel tracing for Linux. This is the killer app for system level debugging and performance tuning. It's now easier than ever to install, with packages released for Ubuntu Maverick. The short introduction to kernel tracing shows how to interpret a simple kernel trace and relate it to strace. I would like to ask Slashdot readers what they would expect as features for a kernel tracing analysis tool, because I'm starting my PhD on this topic and looking for ideas. Also, I wonder why LTTng is not mainline yet. Will Linus Torvalds see the light in 2011?"

Ummm, a PhD is about original contribution. Implementing a "feature" that someone else imagines is about algorithm implementation. An original contribution is about recognizing a hard problem and at least coming up with a novel solution. Ideally, the candidate also provides a thorough description of the problem in the language of her/his discipline. For the most part, if someone can describe a "feature" they'd like to see for LTtng, then they know the problem they want to solve. Defining the problem is ofte

The reason is that I would like to make my research useful for tracing users, and I think the best way to do it is to ask people what they really need. I will give credits to those how helped my, why not?;-)

What is the goal of your work? Do you want to compare kernel tracing solutions and identify critical features in the process of coming up with a reasonable taxonomy? Do you want to implement something? Do you have a specific application for kernel tracing (e.g. informing performance tuning measures in enterprise environments which would probably be of interest to businesses)? Just throwing together a list of desired features is not going to be of interest to anyone, I guess. You have to come up with a motivation for each of the features, argue why this feature is necessary for the application at hand or for any application of kernel tracing in general, cite literature that gives evidence for your assumptions and conclusions. Maybe if you told the people what kind of work you're interested in and what the interest of your advisor(s) is, in which reasearch context (department, university) you are working, they could make sensible suggestions as to which features might be interesting to you.

Kernel tracing instrumentation is ready, now we need decent analysis tools. The problem is that there is so much data, that it's hard to interpret them. For the project, I have to come up with something that is new and better that what is already known.
For example, we could get a better analysis than bootchart, or auto detect bottlenecks in a system (disk, CPU, memory, network, etc...). There are some work done to integrate userspace and kernel space tracing, virtual machine and host traces, dynamic and static trace points. For a distro, they could record a trace in background and send this information allong with the core dump when an application crash occur. That's all ideas!

Here is another idea for you. How about hardware assisted "dynamic" (aka dynamically hooked) tracepoints via a custom Xen-like bare metal hypervisor? The OS and therefore its contained malware would know nothing of the inspection process, and best of all it could be OS independent if done at the hardware level. The control/diagnostics software could be running in a VM right next to the OS under test. Boot the hypervisor from CD and then load the original machines OS. Stealth rootkits would be a thing of the

You're right, we can analyse abnormal situation with tracing. For example, if you have a trace of a system with correct behavior and one with a malware, it could be possible to do a "trace diff" and see what's different. As you may expect, this is not trivial diff!

I'm finishing up a PhD in scalability & performance analysis, and have done a lot of work in instrumentation. A userland instrumentation tool is part of my final research. Instrumentation is in a terrible, terrible state -- save a few points of light -- and I'm happy to see someone else in this area!!

So, as you're starting out, some tips:

1) If you haven't already done so, investigate dtrace. While available on Mac OS & FreeBSD, it

I really liked solaris, I have an Ultra 5 as upgraded as it can be running an old version of solaris 10. Oracle has since decided I don't even deserve to be able to download bios updates, nevermind the OS. Stuff like that makes me want to avoid anything that says SUN on it.

DTrace is Open Source, Free Software (FSF certified), thus, the fact that it's owned by Oracle doesn't really matter much. You don't need Solaris to use it; DTrace is fully functional in MacOS X and FreeBSD (in the latter, userland dtracing is available from 8.2).

With DTrace, you have to know what you are looking for in advance, while LTTng can trace in background in flight recording mode and record everything that is going on. Then, afterward you can have all the information you need, and this is invaluable when you have a hard to reproduce bug!

"Not that much point having a tracing tool if an inexperienced admin cannot safely use it on a live system which has a problem. "

Right. Because everyone knows the best place to develop, debug, and profile code is on a production machine, and the person doing the development should be a system administrator, preferably with minimal experience.

"Not that much point having a tracing tool if an inexperienced admin cannot safely use it on a live system which has a problem. "

Right. Because everyone knows the best place to develop, debug, and profile code is on a production machine, and the person doing the development should be a system administrator, preferably with minimal experience.

I would say many people do know that the best place to understand the performance of a system in production is in production. If the vendors support techs can give an admin commands to run and know that a typo here or there will not result in a panic then that is a very useful feature.

The profiling/tracing issue is moot. Either way the ability to have robust tracing/profiling/debugging tools that can start from the beginning or attach to a process in progress and safely report as much as possible is great for production environments.

The quickest way to characterize any problem is with low level trace information. Trying to think through all the possible differences between a test and production environment *usually* can produce results eventually, but stack traces, syscalls, and more s

"No confusion here. DTrace is useful for both profiling and tracing. More details here"

As is LTT-ng, so I guess your "point" is pointless. Furthermore, more details can easily be found on the lttng website, including a comparison to DTrace, and it doesn't do Linux, making it completely useless as a competitor to LTT-ng. Also, it was your ridiculous claim that such a tool is of no value unless it can be used by an inexperienced system administrator that I was rebutting.

OK. Now just tell me that you wouldn't chase down the problem, but you would have an inexperienced system administrator do it. Then tell me you would never use the tool in any other way, and I'll concede that the OP's point that the tool was useless was spot on.

The best place to investigate a problem that manifests itself on a production machine and cannot be easily reproduced on a development environment may be that machine - especially when doing it is safe. With DTrace, it is. With e.g. SystemTap - it's not.

Binary packages are easy to install, that's it. I don't know of other LTTng integration inside a distro. If you prefer patching your own kernel and compiling tools from git repository, you're free to do it.

So far, LTTng has been mainly integrated in embedded distros: WindRiver Linux, Montavista Linux and STLinux currently ship with LTTng. The interesting news that is particular about Ubuntu here is that, by installing the LTTng packages from PPA, it is now possible to easily deploy the LTTng kernel and userspace tracers on a desktop-oriented distribution.

Maybe I'm reading slashdot too early on a weekend morning, but I find the last statement of the summary particularly offensive. It seems like everyone who has some sort of kernel widget wants a PR campaign to get it included in the mainline. How about you finish your Ph. D. first and provide some convincing evidence as to why every single person running Linux has to have the tool? The trace tools are available as a package for anyone who wants them now. Why should the mainline be burdened with maintaining the package unless a significant number of users need it?

We are waiting for decent kernel tracing since a decade, while LTTng is readily available today. It's better than any other tools like perf, ftrace and dtrace. Microsoft Windows has the Event Tracing for Windows since 2003, and if Linux wants to be taken seriously, it has to be mainline and available without kernel patching. And, I think that users should not be experts to use that kind of tools.

Yea because Linux will never be taken seriously, give me a break. If I want to trace the performance or a particular chunk of code within linux I don't need a tool to do it I have the source code and the ability to modify it. If my boss comes to me ant tells me hey I need this to run faster than competitor xyz you had better believe I am going to make that happen with or without LLT. Sure it may make it easier to do so but if shooting for performance it would also be the first thing I disable.

Actually... While I was the maintainer, IBM's had a team of people working on LTT for a period of 3 years before pulling the plug on their involvement because they saw that all the money they were pouring in there wasn't leading to a mainlining.

Why were they interested in kernel tracing? Well... When a customer of theirs has one of his 10,000 servers misbehaving in production, they can't afford telling him to just take it offline for diagnostics. They have to find (and fix) the problem in the field. There

Linux isn't taken seriously at all! It only has close to 50% of the server market-share and a near monopoly on supercomputers. Look, when you have something workable we might talk, but until then you're just another PHD that has produced absolutely nothing of value.

We are waiting for decent kernel tracing since a decade, while LTTng is readily available today. It's better than any other tools like perf, ftrace and dtrace. Microsoft Windows has the Event Tracing for Windows since 2003, and if Linux wants to be taken seriously, it has to be mainline and available without kernel patching. And, I think that users should not be experts to use that kind of tools.

You might be Ph.D student, but apparently you are disconnected from industrial reality. Linux not being taken seriously? Are you f* kidding me? Is that going to be part of your problem statement at the start of your dissertation?

Sorry, I'm not going to bother registering - I read/. quite steadily but don't usually ever feel the need to add more than what's already said. You can google me around, though, I'm easy to find.

FWIW, I introduced LTT in 1999 and lobbied kernel developers for inclusion for 6 years before giving maintainership to someone else. LTTng is in fact a complete rewrite of LTT and I've got little do with the project these days. I had little to do with its authoring and it likely has none of my code.

Because the mainline refuses, at length, to provide a stable API for the package to target. The kernel documentation basically says "if you think you want a stable API, you actually want to get your package into the mainline kernel (and if your licensing terms won't let you do this, your problem)." To which my response is: fine, but in that case it's your responsibility to accept any and all reasonably coded modules, even if only a tiny proportion of users will want to enable them.

"see the light" - you are making the assumption of it being something that casts light; I suspect Linus Torvalds judges on presented evidence and so far apparently judges the argument hasn't been carried.

The LTTng maintainer has been working for months (years?) to get the kernel tracing into a decent shape. These days the Linux tracing support is wonderful, and not just for LTT - perf, ftrace and systemtap are awesome tools (and more powerful than LTTng in some ways). In fact perf can do all what the web page says and it seems to be more simple for my taste

Trace points in the kernel are available, and this is great, but there are many more than that. You need a good ring buffer lock less to not impact performances and all the infrastructure for this. For example, you can't do flight recording with perf and it's impact performance is greater due to less sophisticated ring buffers.

are you talking about a trace or a data analysis tool? if you plan to use LTT to get a trace and then help the user analyse it, maybe you are more into analysis than tracing. then your question could be a bit misleading. Anyway, you would probably end up trying it all out, adding some features to make it all easier to trace as you try to use the existing stuff and analyse the results and so on as you progress.
And if you are into trace data analysis (as opposed to tracing) then your domain of kernel trace

Yeah, data mining techniques may be relevant since the huge trace size that we can get. Trace reduction techniques, algorithms to index data. One of the thing that is particular to trace analysis is the temporal nature of events, that may lead to something...

There's nothing to indicate he's changed his views. Linus has otherwise remained largely silent on the issue since this comment in 2000. Normally that would indicate no change, so the onus is really on you to show any evidence that would indicate any new opinion. Yes, various debugging facilities have made it into the mainline kernel. Show me the evidence that Linus now likes debuggers, or more to the original point, that he now thinks kernel development should be easy.

He has expressed similar sentiments more recently as well (eg from 2007 on git's use of c vs c++)

C++ is a horrible language. It's made more horrible by the fact that a lot
of substandard programmers use it, to the point where it's much much
easier to generate total and utter crap with it. Quite frankly, even if
the choice of C were to do *nothing* but keep the C++ programmers out,
that in itself would be a huge reason to use C.

Yea, maybe it's just un Ubuntu problem, but on both my laptop (Inspiron 700m) and my desktop (with an Nvidia card) crash about 10% on the desktop and 40% on the laptop, when viewing videos and occasionally when listening to music. It's pretty sad, as I need this stuff for my classwork. I pulled an old celeron XP box out of the closet, tranplanted some ram into it, and everything works fine. Pretty sad.

One of the biggest selling points for DTrace is its scripting language. It is extremely powerful and you can find dtrace scripts shared by others that allow you to do very powerful system stats gathering (e.g. here [goo.gl])
How about doing something similar for LTTng - you could even do something simple like Lua hooks for LTTng