Category Archives: 18165

The Semantic Logging Application Block (SLAB) documentation discusses semantic tracing here. This is great documentation, but obscures the fact that semantic tracing can be done on any platform in any language. It’s just a style of tracing.

You create a class, or a very small number of classes, dedicated to tracing (pseudo-code)

With IntelliSense at every step this is about a dozen keystrokes. The only specific knowledge you need is the name of the Logger class – and with Visual Studio IntelliSense, you’ll get that easily if you include “Logger” anywhere in the name. Just imagine how simple that is!

Benefits

Discoverability and easy access are just two of the benefits of this style of logging.

The call to the trace method states what is happening and helps to document the calling source code. This is especially true if you use named parameters.

The resulting trace output is strongly typed and easily filtered and sorted on specific named parameter values.

All access to the underlying tracing technology is isolated. Stuck with TraceSource now, but intimidated by the potential of future performance issues? Move to semantic tracing and the future change to USE EventSource and ETW will involve a single file. Any change that doesn’t involve the signature, even a technology change, is isolated to a single class.

Another implication of isolated tracing is that details can be added and removed as they are available via the underlying technology – things like thread and user.

Details about the trace are also consistent. The calling code is unconcerned about issues like level (error, warning, information), channel, and keywords. These details can be changed at any point in the application life cycle.

The result is a trace strategy that is clear and consistent.

ETW, EventSource and SLAB

Both EventSource and SLAB encourage semantic tracing, assuming you avoid creating generic trace methods like “Log” and “Warning.” Semantic logging involves giving a nice name to the method, along with strongly typed parameters.

You might want vague methods to keep you going one afternoon when you encounter something new you want to trace, but use them as a temporary measure.

Whether a particular event is a warning, information or an error is in the nature of the action, not the nature of this particular occurrence of the action – or it’s not very well defined, and it’s not semantic.

Summary

One of the most important tools in improving your trace performance is semantic tracing. Simply put, it means that all trace calls are to a common class, or small set of classes, that have methods indicating each specific thing you will trace along with strongly typed parameters. The trace method and each trace parameter has a very clear name allowing IntelliSense to guide correct calls.

There are several versions of EventSource. .NET 4.5 has the initial version, which is improved in .NET 4.5.1. You want to use the 4.5.1 version if you possibly can. The diagnostics, in particular are significantly better.

The CLR team has also done a pre-release version of EventSource on NuGet. This version has a couple of important features, including even better diagnostics and Windows Event Viewer support. That’s right, you have to use the latest and greatest NuGet version if you have events you want admins to see in EventViewer.

Switching from EventSource in .NET 4.5.1 to the NuGet version of EventSource just requires changing the references and namespaces.

The Patterns and Practices team provides SLAB (Semantic Logging Application Block) that uses EventSource extensibility to offer a more familiar experience. In particular, SLAB offers traditional listeners so you can avoid ETW complexities during development. At runtime, you probably want ETW performance, but the tools still stink and you probably don’t want to use them during development.

The P&P team thinks all this multiple version of EventSource stuff is confusing. Well, they are right, it is confusing.

Rather than offer a version of SLAB for the NuGet version of EventSource, the P&P team decided to stick with a single version of SLAB that targets the .NET 4.5.1 version of EventSource.

If you want a version of SLAB that targets the NuGet version of EventSource, download the SLAB source code and make the same changes you’d make to your own code – change the references and namespaces. Here Grigori Melnik says SLAB 1.1 will improve the test suite to better support this modification.

ETW tracing is blazingly fast. The operating system tosses the trace work like accessing files onto another thread. It does this at a very low level allowing very, very fast tracing when tracing is turned on. If tracing is turned off, the performance hit is even smaller. But why worry about tracing turned off!?!?! The goal is creating a trace system you can leave on in production!

OK, that’s a good sound bite, but while I think you should have some tracing that you leave on all the time, you might also want some high frequency event that you have on only some of the time, such as when you’re doing a particular kind of evaluation.

Here are some rough performance metrics I did a while ago – this is on my laptop with a fair amount of open applications.

int/string

Object Overload

Event Off

9ns

100 million/sec

20ns

50 million/sec

Event Watched

410ns

2 million/sec

720ns

1 million/sec

Don’t get carried away with the details of these numbers, and do your own metrics if you feel the details are important. A nanosecond is one billionth of a second. Dedicating the CPU to tracing (a loop) allows about 2 million trace calls/sec with tracing turned on, 100 million with tracing turned off. That’s really, really fast.

But, Ok, you aren’t going to dedicate 100% of your CPU to tracing.

More or less, if you allow 1% of your system resources for tracing, with it turned on, you can do about twenty thousand trace events per second (20,000!). OK, there’s latency and throughput, but I’ll argue at 1% you just don’t care. Doing a super high throughput system with 10,000 transactions/second? You can probably still afford a few trace calls per request – yes, in production (benchmark that in your specific scenario).

By the way, the specification suggests that the hit of using the fallback overload that takes parameter array of objects might have an impact of 10-20 fold. I didn’t see that. The spec explains how to add more overloads – but it’s ugly, requires unsafe code and not something I think you initially need to worry about. It’s about double, and you can add required overloads when you need tracing at half a millionth of a second, instead of a millionth of a second.

The spec also suggests that you might see a performance increase using a locally cached copy of the logger, rather than repeated hits to a static property on the log class. I have not been able to measure any performance improvement, so this unnecessarily complicates your code (unless you are using DI, which I recommend for many, many things, but not tracing).

When you turn tracing off, the impact drops by nearly two orders of magnitude.

You can’t get anywhere close to this kind of performance out of any other trace technique I know of. Certainly not from Trace.Write, TraceSource, or Log4Net.

Tracing is what you include in your code so you know what’s happening during development and more importantly in production. Supporting players like .NET and the operating system also have trace output.

ETW stands for Event Tracing for Windows and is the tracing mechanism built into the Windows operating system.

.NET provides good ETW support starting with .NET 4.5 with the EventSource class. This creates an ETW provider. You need external tools (not yet available in Visual Studio) to create and view traces.

ETW tracing with the EventSource class is fundamentally different than Trace and TraceSource. It is a different pipeline, configured with ETW tools, and can be turned on and/off without restarting your app.

ETW does the slow part of tracing on a separate thread. Trace, TraceSource, Log4Net and most other trace solutions do not do this, or not with the massive efficiency and blazingly fast performance of ETW.

Semantic tracing logically decouples from trace technology, isolates trace decisions like severity, is strongly typed (at the app), and states what is happening in the app, not details of what the trace should say.

Semantic tracing can be done with any trace technology, allowing you to switch later to ETW if you can’t do it now.

The Semantic Logging Application Block (SLAB from P&P) uses EventSource, avoids ETW tools during development, eases the transition to ETW, and has docs on using a semantic approach to tracing.

Jason asked a really, really good question on StackOverflow. I’m answering it here, because it’s a wordy answer. The good news, is the answer is also in my about-to-be-released ETW Pluralsight video on ETW (everything is in their hands, just not posted yet, hopefully next Thursday!). I’ve also got some additional blog posts coming, but today, let me just answer Jason’s question.

“it is unclear how some of the properties are best used to create an intuitive custom trace”

Jason goes on to categorize Event attributes as “intuitive” and “non-intuitive”. I’m throwing out that distinction and covering all of them. And the most important advice might be the last on Message.

Channel

ETW supports four basic channels and the potential for custom channels. EventSource does not support custom channels (if you have a user story, contact me or the team). The default channel and the only one currently supporting in-line manifests is the Debug channel.

The Channel parameter exists only in the NuGet version and only for the purpose of accessing the additional channels, primarily the admin channel to access EventViewer for messages to admins. I was one of the people that fought for this capability, but it is for a very limited set of cases. Almost all events logically write to your channel – the default channel – the Debug channel.

To write to EventViewer, you need to write to the Admin channel and install a manifest on the target computer. This is documented in the specification, in my video, and I’m sure a couple of blog posts. Anything written to the admin channel is supposed to be actionable by ETW (Windows) guidelines.

Use Operational and Analytic channels only if it is part of your app requirements or you are supporting a specific tool.

In almost all cases, ignore the Channel parameter on the Event attribute and allow trace events to go to the Debug channel.

Level

For the Admin Channel

If you are writing to the admin channel, it should be actionable. Information is rarely actionable. Use warning when you wish to tell them (not you, not a later dev, but ops) that you want them to be concerned. Perhaps that response times are nearing the tolerances of the SLA. Use error to tell them to do something. Perhaps that someone in the organization is trying to do something they aren’t allowed to do. Tell them only what they need to know. Few messages, but relatively verbose and very clear on what’s happening, probably including response suggestions. This is “Danger, danger Will Robinson” time.

For the Debug Channel

This is your time-traveling mind meld with a future developer or a future version of yourself.

I’m lucky enough to have sat down several times with Vance, Dan and Cosmin and this is one of the issues they had to almost literally beat into my head. The vast majority of the time, your application can, and probably should run with the default information turned on.

If you’re looking at an event that clearly represents a concern you have as a developer – something you want to scare a later developer because it scares you – like a serious failed assert – use warning. If someone is holding a trace file with ten thousand entries, what are the three things or the ten things you think tell them where the problem is? If they are running at the warning (not informational level) what do they really, truly need to know?

If it’s an error, use the error level.

If it’s a massively frequent, rarely interesting event, use verbose. Massively frequent is thousands of times a second.

In most cases, use the default informational level for the Level parameter of the Event attribute. Depending on team philosophy, ignore it or record it.

Keywords

If you have verbose events, they need to be turned on and off in an intelligent fashion. Groups of verbose events need keywords to allow you to do this.

Warnings and Error levels do not need keywords. They should be on, and the reader wants all of them.

The danger of missing an event so vastly outweighs the cost of collecting events that informational events should be turned on without concern for keywords. If keywords aren’t going to be used to filter collection, their only value is filtering the trace output. There are so many other ways to filter the trace, keywords are not that helpful.

In most cases, use the Keywords parameter of the Event attribute only for verbose events and use them to group verbose events that are likely to be needed together. Use Keywords to describe the anticipated debugging task where possible. Events can include several Keywords.

Task

On the roller coaster of life, we just entered one of the scary tunnels – the murky world of ETW trace event naming. As far as ETW is concerned, your event is identified with a numeric ID. Period.

Consumers of your trace events have a manifest – either because it’s in-line (default for Debug channel, supported by PerfView and gradually being supported by WPR/WPA) or installed on the computer where the trace is consumed. The manifest does not contain an event name that is used by consumers.

Consumers, by convention, make a name from your Task and Opcode.

EventSource exists to hide the weirdness (and elegance) of ETW. So it takes the name of your method and turns it into a task. Unless you specify a task. Then it uses your task as the task and ignores the name of your method. Got it?

In almost all cases, do not specify a Task parameter for the Event attribute, but consider the name of your method to be the Task name (see Opcode for exception).

Opcode

I wish I could stop there, but Jason points out a key problem. The Start and Stop opcodes can be very important to evaluating traces because they allow calculation of elapsed time. When you supply these opcodes, you want to supply the Task to ensure proper naming.

And please consider the humans. They see the name of the method, they think it’s the name displayed in the consumer. For goodness sakes make it so. If you specify a task and opcode, ensure that the method name is the concatenation. Please

This is messy. I’m working on some IDE generation shortcuts to simplify EventSource creation and this is a key reason. I think it will help, but it will require the next public release of Roslyn.

Almost never use an Opcode parameter other than Start/Stop.

When using Start/Stop Opcodes, also supply a Task and ensure the name of the method is the Task concatenated with the Opcode for the sake of the humans.

Version

The version parameter of the Event attribute is available for you and consumers to communicate about whether the right version of the manifest is available. Versioning is not ETW’s strength – events rarely changed before we devs got involved and now we have in-line manifests (to the Debug channel). You can use it, and the particular consumer you’re using might do smart things with it. And even so, the manifest is or is not correctly installed on any machines where installed manifests are used.

Overall, I see some pain down this route.

The broad rule for versioning ETW events is don’t. That is do not change them except to add additional data at the end (parameters to your method and WriteEvent call). In particular, never rearrange in a way that could give different meaning to values. If you must remove a value, force a default or marker value indicating missing. If you must otherwise alter the trace output, create a new event. And yes, that advice sucks. New events with “2” at the end suck. As much as possible, do up front planning (including confidentiality concerns) to avoid later changes to payload structure.

Initially ignore the Version parameter of the Event attribute (use default), but increment as you alter the event payload. But only add payload items at the end unless you can be positive that no installed manifests exist (and I don’t think you can).

Message

Did you notice that so far I said, rarely use any of the parameters on the Event attribute? Almost never use them.

The Message parameter, on the other hand, is your friend.

The most important aspect of EventSource is documenting what the event needs from the caller of the code. It’s the declaration of the Event method. Each item passed should be as small as possible, non-confidential, and have a blazingly clear parameter name.

The guy writing against your event sees an available log method declaration like “IncomingDataRequest(string Entity, string PrimaryKey).” Exactly how long does it take him to get that line of code in place? “IncomingRequest(string msg)” leaves the dev wondering what the message is or whether it’s even the correct method. I’ve got some stuff in my upcoming video on using generics to make it even more specific.

Not only does special attention to Event method parameters pay off by speeding the writing of code that will call the Event method (removing all decision making from the point of the call), but (most) consumers see this data as individual columns. They will lay this out in a very pretty fashion. Most consumers allow sorting and filtering by any column. Sweet!

This is what Strongly Typed Events are all about.

Parameters to your method like “msg” do not cut it. Period.

In addition to the clarity issues, strings are comparatively enormous to be sticking into event payloads. You want to be able to output boatloads of events – you don’t want big event payloads filling your disks. Performance starts sucking pretty quickly if you also use String.Format to prepare a message that might never be output.

Sometimes the meaning of the parameter is obvious from the name of the event. Often it is not. The contents of the Message parameter is included in the manifest and allows consumers to display a friendly text string that contains your literals and whatever parts of the event payload seem interesting. Sort of like String.Format() – the “Message” parameter is actually better described as a “format” parameter. Since it’s in the manifest, it should contain all the repeatable parts. Let the strongly typed data contain only what’s unique about that particular call to the trace event.

The Message parameter uses curly braces so you feel warm and fuzzy. That’s nice. But the actual string you type in the parameter is passed to the consumer, with the curly braces replaced with ETW friendly percent signs. Do not expect the richness of String.Format() to be recognized by consumers. At least not today’s consumers.

By splitting the data into strongly typed chunks and providing a separate Message parameter, the person evaluating your trace can both sort by columns and read your message. The event payload contains only data, the manifest allows your nice wordy message. Having your beer and drinking it too.

Not sold yet? If you’re writing to a channel that uses installed manifests, you can also localize the message. This can be important if you are writing to the admin channel for use in EventViewer.

Almost always use Message so consumers can provide a human friendly view of your strongly typed event payload.

Summary

There are four basic rules for EventSource usage:

Give good Event method names

Provide strongly typed payload data – consider confidentiality – and work to get payload contents right the first time (small where possible)

Use the Message parameter of the event attribute for a nice human friendly message

For every other Event attribute parameter – simplify, simplify, simplify. Use the defaults unless you are trying to do something the defaults don’t allow

He does a great job explaining each of them, but I disagree with the second optimization and wanted to add some comments on the others based on some rough metrics I ran to check performance.

To start with the summary, EventSource plays very nice with ETW and so maintains amazing, absolutely amazing, performance characteristics.

Optimization # 1: Use IsEnabled() before WriteEvent()

If all you’re doing is calling WriteEvent, there’s no preparatory work, then I haven’t seen a significant performance difference. Basically, the first thing WriteEvent does is check whether it’s enabled, and the one extra method call is awfully hard to measure.

However, if you’re doing preparatory work that has any expense, you should absolutely use this. Preparatory work most often happens in complex calls, where a public method marked with the NonEvent attribute calls a private method that is the actual Event definition. In this case, you need to check whether the class is enabled in that public method, before the preparatory work (I realize this might be obvious, but it’s important).

As Muhammad pointed out, if you have keywords or levels associated with the event, you can check them. But the overarching principle is simplicity, so I think only applies in the far more rare verbose/keyword scenario than in the more common scenario where you use the defaults except for provider name and event id.

That said, if you always use IsEnabled() as soon as you arrive in your EventSource class, you will remember to do it on the rare occasion it matters.

Optimization # 2: Convert static field fetch to local member fetch

Muhammad recommends instantiating an instance of the logger in each class that uses it, instead of accessing a static field in the logger class. I tested instantiating the instance vs. a call to the static method for the reverse reason – because some documentation suggested it was slower to instantiate rather than reuse. I could not measure a difference, so arguably this is a matter of opinion. I stated that disagree with Muhammad, not that he is wrong.

I believe there are two principles that override performance, since the perf difference is somewhere between small and non-existent.

The first is that I want to minimize the greatest possible extent code (textual) dependency between my application classes and my logger. Unfortunately, for some technical reasons EventSource does not easily support interfaces, and therefore does not easily support dependency injection (look for an upcoming blog post on that). So, you’re stuck with the dependency of a direct call, or the actual class name. Since the direct call is easier to fix later if you choose to introduce the complexity of DI with EventSource, I believe it is preferable.

I also do not want multiple copies of the logger in case a future way I am logging requires shared state. So, my preferred way to instantiate blocks creation of multiple instances:

Optimization #3: Minimize the number of EventSources in the application

I didn’t actually measure this, because I think performance here is irrelevant.

Each EventSource class maps to a single ETW provider with its name, id and manifest.

Designing your providers is one of the most critical aspects of creating your ETW system. I believe one per component and some sharing (with associated hazards), and Muhammad suggests one per type, which is far more than I would use. Regardless, there is no other aspect – not criticality level, not channel, not opcodes, not keywords – that is in the ballpark of provider names (ids) when it comes to your design.

Later in a consumer, you will almost certainly sort and filter on the provider names. You will collect data based on provider names, and the more you have the more chance that you’ll miss key events.

Create exactly the number of EventSources you need and no more. When in doubt, take the path that results in a simpler strategy.

This is the place you can change the performance characteristic of your ETW tracing – for the good or for the bad.

As Muhammad explains well, the WriteEvent method has a finite number of overloads. If you pass parameters that do not match one of these overloads, a paramarray of objects is used. Now, I didn’t anything close to the 10-20 times proposed in the spec, but it double the time, which is enough to get my attention.

There are two scenarios where you’ll wind up with the object overload. The simplest to fix is passing enums. Accept the enum into your EventSource method as the corresponding enum so that the manifest correctly manages the enum. And then cast the enum to an int in the WriteEvent call like this:

If I check the overloads for WriteEvent, there is an (int, string, int) overload so this call will not hit the object overload. But, it’s quite easy to create a collection of parameters that don’t match one of the overloads. I do not believe that it is appropriate to rearrange parameters to your public method just to match an overload – and the parameters to the WriteEvent call must match the order of parameters passed to your public method.

The fix for this isn’t quite as trivial – it’s easy, but ugly. This is explained in the EventSource specification, and section 5.6.2.2 Creating fast WriteEvent overloads explains how to create the overloads. I always track down the spec as the attachment to this post on Vance Morrison’s blog.

Summary

· None of these performance tips will be observable in your application if you use a listener other than the blazingly fast ETW.

· Based on the metrics I ran a while back, I believe the only one of these performance tips worth worrying about is avoiding the object overload, and it isn’t as bad as anticipated in the spec.

· There are other considerations, at least consistency, that are important for the other three items.

As you dig deeper into ETW, side issues like this come up. The isolation of all EventSource aspects into the single class – inherent in its design – means these complexities always affect only a single location in your application if you encounter them.