Main navigation

Getting more out of Sierra’s logs

There was a time when the logs in OS X were a mine of useful information, not that they had to be mined that hard. You could open Console, Show all, and scroll back to see what happened following the last startup, or when that app unexpectedly quitted on you. In Yosemite, that all started to get more cluttered, as more and more processes were chattering away. By El Capitan, the logs had become so congested that, if you didn’t know exactly what you were looking for, they were almost useless.

Console showing log entries for a typical restart.

At that stage, OS X was using a fairly traditional logging system: processes which felt the need to record significant events, errors, and failures did so by writing a plain text string to one of a group of log files. All Console had to do was open those on demand, and let you view the contents.

Over the years, there have been two main purposes for inspecting logs: developers (particularly Apple’s engineers) who are often trying to fix bugs in their products, and support people (including users themselves) who are often trying to work around the consequences of bugs. Their interests are unfortunately conflicting: developers want a great deal of detail about the product they are working on, while support staff and users want logs to be generally very sparse except for the events leading up to the problem which they are trying to solve.

Sierra’s new logs

Then, when announcing and detailing macOS Sierra at WWDC in June 2016, Apple explained that logs were about to change. Alarm bells should, perhaps, have started ringing when it was announced that Apple’s engineers “want as much logging on all the time as possible”. It was also decided to move all existing logging to the new system. Although this does not prevent apps from writing to the few other logs which remain in traditional text format, all conventional calls to write to regular system logs now end up in the new log system.

Sierra’s logs are not written or stored as text, but in a binary database format (.tracev3 files). This means that the only practical way to access these new logs is using Apple’s tools, either the Console app, or the log command in Terminal. Although Console has no intrinsic ability to browse logs already collected, you can (if you really want to) use log to export a section of existing logs to a .logarchive file, and read that into Console to explore it.

Too much information

With such a huge amount of information being streamed to the logs all the time, log messages are classified into five different levels:

Fault

Error

Default

Info

Debug.

This should, in theory, have made it much easier to see through the incessant chatter and view only the most important log entries. However, in practice the vast majority of log entries (of all levels of severity, it appears) are made at the Default level. Very few indeed occur at the Error or Fault level, and in any case there is no supported method for viewing only entries of Fault or Error levels, at least when it comes to examining previously collected logs.

If you know Consolation, LogLogger, or the log command, you may be aware that there are two options when viewing logs --info and --debug. These respectively include entries at the Info and Debug levels. log show (hence LogLogger and Consolation) has no option to confine entries to Error or Fault level.

You might also think that you could design a predicate filter to limit the levels used. There are two possible candidates here, eventType and messageType. However, eventType matches the type of event (logEvent, traceEvent, etc.) not the level, and messageType currently does not support Error or Fault.

Predicate filters currently supported in Console and log (and therefore available in LogLogger and Consolation) are thus unable to tell directly how important or significant log entries are. They can filter on the basis of the ‘subsystem’, which is a new feature in Sierra and little-supported even by the many components of Sierra itself, by the process or sender which originated the log event, or by basic patterns of content in the log messages themselves. But even used with quite complex predicates, they can only be very rough filters to hone in on the significant events which interest support staff or users.

sysdiagnose?

Indeed, it is telling that Apple’s log engineers stress the importance of sysdiagnose for capturing data to look at bugs and problems. You can invoke sysdiagnose using gestures:

on a Mac, press Shift+Control+Option+Command+. (full stop or period)

in iOS, press Volume Up+Volume Down+Power; there is a slight vibration on an iPhone to signify the start

on a Watch, press and hold the Digital Crown and Side Button for 1 second, which results in a slight haptic

on an Apple TV, press Play/Pause and Volume Down; older remote controls may need these to be held for 5 seconds.

You can then transfer the sysdiagnose dump using iTunes, where necessary.

I have elsewhere considered the huge amount of information generated by sysdiagnose, and the complexities of interpreting it. It is even more of a monster than the new logs are, and even more difficult for support staff or a user to diagnose a problem.

Filtering log entries

The new log database should be an ideal place to obtain carefully filtered entries. Each log entry there consists of a very precise timestamp, the thread number of the process writing the entry, the level (Fault to Debug, as above), an activity code (almost invariably 0), the process ID (as used in Activity Monitor), the name of the process writing the entry, its ‘subsystem’ where available, and the text of the entry itself.

However, Apple has already told us that it has no plans to enable access to the logs by third parties, at least until macOS 10.13 or later. So what we cannot do today using Console and log, and tools such as LogLogger and Consolation, we cannot even work around.

All my experience so far with the new logging system is that it only becomes usable and useful when you use filters, based on predicates and time periods, skilfully. This means working out roughly when the index event(s) occurred, then limiting your searches to the period around that, and applying predicates designed to bring out the entries that are most useful to you.

The simplest illustration is, of course, the commonest reason that users take to their logs, to check whether Time Machine has been making backups without error. Thankfully Time Machine is one of the few defined subsystems, and it is one which logs very little beyond its essentials. It also runs every hour or so, but not between. So it is very simple to set up an appropriate time period and predicate which will select out all Time Machine entries.

Could we use similar techniques to filter new-style logs to contain just ‘the important bits’ which we could read in Yosemite and earlier, then?

Success in doing so depends on our prior knowledge of those important log entries, and our ability to design predicates which will include the right entries, and exclude those we don’t want. Because our criteria are heavily dependent on the content of the entry, i.e. the eventMessage CONTAINS specific strings, developing a successful filter is not easy. From my experience of looking at log entries, I think that the heuristics would actually need to parse the messages in order to assess their significance. Simply filtering on text content would not work.

There are three other factors which need to be borne in mind.

First, during critical periods, there are a lot of log entries. A typical trouble-free startup into Sierra may result in well over 100,000 log entries being written. Reducing those to 100 or less is a fearsome task.

Second, several important processes which used to write to the old logs do not appear to make any entries in the new ones. My best example is launchd, the master process launcher, which makes everything apart from the kernel and itself happen. I have not seen a single entry in a new log which relates to launchd, and suspect that Apple has deliberately suppressed such entries. There is thus a lot of important information which does not seem to exist in the new logs.

Finally, log has bugs which affect its filtering. Before even considering the accuracy of its elaborate predicate system, a much simpler filter for log entries are start and end times. These have consistently failed to include all log entries within the defined time window, according to the width of that window.

I test each new update to Sierra to see whether this issue has been addressed. My test is simple: I count the entries returned from an ‘all entries’ search for a given period of a minute or so, when that minute is set exactly by start and end times, and when it is contained within a broader window of two or three minutes.

Each release of Sierra has shown that, in any given minute, the precise time window returns far fewer log entries than the count of entries for the same period set within a broader window. The disparities are not small, either: for 10.12.3, the narrow window returned 330 entries in the minute, the broader window 806 in that same minute.

If the log command is incapable of filtering accurately on the basis of time, then I fear for its ability to apply complex predicates with any accuracy, but that is far harder to test.

So logs are now useless?

No, they are just harder to make use of, and getting the best out of them is an important skill for support staff. That is why I developed first LogLogger, then Consolation. They should be much easier to use than the log command, but you’re going to have to learn how to get the best out of the logs. I have also posted some tips and suggestions here, here, and here.

Related

6Comments

It does seem that either Apple really doesn’t care, or somebody made a conscious effort to make it harder for non-Apple people to troubleshoot problems in macos. This is a logical step for a company that builds a computer with all the essential hardware soldered into the motherboard, batteries glued, keyboards that are extremely difficult to replace etc etc. From the consumer’s intended perspective, the computer is not a sum of its parts, it is a whole piece and if anything is wrong with it then the only way to have it repaired is taking it to Apple or Apple’s partners who are qualified to service it (in exchange to draconian fees). Now they are extending this policy into software. Of course it doesn’t work this way in the real world and eventually third parties will figure out the way around.

I don’t have much time to experiment because I don’t own a mac myself and while at work, I am always very busy. But I think that we should not write off complex predicates before trying them out.

I have tried complex predicates out: they only make matters worse!
The only way that a complex predicate built with OR terms would work would be for each of the terms which are ORred together to return a very small number of hits, i.e. to have a high success rate (and not to miss any important hits either). The limitations of using predicates include that they give no access to any sophisticated pattern-matching such as regex. So inevitably, if their hits are to include all the important hits, they also return a high proportion of misses.
Let’s say one term looking for Bluetooth problems returns 10 hits and 20 misses (which is very optimistic when searching the log messages). When you OR that with another similar return, you end up with 20 hits and 40 misses. And so it goes on – every OR only increases the total of misses, until you end up with, say, 100 hits, buried in maybe 500-1000 misses.
You cannot use OR to combine sets of imprecise results, because all you do is increase the number of misses to the point where you are just wading through hundreds and thousands of junk messages again.log works best in the way that I have described. It may be possible to write a much better way of searching the log database to minimise the junk returns. But Apple currently does not let us – log is the only tool available.
The summary is that Console and log are not written for support or users, but for developers – particularly Apple engineers – who are concerned about their own product, not more general problem-solving.
Howard.

Of course, OR terms won’t weed out all the junk. But then this is not the intention of OR terms. They are there to define the scope of interest. To deal with the junk will take multiple (many) NOT terms. Let’s imagine that someone has come up with a list of 20 OR terms to limit the scope and a list of 100 NOT terms that are enough to filter out the junk. Do you think this will work?

Nope. As I wrote, the only way to produce really effective filters on the current log entries is actually to parse the contents of the entries. Even obvious clues like ‘error’ and ‘fail’ can appear in both important and unimportant messages. There is no simple way to tell them apart. Even experienced humans struggle to do so!
What we need is much better use of different levels – that is their purpose. If we could then just filter out all the default messages, we’d be left with the important ones. But it doesn’t work like that yet, and in any case log show cannot do that either.
Howard.