Saturday, December 08, 2007

About a month ago I noticed the WiFi on my 8GB iPhone starting to get a little flakey. My iPhone would no longer connect to my home network, which is basically an AirPort Extreme closed network using WPA2 and MAC address filtering; just the standard wireless stuff. I didn't want to believe that the phone was broken because then I'd have to send it in and spend a week without a phone, or talk to a "genius" and beg for a new phone or something, so I just kept putting off digging into the problem any further. If I tried a restore and it didn't fix it, then I'd be forced to go through all the hassle that I desperately wanted to avoid.

Over my two week Thanksgiving holiday trip in Chicago and Ohio my iPhone took another turn for the worse. During our flight back to California we ended up stranded in Chicago—sitting on the plane—for 9 hours before finally taking off for the 5 hour flight. (My wife and I also had our 4 month old daughter with us during this most enjoyable time.) Then it happened. About 1 hour into our 9 hour wait (14 hours total) my iPhone battery died and I had barely been using the phone! Now I was unable to get weather updates, catch up on Family Guy, or use the free O'Hare WiFi that I could still reach from the plane. It was terrible.

When I got back I realized that something was really wrong with my iPhone. With a full charge it would only last for about 6–8 hours with no use! Now I had no choice but to take it in to the geniuses at the Apple store.

Anyway, to make a long story not too much longer, I brought my broken iPhone into the Apple store, talked to a genius and she cheerfully grabbed a brand new 8GB iPhone for me, switched out the SIM card, and sent me on my way in a matter of minutes. I was so pleasantly surprised about how fantastic and easy the iPhone support was. It was great. I should have done this a month ago! Thanks Apple.

Wednesday, October 31, 2007

I wrote an article about Exploring Leopard with DTrace for MacTech Magazine. Check it out when you get a chance. It was very fun to write because DTrace is just so totally freaking awesome. The most difficult part of writing it was limiting it to "magazine length"—I felt like I could go on for a few hundred pages.

Now that this is out, I'll probably start posting some DTrace fun on this blog as I get time.

Monday, October 29, 2007

In a previous post I talked about one way to do compile-time assertions in C and Objective-C. The example used works fine, but it has some drawbacks. Specifically, each call to COMPILE_ASSERT* needs to have a unique message string, otherwise an error is given due to the attempt to redefine a typedef.

One obvious and easy solution to this problem is to put each typedef in its own lexical scope by wrapping it in a do { ... } while (0). This would work, but then we would lose the ability to use the compile-time assertions in global scope or in header files. With regular runtime assertions this probably isn't a big deal, but having compile-time assertions in a header can be incredibly useful. For example, your code may expose some tweakable knobs by #defineing constants, but it might be important that one of the constants is always less than another. This is a perfect place to use a compile-time assertion. Having the assertion itself right in the header file will help ensure the code's correctness and can also serve as a form of documentation.

Since we want to retain the ability to use these assertions anywhere, including in headers, we need to find another solution. Well, another solution is to make sure the typedef'd identifier is unique. We could simply put this burden on the caller and tell them that their message strings must be unique within a given scope (which probably isn't that big of a burden in reality), but we can do better.

We can use the C preprocessor symbol __LINE__ to include the current line number in the typedef identifier name. That should guarantee that the identifiers are unique in most cases (there are some corner cases where this is not exactly true). The only trick here is rigging up the preprocessor macros to do what we want. Here are the macros that I came up with:

We can see that the usage of COMPILE_ASSERT worked two times in a row, with the exact same message string, and it worked in the global scope. This is just what we wanted.

The weird part is that we need 3 levels of macros, and one of them doesn't look like it actually does anything at all (the one on line 2). The macro on line 2 is needed because of the way in which the preprocessor expands macros. Macros are expanded by doing multiple passes over a given line until all the macros have been evaluated. However, once a macro is expanded the resulting tokens are not again checked for more macros until the next pass. This is explained in section 12.3 of The C Programming Language, Second Edition.

Also, when writing and debugging macros, it's very useful to use gcc -E which stops after the preprocessing stage and dumps the preprocessed file to standard output.

Sunday, October 28, 2007

I am certainly no UI guru—my favorite UI is a Unix shell. But I think Comcast's DVR user interface is perhaps the worst UI ever; at least in the top (or would it be bottom?) 10. It seems like it's always doing exactly what I don't want. Nothing is intuitive. It gets in my way. It irritates me ever single day.

Here is an example of one of the Comcast DVR's idiotic UIs. I was recording The Bourne Supremacy when I decided to quickly change channels to see what was on the news. Instead of just changing the channel, it wasted my time with this idiotic question asking whether I wanted to stop the recording and change the channel or just change the channel. How dumb!? Just change the friggin' channel—that's what I told it to do in the first place. At least this time the destructive action wasn't the default.

Thursday, October 11, 2007

I was screwing around this morning and I needed some random words to test something with. The words needed to be real words, not just random sequences of characters (btw, you can generate a random sequence of 8 characters from the shell using jot -r -c 8 a z | rs -g 0 8). In this case, I decided to simply grab a random word from /usr/share/dict/words.

Hmm, but how do I grab a random word from a file? My solution was to generate a random number in the range [1..n] where n is the number of lines in the file, cat -n the file so that line numbers are printed, grep for the line matching the random number, then print out the second column. It looks like this:

Thursday, September 20, 2007

I've been pretty busy with the new baby lately and haven't had much time to post anything interesting. That's not necessarily going to change today, but some folks may find this useful anyway.

There are a few ways to do compile-time asserts that work in C and Objective-C. One way is to simply write a macro that evaluates a boolean expression that can be evaluated at compile time, then in the false case, have it do something illegal that the compiler will complain about. For example, below we just declare the macro STATIC_ASSERT, that takes a boolean expression and a message string, and if test evaluates to false, it will try create a typedef for a char array with a negative size. The compiler will complain, and there you'll have your static assert. The message string is used as part of the typedef name as a way to put useful text in the output, but that means it must not contain any spaces.

Monday, August 06, 2007

Friday, July 13, 2007

I originally wrote this as if it were a "best practices" document, but then what are best practices? Who says they're the best? And what the heck do I know and why would anyone listen to me? So, now it's just a post about software engineering practices that I, personally, value.

Don't forget to design your code

It is extremely important to do some upfront design on all but the most simplistic software. It can even be fun, so it's surprising to see how often it's skipped or glossed over. One reason might be that people start prototyping something, they see that it works, and they get very excited. Then, they start feeling like they're almost done, so they begin fleshing out the prototype to bolt on the rest of the features that the software ultimately needs. Before they know it, they've got a big mess of unmaintainable software that clearly had little forethought. This can be OK for simple projects, but for anything beyond your weekend coding frenzy it's not a good idea. Prototyping is a great way to test new ideas and verify existing assumptions, but don't forget that it's just a prototype.

Writing software is not like constructing a building, in that you don't need to have rigorous blueprints with every precise measurement listed down to the last excruciating detail. But you do still need to do a lot of design work prior to writing your software. Software is built using abstract ("soft") concepts and objects, and you need to figure out what these concepts and objects are and how they interact and fit together. You need to study the problem domain in which you're working and figure out how it will map to your code. These objects typically make up the domain model of your code, and it's very important that they are well thought out and match reality as much as possible. For example, if you and your team always talk about how "a customer has an account", it should be a red flag to see that your code shows the Account class is the one that "has a" Customer object (or perhaps they're not related at all in the code). This reality-code mismatch is not uncommon and it can harbor a nasty nest of bugs. These domain objects are often used heavily throughout the code, and a poorly designed domain model can quickly infect the rest of your code.

While designing your code you should also be open to new and potentially wild ideas. Sweeping refactorings during design time have a very low cost; you typically only need to change a few figures in OmniGraffle or crumple and throw away recycle a CRC card. Also, be very pragmatic. Don't immediately dismiss an idea just because of one potential complication. Maybe that complication will never actually arise. Maybe only a small percentage of your users will even care about it. If the idea is otherwise brilliant, it may be worth a small price. Don't expect to please everyone; you can't do it. Be pragmatic. Be pragmatic. Be pragmatic.

Lastly, expect your design to change as you go, because it will; they always do. But this is OK. Once you're in the trenches writing the code, you may discover a much better way to do something. You shouldn't ignore this. Think about the tradeoffs and go with the best idea.

Refactor early and often

Most software engineers today understand that software development is an iterative process, they know what refactoring is and most say they do it. However, there is often a gray area around when and what to refactor and whether or not it's too late to refactor. One common argument against refactoring is that there's no time for it. But as J. Hank Rainwater said in his book Herding Cats: A Primer for Programmers Who Lead Programmers, "If you don't have time to do the job right, when will you have time to do it again?" Delaying refactoring means delaying reaping the benefits of the well-factored code, and working with well-factored code is much easier, and takes much less time than working with poorly factored code. Delaying refactoring could be a huge cost in development time, bugs, performance, and more. Refactoring does not imply that you'll "slip the schedule." It may take you a couple days upfront to do the refactoring (or minutes, or months; it depends) but a good refactoring can save everyone time in the long run because the code will be easier to write, read, debug, understand, extend, maintain, etc.

It's also common to ignore the need to refactor by saying "we'll push that off until R2." Sometimes it may be acceptable (although, "R2" often never comes) to delay a refactoring until the next release; my opinion is that it's not OK to push off fixing fundamental, core architectural problems. Problems in core code can have a devastating effect on the rest of the code. For example, consider a simple word processor application that has a Document class that conceptually consists of a list of DocumentPage classes. If Document doesn't have a getDocumentPages() method, then all clients of this code may have to implement their own logic to find all the DocumentPages for a given Document instance. Maybe some clients do it right; maybe some do it wrong and create bugs. The best case you can hope for here is that your code only suffers from the unwanted duplication of code. Perhaps a better alternative would be to put the logic for finding all the DocumentPages in one method inside the Document class itself (and then unit test it!).

So, when's it time to refactor? Always. Now. What should you refactor? Anything that needs it. When you see code that needs refactoring, fix it then and there if possible. If it requires a larger change, discuss the refactoring with your team and figure out when it can be done. If you find yourself cursing a certain chunk of code daily, it may just need to be refactored—do it. Be vigilant with your code. Take pride in your code. Strive to make your code clean, readable, and

... as simple as possible, but no simpler.

—Albert Einstein.

Use design patterns, but don't abuse them

The use of OO design patterns can be surprisingly controversial. Some folks say they don't know design patterns by name, but that they innately use them anyway because they're just that good at coding. Others swear by patterns; they think the GoF book is the final word on the subject, and they can even get overly patterns-happy writing a Hello World program. I think somewhere in the middle is the best place to be.

A design pattern is simply a way of documenting a solution to a problem. Each pattern has a name, collectively forming a vocabulary with which engineers can intelligently discuss these solutions. Design patterns are to object-oriented programming what algorithms are to functional programming: they are named solutions to common problems. It's much clearer to say that you sorted some numbers using heapsort, rather than explaining the whole heapsort algorithm in detail. Any worthwhile engineer would know what you're talking about. Similarly, it's clearer to say that NSAttributedString is a Decorator, or that NSNotificationCenter implements the Observer pattern, than it is to actually explain how they're implemented. The patterns vocabulary conveys a lot of information in just a few words; it's succinct.

As we learned from Spider–Man, with great power comes great responsibility. Just because we know of a pattern doesn't mean we should use it. One of the most misused patterns is the Singleton. The Singleton is probably the easiest of the classic GoF design patterns to understand, which may be why engineers who are new to design patterns abuse it. I already discussed the issues with Singletons in a previous post (The Singleton Smell), so I won't go into it all here. But in a nutshell: Singletons are effectively global variables—avoid them as such, they make for code that's tightly coupled and difficult to test, they limit the reusability of the code, they can cause threading problems, etc. When I see code with lots of Singletons I immediately try to think of ways to refactor the Singletons away. Now, before you flame me saying how much you love and need your Singletons, let me just say that they *can* actually serve a purpose. Although, really, I only know of a few classes that actually should be Singletons.

Unit test your code

Some engineers think of unit testing as an annoying administrative task much like adding a new cover sheet to your TPS report. But the reality is that unit tests are a great benefit to the engineer. They give you confidence that your code is correct. Good code must be semantically correct and do what it claims to do. Unit tests allow you to verify this by formalizing a test case in code that can—and should—be run often. If you have a method that claims to accept NULL as an argument, but you never pass it a NULL, how do you know it works when passed a NULL?

Unit tests are also useful because they allow you to act as the client of the class and can often expose a clumsy API or code that may need refactoring. Once you realize the class needs to be refactored, the unit test will give you confidence that you didn't break anything while refactoring. If you find that your code is very difficult to unit test, then it probably needs to be refactored. If you can't figure out how to use and test your own class, how can you expect anyone else to either? Classes that are difficult to unit test are often so because they don't have a well-defined role or responsibility; don't build "kitchen sink" classes that have way too much responsibility.

Unit tests are supposed to test the smallest "unit" of an application. This unit may be a command line program, a function, or the usual for an object-oriented program: a class. Don't try to write a unit test that tests your entire application from end-to-end; that's not a unit test. A class should have a clear role and responsibility and the unit test should ensure this.

Read a lot! Spend time keeping up with current technologies, methodologies, design ideas, etc. What you know today may be irrelevant tomorrow. Don't be that grumpy guy who's learned all he's going to and is content with his current state of knowledge. There are a lot of smart people in this industry; learn from them. When you learn something new and cool, share it with others. I'm sure your teammates would appreciate a new trick that will make their lives easier. And, if you're ever in need of that new trick to show off, read up on ssh port forwarding—that's always a crowd favorite.

Saturday, June 09, 2007

I'm looking forward to WWDC this year. I think all the Leopard features they've announced so far are cool (particularly DTrace), but I hope the "top secret" Leopard features are even cooler; I really want some super-slick new eye candy. Although, at this point, I'm WAY more excited for the iPhone :-)

Thursday, May 10, 2007

A long time ago I wrote a little script called snagdar.sh that simplified fetching Darwin source. It broke when Open Darwin went away and Apple started requiring your ADC login to download some sources.

I finally got around to updating the script so that it now works with the current Darwin source at http://www.opensource.apple.com/darwinsource/. However, since it now requires your ADC login name and password, you must create a ~/.snagdarpass file with this information.

Let's see how it works from scratch. We first need to get snagdar.sh. You can download it from here, or you can get it by doing the following.

Friday, April 27, 2007

The command line Google Calculator that I posted a while ago broke recently when Google changed the HTML on a page. However, the new HTML has div tag around the calculator answer, so it's now much easier to parse.

Anyway, the source and project files for the new gcalc can be downloaded from the original post. Or if you have the source yourself, the new XPath for the answer is .//div[@id='res']/table[1]/tr[1]/td[3]/font/b

Friday, March 09, 2007

Most of us are familiar with typical user accounts associated with Unix systems, such as root, nobody, and daemon. Mac OS X has an additional interesting account for a user named "unknown". Unknown has the UID number 99, which is treated specially within the kernel as well as some user-level libraries. The special properties afforded to unknown are needed to make device sharing between computers as painless as possible. Let us look at what makes unknown so special.

User unknown, or more precisely, the user with a UID of 99 (we will use "user unknown" or "user 99" interchangeably throughout this document), is treated specially in the following ways:

A file owned by UID 99, appears to be owned by whoever is viewing it (see the caveat immediately following)

Volumes mounted with the MNT_IGNORE_OWNERSHIP flag treat all files as if they were owned by UID 99

An important caveat to the first bullet above is that this special treatment does not apply to root. If root views a file owned by unknown, the file appears as it actually is—owned by user 99. Let us look at an example.

We can see here that I created the file file.txt, changed its owner and group to 99, but the file continues to show that I own it. However, if I use sudo to list the file as root, then we can see that the real owner of the file is indeed unknown. Further, we can verify the behavior when we list the file as another, non-root user.

This shows the logic used when retrieving the attributes of a vnode (basically, a vnode is an in-kernel structure that representats a file). We see that if the vnode is owned by UID 99, and the current calling process is not root, then change the vnode's UID to that of the calling process. The equivalent logic for handling a GID of 99 is not shown here. This is exactly the behavior that was demonstrated above.

The second special property of user unknown mentioned above was that volumes mounted with the MNT_IGNORE_OWNERSHIP flag cause all files to appear as if they were owned by user unknown. Additionally, new files will be created with an owner and group of unknown. In many cases, the MNT_IGNORE_OWNERSHIP flag can be controlled on a per-volume basis by checking the "Ignore ownership on this volume" checkbox in the volume's "Get Info" Finder window. However, it can also be set by specifying MNT_IGNORE_OWNERSHIP when calling mount(2).

We can determine whether or not a volume has this flag set by using the following C program.

We can see here that the mounted volume for my iPod shuffle is ignoring ownership. This means that all files on the iPod should appear to be owned by me (or whomever, depending on the rules discussed above), and files created on the iPod should be created as user 99. Let us look at an example.

This special behavior is also handled in the VFS layer of the kernel—it's actually handled about 5 lines above the vnode_getattr() snippet discussed above. The relevant code from the function is highlighted here.

We see that if the MNT_IGNORE_OWNERSHIP flag is specified, the mnt_fsowner value of the mounted file system is consulted. If that value is KAUTH_UID_NONE, then the kernel hardcodes a value of 99—user unknown. Following that, we go through the same logic as before for handling files owned by 99.

One question this brings up is, what if the mnt_fsowner value is notKAUTH_UID_NONE? In that case, the files on the volume will appear to be owned by the user specified in mnt_fsowner. In the kernel, HFS+ is the only file system that actually makes use of this feature. This fact is actually commented in several places with /* XXX 3762912 hack to support HFS filesystem 'owner' */.

Common Questions and Answers

Does this mean that all users can see files owned by user 99?

No. There is more than simply ownership involved in deciding whether or not you can view a file. For example, if the mode of a file that you own is 000, then you will not be able to read that file. Furthermore, if you are denied access to any directory in a file's path, you will be unable to read it. These are just a few of the reasons why this answer is "no".

Is user 99 only given this special treatment on volumes mounted with MNT_IGNORE_OWNERSHIP?

No. User 99 is treated the same on all volumes mounted under Mac OS X.

Why was this stuff done?

The folks at Apple would know for sure, however, I assume it was added to simplify the sharing of devices (e.g., thumb drives and iPods) among computers. If this were not done, then your real UID would be consulted when determining your access to a file. And the fact that your UID may differ on different computers could make this whole process troublesome.

Should I uncheck "Ignore ownership on this volume" on my devices?

Maybe. If the device is shared among several computers, like an iPod or a thumb drive, then you probably want to leave that box checked (see the answer to the previous question). However, if you have a 500GB external drive that you always leave attached to one machine, then unchecking that box is probably a good idea.

In the first paragraph, you mention that some user-level libraries treat user 99 specially. What are you referring to?

UPDATE: Some Carbon APIs do return incorrect information when displaying metadata about files owned by "unknown" to a root process—they show root as owning the file, when they should report it as user 99. This issue may be in the Carbon framework itself, or in the system calls used to retrieve the information (I haven't looked into it).

Monday, February 19, 2007

I stopped by Costco today to see if it was worth a membership. I was greeted by a lovely iPod display just a few feet inside the door, but something was amiss. Needless to say, I left sans membership (there were other reasons as well).

Wednesday, February 07, 2007

Hey folks. I have been working on some pretty neat stuff lately, but I haven't had much time to post. I'll try to get an interesting post out the door soon.

I'm also giving zsh a shot as my default shell. I've always liked bash because it's very powerful and can do some very cool things, but zsh seems to have just about everything bash has, and then some. One of the biggest benefits of zsh is that zsh lets you have a shell prompt on the right-hand side of the screen; really cool.

moby[jgm]> cd /var/log ~ moby[jgm]> /var/log

Most of the advanced shell tricks, such as process substitution, also work in zsh, so you likely won't miss much there. Zsh can't do the bash trick of creating sockets by referencing files in /dev/tcp/host/port, but as cool as that is, I never actually found a use for it (nc is almost always more versatile in this respect).

So far zsh seems pretty darn cool, and it's an easy switch from any bourne-style shell.

Monday, January 22, 2007

As many of you know, a little over a week ago Google announced the open sourcing of MacFUSE. FUSE allows functional file systems to be implemented in user-space programs, without the need to write any kernel code.

Amit already did all the hard work with MacFUSE; I just wanted to play with it. I thought it might be cool to stick a file system interface on Spotlight, so in my free time I came up with SpotlightFS. It is basically a MacFUSE file system that creates true smart folders, where the folders' contents are dynamically generated by querying Spotlight. This differs from Finder's version of smart folders, which are really plist files with a .savedSearch file extension. Since SpotlightFS smart folders are true folders, they can be used from anywhere—including the command line!

SpotlightFS is currently available for download from the MacFUSE Downloads page. Feel free to check it out.

Monday, January 15, 2007

Typical Unix users cringe at the thought of putting spaces in file names. Mac users, on the other hand, frequently put spaces in file names because it's natural and may read better. This means that Mac OS X Unix geeks need to make sure their shell commands (and shell scripts) work correctly when faced with spaces in file names. Below I outline a few simple ways to properly deal with this.

find(1) has a -exec option, which allows you to specify a command to be executed for each file found. The executed command may also take arguments, any of which may be the string {} which will be replaced by the path of the found file. The command and its arguments are not subject to further expansion of shell patterns, so it's safe for {} to stand in for a file with a space in the name. For example,

$ find ~/Library -name '* *' -exec ls {} \;[... output omitted...]

(Notice that our find command is looking for files with spaces in the name.)

find's -exec option is OK for some situations, but since it forks a process to run the specified command for each file found, that can be a lot of unnecessary forking around. This is where find's -print0 combined with xargs -0 comes in. The idea here is that find will print out all the matching files, but instead of separating them by new lines, it will separate the files by the NULL byte ('\0' in C—the same character that terminates all strings in C). Then xargs -0 will read in strings that are separated by NULLs and will execute the specified command with as many paths from find as are allowed on the command line. The following command will create a tar file containing all the files in my Library folder that contain spaces in their names.

Sometimes you need to do more than run one command on a filename. In this case, you'd like to use a loop to process each file. Maybe something like for file in $(find ~/Library -name '* *'); do [... body of loop ...]; done. The problem here is that the for-loop splits its input on white space (like the shell), filenames with spaces will be split up and treated as multiple files. One solution to this problem is to use a while-loop, and the read command. Something like the following should work.

This code works because the find outputs matching files one line at a time. read will read one line worth of data and assign it to the variable specified as its argument (in this case our variable is filename). Notice that within the loops body we need to quote the variable $filename when we use it.

There are other ways to deal with filenames with spaces, but these are the common techniques I find myself using the most.

Monday, January 08, 2007

I'm super tired right now. I'm trying to get some last minute work done because I won't have any time tomorrow. I'm supposed to get up at 3am so I can pick some folks up at 4:15 and head up to Macworld. If we get there by 5am and I still get stuck in an overflow room, I'm not going to be too happy. Actually, some folks (about 15) I work with are camping out there tonight to ensure they get a seat! I guess I'm too lame for that.

Thursday, January 04, 2007

I assume everyone knows about the defaults(1) command, and that it can be used from the command line to manipulate application's preferences. But what some folks may not know is that it can also be used to muck with almost any plist file. This can be very useful when tearing through plists from the command line or in a shell script.

For example, I can display TextEdit's CFBundleIdentifier with the following command.

Not only can you read the plists, but you can also write to them using the normal defaults write /path/to/file key value technique.

However, I do have one question myself about this. In the TextEdit examples above, is there a way for me to specify that I want to read just the value of, say, the NSSendTypes key, which is in a dictionary, that is in an array, which is itself the value of the NSServices key? I'd love to be able to do this, but I'm not sure of the syntax (or if it's even possible).

Oh, and also take note of this warning from the man page.

WARNING: The defaults command will be changed in an upcoming major release to only operate on preferences domains. General plist manipulation utilities will be folded into a different command-line program.

Tuesday, January 02, 2007

As you may have seen, Apple announced a beta of Dashcode today. Dashcode is an easy-to-use development environment for creating Dashboard widgets. It comes with a few cool widget templates, one of which is for creating simple widgets to display RSS feeds; so, clearly I had to spend the 10 clicks it took to make a simple widget for this blog.