LOG_EXPR(x) is a macro that prints out x, no matter what type x is, without having to worry about format-strings (and related crashes from eg. printing a C-string the same way as an NSString). It works on Mac OS X and iOS. Here are some examples,

Pretty straightforward, really. The biggest convenience so far is having the expression printed out, so you don’t have to write out a name redundantly in the format string (eg. NSLog(@"actionURL = %@", actionURL)). But LOG_EXPR really shows it’s worth when you start using scalar or struct expressions:

LOG_EXPR(self.window.windowLevel);

self.window.windowLevel = 0.000000

LOG_EXPR(self.window.frame.size);

self.window.frame.size = {320, 480}

Yes, there are expressions that won’t work, but they’re pretty rare for me. I use LOG_EXPR every day. Several times. It’s not quite as good as having a REPL for Cocoa, but it’s handy.

How It Works

The problem is how to pick a function or format string to print x, based on the type of x. C++’s type-based dispatch would be a good fit here, but it’s verbose (a full function-definition per type) and I wanted to use pure Objective-C if possible. Fortunately, Objective-C has an @encode() compiler directive that returns a string describing any type it’s given. Unfortunately it works on types, not variables, but with C99 the typeof() compiler directive lets us get the type of any variable, which we can pass to @encode(). The final bit of compiler magic is using stringification (#) to print out the literal string inside LOG_EXPR()‘s parenthesis.

The first and last lines are a way to put {}‘s around the macro to prevent unintended effects. The do{}while(0); “loop” does nothing else.

First evaluate the expression, _X_, given to LOG_EXPRonce, and store the result in a _Y_. We need to use typeof() (which had to be written __typeof__() to appease some versions of GCC) to figure out the type of _Y_.

Now we have enough information to call a function, VTPG_DDToStringFromTypeAndValue() to convert the expression’s value to a string. We pass it the _TYPE_CODE_ string, and the address of _Y_, which is a pointer, and has a known size. We can’t pass _Y_ directly, because depending on what _X_ is, it will have different types and could be of any size.

VTPG_DDToStringFromTypeAndValue() returns nil if it can’t figure out how to convert a value to a string.

The VTPG_DDToStringFromTypeAndValue() Function

It’s derived from Dave Dribin‘s function DDToStringFromTypeAndValue(), and is pretty straightforward: strcmp() the type-string, and if it matches a known type call a function, or use +[NSString stringWithFormat]:, to turn the value into a string.

The First Step Twords Fixing Your Macro Problem is Admitting it…

So yeah, maybe I went a little wild with macros here…

But it took out some WET-ness of the original code, and prevents me from accidentally mixing up types in a long wall of ifs, eg.

If I were cool, I’d use NSDictionarys to map from the @encode-string to an appropriate format string or function pointer. This is conceptually cleaner; less error-prone than using macros; and almost certainly faster. Unfortunately, it gets a little tricky with functions, since I need to deference value into the proper type.

One final note from my testing, I could do away with the strcmp()s, because directly comparing @encode string pointers (eg if(typeCode == @encode(NSString*)) works. I don’t know if it will always work though, so relying on it strikes me as a profoundly Bad Idea. But maybe that bad idea will give someone a good idea.

Limitations

Arrays

C arrays generally muck things up. Casting to a pointer works around this:

__func__

Because it is a static const char [], __func__ (and __FUNCTION__ or __PRETTY_FUNCTION__) need casting to char* to work with LOG_EXPR. Because logging out a function/method call is something I do frequently, I use the macro:

#define LOG_FUNCTION() NSLog(@"%s", __func__)

long double (Leopard and older)

On older systems, LOG_EXPR won’t work with a long double value, because @encode(long double) gives the same result as @encode(double). This is a known issue with the runtime. The top-level LOG_EXPR macro could detect a long double with if((sizeof(_X_) == sizeof(long double)) && (_TYPE_CODE_ == @encode(double))). But I doubt this will ever be necessary.

I haven’t actually written any code that uses long double, because I use NSDecimal, or another base-10 number format, for situations that require more precision than a double.

Scaling and Frameworks

Growing LOG_EXPR to handle every type is a lot of work. I’ve only added types that I’ve actually needed to print. This has kept the code manageable, and seems to be working so far.

The biggest problem I have is how to deal with types that are in frameworks that not every project includes. Projects that use CoreLocation.framework need to be able to use LOG_EXPR to print out CoreLocation specific structs, like CLLocationCoordinate2D. But projects that don’t use CoreLocation.framework don’t have a definition of the CLLocationCoordinate2D type, so code to convert it to a string won’t compile. There are two ways I’ve tried to solve the problem

Comment-out framework-specific code

This is pretty self-explanatory, I’ll fork VTPG_Common.m and un-comment-out code for types that my project needs to print. It works, but it’s drudgery. Programmers hate that.

Hardcode type info

The idea is to hard-code the string that @encode(SomeType) would evaluate to, and then (since we know how SomeType is laid out in memory) use casting and pointer-arithmetic to get at the fields.

For example:

//This is a hack to print out CLLocationCoordinate2D, without needing to #import <CoreLocation/CoreLocation.h>
//A CLLocationCoordinate2D is a struct made up of 2 doubles.
//We detect it by hard-coding the result of @encode(CLLocationCoordinate2D).
//We get at the fields by treating it like an array of doubles, which it is identical to in memory.
if(strcmp(typeCode, "{?=dd}")==0)//@encode(CLLocationCoordinate2D)
return [NSString stringWithFormat:@"{latitude=%g,longitude=%g}",((double*)value)[0],((double*)value)[1]];

This Just Works in a project that includes CoreLocation, and doesn’t mess up projects that don’t. Unfortunately it’s horribly brittle. Any Xcode or system update could break it. It’s not a tenable fix.

Areas for Improvement

When I have time, I plan to write a general parser for @encode()-strings. This will let me print out anystruct, which mostly solves the type-defined-in-missing-framework problem, and would let LOG_EXPR Just Work with types from all kinds of POSIX/C libraries.

May 24, 2010

Another counter-intuitive finding is that scam victims often have better than average background knowledge in the area of the scam content. For example, it seems that people with experience of playing legitimate prize draws and lotteries are more likely to fall for a scam in this area than people with less knowledge and experience in this field. This also applies to those with some knowledge of investments. Such knowledge can increase rather than decrease the risk of becoming a victim.

What’s a spurious relationship?

Here’s one: People who eat ice cream are more likely to drown. Both incidence of ice cream eating and rates of drowning are related to summertime. The relationship between ice cream and drowning is spurious. That is, there is no relationship. Yet they appear related because they are both related to a third variable.

This isn’t practical, but I think it’s neat that it’s doable in C99. The implementation I present here is incomplete and for illustrative purposes only.

Background

C’s implementation of variadic functions (functions that take a variable-number of arguments) is characteristically bare-bones. Even though the compiler knows the number, and type, of all arguments passed to variadic functions; there isn’t a mechanism for the function to get this information from the compiler. Instead, programmers need to pass an extra argument, like the printfformat-string, to tell the function “these are the arguments I gave you”. This has worked for over 37 years. But it’s clunky — you have to write the same information twice, once for the compiler and again to tell the function what you told the compiler.

Inspecting Arguments in C

Argument Type

I don’t know of a way to find the type of the Nth argument to a varadic function, called with heterogeneous types. If you can figure out a way, I’d love to know. The typeof extension is often sufficient to write generic code that works when every argument has the same type. (C++ templates also solve this problem if we step outside of C-proper.)

Argument Count (The Good Stuff Starts Here)

By using variadic macros, and stringification (#), we can actually pass a function the literal string of its argument list from the source code — which it can parse to determine how many arguments it was given.

For example, say f() is a variadic function. We create a variadic wrapper macro, F() and call it like so in our source code,

x = F(a,b,c);

The preprocessor expands this to,

x = f("a,b,c",a,b,c)

Or perhaps,

x = f(count_arguments("a,b,c"),a,b,c)

where count_arguments(char *s) returns the number of arguments in the string source-code string s. (Technically s would be an argument-expression-list).

Example Code

Here’s an implementation for, iArray(), an array-builder for int values, very much like JavaScript‘s Array() constructor. Unlike the quirky JavaScriptArray(), iArray(3) returns an array containing just the element 3, [3], not an uninitilized array with 3 elements, [undefined, undefined, undefined]. Another difference: iArray(), invoked with no arguments, is invalid, and will not compile.

This macro is pretty straightforward. It’s given a variable number of arguments, represented by __VA_ARGS__ in the expansion. #__VA_ARGS__turns the code into a string so that count_arguments can analyze it. (If you were doing this for real, you should use two levels of stringification though, otherwise macros won’t be fully expanded. I choose to keep things “demo-simple” here.)

This is a dangerously naive implementation and only works correctly when iArray() is given a straightforward non-empty list of values or variables. Basically it’s the least code I could write to make a working demo.

Since iArray must have at least one argument to compile, we just count the commas in the argument-list to see how many other arguments were passed. Simple to code, but it fails for more complex expressions like f(a,g(b,c)).

Just as you'd expect, this code allocates enough memory to hold countints, and fills it with the remaining count arguments. Bad things happen if < count arguments are passed, or they are the wrong type.

I didn't even try to correctly parse any valid argument-expression-list in count_arguments. It's non trivial. I'd rather deal with choosing the correct MAX3 or MAX4 macro in a few places than maintain such a code base.

So this kind of introspection isn't really practical in C. But it's neat that it can be done, without any tinkering with the compiler or language.

September 17, 2009

A programming language is a tool for handling design complexity. That’s what all of computer science is, really — languages, libraries, type systems, garbage collectors, everything you learn about programming. They’re ways to build more and more complex designs without losing your grip.

The way you manage complexity is to be able to ignore it. A good programming tool lets you forget about some part of the problem, so that you can focus on some other part. And it ensures that when you return to the parts you forgot, you haven’t accidentally broken them.

Years ago, When I was taking to programmers about what college I wanted to attend, I had in interesting conversation about how Computer Science education is an utter failure at preparing students for real-world programming. Outside of Software Development, no technical field accepts (sometimes prefers) candidates with “N years of experience” in place of a degree. I’m not sure I know why CS education fails so badly and universally. But my current best guess is that it’s because school never exposes you to enough complexity. Projects have to end in a semester. You never have to deal with a multimillion-line program, written by hundreds of co-workers, dozens of which you need to collaborate with, at unexpected times, for surprising reasons.

Author Nicholas Carr coined the phrase digital sharecroppers to describe those of us who create content on community Web 2.0 sites….He says we are like sharecroppers after the Civil War — tilling land that we don’t own, barely eking out a living, while someone else who owns the land, benefits.

But clearly facebook gives us something of value, even if it’s fun, not money. (That said, I’ve got adds blocked though CSS).

However, I think Digital Sharecroppers is an appropriate term for contributers to restrictive open source projects. The real wealth of a programming project is the code, not the final product. And licenses like the GPL don’t let contributers share in that wealth by using it in their own proprietary projects.

The future of advertising needs to be selling – that is, enabling – relevance instead of selling scarce space, time, or eyeballs. The future needs to be about adding value – relevance – rather than selling scarcity (extracting what the market will bear). I’m not sure whether Digg’s system is a step in that direction; Batey’s right that there could be unintended consequences. But it’s worth watching. I hope Digg shares data and experience in its fascinating experiment.

The idea is to remap Caps Lock so that while holding it, J-I-L-K work like the arrow keys to move the cursor. Since we have more clear arrow keys, I don't think that's the best use for caps lock, but ideas for using it are always interesting.

It's a dangerous mistake to believe that statistical research is somehow more scientific or credible than insight-based observational research. In fact, most statistical research is less credible than qualitative studies. Design research is not like medical science: ethnography is its closest analogy in traditional fields of science. User interfaces and usability are highly contextual, and their effectiveness depends on a broad understanding of human behavior

The problem with qualatative "research" however is that it's not clear how to make it objective and verifiable.

Nielson makes a good point that 95% confidence ⇒ 1/20 results are wrong. But that's why scientists reproduce experiments, and why it’s so necessary to be able to do that.

My keyboard has 26 letters, 10 numbers, and 12 symbol keys, like ~. All but spacebar make a different symbol when I hold down shift, giving me 93 characters to use in my passwords. But the number of words that can make-up a pass-phrase is easily in the 100,000s. Estimating exactly how big is a bit tricky, but I will stick with 250,000 here (I think it’s an undercount, more on this later).

We Know How To Talk

The human brain has an amazing aptitude for language. But “passwords” aren’t really words, so they don’t tap into this ability. In fact, we often use words to try and remember the nonsense-characters of a password.

Wouldn’t it make more sense to just use the words directly, if we can remember them more easily?

Hard For Computers, Not Hard For Us

People feel that if security system A is harder for them to use then system B, then A must be harder for an attacker to bypass. But the facts don’t always match this intuition.

What authentication code do you think is harder for a bad guy to hack, the 7 character strong password “1Ea.$]/”, or the mnemonic for the first 3 characters, “One Elvis Amazon”? Certainly “1Ea.$]/” is harder for a person to remember. It feels like it should be harder to break. But a computer, not a person, is going to be doing the guessing, and all it cares about is how big the search space is. There are 937 possible 7 character passwords. Let’s say there are 250,000 possible English words (more on that figure later). Then there are 250,0003 3 word combinations — meaning an attacker would have to do 260 times more work to guess “One Elvis Amazon” than to guess “1Ea.$]/”.

With pass phrases, easier for the good guys is also harder for the bad guys.

Different conjugations (can) count as different words in pass-phrases. There’s only one entry in a dictionary for swim, but swim, swimming, swam, etc. make for distinct pass-phrases (eg. “Elvis swims fast”, “Elvis swam fast”, etc. Both phrases don’t show up in a google search by the way.) So the real number of words should be a few fold larger than a dictionary indicates.

But given how terrible we are at picking good passwords, and how good we are at remembering non-nonsense-words, I am optimistic that we can remember pass-phrases that are orders of magnitude harder to guess than the “good” passwords we can’t remember today.

Fewer Ways To Fail

We’ve all locked ourselves out of an account because of typos or caps lock. But pass-phrases can be more forgiving.

Pass-phrases are caseinsensitive. There’s no need to lock someone out over “ELvis…”.

Common typos can be auto-corrected, much as google automatically suggests words. Consider the authentication attempt “Elvis Swimmms fast”. The system could recognize that “Swimms” isn’t a word, and try the most likely correction, “Elvis Swimms fast” — if it matches, then there’s no reason to ask the user if it’s what they really meant. (Note that only one pass-phrase is checked per login attempt.) I don’t have hard data here, but given how successful google is at interpreting typos, I’d expect such a system to work very well.

Pass-phrases might be more difficult on Phones, and similarly awkward to write with devices. Writing more letters means more work. Predictive text can only do so much. Repeatedly typing 3 letters and accepting a suggestion is clearly more work then just tapping out 6 characters. Additionally, there are security concerns with a predictive text system remembering your pass-phrase, or even a small part of it.

But for computers, pass phrases look like a clear usability win.

Easily Secure Conclusion

(In case you were wondering that was a unique phrase when I wrote this.) Using pass-phrases over passwords (which are really pass-strings-of-nonsense-sybols-that-nobody-can-remember) makes a system significantly harder to crack. Pass-phrases are easier for humans to remember, and a system that uses them can be very forgiving. But as always, the devil is in the details. It’s terrifying to be an early adopter of a new security practice, even if it seems sound.

May 15, 2009

I started writing a list of ways I thought Objective-C could be improved, and I realized that many of my wishes involved more compact syntax. For example [array objectAtIndex:1] is so verbose I think it diminishes readability, compared to array[1].

I can’t quite match that brevity (can you, by using Objective-C++?), but with a one-line category, you can say, x = [array:1];.

My question is: do you find this compact “syntax” useful at all, or is it added complexity with no substantial code compression? Personally I think the latter, but the number of wishes I had involving more concise Objective-C syntax makes me wonder…

I have latched onto an idea, but don’t have the resources to follow up on it: could a static-analysis tool identify repeated patterns of code, across manycodebases, that should be extracted out as subroutines and higher-level functions? How universal would these “emergent libraries” be?

Most programmers would never bother to create such a small function. But several words of code are saved whenever one occurrence of the dup2()/close() pattern is replaced with one call to fd_move(); replacing a dozen occurrences saves considerably more code than were spent writing the function itself. (The function is also a natural target for tests.) The same benefit scales to larger systems and to a huge variety of functions; fd_move() is just one example. In many cases an automated scan for common operation sequences can suggest helpful new functions, but even without automation I frequently find myself thinking “Haven’t I seen this before?” and extracting a new function out of existing code.

What’s particularly fascinating to me are the new operations we might find.

Before I was exposed to the Haskell prelude I hadn’t known about the fundamentally useful foldl and foldr operations. I had written dozens of programs that used accumulation, but it’s generalization hadn’t occurred to me — and probably never would have. Static analysis can help uncover generalizations that we might have missed, or didn’t think were important, but turn out in practice to be widely used operations.

April 30, 2009

This is a collection of sources on what constitutes an acceptable delay. It’s very much a work in progress, and will be updated when I stumble into new information. I’m very interested in any insights, experience, or sources you may have.

Based on some experiments I did back at IBM, delays of 1/10th of a second are roughly when people start to notice that an editor is slow. If you can respond is less than 1/10th of a second, people don’t perceive a troublesome delay.

In A/B tests (at Amazon.com), we tried delaying the page in increments of 100 milliseconds and found that even very small delays would result in substantial and costly drops in revenue. (eg 20% drop in traffic when moving from 0.4 to 0.9 second load time for search results).

David Eagleman’s blog post Will you perceive the event that kills you? is an engaging look at how slow human perception is, compared to mechanical response time. For example, in a car crash that takes 70ms from impact until airbags begin deflating, the occupants are not aware of the collision until 150-300 milliseconds (possibly as long as 500 milliseconds) after impact.