25 November, 2008

If I asked you to write a source control system, you might say: "Pff. That's easy dude! Just store a copy of the document every time make a change. When you need a particular revision, you just access that copy of the document". And although your system would work, I'd say: Think about it!

Storing a full copy of the document for every change is wasteful; revision changes are small: a line of text here and a line of text there. Why would you store a full copy of the document just for a small change?

At this point you may be tempted to say: "Well, then just store the original document, and store the changes as small delta files. We can then apply the deltas to get you any revision you want.". This, of course, is a much better solution as far as storage is concerned; however, I'd still say: Think about it!

Under normal circumstances, users access the most recent revision of a document far more often than any other revision. Furthermore, storing the original document and applying all of its changes is computationally expensive. It would then seem that having to apply all these deltas for our most common operation is a bad idea. Seeing this, how can we further improve our syste?

Well, how about we store the most recent version of the document instead of the original? This would mean we would have to store deltas to take us all the way back to the original version, but that's OK - it's not much different than what we were thinking about doing before. However, with this change, we can now perform our most common operation (return the current version of a document) in constant time. Also, the expensive operation (returning an older version of the document) now occurs on rare occasions. Better, huh?

And now that I'm out of ideas for our source control system, I'm going to go back and "Think about it!", a little bit more, because I'm sure there's still lots of room for improvement.

20 November, 2008

In my fun-as-a-rock database class I recently got an assignment to correct misspellings in a file full with city names.

Now, there's two ways to do spell checking: the Microsoft way, and the Google way. Care to guess which is the wrong way to do it? Yup, you got it: the Microsoft way sucks! Ok, maybe it didn't suck back in the 17th century when Spanish Monks were doing all the spell checking known to mankind (which I think consisted of 3 or 4 individuals that actually knew how to read, or cared about spelling for that matter).

So, if the Microsoft spell checker and the Google spell checker could talk, what would they say?

Microsoft would say: Listen buster! My dictionary contains all the correct words in the universe; either you comply or you don't. Got it?

Google would say: What do I know about spelling? I'm just trying to figure out a way to make more money from all this content I just indexed. Oh, and by the way, that word you just typed, it look awful close to this other word I see a lot in my index. Is that what you meant?

The problem with the Microsoft approach should be obvious, but it's important to point out that the Google approach is not without faults either.

The biggest problem with the Google approach is that to some extent it's a form of crowdsourcing. If your crowd can't spell, then you're toast.

Last, but not least, I'd just like to show you some pseudo code on how I implemented my spell checker:

Read all the city names in the file while keeping track of every variation we've seen and how many times we'v seen it (in a hash, dictionary, etc). Take the most popular spelling for each city, and call that the correct spelling.

To correct word X, calculate its edit distance to all the correct spellings. Chances are word X is really the "correct spelling" it mostly resembles.

Figure out what do if you've never seen X before.

And that concludes today's post. Now if I could just get Google to write grammatically correct sentences for me, I'd never have to worry about proof reading my posts ever again.

Disclaimer: I would just like it to be known that I'm in no way a MS hater; in fact, I'm somewhat of a MS fan. I'd also like it to be known that I'm not a Google fan boy; in fact, I'm a little afraid of them - they read my email, and I'm sure they're the new federal agency that's in charge of spying on citizens.

18 November, 2008

Glen Wagley, after reading my post on pair programming said to me: "I understand what you mean about cutting corners. But I don't do it anymore; it's not worth it".

That was a slap in the face. Here I am blogging about ways to be a better programmer, yet I still cut corners myself.

All of this lead me to think that the number one thing you can do to become a better programmer is to have courage and integrity: if you see code that needs to be refactored, refactor it; if you know you need to throw away some of your code, don't be hesitant and throw it away; write your unit tests first; write your documentation... you get the idea: do those things you know you should do even though you don't you don't always want to.

I guess what' I'm trying to say, if you're Mormon, CTR. If you're not Mormon, please contact your local LDS missionaries; they'll be glad to teach you what CTR means. Seriously, however, it's sad that we have sites like The Daily WTF were we laugh about the crap we, so called "pros", write. I know that the profession is new, but that should be no excuse; I don't see a site where surgeons laugh about all the times they've left scissors inside their patients (OK, I really haven't searched and there's probably one out there). I realize we all make honest mistakes, but we should draw the line somewhere.

Now that I'm done ranting, and while I'm on the topic, here's a list of other things you could do to become a better programmer (in no particular order): learn other languages & platforms other than the one you currently use; learn to write good prose (in more than one language?); learn to touch type (as Steve Yegge suggests here); read tons of technical books and even more non-technical books; learn how to market your ideas; be humble and learn from others (regardless of their title/position); use a text editor effectively...

I have a ton of these, but I'd rather hear from you now. What do you do to become a better programmer?

Update: Since writing this article, a good buddy of mine wrote a rather inspirational story on standing up for what's right. If you so desire, you can find said story here.

15 November, 2008

Yes, I'm going to try to convince you to write less code. But, this should be an easy feat; I have Shakespeare also advocating my cause:

Brevity is the soul of wit.

Shakespeare's truism is readily apparent in good code: code that expresses complex concepts in succinct statements is beautiful and worthy of admiration. For several reasons, all other things being equal, smaller code is better code.

Fewer lines of code mean less bugs and lower maintenance cost. There are, however, other less apparent reasons for which you should try to write as little as possible:

You won't fall pray to the temptation of writing code that's not immediately necessary. In other words, you'll be YAGNI compliant. :)

It is better to be thought a fool than to write code and remove all doubt. Joking aside, however, the more you write, the more likely you're to make a mistake.

You won't get locked into poor decisions. I was watching Abrams & Cwalina speak at PDC today. One of their comments really struck me: they said, and I'm paraphrasing, that refactoring and correcting design mistakes in frameworks is harder when you have more code than what's absolutely necessary. Specifically, they regretted adding a public constructor to the System.Exception base class. That's it! Just one public constructor too many! And although they wish they could change it, they simply can't. If you make a poor decision, you'll have to maintain it.

You're less likely to repeat yourself. Or, to phrase this positively, you'll be DRY compliant.

Finally, by writing less code, you'll avoid the temptation to over engineer your solutions.

Learning to be concise in code is hard; it takes effort and patience. You'll have to refactor ruthlessly and mercilessly, but it will be worth the effort. Even though you won't have much code to show off, you'll be proud of it.

13 November, 2008

Have you ever heard of tracer ammunition? Well, this post is kind of a tracer post. In a minute you'll see why.

I recently wrote a small utility to change the character encoding of very large files. I'm thinking about writing a GUI for my utility and making it freely available. Yup, that's right for free.

Except, before I go through the trouble of writing the GUI, I'd thought I'd find out if there's any interest in such tool or not. If you're here, reading this, that's enough to tell me you're interested. Now, if you really need it right now, email me and I'll be happy to send you the command line tool.

And now you know why this is a tracer post. See? I did learn something from The Pragmatic Programmer! Or from The 4 Hour Workweek. Take your pick; they're both excellent books.

06 November, 2008

I really enjoy pair programming, and so I thought I'd write about some of the benefits I've seen from pair programming.

i.Pair programming increases job satisfaction. Believe it or not, I crave the interaction with other geeks; I need the mental stimulus that comes from talking with my peers.

ii.Pair programing increases code quality and decreases bug counts. I take pride in my craft; I like to write good code. Unfortunately, under pressure I cut corners all over the place (telling myself I'll refactor later). However, when I have someone watching over my shoulder, it's a lot harder for me to write hacky code.

But having someone watch what I'm doing isn't just about the guilt trip. I also appreciate having someone immediately available to discuss ideas and to steer me away from potential problems. This alone literally saves me hours in wasted effort.

iii.Pair programming makes better programmers. I can't even begin to tell you how much I've learned from sitting next to someone while they code. Now, I must admit, probably 90% of what I've learned has little or nothing to do with programming. But it doesn't matter! Believe it or not, programming is a social activity; to put it simply, good code cannot be developed in isolation.

iv.Pair programming makes programmers "faster". I honestly feel I'm 3 times more effective when I'm pair programming than when I'm sitting by myself in my cube. This is probably because when I'm stuck, or I have a question, I have someone immediately available to help. Also, there's a bit of added pressure to be faster since every hour at the keyboard is really 2 man hours at the keyboard.

v.Now, what if I told you that all of these benefits come with a fairly low cost? Well, the good news is that they do: according to this study, the cost is only about 15% increase in development time. Not bad, huh?

So, if you're still not pair programming, go talk to your boss and start practicing now. You won't regret it!

04 November, 2008

Disclaimer: Admittedly, I got this idea from Jeff Atwood, but I think I've improved on it quite a bit.

At work I have an ASP.NET application that needs to check whether the Department of Treasury has published a new OFAC list (a list of people with whom we can't do business). If there's a new list, we need to parse it, store it, and make sure that none of our customers have popped up on the new list.

In phase 2, we plan to move this functionality into a windows service, but for now this is how I made this all happen in the background:

First, we start with the interface for our worker objects:

public interface IAsyncWorker{ //the name of the worker object string Name { get; } //return the next time the object should run DateTime AbsoluteExpirationTime { get; } //does the actual work the worker needs to do void DoWork();}

I like this version better than Jeff's because it removes all conditional statements the CacheItemRemoved() method would have had if we had not created and IAsyncWorker interface.

This has been working great in our initial tests, but we still plan to move this to an external windows service at some point.

We're not worried about running out of threads (the thread does come out of the AppPool), since our task only needs to run every 24 hours. However, you might run into issues if you need your code to execute under a different identity than the threads in the AppPool.

This is a great technique: it gives you the ability to do async tasks with very little overhead.

01 November, 2008

The LINQ debugger is great; it saves you the trouble of having to use something like LINQPad - another great product I wish I had written, and lets you see what your LINQ query will look like in SQL and what it will return.

If you're like me, you'll be shocked you hadn't heard about this VS plugin before.