Stuff

If you value your sanity, don't accept external files in your own programs. Parsing someone else's output is just going to give you headaches.

Okay, it's not realistic, but I can dream, can't I?

Digital cameras that produce AVI files, particularly Motion JPEG encoded ones, have been a bit of a problem for me because they sometimes produce files that are marginal or non-compliant. One common problem is that the video stream contains JPEG frames with custom Huffman tables (DHT markers), which according to Microsoft's original MJPEG spec you're not supposed to do. Instead, the Huffman tables are omitted and fixed for speed and simplicity. VirtualDub's internal MJPEG decoder was written with this in mind, so it won't decode streams that have custom Huffman tables, and so if you don't have an MJPEG codec installed you'll get decode errors. I haven't gotten around to rewriting the decoder so that it can handle custom tables, since it was sort of meant to be a fallback to begin with.

Anyway, I received a sample file today from another digital camera that has a new problem, producing broken AVI files. This time it isn't the video stream, but the RIFF structure itself: the data after the first video frame (00dc) chunk is just garbage. If you open it in VirtualDub or a standard video player, it plays fine, because the outermost part of the RIFF structure is fine and that's enough to get to the index. Usually, AVI parsers use the index whenever they can, and thus they'll read the file since the index points directly to the frames. Anything that tries walking the RIFF structure, though, will barf because it's invalid within the LIST/movi chunk that holds the frames. Dumping the file, it looks like whoever wrote the camera firmware decided to save five minutes by just seeking to the next sector boundary instead of actually writing a proper JUNK padding chunk. Sigh.

AVI, like many formats, suffers from decay due to the "works well enough" effect. In fact, just about any format in popular use will have this problem when the programs that read it don't do strict validation and the people who use those programs don't care about conformance. It's like having to deal with XML files where the angle brackets don't match because the person that wrote it figured that everyone uses regexes to parse it anyway.

A while ago, I went hunting around for the expressions needed to implement Photoshop's layer blending modes. The basic ones like Lighter Color and Screen are easy to find or discern, but some of the lighting modes are harder to do. The best reference I've found so far is from a PDF blend mode addendum on Adobe's site:

I've been thinking of putting together a new desktop machine -- to replace the ancient Socket 754 AMD64 machine that is currently serving as a door stop -- and most likely it'd be Sandy Bridge based. The nice thing about this is that I'd then be able to experiment with Advanced Vector Extensions (AVX). Currently my main machine is a laptop with a Core i7, so the highest CPU insn set I have available is SSE 4.2. Of course, when I actually looked at AVX again, I found out to my disappointment that it's floating point only like SSE was, and the AVX2 integer version won't arrive until a future chip, which pretty much torpedoed most of the ideas I had for using it.

Why not just switch to floating point?

Well, the main reason is that it would nuke the benefit of trying to use AVX in the first place, which is higher parallelism. AVX uses 256-bit vectors instead of 128-bit vectors, so it can process twice the number of elements per operation and thus get double the throughput. However, most of the data I work with is in bytes, so going to 32-bit floats means dividing throughput by four. Multiplying by two and dividing by four doesn't work in your favor. Then there are other reasons:

There generally isn't much flexibility in conversion to and from narrow integer formats. SSE, for instance, only really wants to convert floats to/from signed 32-bit integers, and anything else is slow to deal with.

Floating-point operations often have higher latency.

It takes more memory and thus more memory bandwidth.

It's harder to safely manipulate addresses in floating-point math. (Not impossible, but the ice is thinner.)

You can't use algorithms that require addition and subtraction to be commutative, like a moving average.

No free saturate on add/subtract operations.

No cheap average operation. (Hey, it's very important for some applications!)

You have to worry about NaN disease.

It's definitely not just a question of switching to vector float types. That isn't to say there aren't advantages to going FP, of course:

Vector divide and square root. (I once thought about implementing Photoshop's Soft Light blending mode in fixed point, but I took one look at the blending equation and said screw it, floats it is.)

Automatic decent rounding on every operation. Managing error and rounding is a headache in most fixed point routines.

Generally easier and more straightforward implementation of algorithms.

AVX does appear to have some niceities for integer routines, like 3-argument syntax, but truth be told, I haven't had too many problems with excess register moves lately. It's a bit of a bummer to go from "yeah, this would probably run much faster with 256-bit vectors" to "hmm, I'd have to convert this to floats and then it would probably run slower." :-/

One of the primitives I always dread when writing a 2D library is a line drawing routine. You might ask, why write your own line drawing routine, but there are lots of reasons to do so. Perhaps you need an off-screen renderer in a private buffer, or you're working with a matrix that's something other than an image, such as doing line of sight raycasting. In any case, ask anyone what algorithm to use for drawing a line, and they're likely to say "Bresenham" -- a well-known and simple line drawing algorithm.

The next thing that happens is that someone gives you coordinates outside of your grid, and your line routine happily crashes.

This means, of course, that you need to clip the lines. The cheesy way is to use a PutPixel() routine that rejects outside points. This works, and is more formally known as guard band clipping when done in moderation, but it has the downside that if someone hands you a billion pixel long line your line drawing routine might take an awfully long time to complete. What you need is a clipping algorithm that gives the section of the line within the clipping bounds without actually stepping through the whole thing.

Now, ask or search for a line clipping algorithm, and you'll invariably get an answer like Cohen-Sutherland... which unfortunately is wrong. Okay, it's not actually wrong -- it's a valid algorithm, your routine won't scribble outside of the grid and crash anymore, and it'll draw something that looks like a clipped line. Take a closer look, however, and you'll find that it doesn't clip lines quite right. Here's why: