Well, yes it's awesome, but it's clearly within the limits of what a C64 can do. 4 bit audio, low sample rate and quite possibly some of the effects precalculated - e.g. breaking the sound into frequency bands ahead of time and interleaving them on disk, then combining them to play back is a lot easier (just adds + averaging) than doing a realtime FFT.

But yes, mad props for the effort.

Some other C64 coders do "remakes" of Amiga and PC demos. Artistically, they look like the same effects these big-box demos did, but they only have a fraction of the polygons or screen-size, so it falls within the limits of what a 1MHz CPU can do (with some genius coding). The big trick is forcing the comparison, your brain will remember the old Amiga or PC demo and fill in the gaps so the C64 doesn't have to :)

quite possibly some of the effects precalculated - e.g. breaking the sound into frequency bands ahead of time and interleaving them on disk, then combining them to play back is a lot easier (just adds + averaging) than doing a realtime FFT.

Given that this appears to be his previous "C64MP3", just with some controls added into the playing app, it's even less than that.

First, yes, there's precalculation going on. It's a series of matlab files that takes gigs of RAM and forever to run.

Then, it uses two voices of a SID. The first voice does coarse envelope and fundamental frequency by setting the volume and pitch on a SID channel, and harmonics are done by loading one of 256 pre-calculated PCM samples into the voice. They're all the same length, but by adjusting the pitch on the channel, the fundamental and harmonics go up in the same measure. The second voice is just a white-noise channel to make an attempt at dealing with percussion, sibilants, etc. The player for all of this was released in February of this year. Really if you think about it, it's more akin to a SID-assisted MOD player than an MP3 player, plus an offline encoder that turns an audio file into a high-tempo "MOD-alike" file that this app can play. Not too surprising considering it was written by a guy with extensive experience with MODs.

Now, look at the new demo. In the first couple minutes you've got speed adjustment (leaving pitch the same), pitch-shift (leaving speed the same), bass boost and some basic EQ stuff. (I couldn't tell by ear what the "autotune" one is supposedly doing). Is any of that "realtime DSP"? No. Speed adjustment just changes the speed of running through the samples (leaving the SID running at the same settings for more or less time before moving to the next set). Pitch adjustment just adds a constant to the pitches that are sent to the SID. The bassboost and EQ stuff uses the high/low/bandpass filters already on the SID.

Next: Echo, "tube distortion", a quantizer ("Grungelizer"), a volume compressor, dither, and master volume Okay, to be honest I have no clue what trick he's using for echo. Distortion is probably done on the SID. Quantizer is just doing a couple of bitshifts on the PCM samples when it's streamed to the SID. The compressor just works on the envelope volumes, rather than processing any of the actual samples. The dither probably just programs the spare voice to do a little low-level noise. And volume, again, just has to multiply the envelope volumes.

So basically, as with any demo, the secret is to cheat. In some sense, yes, it's doing what it looks like it's doing -- the audio is there for you to see. In another sense, it's not doing anything like what you assume it's doing. It's doing things that are a lot cheaper and dirtier.

Edit:Download links, for both the older "C64MP3" demo and the new "Cubase64" one. I haven't read the Cubase64 source yet -- time for me to get some sleep -- but I will.

Long time ago, I almost did a realtime Fourier-transform music spectrum analyzer using a Z80 processor - the one used for the Sinclair Spectrum computers, not very different from a C64. It's totally doable, with careful low-level programming and crafting your algorithms.

compute the instantaneous frequency/amplitude relationship of the signal using the STFT, which is the discrete Fourier transform of a short, overlapping and smoothly windowed block of samples;

apply some processing to the Fourier transform magnitudes and phases (like resampling the FFT blocks); and

perform an inverse STFT by taking the inverse Fourier transform on each chunk and adding the resulting waveform chunks.
To be able to handle this in real-time, step 1, 2 and 3 are done already in the audio compression step, and so we can handle time stretch on-the-fly. Job done.

And how's the compression done?

For the lower 4kHz of the audio playback, the encoding process takes some 25 minutes to run through, using a state-of- the-art PC. It works like this:

Find the fundamental frequency of the sound. For a human singing voice, this equals the note you'd play on the piano. Or something in between notes on the piano.

Resample the complete 2-minute song into a constant- pitch audio sample. This sounds really strange, since the tempo of the song is lost, and the voice is a robotic one-note-song.

Extract some 15000 small pieces of this song which we now will call "formant waveforms". These waveforms are actually loopable, since we have chosen a fixed frequency for all of them.

Compare all formant waveforms, and find out which of them sounds the same.

Remove formant waveforms that are similar until there's only 255 of them left.

Make a couple of lists with this information:

Which formant waveform shall we play now?

At which fundamental frequency shall we play it?

And with what volume?

This sounds pretty straight forward - but this does not solve the problem. We only had some 50kB of memory left for audio data. We want to chose a formant waveform size, to start with. A female voice does not contain any frequencies below 150Hz. That's why we love them, isn't it? So, lets choose a lowest frequency of 150Hz.

In fact, it makes me think this is the C64-audio equivalent of what State of the Art and Nine Fingers did for video - it was full screen video of dancers, but instead of sampling pixels and trying to compress them, they converted them to vector graphics.

The FFT stuff is not done on the 6510, so it has nothing to do with the 1MHz of the C64. It clearly says so in the quote.
You guys (kyz and malefic_puppy) are probably aware of that, but just to prevent people from getting the wrong impression.

Fair enough, but what if someone were to make a demo like this for a modern pc using and optimizing for every bit of power the computer had? Though that would take so much planning and time I'm not even sure anyone could do it.

To be honest, 95% of that is "what modern games are". Sure, you're not optimizing for every bit of power, but you are optimizing for the vast majority of what's cost-effective to do.

If you spent an extra three years and three hundred programmer years, you could maybe make a game that works at twice the speed at most, but of course you could have done the same thing by just waiting a year and a half for computers to get twice as fast.

Pex and I were co-workers (before the startup we were at folded), and yeah - he's a brilliant guy. Finished his master and bachelors' in three years. As with all true übernerds, he's a great guy to be around and extremely friendly and talkative. He has a great sense of life-work-balance, didn't sweat the crap a corporate environment inevitably puts on you, and only worked 4 days a week too. :)

Damn, I miss that place - all the people were brilliant. Pex was certainly a bright one, but the rest of the people weren't bad either. Too bad putting a bunch of smart people in a room and making money are two very different things. :)

It's considered an audio codec test-tune due to the fidelity of the voice. It's one of those things where it gets really easy to hear audio compression artifacts, and was one of the original trial-sounds for the MP3-codec back in the day.

Never mind "cymbal hiss" or "drums sounding thin" or for that matter a trumpet in jazz, this track is hell to get properly compressed.

Didn't know the fun fact, but yeah. I read an article (from/with) the original developer discussing it, basically saying they had been quite happy with mp3, until he heard that song on the radio and decided to try, and realized it sounded shit and it was nearly back to the drawing-board.

SVT (Swedish television) made a series of documentaries called "Hitlåtens historia" (Story of the hit song) which were excellent. I think it was Suzanne Vega - Tom's Diner, Metallica - Enter Sandman, Roxette - The Look and Jay-Z - Hard Knock Life. The Tom's Diner one contained an interview with the Fraunhofer guy.

Go ahead, Voice fidelity is a hard thing to get hold of it seems. We're quite sensitive to changes in this area, which means that most codecs are specifically tuned to give us more bits of data for it, simply because they cannot compress it properly.

The guy wrote a modern software DSP (digital signal processor) with ancient hardware and did it in real time. He was showing how the shit he was able to do on the ancient hardware wasn't readily available until years after said hardware was out.

Real Time Digital Signal Processing involves running somewhat heavy math on thousands of numbers with a strict time budget. The result is to add effects to sound --like echo, autotune or an equalizer.

This demo is impressive because commercial DSP programs normally require high-end machines and gobs of memory. Meanwhile the C64 only runs a 1Mhz and only has 64K of RAM. Playing the 4Khz sound means that it only has about 256 cycles to work on each sample (plus it has other work to do like draw the UI). That's not a lot.

To be fair to the pros, though: The demo does cheat as much as physically possible. That's really what makes it impressive --the creativity of the cheats.

Well, I - Once upon a time - was a very geeky kid when I was young. I had a favourite fantasy author - Tamora Pierce -. In her book there was a certain Knight in the Background, whom I thought was essentiel to many of the things happening in the two series. He's name was Raoul of Golden Peak. When translated to Danish, it would become peak would become lake.

That's my reason. And why Silkeborg, well the band alphabeat originated from Silkeborg and they met and connected through Silkeborg Gymnasium. ;)

The C64 doesn't have built in support for sample playback. You play back samples by playing all kinds of nasty tricks (the basic method is to change the volume a few thousand times a second, which causes an audible "click", thus giving you one bit samples) that uses a significant percentage of the total available processing power.

Pretty much doing anything to manipulate the samples being played back without significantly affecting the sample quality is impressive.

(and for the record, the C64 doesn't have any opcodes for multiplications, only shifts and adds, so doing large number of multiplications is extremely costly)

Nope. I remember how amazed I was when I got my first Amiga, and had the luxury of programming a M68k with all kinds of fancy instructions and addressing modes, including multiplication.

Though even on early M68k and x86 CPU's, using built in multiplication was to be avoided if at all possible. On the 68000, a multiplication took a minimum of 70 clock cycles, compared to a minimum of 6 or 8 for addition - anything that could be expressed as less than a handful of add/shifts between registers would be quicker to do that way.

One of IBM's last decimal machines (the internal format for numbers was BCD, with sign and parity bits), the 1620's nickname was CADET (Can't Add Doesn't Even Try) because addition was accomplished using lookup tables rather than adders (similarly for subraction and multiplication, and there was no DIVIDE instruction at all; division was done in software).

Now, machines before the 1620 did add with hardware adders, but don't think that any of the basic arithmetical operators is fundamental to having a working computer.

To bring this back around to 'playing music on computers that should be about as musical as a dead possum', here's something the 1620 could do:

There used to be a program for the 1620 that worked line this. You put an AM radio on the CPU console, and tuned it for the loudest noise. (They generated a lot of random RF noise that could play havoc on nearby electronic equipment.) Then you fed a deck of cards with the program. The radio would play "Stars and Stripes Forever" and the line printer would play the drum rolls.

I read this title without seeing the comma in-between "Fuck" and "this". It made way for a completely different perspective while reading this article. After watching the video, I was like, "Yeah, fuck this guy!"

cool, I remember back in 1985 I typed a program in on my C64 to give me digital audio. It could play back music from a standard music cassette put in the tape drive. The secret was to send the digital audio signal through the 4 bit volume register fast enough to cause the speakers to vibrate. Voila. 4 bit digital audio.

There were a study about how many hours are wasted every day in offices waiting for Windows to start up, the user to log in, Outlook starts, Word and Excel starts, users wait for printing..

Just take a look at syscall traces (strace in linux, Process Monitor for windows). Most of the time is wasted for waiting on I/O, loading DLLs, checking for registry keys (or checking ld.so.conf or searching for special files in /etc) on each DLL (.so in the Linux world) load, paging things in-and-out of memory, opening directories and enumerating their contents etc..

And don't even get me started on bad default timeouts. How the fuck came up with more than 5 seconds timeout for network discovery for Windows Networking?

Yes, processing power is cheap, but PCs are very complex systems. (And that's a semi-good argument for why things-that-should-be-simple turn out to be abominations mentioned above.)

I can write an app that connects to a machine using HTTP over TCP/IP, grabs an XML feed, parses it, downloads a file, and then plays its audio in a manageable amount of code using Python or C++/Qt. Doing the same with assembly is just insanity. Heck, even doing that with C is a pain in the ass.

The more abstraction levels that our computers deal with, the more abstracted programming languages we need to maintain ease of implementation. Otherwise, you're just spending your time repeating work - time which could be better spent working on what makes your app unique.

But your app does none of those things, it's all 3rd party libraries doing the work and you're little more than a code monkey stitching it all together.

I'm not trying to be offensive, it's just that I've been doing that shit for years and I find it is soul destroying activity that prevents me from really understanding the machine.

Another thing is too many people also confuse libraries for languages. While it would still be easier to write all the stuff in a higher level language than assembly language, if all you're doing is calling into these libraries, that's not too much more difficult in assembly than any other language.

Try to write a program in your favorite language without calling into any libraries, then try writing a program in assembly with full access to any and all libraries, then try writing both without any access to libraries. I think you'll find that it's not so much the language, but the combined efforts of other programmers you're getting the productivity increases from.

The thing I've found with assembly is it encourages a kind of roll your own mindset for just about everything and few people have the patience for that, but to say that those things are insanity in assembly is true not because the language is hard (it's actually pretty easy), but because there are far fewer third party libraries out there written in assembly for assembly programmers to do that, thus you tend to start off reinventing the wheel.

Hell, with a proper selection of macros, you won't even know you're programming in it.

But your app does none of those things, it's all 3rd party libraries doing the work and you're little more than a code monkey stitching it all together.

And it's thanks to this abstraction that I can create useful products in a reasonable time frame. I don't have to be an expert in DSP and generate my own MP3 decoding scheme, someone else knows it better than I ever will, and implemented a good library for it. I don't have to worry about the ins and outs of TCP/IP datagrams, someone else has taken care of that in another library. I don't have to concern myself with UTF-8 vs. ASCII parsing, that's been handled too.

There's nothing wrong with using abstractions in order to make your job easier. The average home computer isn't a simple device whose sole concern is perhaps a lone RS232 peripheral with a disk drive and text-based interface anymore.

It's now this tool that does advanced 3D graphics, with translucent and animated GUIs, that networks with other computers through several levels of abstraction (physical -> ethernet -> TCP/IP -> bittorrent). To do all of this from scratch in assembly is to understand every single aspect all the way through, and I don't think there's enough hours in a lifetime to cover that. Nor is it even a reasonable proposition.

You yourself are not immune from your philosophy either. What are macros except a level of abstraction above in order to make your job easier. Oh sure, the end product is all in assembly, but you could say the same of any language that compiles into machine code.

In any case, I'm a proponent of the right tool for the right job. I have nothing against assembly, but I think it's only value these days is for embedded systems and things like microcontrollers where clock cycles and memory are very valued. For home computers, you might as well use Python. It's not like you'll even notice a performance difference for most uses.

And it's thanks to this abstraction that I can create useful products in a reasonable time frame. I don't have to be an expert in DSP and generate my own MP3 decoding scheme, someone else knows it better than I ever will, and implemented a good library for it. I don't have to worry about the ins and outs of TCP/IP datagrams, someone else has taken care of that in another library. I don't have to concern myself with UTF-8 vs. ASCII parsing, that's been handled too.

This is true. I'm not one to denigrate the code monkey since I myself mostly am a code monkey. But sometimes you want more than that. I could not fathom continuing what I'm doing as a business programmer for the rest of my life and continue to call it programming. It's the difference between hand crafting your own car and taking your spot on the assembly line, knowing a very limited aspect of what you're doing, but doing it well, though you might not understand anything other than your place in the big picture.

There's nothing wrong with using abstractions in order to make your job easier. The average home computer isn't a simple device whose sole concern is perhaps a lone RS232 peripheral with a disk drive and text-based interface anymore.

And maybe that's striking close to where we went wrong. Why can't we have standards that allow hardware to be programmed easily? Take graphics drivers, for instance. Back in the days of DOS is was mind numbingly simple to get into mode 13 and from there it was just tossing data into memory. While BIOS wasn't the speediest option around, it wasn't difficult. Even setting that up by hand was relatively easy. Today everything is different. Hardware manufacturers close all the programming details of their hardware and release drivers that only work under certain operating systems, when there really is no need for it. I don't consider this an improvement by any stretch of the imagination. All hardware really should have its own free, independent layer upon which other things can be hosted and all functionality is exposed, and all hardware should have open specifications in the same way Intel or AMD instructions are open. Assembly language would be more accessible in such an environment. This was the goal of an operating system, but in my mind an operating system should just be a thing that abstracts the hardware, not fancy window manager GUIs and gobs of applications. Obviously those have a place too, but it would be nice to be able to escape that when one wishes. I used to have a PC that had an assembler and debugger built into the BIOS. You'd just press a key during boot and there you were. In less than a week you could have a FORTH written well enough to do useful stuff, or even the beginnings of your own window manager. What happened to that? Why do I have to spend weeks or months drilling down from within windows to get to the hardware?

It's now this tool that does advanced 3D graphics, with translucent and animated GUIs, that networks with other computers through several levels of abstraction (physical -> ethernet -> TCP/IP -> bittorrent). To do all of this from scratch in assembly is to understand every single aspect all the way through, and I don't think there's enough hours in a lifetime to cover that. Nor is it even a reasonable proposition.

Maybe not, but it's damn fun and useful to know. Most programmers I know these days spend a shitload of extra time finding and fixing bugs in their code because they throw their hands up once they see assembly language while debugging. It's sad too, because it's not that hard to understand.

You yourself are not immune from your philosophy either. What are macros except a level of abstraction above in order to make your job easier. Oh sure, the end product is all in assembly, but you could say the same of any language that compiles into machine code.

This is true. Again, I'm not going to denigrate abstraction. I think abstraction is good, but I do think it could be made easier to peel the onion, so to speak, if one finds they must go deeper.

In any case, I'm a proponent of the right tool for the right job. I have nothing against assembly, but I think it's only value these days is for embedded systems and things like microcontrollers where clock cycles and memory are very valued. For home computers, you might as well use Python. It's not like you'll even notice a performance difference for most uses.

The educational value of assembly is not to be understated either. I think everyone should learn assembly. I too am a proponent of the right tool for the right job. My beef is that too many people are unaware of what tools are at their disposal to begin with. They don't know that assembly language is a laser precision instrument, they just look at it and it scares them so they toss it back in the box and stick with their hammers and shovels, ignoring any and all opportunities for surgery on the CPU because the tools they have are "good enough". They don't really know how "good" good can be.

It's the difference between hand crafting your own car and taking your spot on the assembly line, knowing a very limited aspect of what you're doing, but doing it well, though you might not understand anything other than your place in the big picture.

I would argue that the notion of the big picture is relative. For one person, that might consist of understanding exactly what the computer is doing and how it is able to do its magic.

For another, the priority might be elsewhere. It might be how to create the best algorithm to provide the most accurate results for processing geophysical data. In which case, going down to such a low level is actually a liability - as it requires a higher amount of code to get the algorithm running (esp. if more complex operations such as FFTs get involved.) The more code you have, the more opportunity there is for errors. More coding costs time. Errors cost time. Time that could be invested in making the algorithm better.

The educational value of assembly is not to be understated either. I think everyone should learn assembly. I too am a proponent of the right tool for the right job. My beef is that too many people are unaware of what tools are at their disposal to begin with. They don't know that assembly language is a laser precision instrument, they just look at it and it scares them so they toss it back in the box and stick with their hammers and shovels, ignoring any and all opportunities for surgery on the CPU because the tools they have are "good enough". They don't really know how "good" good can be.

I agree: assembly is highly educational. Necessary? Well I have a tougher time arguing for that. Programming is getting so easy that it's becoming an incidental tool of people's jobs. Sort of like a calculator. Do you need to know how it works inside to make full use of it? I'm sure it helps, but I wouldn't say it's necessary.

Certainly, I haven't found much of anything to disagree with in this discussion. Maybe it's not necessary, but my knowledge of assembly has saved my ass more times than I care to remember where others gave up.

Obviously it's necessary for someone; i.e. the people who write the software on which the other layers of software are written; people who will inevitably make mistakes, and people you'll have to wait on to fix things because you don't know how to fix them yourself. I have a lot of application patches written in assembly at work that fix bugs in certain closed-source apps, some of them no longer actively maintained and would take far too long to rewrite.

This skill is an example of where knowledge of assembly is useful if you want to get things done fast, or at all, since waiting for fixes that may never arrive can also be a show stopper and hold up development. Not knowing how to properly debug an application with modules that have no available source will slow you down as well, and if you learn assembly the added bonus is everyone thinks you're awesome for being able to fix things that everyone else in your department says can't be fixed.

I will never lose my job because I have something no one else there does.

An alternative perspective is to view the existence of libraries as the equivalent of "standing on the shoulders of giants". All of science and mathematics is about extending what has already been created. Doing everything from first principles is ultimately infeasible.

Yes, but there's still the trend of not teaching people how to be efficient any more. It's a disease of the cheap hardware, where it's easier to just throw another box at it than optimise that bit of code.

Most schools have Computational Complexity classes, they learn Quicksort, Heap sort, Merge sort and whatever. And that's great. But they forget that we're not programming Turing machines. And most of the time what matters is data access.

Oh, it did, but it failed to mention anything relatively useful for, you know, complex software systems engineering. (Hm, maybe we had a course about design patterns and project management. Everyone hated it, the teacher is just a typical boring-to-death competition winner.)

It's going to change: Performance per core is increasing slowly enough now that people need to get better at either parallelization or optimizing single core performance, and parallelization is hard for many problems.

Modern compilers can outperform human assembly programmers. There are also cases where just-in-time compilers do a better job than standard compilers because they can use run-time conditions as part of their processing.

It's only power if it's being used. If we all spent the amount of time this guy did to get applications to work under absurd tight constraints, we would truly be wasting the power. 99.9% of the practical applications would never be completed to use any processing power. Now that would be shameful.

That's true. On the other hand, most programmers these days are barely competent enough to be allowed near a keyboard. I keep banging my head against the wall whenever I need to debug some performance problem, strace/ltrace an application and instantly see that some moron is causing 3-4 times as many context switches as needed, for example, or does something that wastes ridiculous amounts of memory.

Most of the time an app would be far less resource hungry if the people who wrote it had the faintest clue what they were doing, or at least spent an hour or two actually reading up on the subject (biggest pet peeves: any network client library that does small reads instead of doing non-blocking reads into a larger buffer - if you ever find yourself giving single or double digit byte lengths to read() for example, I probably would want to break your legs; same thing if you do larger number of small size malloc() calls without a damn good reason, and that damn good reason better include an explanation of why a pool/arena allocator wouldn't make a difference).

Some of my biggest hates (though it's been a few years since I've looked at them, so hopefully they've improved): T1lib (used to do hundreds or thousands of malloc(4) and similarly ridiculous sizes in cases where it could allocate an array and dish out pointers from it), MySQL client libraries (tons of 4 byte read()'s for no good reason, and that's before I start bitching about the lack of async support), Postgres/pg_dump (who thought it was a good idea to read a whole table into memory, then write it to disk, then read it back from disk and send it over a socket when you'd want to dump from a table, instead of just streaming the data straight to the socket? Seriously...)

Leaky abstractions. They probably don't want 4 byte read()s, but they do want to say stream.getInt(), and they designed the wire protocol such that they can't say ahead of time how many bytes the entire query/row/block will take, so the person who implements "stream" can't take any chances and read() more bytes than immediately needed, because he might ask for too many and end up blocking the application, or even if he did it async, he'd have to wait for an arbitrary time-out of the final block so he can get the straggling bytes.

They probably don't want 4 byte read()s, but they do want to say stream.getInt()

Then they should implement a stream class / set of functions / whatever, not assume that read() will do what they want without a massive performance overhead when even the most basic testing will tell them otherwise.

and they designed the wire protocol such that they can't say ahead of time how many bytes the entire query/row/block will take, so the person who implements "stream" can't take any chances and read() more bytes than immediately needed, because he might ask for too many and end up blocking the application or even if he did it async,

Here's the pattern to follow to avoid that problem:

Create a buffer.

On each read request, first copy whatever you can from the buffer.

If the buffer does not have enough data to fill the entire request, refill the buffer with select() + read() or read() on an async socket depending on requirements. Repeat until the request has been satisfied, possibly with remaining data in the buffer (as an optimization, you can handle larger read requests directly into the target memory area instead of into the buffer)

So instead of read() on a raw socket you call your own little library function that encapsulates the buffering and ensures you never call read() with small buffer sizes.

Any serious library that encapsulates sockets does or should do the equivalent of this. The problem is generally programmers that think they know what they are doing and calls the low level API's directly without understanding the costs.

he'd have to wait for an arbitrary time-out of the final block so he can get the straggling bytes.

No, that's not how it'd work. You request whatever is available now, either with select()/poll/epoll + read() or by setting the socket async (depending on what fits with the rest of your app). If the buffer doesn't fill up totally, that doesn't matter, as long as you get enough to fulfill the request. Otherwise you go into another cycle. This uses slightly more resources when the app is idle, but it's far more efficient when network traffic is high.

For the final block, the data the app is requesting will arrive, mark the socket as readable, and get read, and the read() will just complete with far less than the full buffer filled.

However for earlier data blocks, assuming the sending app isn't retarded, the read()'s will succeed with much larger chunks of data.

Double digit percentage increases in throughput is routine when fixing apps that does small reads. I've had apps triple throughput by fixing stuff like this.

Double digit percentage increases in throughput is routine when fixing apps that does small reads. I've had apps triple throughput by fixing stuff like this.

Word.

I write unarchiving software and I regression test it by comparing MD5sums of unarchived files to known good MD5sums.

I had an 80,000% speedup when, instead of writing the unpacked data to disk (file by file, just using the same filename) and then MD5summing that file, I simply made the unpacker feed an MD5 routine straight from memory.

I would have thought that the data wouldn't even touch the disk seeing as it was such a temporary file. I was wrong. Linux was actually ensuring it got written to the HDD, which was the main bottleneck, and no amount of profiling told me that - just common sense.

Touching the HDD when you don't have to is really bad. Touching the HDD on a network share when you don't have to... and being careless, inexperienced or even an idiot doesn't explain it and I start wondering if it is a malicious attempt at job security.

I was once called in to fix a broken app. It processed a medium amount of data (a few 100,000 records) and output about 10GB of generated pdfs. This ordinarily took about 4-5 days. The last batch was broken somehow (nobody could tell us why, and the original developers were unavailable), and we had two days to make it better, with lots of money at stake. It took us about a day to find out what was wrong, and that it was impossible to salvage anything, which meant we had to run the entire thing again, but in just one day this time so we had to make it fast.

One of the things that were done was to generate barcode images (for printing reasons, so there were only about 400 distinct ones), which were saved to an NFS share (overwriting any old image, if it was there), only to be read back again right away using a different library, then reading a pdf from the same NFS, stamping the image onto the file, and writing it back. By not saving the images (using a regular byte array to convert between the two inane libraries), and grafting this process onto the pdf generation step so we saved another unneccesary store and read, we reduced the time of that part from about 36 hours to 5 seconds.

Processing power is a cheap commodity these days. Human power is still at a premium so why waste it doing assembly when the programming job can get done much faster and less buggy by using a high-level language?

I've read pretty much all of this subthread and even though I agree that we as human programmers require abstraction layers in order to get things done on time, maintainability, not reinventing the whell, etc, I still think that most likely we are not taking advantage of the true potential of the hardware we have today. It's difficult to know that we could accomplish if we squeezed the maximum potential of a modern computer. Maybe the problem resides in our humanity. Maybe computers should be programming themselves.

There's no state tracking. I would optimize the mainswitch with some Input/Output, and then run the signal through some binary-adapted pitch modulators. That would get the job done using half the amperes. Come to think of it, this would also obviate the need for either packing OR unpacking the frame.

This particular demo is interesting because so much of the work happened offline, on a fast modern PC. The data that he's 'playing' is a hyper-efficient representation of the song that couldn't have been created in the 1980s, unless you had the keys to the Cray-2 at the NSA.

The problem of how to trade offline processing time for improved realtime performance is an interesting one to tackle, if you're involved in DSP engineering work. The fact that his work led to the ability to auto-tune a voice on a Commodore 64 is almost beside the point.