I wrote some code to re-implement random sound effect pitch-shifting in Doom,
specifically the Chocolate-Doom project. This page is just an info-dump of the
notes and details that I made along the way.

Measurements

A big part of the project was measuring the behaviour of the pitch-shift in old
versions of Doom. One approach to this would be to run it through a dissassembler
and trace the machine code. I know some people are using this approach to figure
out how the OPL code works.

I took a different approach: knowing a bit about the pitch calculations from the
Linux Doom source code, I made small modifications to the EXE to make the
random number generator predictable. I then also
modified the game data so the sound effects were replaced with a single, one-second
sine wave, tuned to middle C. I then started the game up, triggered some sound
effects and recorded the output. I could then compare the result against the input
sine wave.

switch.wad: A single sound effect, as with sine.wad but replacing SWITCH (Heretic sound effect)

moan.py: opens doom1.wad, expecting the 1.2 SW IWAD, replaces all directory entries for sound effects with one pointing at DSPODTH1

sine.py: Opens doom2.wad and reads out all sound directory entries. opens sine.wad and reads all the data. writes out sine2.wad, mapping all doom2 sfx to the one sound effect in sine.wad.

sizesfx.py: open doom2.wad, print out how many bytes are used by sound effects.

Doom 1.2 shareware

255

113

1.06

1.117

247

121

1.029

1.054

113

127

1.005

1.007

16

128

1

1

141

131

0.971

0.9765

135

137

0.928

0.9296

0

144

0.883

0.875

The red line ("c-d") was the behaviour of my patch against Chocolate-Doom at the time of testing,
and the orange line ("exp") was an improved algorithm I was testing. I eventually settled on that
algorithm, but you can see there's a little room for improvement.

Heretic 1.3

Heretic has 15 distinct pitch values, and calls M_Random twice when generating them.
This made things tricky with the approach I was taking, because I couldn't just fix
the RNG to one value: I had to write 'stripes' of values across it, instead. I ended
up writing scripts to make this easier:

Compared to the doom shifting, we're quite a lot further out for Heretic, but it's
still roughly right, and sounds OK in-game.

Pitch shifting code

I guessed pretty early that the Doom/DMX code wasn't doing a "proper" pitch-shift,
but stretching or squashing the sample to change the length and pitch. my initial
stabs therefore resized the sound effect by the same ratio as the pitch value
against the "norm". This turned out to be inverted, which makes some kind of sense:
a higher pitch value results in a higher pitch, which is a shorter play time, and
shorter buffer.

The solution we went with was a variation on my very first hack: map source to
destination buffer cells based on their percentage offset from the start of the
buffer. In short, just copying a subset of samples over, or doubling up some to
make up the required buffer length, but not modifying the samples in any way.

Interpolation

Proper resampling is more involved. I wrote a slight improvement on the above
which did interpolation of cells in the 'pitch-up' case: every source sample
is mixed into the output buffer, depending on what cells it would contribute
to. Multiple source-samples to destination cells are averaged with an even
weighting. A further improvement again would be to have a non-even weighting,
based on how close the cells matched up.

I actually thought this sounded better for the 8 bit 11kHz samples from Doom,
but when I tried reworking it to support higher quality samples, I got a lot
of noise, and would have had to write a low-pass filter. Rather than sort it
out we just used the first iteration which sounded good enough and was fast
to perform frequently at runtime.

Memory management

Chocolate-Doom had a sound-effect caching scheme in place before I started
adding pitch-shifting. Every in-game sound effect is pre-cached at game
start-up and stored in a priority-list. When a sound effect is played, it's
promoted to the top of the list. If the list reaches a certain size, sounds
are purged from the bottom of the list.

My first attempt at adding pitch-shifting inserted the shifted sound effects
into the priority list. The trouble with this was the threshold for throwing
out sound effects was set pretty high, and memory usage grew quite a lot with
the extra sound effects. Here's the memory usage for Doom's DEMO1 and DOOM2's
DEMO3, with the caching. The lines represent different game sample rates. p0
means shifting disabled, p1 means enabled.

In the end we decided to just not cache pitched sound effects. They are re-calculated
every time they are needed. The red and blue lines on these graphs correspond to the
earlier ones, the green lines are pitch shifting on with no caching.

Before we decided to just not cache, I was planning to tweak the purging algorithm
to throw out pitched sounds first, and possibly set a second, lower waterline for
pitched sounds. I recorded a long-ish session of Doom 2 and compared the memory
usage of pitch on and cache versus pitch off, to see if it 'topped out'. I think
this is inconclusive (play time was approximately 15 minutes).