Let's Design and Build a (mostly) Digital Theremin!

"I use to ask Fred what your theremin could sound like, as beautiful as the best analog theremin? He said it can sound like anything you want it too. How far in the future are we from excellent sound?" - Christopher

DSP is extremely similar to analog electronics. But things that are falling off a log in analog can be quite challenging in digital, hence the steep learning curve. Guitar digital multi-effects are a good example, it took them a long time to figure out non-aliased distortion, and to get amplifier and speaker emulation "close enough" to the real thing. But once they solve it it's set in stone, and much easier/cheaper to replicate exactly. Presets are a billion times easier to do in digital as well. Delay effects (phasing, flanging, echo, reverb) are easy in DSP but a huge challenge in analog.

It can sound like anything given enough programmer expertise and effort, and real-time processor cycles.

"I know finding a top Thereminist for the demo can be difficult."

Yah, it's not a Field of Dreams kinda thing (if you build it they will come). I'm hoping my proximity to NYC will help in this respect.

I'm finding volume control, such as in the mixer, to be even more involved than expected.

If you look at audio taper pots, they have a certain slope from 100% rotation to ~65%, a soft knee, then a less steep slop from there to ~25% rotation, then pooping out to zero. Log response is then a piece-wise approximation of linear responses. Mixing console faders are similar, but use a piece-wise approximation of log responses, with the upper ~1/2 section covering a range of ~20dB (presumably the gain side of the slider), and the ~1/3 section below that dropping at a steeper log slope, followed by two more log slopes, and ending on a definite designed-in physical dead zone.

You can't use a perfect log response here because you want zero output when you set the control to zero, but zero output isn't possible on a log scale, only smaller and smaller outputs. Consulting "Small Signal Audio Design" by Douglas Self, the response of the Baxandall active gain control is perhaps the thing to emulate. It gives a fairly (+/-3dB) logarithmic ~30dB or so output change (at a ~44dB rate) over the top 80% of the input control range, and then dive-bombs in a smooth but non-logarithmic manner to zero. I'm thinking maybe a polynomial can do this.

My problem with all this is: what's the ideal response for a volume control? Does an ideal even exist?

[EDIT2] Just tried cube on the prototype mixer - it works like a champ! Saves a bunch of cycles too (don't need to do low res EXP2).

[EDIT3] Just tried simple squaring, it's even better! Wow, excellent volume taper and exponentiation with a single assembly instruction!

I think a lot of the log response choices in analog circuit volume controls are for reasons other than what is absolutely optimal, e.g. the Baxandall removes the absolute value of the pot from the equation.

I'm running into squaring everywhere it seems, squares and powers of squares x^(2^n). Using it in the tuner to weight the LED that's on, using it in the pitch quantizer to weight the nearest chromatic note, using it in the mixer to taper the volume and do linear to exponential conversion. Wonder where it will spring up next? Squaring is concave in the linear domain and convex in the logarithmic domain, so it's kind of a crossover function.

Speaking of the mixer, in the interim where I don't have full-blown vocal synthesis going on, I've got some basic waveform such as sine, triangle, ramp, and square. These are easy to generate from the NCO ramp, but the sharp edges generate quite audible aliasing, so I find myself not using anything other than the sine wave at all, and the sine wave is getting kind of boring. One semi-fix may be to start with a sine wave and mutate it in ways that don't produce slope discontinuities. Squaring it gives more of a pulse wave shape, square root gives more of a square wave shape. Square root is expensive, so flipping it, squaring it, and flipping it back gives a similar looking result. Squaring or ~square roooting quadrants I & IV, and II & III independently can give even harmonic content while retaining smoothness at the transitions.

I've coded up a precision state-variable filter and can't wait to try it with the noise source. I also have a "quick" version that has some tuning error at higher cutoffs, but uses fewer cycles, which should be useful for formants and the like.

============

[EDIT] Woah, squaring portions of sines and inverted sines works pretty good for generating harmonics without too much aliasing. And the state variable filter is working much better than I was expecting - it's pretty fantastic actually!

[EDIT2] Yet another squaring: to expand the resonant end of the state variable filter damping control value. Quite nice!

Today was rewrite day. I noticed the NCO was generating a super high frequency when it should have been 0Hz. Turned out to be underflow related to the pitch quantization pulling things into the nether regions. The tuner and pitch quantizer now share more common code, and things are more modular (subroutines) and straightforward in their implementation.

Yesterday and today were vocal formant filter day. I stuck a low-pass and two band-pass state variable filters (all in parallel) in the signal chain, along with UI menu access. Vocal synthesis seems highly reliant on the harmonic content of the glottal waveform, my inverse squared (one or more squarings) two quadrant sine wave processing seems to give a pretty good starting point. The filter Q doesn't seem all that important in terms of adjustment, though the level and certainly the frequency are. Three formants are sufficient to do fairly realistic vocal stuff. Mixing in some noise and a final low-pass add to the realism, though these are just initial investigations. I'll post some videos soon, it can do a female sudden intake of breath that's pretty scary! (And I'm beginning to understand the vocal tract physics behind it all - here's a good paper: http://www.cs.princeton.edu/~prc/PRCThesis.pdf, and a fun web thingie: https://dood.al/pinktrombone/)

I learned this term (new to me) from the Perry Cook PHD Thesis I linked to in my previous post. It's speaking in a very low pitched register so that the vocal chords produce a "popping" impulse sound (think of how Noam Chomsky talks, or how many young people tend to speak lately). This is pretty easy to do, synthesis-wise, as the vocal chord waveform has to be rich in harmonics, and slowing it down (lowering the pitch) gives this kind of effect. The "pops" cause the "ringing" (Q) of the vocal tract resonances to be more noticeable.

What's harder to do is simulate the dynamics of the vocal chords. The paper discusses open/close times of the chords, open constant times, etc. and how the waveforms can be synthesized with minimal aliasing. I need to implement something here tied to volume dynamics, because what I'm doing now sounds like someone adjusting the volume control on a stereo, not a person talking louder and softer.

The paper also discusses the design of a physical model of the vocal tract, something I was burning to try but am now a bit colder on. In many ways physical modeling is what you want to do because the adjustment "knobs" on it usually closely correspond to physical parameters (string stiffness, resonant tube length, etc.) and can therefore be more intuitively tailored. And a lot of desired behavior (resonance, decay rate, spectral coloring, etc.) is a natural consequence of stimulating the model.

I'm finding three formants (band pass filters) in parallel can give a fairly realistic voice sound if properly stimulated. The human vocal tract is basically a 'Y' pipe, where the vocal chords are at the bottom, and the two upper paths are the nasal and mouth resonators. The mouth resonance is complicated by the jaw opening and closing, and the tongue and the lips, while the nasal resonance is largely constant (if not stopped by the back of the tongue). A fourth somewhat fixed band pass filter can fill-in the throat and face sounds that naturally radiate when humming and such. The mouth and nasal pipes are open on the ends, so some of the sound reflects back into the pipe with reversed phase.

My signal chain right now is a mutated sine wave for the glottal sounds. Flipping and repeatedly squaring two of the normalized quadrants gives a pretty nice even and odd harmonic content without a lot of aliasing, and doing this to all four quadrants gives an odd harmonics only rounded square wave, which unfortunately does alias at higher squarings and frequencies, though nowhere near as badly as a true square. The harmonic control is quite convenient, as I have it arranged to do two quadrant squaring for positive parameter adjustments, and four quadrant squaring for negative parameter adjustment (and a pure sine wave for parameter = 0). In parallel with this mutated sine wave I have white noise feeding a tracking or fixed state variable filter. Both the sine and noise are volume controlled via the left-hand antenna. These feed into a formant filter bank, consisting of four band pass filters in parallel. The banks are split into two screens, and I can disable either half (or both) of the filters with a parameter, and when all are disabled there is an automatic pass-through. Having two banks of two is handy, because I'm using two filters for the mouth resonance, and two filters for the nasal and throat resonance. Turning off the mouth filters sounds like humming or nose breathing.

It's quite fun to play around even with this basic setup. Sharp intake of breath causes turbulence at the lips, so simply mixing in unfiltered noise is a close approximation. Breathing is just noise through the formant filter. The two axis continuous control the Theremin provides is highly suited to this sort of thing. It's kind of weird adjusting the formants though. It's rather like summing sine waves manually to get a certain tonal color: unless you're close you don't know it, and your ear hears the separate things you are adjusting until it all suddenly pulls together. The human brain has quite strict categories for what sounds human; fortunately it isn't too difficult to fake it out with simple signals and filters.

And... yet another use for squaring! Instead of converting the formant filter frequency parameter control from linear to exponential via EXP2, I'm using one squaring. This nicely expands the low end of the adjustment and is essentially free (a single assembly instruction). So all three formant filter controls (frequency, damping, mixing) now employ squared parameters (as does the noise tracking filter when it's in manual tuning mode).

Linear all the way from touching the antenna to my hand at the side of my body! I honestly never expected to achieve anything quite this perfect nor as easily adjustable. I know I yack about it too much, but it's my pride and joy, and the very cornerstone of the best Theremins.