All posts for the month October, 2012

Ok, I’m now at my deal breaker – filtering. In real-world hardware analog circuits, filtering is relatively easy. Just add a tunable r-c (resistor-capacitor) circuit, or an op-amp. In software, it becomes a bit less straightforward.

The French mathematician Joseph Fourier developed a way to convert real-time-based measurements into the frequency domain, and back. Now known as the Fourier Transform, this bit of math lets us take time slices, determine their frequency components, zero out the higher frequencies as desired, and then invert the transform to get the time slice back for sending it to the speakers.

(I may want to re-do this illustration. Technically, the frequency data should be a series of differentiable vertical lines. We’ll just pretend that this shows a block of pink noise.)

Easier said than done. Java doesn’t have a built-in Fourier transform class. There is a downloadable pack from netlib.org, but I don’t have an extractor for .tgz files yet. Fortunately, the FT is a very common exercise project for university students, and there are some public domain Java class implementations on the net. (None that are fully optimized, though)

There’s no point to my explaining the FT here. The important things to know are that the general version of the FT designed for computers is called the DFT (discrete Fourier Transform), an optimized version is the FFT (Fast FT), that even the FFT is slow, and the FFT requires time samples to be a power of 2 (e,g. – 1,024 or 2,024 samples).

I obtained my copy of the FFT class implementation from the Princeton site. It uses a separate class file called Complex.java that I had to hunt for. Save both files in the same directory as the software synth app, making sure that the file names match the class names used within the file. (I.e. – FFT.java and Complex.java). Also, if you’re using Netbeans, make sure to add a line at the beginning of each file for the package (I used “package my.adsr;”. Using the new code is just a matter of creating objects of the FFT and Complex classes, and making Complex arrays for holding the original time slice, the FFT frequency results and the time-domain results from the inverse FFT (IFFT).

Note that I had problems with Netbeans suddenly complaining that the FFT file needs “this” or “super” for referring to objects, but the errors mysteriously went away when I saved the file (or something. All I did was change the formatting a little and saved the file and that was that. I don’t know if I’ll get the error again when I open Netbeans in the future.)

I’ll add the actual code later, after I’m done fixing it.

For the moment, I’ll just talk about concepts.

Restrictions:
I’m using a sampling rate of 16000 for the sound engine. The Java timer function only resolves to 1 ms intervals. I want to take time slices between 10 and 20 ms for various reasons (partly to keep array sizes small, and to allow for timely responses to changes to the MIDI keyboard controls when the user makes them). The FFT needs array sizes to be a power of 2. The FFT arrays need to be large enough to precisely resolve my 16,000 samples/second, but small enough to not affect the sound playback at 10 – 20ms intervals. I started out by choosing 1,024 for the array sizes.

My largest sound slices are 320 samples. So, for the first 3 calls from the timer, I have to keep track of where in the fftBuffer array I am, and just do a System arrayCopy (I know this is slow. I don’t know how to do this with a Component object yet). Run the FFT, process the frequency data, then run the IFFT and send the results to the sound engine. Cut the first 320 elements from fftBuffer (System.arraycopy(fftBuffer, 320, fftBuffer, 0, fftBuffer.length – 320);) and repeat. Call 4 is more tricky. For all subsequent calls, just copy the 320 samples to the end of fftBuffer. (For call 4, I need to adapt to the fact that the buffer array is 1024 elements, and the 320 samples would fall from 960 to 1240. This is accomplished simply by cutting the first 64 elements and copying the samples to the end of fftBuffer. One of the advantages (in my opinion) of having fftBuffer larger than the incoming sampled size of 320 is that this gives me some smoothing of the FFT results just in case the incoming sound changes abruptly at a sample boundary.

In my test program, I used a sample tone of MIDI note number 89, which is 1,396 hz. Printing the real components of the FFT and then plugging them into Excel, I can find the peak in “bin” 88, which corresponds to 1,390.625 hz. In fact, I’m running the oscillator through the ADSR first, so the energy levels are spread out across several bins. But the main peak is at 1,390.625 hz.

Another note is that the FFT “folds” the calculated results around fftBuffer size/2. What this means to us is that the contents of array elements 511 and 512 are the same frequency, 510 and 513, 509 and 514, etc. If we write 0.0 from element 512 to 1023, we lose half our sound volume from the IFFT output. My work-around is to do something like the following:

One of the more interesting things about the old-style analog synths was that the connections between modules were made using patch cables. As long as a particular feature of any given module had a jack, some other feature from some other module could be connected to it (matching up inputs with outputs, obviously). With more modern synths, the connection combinations have been greatly simplified, in part because few musicians need every single one, and partly to just keep the circuit designs simple. But with software, there’s no particular reason not to go back to the old approach, and allow the user to decide which permutations to make use of at any given moment.

So, I went back to my hardware schematic and said to myself “self, let’s feed a sawtooth wave into the ADSR sustain input”.

This turned out to be a lot harder than I’d expected, because what I really wanted was a single cycle of the sawtooth. I could try matching up one cycle of an oscillator to the time I want the sustain level to change and then try to trigger the ADSR release phase, but it makes more sense to expand the circuit toolkit by adding a new module – the oneShot.

In electronics, there is a circuit called a flip-flop. It can be configured to free-run as an oscillator, or to trigger one time and after a preset period to reset. This latter operating mode can be called a “one-shot” (i.e. – it flips then flops once). And that’s what I wanted.

However, as I was building up the oneShot class, I realized that there was no way to reset it automatically at the end of the ADSR cycle, if my plan also included gate arpeggiating the ADSR. That is, if oscillator 1 creates my tone, and I run that into ADSR 1, and I add oscillator 2 to gate the ADSR, my trigger circuit would run once, stop, and then not run any more because the cnt member wouldn’t be zeroed with each cycle of oscillator 2.

The difficulty stems from the osc class not having a “gate edge” event output. Determining whether the oscillator is on the rising edge or falling edge is easy as long as you save the last gate state within the class, but I wasn’t doing that, either. So, I had to go back to the osc class and replace the gateOut() method and add the lastLevelState and lastEdgeState variables. With these, the oscillator can output a risingEdge or fallingEdge event.

The oneShot class is essentially the osc class, with the addition of starting and stopping values, and an inverted triangle waveform.

What I really want the oneShot to do is take over the last value from decay, ramp up or down during sustain, and then maintain the last value of sustain in order to use it for setting the value to use during the release phase. In the case of the sawtooth and reverse sawtooth waveforms, I’m actually outputting a rising or falling ramp wave. If I still want to use a sawtooth, then I can go back to the osc class and use that instead.

I’m starting to make use of constants, for soft-coding sampling rate and waveform numbers, too.

I’m still getting some clicking in the output audio, but through experience I’m realizing that it’s almost always because I’m doing something wrong to insert a cycle that either changes too fast, or goes to zero between operations for some reason (as when changing ADSR phases). But, I am slowly learning, and modifying my code to fix that as I go along.

The next example uses a oneShot to change the ADSR sustain level over time, and a second oscillator to gate arpeggiate the ADSR. vca2 is used for adjusting the max volume from osc2 between 0 and 1.0 for controlling the sustain times.

The next three circuits are pretty short and simple, so I’ll put them together here.

keyboard

The keyboard class is very straightforward. When the user presses a key (either the Play button in the Java GUI, or a real key on the Roland A-300), the MIDI standard tells us that a NOTE_ON MIDI message should be generated, which includes the note number (0-127) and the velocity the key was pressed at (0-127). When the user releases the key, a NOTE_OFF message should be generated, including the note number. It’s up to us to convert the MIDI note number to an actual frequency value. I took the piano frequencies from wikipedia, extended them for numbers beyond the plain 88 piano keys and put that in an array called keyFreqAry. After this, I use the NOTE_ON(int n, int v) and NOTE_OFF(n) methods to mimic the MIDI events. gateOut() lets me send the gate signal to the oscillators or ADSR. And noteOut() sends the specified frequency value to the target oscillator.

The Voltage-Controlled Amplifier was originally a hardware circuit that allowed the operator to include volume control in the analog synth. Generally, it was patched in between the ADSR output and the speaker amp as a form of pre-amp. I need to use the VCA in order to scale the +/- 1.0 magnitude audio signal to be audible. The sound engine expects a short value of +/- 16K for max volume. Additionally, I may want to apply a “DC offset” to the signal for things like two-tone arpeggiating (e.g. – between 200 hz and 400 hz). So, max. voltage is selectable from 0 to 16,384; and offset technically can go from -16K to +16k, although +/- 8000 might be more reasonable.

Noise is an important part of sound, especially if you look at snare drums and electric guitars. In the real world, getting pure “white” noise (absolutely random) was quite difficult. “Pink” noise was more common, in that certain frequencies were more dominant in the frequency domain. With a computer, generating random numbers between 0 and 1.0 is very easy, and the “color” of the noise may approach “white” depending on the algorithm used. However, noise gets overpowering very quickly and could cause hearing damage if it’s too strong. Plus, you may not always want a solid block of hiss in your signal. So, I included 4 noise “waveforms” to choose from.

By “density”, I mean that a random value will be generated every ‘x’ milliseconds, where you pick the value of x.

For Brownian, I start at zero, and add a small +/- random value to that. Next time, I add another +/- random value. This gives me a “random walking pattern” that should wander all over the place.

With simple experimentation, I’ve found that only 10% noise is enough to make a regular signal interesting (much more than that threatens to be painful to the ears). Straight random is very harsh, while random interval Brownian noise is almost “velvet-like”.

Sound doesn’t simply “happen”. Usually, there’s some kind of rise time from 0 to full amplitude, and some measurable fall time back to zero. If we look at a flute, the amplitude grows relatively slowly and drops off a bit faster. A snare drum, on the other hand is close to instant on and instant off. This change in volume really doesn’t have much to do with the type of waveform used for the audio signal (sine or squarewave). Instead, it’s the “shape” of that signal’s amplitude over time, or it’s “envelope”. That is, different instruments have different “envelopes”. This is a concept that took me years to really figure out.

While real-world sounds have a wide variety of envelope types, early analog synthesizers employed straight-line formats that were generally composed of at least 3 common parts, while more complex envelopes are built on the basic 3. They are:

A – Attack
D – Decay
S – Sustain

Attack is the rise time, or the speed at which the audio signal goes from 0 to full volume.Decay is the first fall time, or the speed at which the audio signal goes from full volume to the sustain level.Sustain is the level, or the audio signal amplitude that plays while the keyboard key remains pressed.

In this “ADS” system, when the user presses a keyboard key, the sound goes from 0 to full volume at the rate given by attack. If the key is still pressed, the sound immediately goes from full volume to the sustain level at the rate given by decay. While the key remains pressed, the sound stays at the sustain level. When the user stops pressing the key, the sound goes straight to 0. As an example, if A = 1s, D = 0.6s, S = 0.4, we get the following envelope (this is how the Gakken SX-150 Mark II works).

From what I’ve read, old Moog synthesizers had a “hold” parameter of 20 ms between attack and decay to add “punch” to the envelope output. And, ADSR systems add a release parameter to specify how quickly to go to zero when the user lets go of the key. In a Moog-style ADSR, with R = 0.2s, we get the following envelope.

An even greater elaboration would be to have an attack2/peak level/release combination instead of just release for something of a “wah-oh” effect at the end, but I haven’t taken the time to address that complication yet. Thinking about it, it wouldn’t be that hard to write, I guess.

Most analog synths have A, D and R controls that go from 0 to 4 seconds or so, either in linear increments or a gradient scale. For my purposes, since the Roland keyboard dials go from 0 to 127, I’m looking at roughly 30 ms steps, which is more than small enough (for most purposes). For S and full volume, I’m sticking with 1.0 and employing a VCA (voltage-controlled amplifier) approach for volume control. So, full volume will be 1.0, and S will go from 0 to 1.0 in 0.01 steps (0 to 100).

Triggering for Attack and Release occurs with the gate signal. It’s an edge trigger, so I’m storing the last gate value and checking if it’s changed. If so, I store the new value and advance the envelope phase pointer to either the A or R modes. Everything else is just a matter of incrementing the cnt counter and advancing phases from A to Punch to D to S automatically. Now, there’s a serious problem with introducing clicks in the sound if any given time parameter is set to 0 (i.e. – Decay = 0 ms), or when going from Punch to Decay when cnt == the Punch value. This is because normally saying “if cnt == punch, set mode = decay” means that there’s one time slice where there’s no processing of the envelope and the method returns 0. That 0 val is what causes the click. So, my if-statements get a bit kludgy in checking for all combinations of cnt and ADSR timing settings. The good part is that now that the code’s written, it’s easy enough to extend by adding attack2/peak level at a later date.

One elaboration I added is gate event triggers. I can use these to trigger additional oscillators to run only during the Punch phase, or during Release. A future mod will be to add an Invert mode, which is just “return(1.0 – ret * signalIn)”, for flipping the envelope upside-down.

I’ll use a for-loop this time to demonstrate how the 20 ms buffers are built up for the sound engine, using a single oscillator and the adsr.

for(int pCnt = 0; pCnt < 320; pCnt++){
shortBuffer.put( (short)( 16000 * ( adsr1.nextSlice(osc1.nextSlice() ) ) ) );
}
}
An amazingly fascinating possibility opens up if you look closely at that above circuit diagram. The zigzag lines with the arrows indicate a fixed value being assigned to each of ADS and R; e.g. – from the dials of the Roland keyboard. But what if you route the output of an LFO (low frequency oscillator) to, say, Attack? The rate value for Attack would change over time and Attack would no longer be linear. Good choices for waveforms would be the sawtooth and reverse sawtooth, which could be triggered by the Attack Gate event. You’d need to multiply the LFO output by a scalar, since the LFO only goes from 0 to 1.0, but that’s what the upcoming VCA class is for. Just make sure that one cycle of the reverse sawtooth aligns with the Attack time you want.

(The actual shapes will depend on the algorithm used to feed the Attack input, and could result in the two output envelopes being exchanged.)

The mook on Marie Curie is both good and bad this time. It’s bad in that the artwork is overly stylized (none of the characters look like their photos and Marie was never that attractive) and Marie’s personality is extremely softened to make her fit partially into the stereotype of “the good wife” (partially, in that she’s still portrayed as a dedicated researcher). It’s good in that there’s actual scientific theory and an explanation of how she achieved her early discoveries. When I was growing up, my textbooks made almost no mention of Curie, outside of just her name as one of the early researchers into the science of radioactivity. Which is a shame because she not only developed a way to extract radium from pitch blende, but she discovered both radium and polonium, was the first woman to receive a Nobel Prize (in physics, 1903, shared with her husband Pierre, and fellow researcher Henri Becquerel), received a second in 1911 (solo this time, for chemistry), helped establish a scientific explanation of radiation (a term that she herself coined), raised two daughters on her own after Pierre was killed in a road accident, was the first female professor at La Sorbonne, and had driven one of the world’s first mobile x-ray labs through Paris in order to assist the injured during WW I (she was the director of the Red Cross Radiological Service, with 20 vehicles and 200 radiological units throughout France). Her oldest daughter, Irene, assisted with the mobile units, and along with her own husband (Frederic Joliot-Curie) received a shared Nobel in 1935 in chemistry for their work on artificial radioactivity. (Making the Curie family the first to have 5 Nobel medals.) Marie died at age 66 from aplastic anemia caused by prolonged radiation exposure.

The intro manga for mook #37 has Merrino’s older sister, Mohea, receiving a diary from Mami’s mother. Mohea then proceeds to run around the house to document their behavior (Mami is a late riser and doesn’t notice when Mohea gives her shaving cream instead of toothpaste. Mami’s mother is a great cook. Youichi and Merrino always fight over who gets the last snack, which Mami usually takes and splits with Mohea. Mami’s father leaves for work before everyone else wakes up, and returns when they’re all asleep). She tries staying up one night until the father comes back from work, and the family finds her passed out in the hallway (mimicking a famous incident in Curie’s past), and the mother says that this is one time when he wasn’t scheduled to come home. The wrap-up has Merrino sending her report to the Sheep planet and winning the “No-baahh” Prize. There’s no cash award or medal, so Mami’s mother bakes her a cake with Mohea’s likeness drawn on top in frosting. Merrino vows to win the No-baahh prize himself, by gorging on every single kind of snack on Earth.

The main manga is by TOBI (Rooftop Princess, Girl With Glasses. There’s nothing on TOBI in English, and the Japanese wiki article only lists 4 titles, although Girl With Glasses was turned into an OAV by Media Factory.) If treated as a generic manga, it’s ok. The artwork isn’t very inspired, but it’s not really horrible. It’s that this time this is just a manga-ized version of western characters that have been drastically prettied-up so that they don’t look anything like their photos. Marie starts out as a precocious bookworm that turns into a devoted wife and mother. This is very much in keeping with the Japanese notion of a “good girl”, and may not come close to reality. Anyway, the story begins with two of Maria Sklodowska’s older sisters stacking chairs up behind her as she reads a book. Maria knocks the chairs down when she stretches after finishing reading, and doesn’t really notice the noise. The older one, Bronislawa, is jealous because Maria is reading and writing at a higher level than her, even though she’s 4 years younger. The family lives in the Kingdom of Poland, with both parents, 4 girls and a boy. When she was 9, her oldest sister died, followed by their mother 2 years later. At the time, Poland’s universities didn’t admit female students, so Bronislawa vows to go to France to study medicine to help protect the rest of the family. Maria takes a job as a private tutor to help raise money for Bronislawa’s education. In the story, Maria falls in love with one of the boys her age that she’s tutoring, but overhears his father forbidding his son from marrying “a peasant teacher”. Crushed, she concentrates more on her own studies and work. Bronislawa gets married to a doctor, and is in a position to give money to Maria to come to France and study as well. Maria moves in with her sister and brother-in-law, then gets her own apartment. She studies so hard that she forgets to eat and passes out. She’s discovered lying on the floor by her sister the next day. Maria ignores the other students that make passes at her, but attracts the attention of Pierre Curie, one of the professors at La Sorbonne, and he proposes to her when she nears graduation. They get married, with Maria wearing a black lab coat as a wedding dress, and settle down in Paris. She changes her name to Marie, the closest French pronunciation.

Pierre sets up his own lab, and Marie splits her time between raising their first daughter, Irene, housekeeping, and their research. At the time, Henri Becquerel had reported finding a strange energy coming from pitch blende, which was known to contain uranium. No one knew how uranium worked, so Marie and Pierre decide to study this energy. Marie developed a process of melting pitch blende and removing the crystallized metals that formed afterward, with the result being that the crystals put out more energy than an equal weight of uranium did. Through further study, Marie showed that pitch blende also contained other radioactive materials, leading to the discovery of polonium (named after the Kingdom of Poland, which had been carved up into 3 separate countries by that time). Pierre asks her at one point if the work load is too great for her, and she says she doesn’t mind, since she loves her husband and daughter so much (a scene taken out of any shojo manga). The announcement that they’ve won the 1903 Nobel for physics, shared with Becquerel, takes them by surprise. As does the letter from America offering to buy the patent on distilling polonium from pitch blend. The family could use the massive sum of cash offered, but Marie instead states that scientific discoveries belong to everyone and makes the process public for free. Their second daughter, Eve, was born in 1904, and Pierre was struck by a horse-drawn cart and killed in a street accident in 1906. The last page shows Marie conquering her grief, and going on to teach at La Sorbonne, and operating one of her mobile x-ray units along with Irene during WW I. Interspersed with the biography are science sidebars discussing radiation, radium, Henri Becquerel, and how radiation can be used to identify fake diamonds.

The textbook section contains photos of Marie, her family and her lab, pictures of the Nobel certificates, and shots of their old home and La Sorbonne. The text describes her upbringing, the research into Polonium and Radium, and her marriage with Pierre (when they got married, instead of the usual elaborate honeymoon and expensive presents, they bought themselves 2 bicycles and toured Europe). The last 2 pages highlight other scientific breakthrough moments, since Marie’s discovery of radiation occurred accidentally when walking into the lab with the lights off and seeing the beaker of crystals glowing blue. Examples include Archimedes leaping out of his tub, Newton and the falling apple, Fleming sneezing on a culture sample, Roentgen seeing a special photographic film glowing in the dark, and Kouichi Tanaka‘s discovery of a way to perform mass spectrometric analysis of biological macromolecules.

Overall, ignoring the artistic licenses taken in this mook, #37 does a decent job in presenting a brief pictorial overview of Marie Curie’s life and accomplishments, while adding just enough science as to be educational without being obtrusive. There are notable omissions, such as her connection to the Red Cross, the vilification she received from the French right-wing press for being a foreign-born woman in France, as well as the fall-out from a year-long affair in 1911 with physicist Paul Langevin, who had been separated from his wife at the time. So, if you want a deeper understanding of Marie Curie the person, I suggest getting a good biography on her.

I left off in part 3 with a working oscillator class, but no explanation for how to use it. This time, I’ll talk a little more about waveform generation theory. Taking the class from the last entry, let’s create an object called “osc1”.

(The arrow plus wiggly on the left represents a variable voltage input for selecting the frequency via some external control or dial.)
osc osc1 = new osc(1000, 0, 0.5); // Freq. Waveform, Ratio
speakerBuffer[ptr] = osc1.nextSlice();

Note: speakerBuffer[] is pseudo-code. I’ll use real code in later entries.

The constructor takes a fixed frequency value, a waveform type and the ON/OFF ratio. As mentioned before, the amplitude for the sound signal is -1.0 to +1.0 and will be centered at 0. The available waveforms right now are:

So, I have an oscillator producing a 1KHz sinewave. (Ignoring the fact that I need a for-loop and to increment cnt in the loop.) Pretty boring. However, remember that audible sound is from roughly 30 hz to 12KHz. What happens if we go below that?

The LFO
An LFO, or Low Frequency Oscillator, is simply a regular oscillator running between 0.031 hz to 200 hz. Meaning that my osc class is already both a regular audio oscillator and an LFO. (In the real world, a low frquency oscillator needs a different design, which is why they’re treated differently.)

Say I want to create an arpeggiator. You may remember from the K-Gater series that an arpeggiator is a circuit that turns a note on and off. The Korg Kaossilator Pro has a simple gate arp that lets you either toggle a voice on and off with a variable rate and a fixed ratio, or a variable ratio and a fixed rate. My oscillator class ratio member lets me choose between 0.0 (0% on) and 1.0 (100% on). Essentially, it lets me make a rectangle wave.

The gate member is used to zero the waveform output if gate is false. The gateOut method is true if the waveform output is greater than 0, false otherwise.

Starting to get the picture? Just two oscillators and we’re already having fun.
Instead, let’s go with beats. We get a beat frequency when two oscillators are running in parallel but at slightly different frequencies.

Here, osc2 is running 5% faster than osc1, producing a 1kHz tone that sounds like it’s turning on and off at a 50 hz rate. If we change freq, we still get that 5% beat frequency.
Or, another choice. How about the next step in arpeggiating – switching between two frequencies?

In a normal hardware synthesizer, electronic circuits usually output one kind of thing – voltage or current. An oscillator’s output may change 1000 cycles a second to make a 1KHz sinewave output, but it’s still just voltage. If we want to control the oscillator, we can apply an input voltage that then makes the oscillator run at different speeds. This is a VCO (Voltage Controlled Oscillator). In a way, a normal oscillator that we can tune using a manual dial is also a VCO, it’s just that it’s intended only for manual tuning. Either way, the purpose of a VCO is to let you change the output frequency by changing some input voltage. Most VCOs are non-linear, meaning that smaller voltages produce small changes in frequency, and higher voltages produce bigger changes. Typical VCOs are 1V per octave (0V-5V = 5 octaves). Other VCOs are linear, with fixed frequency steps per 1 mV. My oscillator is linear right now, and it doesn’t know the difference between “voltage”, “current” or frequency” – they’re all just unit-less numbers to us.

What if I want to arpeggiate between 200 hz and 300 hz 4 times a second?

osc osc1(100, 0, 0.5);
osc osc2(4, 1, 0.5);
double base = 250.0;
double offset = 50.0;
osc1.setFreq(base + offset * osc2.nextSlice()); // osc2 outputs a -1.0 or +1.0 squarewave.
speakerBuffer[ptr] = osc1.nextSlice();
Here, osc2 produces a 4 hz squarewave with an output that goes from -1.0 to +1.0, which is then scaled and applied to an offset before being fed into osc1 as its operating frequency specification. Technically, the value of 100 in the osc1 constructor is being overridden, but I have to put something in there because I only have the one constructor method. osc1 then produces the arpeggiated output. To change the arp rate, I just enter a new value of osc2 frequency. A new value of ratio would change the time spent at 200 hz or 300 hz. To change the two arpeggiated frequencies, just change the values of base and offset.
One last idea. How about frequency sweeping? Say I want the sound to go smoothly from 100 hz to 800 hz and back?

This is effectively the exact same setup as for the second arpeggiator circuit, with the only real difference being that osc2 is outputting a linear triangle wave. So, if we instead try the sine, sawtooth or reverse sawtooth waveforms, we’ll get completely different effects at different frequencies and offsets. (Actually, osc2 sinewaves at 5-10 hz are just weird.)

All this with just 2 oscillators and nothing else. Adding a third could let you gate arp a frequency sweep…

If we look at Dick Baldwin’s tutorial on sound synthesis, we’ll see that there’s a method specifically for setting up an “audio format” object. This object defines how the computer is going to accept audio data going to an internal mixer prior to being sent to the speakers. One parameter of this object is sampling rate. It only accepts the values 8000, 11025, 16000, 22050 and 44100. This is important for several reasons.

First, sampling rate sets the number of bytes we need to generate per second. This will be a fixed value for the entire time the program is running.Second, it sets the timing for our program. That is, if we pick 16KHz, then the timing is going to be 62.5 us for every sound function we create.Third, we have to worry about aliasing. This is when the number of samples per second are insufficient for recreating our waveform in real time. Generally, we need at least 2 samples per cycle of a sinewave to be able to make something that sounds like a sinewave to our ears. In essence, this sets the highest frequency we can play to 1/2 the sampling rate. If you use a 16K rate, then the highest frequency we can create from the oscillator is 8KHz. (Technically, at 8 KHz, all we’re getting is shrill squeaking. We lose the sinewave shape after 4 KHz.)Fourth, the above 3 reasons are interrelated. We can use a faster sampling rate, but if the computer runs slow, we’re going to get “buffer underruns” and clicking. We can instead use a lower rate, which means better performance on slower computers, but no sounds above 8 KHz, or even 4KHz (for the 8000 sampling rate).

The human ear generally can only discern sounds from about 20 Hz to 15KHz, and that deteriorates as we get older. At age 35, it may be closer to 30 Hz to 12KHz for men. Additionally, on my computer, any sounds below 50 Hz are very difficult to hear, and below 20 hz turn into simple random clicks. Fortunately, if we make SAMPLING_RATE a constant, we can easily tweak it to determine the best setting for us on a computer-by-computer basis.

Yeah, so what?

Well, let’s say that we want to create a sinewave for making a simple 1KHz tone, and we then want to run it through the envelope generator to shape the volume output before sending it to the speakers. We need to do each operation once on a per-sample basis. I.e. – get the value out from the oscillator, apply the envelope to it, then send it to the speaker. Increment the sample counter, get the next value from the oscillator, shape it, send it to the speaker, etc. We can build a short loop and perform the oscillator and envelop operations 100-200 times, saving them to a buffer prior to dumping the buffer to the sound engine, but the individual oscillator and envelope operations need to be synced per-sample. And the best way to do this is use a simple integer counter to track which sample we’re on. The best part about using a counter is that we can change the oscillator or ADSR parameters in real time and the counter will let us perform our calculations with minimal complications.

We’ll worry about the adsr() function later. What’s important is that we could change the value of frequency whenever we wanted, going from 400 hz to 800 hz and back, and the method osc() would cope with it.

—————-

Ok, oscillators.

A hardware oscillator is a circuit that varies its output between two values. That is, it “oscillates” back and forth. One kind of oscillation is the sinewave. It is easy to generate in software, and produces kind of a “wind through a tube” whistle tone. Other waveforms that can be useful for a synthesizer are the squarewave, sawtooth, reverse sawtooth and triangle waves. If they are within my above range restriction (50 hz to 12 kHz) they are audible to the human ear.

I’ll put methods for making each waveform into my osc class. This will let me easily use 3 or 4 oscillator objects at a time for a wide variety of purposes.

I’m putting cnt within the class to allow me to run multiple oscillators independently. If I want to sync them together, I can call their resetCnt() methods all at the same time.

Calculating sines is very time consuming, but for the most part it’s just a constant times cnt. So, I’ll speed things up by pulling the constant part out and recalculating it when the frequency changes. As for the other waveforms, it’s easy to get one “cycle” by doing a modulus caculation on frequency (cnt % (sampleRate / freq)), which gives me a number from zero to sampleRate / freq – 1. The calculations for each waveform should be easy to figure out.

This leaves ratio, gate and lastVal. The amplitude of all waveforms will be from -1.0 to +1.0, and there’s no DC offset (I’ll deal with that in the section on the VCA). I talked about gate in the intro section – this is just an on/off signal that can be used for controlling a given oscillator, or controlling any other circuit from a given oscillator. Normally, ratio is 50% (on half the time, off the other half), but we may want other settings if the oscillator is used for arpeggiating. I’ll talk about this in the next blog entry. lastVal just stores the last output value for those times when you want the osc output, but don’t want to increment cnt.

(All rights belong to their owners. Images used here for review purposes only.)

Naomi Uemura is not a household name in the west, but he is still very famous in Japan as its first, and most successful world adventurer. He set world records as the first person to climb all 5 of the tallest peaks (conquests include McKinley, Mount Kilimanjaro, Aconcagua, Mont Blanc, the Matterhorn and Mount Everest) on five continents solo, the first to ride the full length of the Amazon River solo, and the first to reach the North Pole by dogsled solo. He initially planned to reach the South Pole by dogsled in 1982 with help from Argentina, but the outbreak of the Falklands War caused him to postpone that, and instead he tried climbing McKinley again during the winter for training. He was last seen around Feb. 14, 1984 during his descent from the peak. High winds and bad weather prevented an aerial rescue and he failed to reach his base camp. He’d just turned 43 (his birthday was Feb. 12.) Among his innovations, were a sail for the dogsled, and a bamboo pole rig for catching himself if he fell in a crevice.

The intro manga has all of the kids out climbing a wooded hill and they get lost. Merrino accuses Utako of leaving the trail to chase a butterfly; Utako says it’s because Daichi wanted to go to a weird cliff; Daichi blames Youichi for trying to find water, which in turn was because Mami needed a drink due to Merrino’s only bringing spicy snacks along to eat. Ken tells them to all shut up. Because of his interstellar travels, he’d gotten good at preparing for trips – he brought 3 days of snacks, bandages and solar reflective ponchos. However, his slip of the tongue almost gives him away as an alien and he shouts for Study Bell to start the next lesson. In the wrap-up, it’s getting dark and the kids are getting scared. Ken hears a wolf calling, giving him directions to the nearest highway. In the bushes, the Wolf Trio have been spying on our heroes, and the female leader was the one to help them out. Her minion asks why she didn’t take the opportunity to announce her love to him, and she gets all flustered.

The main manga (by Hiroshi Kashiwaba, who has almost no credits in the Japanese wiki) is mostly faithful to Naomi’s own accounts of his adventures, and the artwork is not that overly westernized. Naomi’s nose has again been made too thin, compared to his photos, but it’s not that distracting. The story picks up with Naomi dog sledding along the Arctic ice cap, and having to untangle some knotted harness ropes. The dogs pull free and his entire team runs off, leaving Naomi to figure out how to cover 60 km on foot. Half the team returns and he makes town safely. He considers giving up this adventuring thing, but uses the experience to strengthen his resolve. He then flashes back to when he was age 19 and participating in his university’s mountain climbing club. He grew up in the mountains, but was physically unfit. A fellow club member showed him a photo of McKinley, and Naomi was smitten by the idea of seeing glaciers. He built his body up, graduated from university, and went to the U.S., where he made money picking grapes in California.

Naomi continued on to France to try climbing Mount Blanc, but a fall through a crevice showed him how dangerous the ice could be. So, he lied about being a skier to get a job at a ski resort. Because he was such a hard worker, when the resort learned about his fib, they decided to not fire him. He then went on to summit Gojyuba Kan (ゴジュンバ・カン登頂: seems to be a smaller mountain in the Himalayas; I can’t find an entry for it) Mount Blanc, Kilimanjaro and Aconcagua. While in Peru, Naomi decided that he wasn’t going to just focus on mountains, and next tackles the 6,000 km-long Amazon on a raft on his own, finishing in 2 months. In 1970, he followed a Japanese team up Everest, separating from them to reach the peak by himself. The sight from the top of the world spurs him on to go for the south pole. He goes to Greenland and lives in an Eskimo village for several months to learn how to control a dog sled, which brings the story up to the fiasco on the Arctic ice cap. But, he makes it back safely, and gains skills in hunting with a rifle and skinning seals for food. The Falklands War prevents him from setting off from Argentina for the Antarctic in 1982, so he goes to McKinley in early 1984 for training and disappears.

The textbook section is completely focused on Uemura this time, with photos of him, his university friends, shots of him on various peaks, and one with one of the Eskimo kids he befriended. There’s a map of his various treks, pictures of him picking grapes in California and working a ski lift in France, and some of the gear he used (including his sextant and the rifle and hunting blind used to kill the polar bear that terrorized him). The text talks about his upbringing in Hyougo prefecture, the different jobs he took to raise money, his primary adventures, and the polar bear incident. One night, he awoke to find the bear in his tent. It ate all of his foot and snuffled his sleeping bag before leaving. He staked out the tent the next day, and shot it when it returned for more food.

Overall, this volume is one of the more personal, weaving Uemura’s adventures in with his own triumphs in a such a way that we can learn more about what he was good at, rather than just get a handful of vignettes to sit through. This is in contrast with, say, the Honda volume where we learn nothing about bike engine design, which was the one thing Honda himself was really good at. With Uemura, we can at least see a little about what it takes to control a dogsled. Recommended.

Ok, I was going to hold off on writing this section on the sound engine until later, when I had it fully working. However, it’s now close enough as to justify at least talking about it. I was also going to go into a long discussion of why I made various choices, but the code is long enough as is, and there’s no point to prolonging things.

There are three parts of a software synth, in my opinion, that are needed right at the beginning – a working oscillator (to make something to hear for verifying that everything else works), the ADSR (to show that things are working in real-time) and the part that delivers the sound waveform to the speakers. Arguably, the ADSR can be left to later, but there’s no point to having the speaker section working if there’s no tone to play, and vice versa. So, the speaker section and the oscillator kind of went hand-in-hand.

If we look at Dick Baldwin’s Java audio tutorials, specifically the one on synthesized sounds, we’ll see that most of what we need is right there. The method playOrFileData() sets up a source data line to the speakers; listenThread() takes a pre-built waveform and sends it out through the source data line, and the synGen class is what pre-builds the waveform to be played. The problem with this arrangement, for my purposes, is that it makes a 2-second long waveform and then plays it. It’s not real-time and it’s not particularly responsive. (I don’t want to press a keyboard key and then wait 2 seconds to be able to press the next key.)

For convenience sake, I’m going to call the following code the “sound engine”. It will consist of a listener that runs permanently in the background waiting for something to play, a timer to call the method for making 10-20ms sound slices, the slice method itself, a method for starting the listener, and a method to stop it when the program exits.

The listener start method needs to determine the audio format we’re using (sample rate, number of channels (1 or 2), sample size (8 or 16 bits) and serial bit order), and then use that to open a data line, which then is used for opening a source data line. The last step is to launch the listener. We need to create some supporting variables as well. (Again, I apologize for WordPress’ stripping out of formatting. A link to the formatted textfile with the code fragment can be found at the end of this entry.

In Baldwin’s tutorial, he creates a 64K buffer array, which, because it has a fixed length, ensures that the listener will take 2 seconds to dump and play back the sound. Since I’m using a 16000 sampling rate, and running between 10 and 20ms of data at a time, the array can be 320 or 640 bytes in size. This part needs tweaking, but the idea is that while one slice of waveform is being dumped to the speakers, I want to be building up the next slice to have it ready when needed, to avoid data run-outs (running out of data too quickly, which results in clicking noises). So, I’m using a double buffer and alternating between them. I’m trying a staggered approach, where the first slice of waveform is 10ms, and then all subsequent slices are 20ms each. It works in principle, but I’m kind of guessing at the buffer sizes and I really need to sit down and make sure I’m doing this just right.

First, start the listener.

public adsrUI() {
initComponents();
startSend2SpeakersListener(); // Get the play data listener running in the background
}

// Listener that does the actual work of sending data to the speakers.

// playBuffer[] is fixed length, meaning that there’s no real clue that this listener has reached the end when sending data to the
// audioInputStream buffer. To get around this issue, I’ll use the gotData flag to show when new data is ready for buffering. Then,
// I need to mark the first byte of the audioInputStream buffer in order to have .reset() return to it at the end of buffering.
// One important point to remember is that the listener is running non-stop in the background and will keep trying to play whatever
// was last in the buffer if we just reset it to the beginning using the .reset() method. So, I need to make gotData false as well.

I talked about timers in the K-Gater blog series. I’m just setting up a simple timer and using a counter to determine how many milliseconds have past. Then I call the slice generator at 10ms or 20ms intervals.

And here’s where I do the actual work of making the waveform to be played, slice by slice. I’ll be talking about this section a lot in the future. But, basically, the idea is as I mentioned above. I want the double buffers to be stagered, so while one is dumping data to the speakers, the other is getting a new waveform slice. So, the first buffer takes 10 ms of data, and then they’re all 20 ms long. I use the variable gotData to tell the listener to check the buffers for something to play (and the listener sets it to false at the end of dumping each slice). Otherwise, the only real magic is in the for-loop, and I need to wait until I talk about the oscillator to get into any details.

As mentioned at the beginning, I get clicks when the waveform amplitude or frequency changes too quickly. I think this is due to the low sampling rate. At 16,000 samples/second, I can only produce a 8 KHz wave anyway, and since the human ear can sense up to at least 12 KHz, I’m planning eventually to go to a 22,050 or 44,100 rate.

Ok, things are slowing down here again. I’m still waiting for the next 50 Famous People issue that I want to come out. I’ve finished K-Gater (for right now) and there’s no new news from Gakken. To fill the gap, I think I’ll start up a more-or-less weekly (for right now) “diary” on the progress of a new Java app I’m working on.

I’ve mentioned that I love synthesizers but that I don’t have the money or space for them. I suggested that I might write something in Java following the completion of K-Gater, but that I was putting it off because I didn’t know how much of a performance hit an FFT would cause. However, I find myself with some unexpected free time, and it’s looking really difficult to pull together the Java code needed to write a synth app. This is very similar to the situation I faced with controlling external MIDI hardware with Java, and I’d written K-Gater just to have a real-world example of external hardware MIDI control “out on the net”. So, I’ve decided to tackle a Java synth app for the same reasons.

One difference this time, though, is that I’ve found the tutorial pages from Dick Baldwin a professor at the Austin Community College. He’s written over 600 tutorials on C++, Java and other programming languages, but it’s only the 10 files on Java Audio that interests me right now. These are a good start, and the one on synthesizing sound lays down some of the major principles that I’m looking for. The problem is that Dick, plus most of the other people that I’ve looked at, addresses the complete generation of the sound prior to playing it back, using buffers and clips. This doesn’t work for my purposes, because I want to change envelope and oscillator settings in real-time while the key is being pressed. The existing tutorials separate generation from playback, building up the entire sample (or reading it from an existing stream from a file), and afterward play it. I need to break the generation phase up into 10ms or 50ms slices, run it through the FFT, play that, read the MIDI keyboard controller dials, change software settings and then generate another 50ms time slice.

In essence, I need a buffer I can access from outside a listener, appending data as needed, and then dumping the buffer when the user releases the keyboard key. The advantage to deleting the buffer at the end of one note and the start of another is that I can then assign buffers to separate keys, and allow the user to play overlapping notes. Depending on the final approach I take, the overlapping notes will allow me to add a “glide” feature, common to some hardware synths.

I’ll end things here with a short description of the “gate” concept.

In a hardware synthesizer, the gate signal acts like an on-off switch. You can either just turn on an oscillator and let it run independent of the keyboard (letting you make music with just the dials and not the keyboard keys), or you can send the gate signal to the oscillator, turning it on only when the key is pressed. In this second case, we’d need to know what MIDI note number is associated with the key (0-127) so that we can convert it to specific oscillator frequencies (ala a VCO – voltage-controlled oscillator).

Additionally, we have the envelope generator. This circuit changes the volume of the output sound based on time. Typically, the envelope follows the Attack, Decay, Sustain and Release pattern, and it starts with the rising edge of the gate signal and ends with the falling edge. Meaning that even if we leave the oscillator running, we wouldn’t hear anything out from the ADSR envelope generator until it (the ADSR) got the gate signal to start the Attack phase.

Why does “gate” matter? Well, if we add gate inputs to each circuit, we have the ability to turn them on and off as desired (regardless of whether it’s an oscillator, ADSR, inverter or a frequency filter). Then, instead of using the keyboard to generate the gate signal, we could use an oscillator outputting a very slow squarewave to make an arpeggiator.