You can read more about it here:http://arduino.cc/forum/index.php/topic,53604.msg383164.html#msg383164

Anyway, the sound effect it plays is generated via additive synthesis. Using a lot of fixed point math, a sine table, and 16 bit phase accumulators, I add two to six sine waves together, and modulate their pitch and volume with two more, in addition to modulation I perform using the current speed setting of the prop. In total, I have ten phase accumulators which re being updated, and I do the frequency modulation by adjusting the phase accumulator step values directly... which I can do because it turns out there is a 1:1 mapping of step value to frequency, calculated like so: (frequency/samplerate)*65536. The samples generated in this manner are then fed into a 1024 byte buffer which my DAC interrupt increments an index into as it pulls data from it.

The problem is, my code isn't fast enough. My DAC needs to be updated 31,250 times a second so that I can generate high pitched chirping noises, but my testing quickly revealed that the update function is lagging behind quite a bit. How much exactly, I'm not sure, but when I tried to display how many samples behind the function calculated it was based on the indexes into the buffer the numbers were all over the place, and my main loop slowed to a crawl when I allowed it to update the entire buffer if need be. I managed work around the issue by letting the function that generates the samples to only update one quarter of the buffer at a time, and that seems to have worked because my sounds are largely repetitive, but it's a kludge, and I fear that it is impacting the quality of my sound output, and causing my servos to jitter occasionally.

So here is the code which generates the samples for my DAC. If you have any suggestions for how to make this like 4x as fast, I'd love to hear it:

const int pitchChange = 2000;//1400*1.5; // How much the frequency of the sound effect should rise between 0% extension and 100% extension. const int pitchStepChangeMax = (pitchChange / 31250.0) * 65536; // This value, scaled by the current PKE speed, is added to the step values for the phase accumulators which generate the sine waves used to recreate the PKE's sound. It is another way of representing the frequency change when the PKE speeds up. int pitchStepChange; // The calculated amount by which we should adjust the step values for all the frequency phase accumulators.

byte pkeVolumeFixed; // 0.256 fixed-point representation of current PKE volume level. byte volume; // 0.256 fixed-point representation of current volume level after 1-2hz oscillation has been applied, and just before it is applied to the current sample being rendered.

const word sectionLength[4] = {256, 128, 256, 384}; // The number of samples in each of the four sections which make up the sound effect. (5 sines, 5 sines w/ noise, 5 sines, 2 sines) static double sectionDivider = 1; // Section length randomly varies between /1 /2 and /4. This value signifies the number of bits to shift the values by to accomplish that. [0..2] static word sectionIndex = 0; // Incremented as we step through each section. static byte section; // The section currently being rendered. static byte pulseCount; // A pulse is a group of 5 sections. We count how many have been rendered so we can change the timing between pulses every second or so.

static int nextIndex = 0; // The next sample in the buffer which needs to be updated. long sample = 0; // The new sample to be placed into the buffer. We may be able to reduce this to a byte or a word.

sample = sample - 128*2; // Center sample around 0. We need to do this before dividing to get the "average" because we aren't dividing by the same number as the number of samples, and the volume is being reduced a bit, so the math doesn't work out otherwise.

sectionIndex++; // Increment the number of samples we have rendered in this section. if (sectionIndex >= (sectionLength[section]*sectionDivider)) { // If we have rendered all the samples needed for this section, move on to the next section.

sectionIndex = 0; section++;

if (section == 4) { // If we have gone past the last section, select a new sectionDivider, and start again from the first section.

pulseCount++; // Keep track of the number of pulses (chord pairs) we have rendered at this speed.

if (pulseCount >= 32) {

pulseCount = 0;

// At a 31250hz sample rate, and standard pulses being 1024 samples long, there are around 30 pulses per second. // If we want to change the pulse rate every second or so, (as defined by sectionLength[section]>>sectionDivider) then changing the rate every 32 pulses will do the job. // Since sectionDivider allows us to speed up the pulses, if we want to wait for the same period of time before changing speeds, we must remember to multiply the desired pulseCount by sectionDivider, // or divide pulseCount by sectionDivider before comparing it.

// Random() is a slow function but since we're only using it once in a while, it might be okay.

case 0: // Select from any of the three speeds at the "slowest" (quietest, lowest pitch) setting.

//sectionDivider = random(0, 3); // random(0, 3) returns a float greater than or equal to 0 and less than 3, so the range when typecast to (int) will be [0..2]. //noise = pgm_read_byte(noisetable + noiseIndex);

How am I compiling it? I'm not sure what you mean. I'm using the Arduino IDE if that's what you're asking.

As for how much each line of code is executed...

The function is called once per main loop. The main loop reads the analog inputs, and has a couple 1ms delays in it to do so. Around 64 samples will be played in the time it takes for those delays to execute, but if the function is fast enough it should be able to catch up.

There is a bit of setup code at the start of the function for variables which change only when the speed changes. Since the speed can change only outside the function, and when the analog inputs are read, I can avoid having to recalculate those variables as long as I'm in the function refilling the buffer.

Most of the rest of the code in the function is executed once per sample. The oscillators are updated once per sample as well, and there are ten of those. Now that I think about it, I suppose I could unroll that oscillator update loop and save a few cycles there by executing one fewer conditional jump per sample.

At the end of the function there's some code with some conditionals. This is what I'm reffering to:

sectionIndex++; // Increment the number of samples we have rendered in this section. if (sectionIndex >= (sectionLength[section]*sectionDivider)) { // If we have rendered all the samples needed for this section, move on to the next section.

I suppose I could precalculate sectionLength[section]*sectionDivider. The sectionlength and sectiondivider can't change until that condition is met and I've moved on to the next section.

The code inside that conditional is executed roughly only once every 1024 samples though, so unless that is really really slow I shouldn't need to worry too much about it.

I cannot see why sample should be a longIt contains the sum of upto 6 byte values, divided and shifted to be 0 centered, then multiplied by a byte volume.It should be a signed 16 bit int...This code would be 2 to 4x quicker to execute

volume = ((long) pkeVolumeFixed * sine) >> 8; // [0..255]is doing a 32bit multiply, when a 16 bit result would do I think.If the compiler generates a call to the 32 bit multiply this is costly > 32 clocks.A 32 bit multiply is 4x slower than a sixteen bit multiply which is 4x slower than a 8 bit multiply which is 2 clocks.

-

The only other oportunity in the code for improvement that I can see is the logic around the sectionIndex, section number and the switch(section).

If you could pull this logic out of the sample loop, and have three functions one for section02(), section1() and section2() you might get a speed up.

This does depend on the rest of the codes structure.

-

Is there a simulator environment in which you can profile the execution of the code?

Professionally, i refuse to optimise without measurement.

in this play environment it is fun almost because the measurement tools seem to be missing, but this is a lot of code to get running fast and correctly - and profile tools would help a lot.

You calculate the phase[0...9] as 16 bit numbers but only use one byte in the code in the inner loop ..You would make the code easier to execute (so faster) if you stored the bit of it you need to use.outside the loop.

The phase accumulators are incremented after each sample is calculated in advance of the next sample to be rendered. I'm pretty sure I can't do that calculation outside the main loop.

Quote

I cannot see why sample should be a longIt contains the sum of upto 6 byte values, divided and shifted to be 0 centered, then multiplied by a byte volume.It should be a signed 16 bit int...

You may be right about that. It's hard to keep track of how large the values will get. I think I made it a long at some point to make sure that an overflow wasn't the problem. When I first tried this code after writing it, it didn't work at all. That's why there's a few floating point values used in there in the code I commented out. I was trying to simplify things until I found the source of the problem, then I went back and optimized it again. I'll double check that and the other variables to see what I can make smaller.

Quote

If you could pull this logic out of the sample loop, and have three functions one for section02(), section1() and section2() you might get a speed up.

Yeah, I was looking at that after I posted the code. Not really sure how to go about optimizing that in a clean manner. Right now it's like:

For {switch {case:case:case:}}

But to avoid the case in the loop I'd need to flip it around something like this:

repeat { switch {case:repeat {} until we'vew rendered all samples in this sectioncase:repeat {} until we'vew rendered all samples in this sectioncase:repeat {} until we'vew rendered all samples in this section} until we've rendered all samples

And all those case statements would have a ton of repeated code in them. Which isn't good if I want to be able to maintain the code and tweak it. Then I'd have to copy all the tweaks three times every time. And I've been doing tons of tweaking to get the sound right.

Quote

Is there a simulator environment in which you can profile the execution of the code?Professionally, i refuse to optimise without measurement.in this play environment it is fun almost because the measurement tools seem to be missing, but this is a lot of code to get running fast and correctly - and profile tools would help a lot.

I don't know of any such tool. But I guess I should look for that thing you mentioned which will let me look at the generated assembly code. I think it's gonna be hard to find the relevant section to look at in that though. I imagine it's not going to have any of the variable names and be a mess of stuff being pushed and popped off the stack and I won't be able to make heads or tails of it.

Hm... one thing I can do with the phase accumulators is not increment them all every loop. As I mentioned, I could unroll that loop where I increment the accumulators. But rather than do that, instead I could increment the accumulators I use in the switch statement that selects which ones to combine. I think I did it the way I did originally because I was trying to make the code simpler and more generalized, and also because I thought it might improve the sound quality, but now I think that's probably not necessary.

static double sectionDivider = 1; // Section length randomly varies between /1 /2 and /4. This value signifies the number of bits to shift the values by to accomplish that. [0..2] static word sectionIndex = 0; // Incremented as we step through each section. static byte section; // The section currently being rendered.

sectionIndex++; // Increment the number of samples we have rendered in this section. if (sectionIndex >= (sectionLength[section]*sectionDivider)) { // If we have rendered all the samples needed for this section, move on to the next section. sectionIndex = 0; section++;

Is doing a float (double is not implemented on AVR-GCC) multiply each time through the loop. That is a couple of hundred cycles.

You could count down from sectionLength [ section ] *sectionDivider to 0.

=======The comment about restructuring is a bit of a guess. I thought the outer function might look something like this

const word sectionLength[4] = {256, 128, 256, 384}; // The number of samples in each of the four sections which make up the sound effect. (5 sines, 5 sines w/ noise, 5 sines, 2 sines) static double sectionDivider = 1; // Section length randomly varies between /1 /2 and /4. This value signifies the number of bits to shift the values by to accomplish that. [0..2] static byte pulseCount; // A pulse is a group of 5 sections. We count how many have been rendered so we can change the timing between pulses every second or so.

pulseCount++; // Keep track of the number of pulses (chord pairs) we have rendered at this speed.

if (pulseCount >= 32) { pulseCount = 0; // At a 31250hz sample rate, and standard pulses being 1024 samples long, there are around 30 pulses per second. // If we want to change the pulse rate every second or so, (as defined by sectionLength[section]>>sectionDivider) then changing the rate every 32 pulses will do the job. // Since sectionDivider allows us to speed up the pulses, if we want to wait for the same period of time before changing speeds, we must remember to multiply the desired pulseCount by sectionDivider, // or divide pulseCount by sectionDivider before comparing it.

// Random() is a slow function but since we're only using it once in a while, it might be okay. switch ((int)(pkeSpeed*3)) { // With an (int) conversion, fractional portions are discarded, so everything except pkeSpeed = 1.0 is mapped to 0, 1, or 2.

static word phase[phaseAccumulators]; // Phase accumulator. Used as index into sine table with >> 8. [0..65535]. No modulus is needed with these accumulators as they automatically wrap back to 0 once they hit 65535. word phaseStep[phaseAccumulators]; // Step value by which to increment phase[] each sample.

const int pitchChange = 2000;//1400*1.5; // How much the frequency of the sound effect should rise between 0% extension and 100% extension. const int pitchStepChangeMax = (pitchChange / 31250.0) * 65536; // This value, scaled by the current PKE speed, is added to the step values for the phase accumulators which generate the sine waves used to recreate the PKE's sound. It is another way of representing the frequency change when the PKE speeds up. int pitchStepChange; // The calculated amount by which we should adjust the step values for all the frequency phase accumulators.

byte pkeVolumeFixed; // 0.256 fixed-point representation of current PKE volume level. byte volume; // 0.256 fixed-point representation of current volume level after 1-2hz oscillation has been applied, and just before it is applied to the current sample being rendered.

const word sectionLengthBase[4] = {256, 128, 256, 384}; // The number of samples in each of the four sections which make up the sound effect. (5 sines, 5 sines w/ noise, 5 sines, 2 sines) static word sectionLength = 256; // Desired length of section currently being rendered.

static double sectionDivider = 1; // Section length randomly varies between /1 /2 and /4. This value signifies the number of bits to shift the values by to accomplish that. [0..2] static word sectionIndex = 0; // Incremented as we step through each section. static byte section; // The section currently being rendered. static byte pulseCount; // A pulse is a group of 5 sections. We count how many have been rendered so we can change the timing between pulses every second or so.

static int nextIndex = 0; // The next sample in the buffer which needs to be updated. int sample = 0; // The new sample to be placed into the buffer.

sample = sample - 128*2; // Center sample around 0. We need to do this before dividing to get the "average" because we aren't dividing by the same number as the number of samples, and the volume is being reduced a bit, so the math doesn't work out otherwise.

nextIndex++; // Increment the index. nextIndex = nextIndex & 1023; // Wrap nextIndex when it reaches the end of the buffer. Bitwise AND is much faster than modulo.

// Update which section we are in.

sectionIndex++; // Increment the number of samples we have rendered in this section. if (sectionIndex >= sectionLength) { // If we have rendered all the samples needed for this section, move on to the next section.

sectionIndex = 0; section++;

if (section == 4) { // If we have gone past the last section, select a new sectionDivider, and start again from the first section.

pulseCount++; // Keep track of the number of pulses (chord pairs) we have rendered at this speed.

if (pulseCount >= 32) {

pulseCount = 0;

// At a 31250hz sample rate, and standard pulses being 1024 samples long, there are around 30 pulses per second. // If we want to change the pulse rate every second or so, (as defined by sectionLength[section]>>sectionDivider) then changing the rate every 32 pulses will do the job. // Since sectionDivider allows us to speed up the pulses, if we want to wait for the same period of time before changing speeds, we must remember to multiply the desired pulseCount by sectionDivider, // or divide pulseCount by sectionDivider before comparing it.

// Random() is a slow function but since we're only using it once in a while, it might be okay.

case 0: // Select from any of the three speeds at the "slowest" (quietest, lowest pitch) setting.

//sectionDivider = random(0, 3); // random(0, 3) returns a float greater than or equal to 0 and less than 3, so the range when typecast to (int) will be [0..2]. //noise = pgm_read_byte(noisetable + noiseIndex);

The code seems much faster now, though this has changed the character of the sound that's being produced. Also, it has also sped up the animation of my leds. Or rather, it's no longer slowing them down. The LEDs are supposed to be updated at 30hz, and I hadn't noticed the original sound update code was running so slowly that it was affecting them.

I guess I'll have to retweak the sound after I do some more testing and optimizaton. I think I will count the number of samples rendered so far and the number of samples played so far, and compare the two by dividing one by the other to get a ratio which will tell me just how much faster the playback speed is than my update speed. Maybe I can now get it closer to the ideal I was shooting for which was to have it sound identical to the film, though.

Oh, and as for avr-objdump, I've been looking into that but have yet to find where the Arduino IDE is sticking the generated object files so I can disassemble them.