Sound Source Triangulation Game Using Atmega32

The goal of this project is to determine the time and location of a sound source in all three dimensions (x,y,z) using an economical and easily reproducible setup.

To accomplish this goal, we decided to try and triangulate the sound source using a 4 microphone configuration. We used the Atmel Mega32 microcontroller to detect the sound pulses from the microphones. The triangulation calculations are then executed on the data gathered resulting in the 3 dimensional position and the time of the sound source. The next part of the project was to apply this position determining algorithm to play a simple game on the TV: the game prompts the player to clap at a certain position. After the player correctly claps at the location, the game then proceeds to prompt the player to clap at new locations. The game keeps track of the number of mistakes made by the player.

High Level Design

The sound triangulation scheme was influenced by the Global Positioning System (GPS), which triangulates the location of a receiver using satellites and the speed of light.

The fundamentals of the triangulation idea are derived from the assumption that speed of sound is constant. Thus we were able to determine the distance between a microphone and the sound source using the speed of sound and the time it takes the sound wave to propagate to the microphone. The figure below depicts this idea.

The equation below describes the relationship between the speed of sound, c, the location of the sound source (x,y,z), the location of the nth microphone (xn,yn,zn), the time of the sound source transmission (t), and the time of the sound source reception (tn) .

These equations allow us to update our initial guess to get closer to the actual solution. We have found that it takes about 5 iterations to arrive sufficiently close (within 1 m) to the actual solution. We used Matlab to verify these equations and prove their correctness by reverse engineering problems. Details of this testing will be discussed later.

The schematic provided above is mostly self-explanatory. A sound source (hand clap) triggers the microphones. The pulses are received by the Mega32 which then time stamps each of them. We only need to have the time stamps be accurate relative to the other microphones. Using these time stamps, we are able to apply the triangulation algorithm to converge to a solution. This solution is given to the game interface which processes it and prompts for a new location.

A hardware and software tradeoff emerged with regards to the pulse detection and timing. Initially, we wanted to send the outputs of the microphones to the ADC unit in the Mega32. We would process this signal in code to determine whether a sound source had been detected. This idea would require minimal hardware and considerable software signal processing. We would need to deal with multi-pathing errors which would be difficult to identify in a voltage signal. The other option was to use amplifiers and Schmitt triggers to send a CMOS logical signal to the port pins of the MCU. This scheme would require additional hardware but would make the signal much easier to decode in software. In the end, we decided to go with the second option and use the port pins. The biggest advantage of this scheme is it would be easier adhering to the strict timing requirements of the composite signal to the TV. It would be considerably difficult to poll the ADC on 4 channels as well as process the signal while being able to generate the sync pulses required for the TV. This would create many errors in the time stamping which would have a big effect on the triangulation of the solution.

The schematic provided above is mostly self-explanatory. A sound source (hand clap) triggers the microphones. The pulses are received by the Mega32 which then time stamps each of them. We only need to have the time stamps be accurate relative to the other microphones. Using these time stamps, we are able to apply the triangulation algorithm to converge to a solution. This solution is given to the game interface which processes it and prompts for a new location.

A hardware and software tradeoff emerged with regards to the pulse detection and timing. Initially, we wanted to send the outputs of the microphones to the ADC unit in the Mega32. We would process this signal in code to determine whether a sound source had been detected. This idea would require minimal hardware and considerable software signal processing. We would need to deal with multi-pathing errors which would be difficult to identify in a voltage signal. The other option was to use amplifiers and Schmitt triggers to send a CMOS logical signal to the port pins of the MCU. This scheme would require additional hardware but would make the signal much easier to decode in software. In the end, we decided to go with the second option and use the port pins. The biggest advantage of this scheme is it would be easier adhering to the strict timing requirements of the composite signal to the TV. It would be considerably difficult to poll the ADC on 4 channels as well as process the signal while being able to generate the sync pulses required for the TV. This would create many errors in the time stamping which would have a big effect on the triangulation of the solution.

Hardware Design

Initially we tried to use 4 omni-directional microphones for our project. These produced little to no change in output. After days of testing and debugging these microphones, we decided to try uni-directional microphones. These gave a much better result using the exact same setup.

The microphones were hooked up as specified by the data sheet given in the references. The 1F capacitor on the output terminal of the microphone acts to nullify any unwanted DC offsets. This signal is then given a DC offset of 2.5V before being sent to the first op-amp which is configured to be a non-inverting amplifier with a gain of 100 (1MΩ / 10kΩ). This amplified microphone signal is sent through a diode to rectify the signal. The RC circuit (51kΩ/1F) maintains the voltage amplitude at the peaks of the signal with the expected RC discharging behavior. The output of this block is sent to the Schmitt Trigger. The 1MΩ resistor causes most of the feedback voltage to drop across it which ensures that the width of the Schmitt Triggers hysteresis curve is small. The 10kΩ resistor and potentiometer is used to determine the trigger voltage. The potentiometer is tweaked until the signal gives a clean pulse for sound sources.

This circuit outputs a 5V signal when there are no sound sources detected. When the user claps his hand to within approximately 3m of the microphone, the circuit outputs a 0V signal. These levels can be easily detected by the port pins of the MCU.

From the microphone circuits, the signals are routed to the STK-500/Mega 32 port pins. We use Port A as inputs for the microphone signals. These signals are inverted and sent to Port C which drives the LEDs on the STK-500. This causes LEDs to light up when a sound source is detected. This serves as a check to make sure that the microphones are able to pick up a sound impulse.

Software Design

Matlab Triangulation Verification

In order to verify our triangulation calculations, we reverse engineered various problems in Matlab using our equations. These simulations used a predefined sound source location and a predefined microphone configuration to determine the times at which the microphones would receive the signal. Using these times as the appropriate inputs for our triangulation calculations, we were able to successfully arrive at the predefined sound source location. However, we discovered a limit for the convergence of the solution: if the sound sources were well contained within the predefined microphones configuration, only then would we be able to converge to the correct location. Otherwise the solution converges to a different location and time that satisfies our equations. Even in a GPS system, the receivers are well contained within the satellites configuration. There were multiple solutions to our system of equations. Therefore we decided to create a large enough structure in which a player could comfortably play.

We ran a test simulation with predefined parameters. The sound source was set to be at (1,1,1) at a time of 0. The positions were defined so that the sound source was well contained similar to the configuration in our problem. After 4 iterations, our code outputted the following:

The coordinates of the sound source are as follows:

x – 1.000007 inches

y – 1.000018 inches

z – 1.000028 inches

The time of transmission was 0.000063 seconds

It took 4 iterations.

This test case was one of few used to verify the validity of the mathematics of the triangulation calculations. The Matlab test code is provided at the end of the report.

Port Polling Scheme

To ensure that the TV would be compatible with our code, we polled Port A during the horizontal sync interrupt. This occurs every 63.625 s. This sampling rate introduces an error in the time stamping of about 0.0190875 meters. This an acceptable amount of error in each signal. Each microphone signal was routed to a unique Port A pin. The state of these pins was stored in an array (alreadyTrigger). If the pin went low and the state is not already triggered then alreadyTrigger is set to 1 and the time is recorded. After the first pin is triggered, a separate timing variable is initialized. If this timer goes above 10 milliseconds then the state variables for each microphone are reset and the timer is reset. This signifies only 1 microphone getting triggered, which is not sufficient to triangulate. When all 4 microphones have been triggered, a flag is set which signifies that the system is ready to perform the triangulation calculations.

Triangulation

The functions listed below are implemented to perform the triangulation calculations shown earlier:

inverse4– This function take a 4X4 matrix and calculates its inverse. Each element of the inverse is calculated by multiplying specific elements of the original matrix together. Using a more advanced algorithm such as the Gauss-Jordan elimination technique may have been faster. However, the technique we implemented can be performed in a less varying number of cycles with no branches. This gives us predictable behavior that can be used with the TV under strict timing constraints.

multMatVect4- This function multiplies a 4X4 matrix with a 4X1 vector and returns the result.

addVect4- This function adds a 4X4 matrix with a 4X1 vector and returns the result.

function1- Evaluates an intermediate expression that is used in the function dfdvar.

dfdvar– This function calculates the derivatives from Equation 3 for specific values. A matrix of values, A from Equation 4, is calculated using this function.

negFunction– This function calculates each element of the vector b from Equation 4.

norm4– This function returns the magnitude of a vector. This is useful in determining whether the solution has converged.

TV Game

Our game prompts the player to clap at a specific location. The game then determines the actual position where the player claps and reports this position using the TV. The game then proceeds to prompt the player to clap at a different location. In order to have enough time to complete the calculations, we reduced the resolution on the TV.

Rescoping

Unfortunately, we were unable to triangulate on sound sources. To diagnose the issue, we acquired the time stamps for a particular sound source and used these as inputs in our Matlab test code (mentioned earlier). This analysis pointed to the problem of being unable to invert the Jacobian matrix, A, from Equation 4. This was due to the matrix being ill-conditioned and therefore scaled badly with respect to each other. Thus we were able to narrow down the problem to an issue with the time stamped inputs. To verify the time stamping technique, we tested the relative time stamps against oscilloscope readings of the same signal. We observed that any two measurements were within 0.2 milliseconds of each other. This variation could be explained by the lack of precision on the manual oscilloscope measurements. Therefore, we believe that the time stamping algorithm functions correctly. If we created a sound source in approximately the same location every time and observed the results on the oscilloscope, we noticed a few millisecond variations; with respect to the speed of sound (300m/s), this corresponds to 30 cm/ms. Variations of this scale with respect to our system would definitely cause our triangulation calculations to fail. This leaves the microphone circuit as the reason for the problem. Our circuit was not suitable to measure the time differences inherent to our system.

The problem could not lie in the fact that we had meter long wires connecting the microphones to the proto-boards. The propagation delay of electromagnetic signals through wires is insignificant compared to the speed of sound (3*108 vs 300). This also rules out the propagation delays of the operational amplifier, which is in the order of nanoseconds. However, the RC circuits used throughout the circuit could cause problems. The RC time constants were on the order of milliseconds (51000 * 0.000001 = 51 ms). The charging of 1 F capacitors would cause phase delays, which would completely throw off our time stamping measurements. We tried using smaller capacitors to decrease charge times, but we were unable to make the circuit function properly. The time constant was too small to maintain the amplitude at the peaks, which could then be triggered by the Schmitt trigger. In addition, the capacitors used in the circuit had tolerances of 20%. These variations across the 4 microphones would cause relative time delays which would be detrimental to our calculations.

Our original game was to determine the location at which a player claps. Due to time constraints and unexpected errors, we had to rescope our game to be simpler; our game now prompts the player to clap in a quadrant. This is determined by identifying which microphone pulse arrives first. However, one of our microphones malfunctioned during this rescope and the game has been further rescoped to include only three regions.