Autotune.NET

Description

We've all cringed as a hopelessly out of tune contestant appears on the latest episode of “American Idol.” Occasionally, there's a contestant who manages to be pitch perfect all the way through—right until they flub the final note. And in the cutthroat world
of televised auditions, sing one slightly flat note and you're out.

So what takes care of a bad-pitch day? Autotune—an effect that corrects the pitch of your voice so you'll never again sing out of tune. And now, with the power of modern microprocessors, autotune is possible in real-time, allowing singers to benefit from
its almost magical powers during live concerts.

The company most famous for its autotune effect is Antares.
Antares Auto-Tune currently retails for $249, and a stripped down version is available for $100. In addition to simply improving the pitch of a dodgy singer, autotune can be used to create unique robotic sounding vocal effects, a technique massively popular
in recent years thanks to its use by artists such as T-Pain and the group behind the “Auto-Tune the News” YouTube videos. In 1998, when the effect was first used on
Cher's “Believe” single, the producer used such extreme settings that instead of subtly adjusting the pitch, autotune “snapped” instantaneously to the nearest “correct” note.

Here is a nerdy example of what Autotune can do.

How does Autotune work?

An autotune effect has two parts. The first is pitch detection, which calculates the dominant frequency of the incoming signal, and is the reason autotune is normally used on monophonic audio sources (i.e. playing one note at a time, not whole chords).
So, if your guitar is out of tune, you're out of luck (Celemony's Melodyne product, however, features some incredible capabilities for pitch-shifting polyphonic audio).

The second stage is pitch shifting, or “correcting” a given note. However, the bigger the pitch shift required, the more artificial the end result will be, and it is worth noting that absolutely perfect pitch is not always desirable. Sometimes the
blended notes resulting from vibrato, for example, are an important part of the performance, and eliminating them would be detrimental.

Creating a .NET Autotune Algorithm

For this project, we will be creating an autotune effect for .NET. Serious recording enthusiasts should just go out and buy a decent autotune effect, but to have a little fun, we'll see if we can make an autotune that can give us a poor man's Cher effect
(or T-Pain effect, if you prefer).

To get started, I searched to see if there were some pre-existing open source autotune implementations, which brought me to
awesomebox, a project created by Ravi Parikh and Keegan Poppen while they were students at Stanford University. They kindly gave me permission to make use of their code, which uses an auto-correlator for pitch
detection and an open source pitch-shifting algorithm from audio DSP expert
Stephan M. Bernsee.

Porting C++ to C#

Although porting from C/C++ to C# is not exactly fun, there is enough similarity in the syntax that it is possible to complete without too many changes. You need to remember, however, that a
long in C is an int in C# (i.e. 32 bits long not 64).

Additionally, the C# compiler is fussier than C/C++ when it comes to casting between floats, doubles, and ints. Putting the “f” suffix on numeric literals sorts out most of these compiler errors.

Pointers can be a pain. I tend to replace them with integer variables used to index into an array. You can of course use unsafe code, but that limits your options if you plan to port to Silverlight or Windows Phone 7 at a later date, neither of which allow
unsafe code or interop into unmanaged code. The necessary mathematical functions are available in the System.Math class.

The WaveInEventArgs contains the number of bytes recorded (e.BytesRecorded) and a pointer to the buffer containing those bytes (e.Buffer). The handler does two things with the recorded data. First, it calls WriteToFile, which uses the
WaveFileWriter class from NAudio to write the data to disk:

c#:

// before we start recording, set up a WaveFileWriter...
writer = new WaveFileWriter(waveFileName, recordingFormat);
// ... every block we receive we write it to the WaveFileWriter:
writer.WriteData(buffer, 0, bytesRecorded);
// ... and when recording stops we must call Dispose to finalize the// .WAV file properly
writer.Dispose()

Converting Audio to Floating Point and Back Again

Once recording has completed, we have a WAV file on which to perform our autotune effect. However, our WAV file consists of 16 bit samples (i.e.
System.Int16 aka short). In other words, we have a sequence of byte pairs, each of which represent a number in the range -32768 to 32767. For the digital signal processing we will be performing, it's best to have a sequence of floating point numbers
(System.Single or float) in the range -1.0f to 1.0f. This is a common requirement, so NAudio provides a utility class to convert audio from short to float called
Wave16ToFloatProvider. Here's the code that takes a WAV file and implements the autotune algorithm on it:

First we use a WaveFileReader to open the file that we've just created, which contains 16 bit audio samples

Then we use the Wave16ToFloatProvider to perform the conversion to floating point samples

Next we pass it through our autotune effect (the AutotuneWaveProvider). We'll explain how this works this later

Then we use the WaveFloatTo16Provider to convert back to 16 bit samples ready for saving to WAV (we could save a 32 bit WAV, but it would be rather wasteful of disk space)

Having set up the audio pipeline, we can read from the WaveFloatTo16Provider and pull audio right through from the WAV file. We need to read in block sizes that are a power of 2, since we are passing the data through FFTs. If we want to read arbitrary block
sizes, we need to introduce another element into our pipeline to buffer up enough data to pass through FFTs

Finally, we use the WaveFileWriter to write the data we have read into a WAV file

The AutoTuneWaveProvider

As we saw in the last code snippet, the AutoTuneWaveProvider is the piece in our audio pipeline that actually performs the autotune effect. It implements the NAudio
IWaveProvider interface, which allows it to be used in the pipeline for real-time playback if necessary, even though our example code is not doing this (see the section on performance later). Here's the AutoTuneWaveProvider constructor:

Public Sub New(ByVal source As IWaveProvider, ByVal autoTuneSettings As AutoTuneSettings)
Me.autoTuneSettings = autoTuneSettings
If source.WaveFormat.SampleRate <> 44100 Then
Throw New ArgumentException("AutoTune only works at 44.1kHz")
End If
If source.WaveFormat.Encoding <> WaveFormatEncoding.IeeeFloat Then
Throw New ArgumentException("AutoTune only works on IEEE floating point audio data")
End If
If source.WaveFormat.Channels <> 1 Then
Throw New ArgumentException("AutoTune only works on mono input sources")
End If
Me.source = source
Me.pitchDetector = New AutoCorrelator(source.WaveFormat.SampleRate)
' alternative pitch detector:
' Me.pitchDetector = New FftPitchDetector(source.WaveFormat.SampleRate)
Me.pitchShifter = New SmbPitchShifter(Settings, source.WaveFormat.SampleRate)
Me.waveBuffer = New WaveBuffer(8192)
End Sub

Some points to notice:

We pass in a source IWaveProvider—this is where the data will be coming from

We check that the source is in the right format—floating point mono input.

We also pass in an AutoTuneSettings object. This not only encapsulates the settings for autotune, it is important if you want to adjust the settings in real-time while the effect is running

We then create the two key components of our autotune effect: a pitch detector (which uses an autocorrelator), and a pitch shifter

Finally we create a buffer to use for audio processing. This can be a byte[] array, but we use
WaveBuffer from NAudio because it uses
a clever trick that allows us to cast a byte[] into a float[] without using unsafe code or having to copy all of the data

The key method on any implementation of IWaveProvider is its Read method. This is where the audio consumer, usually the sound card or a WaveFileWriter, asks for data. The data must be supplied as a byte array, and if at all possible you should
return exactly the number of bytes you were asked for (if you can't, an extra layer of buffering is usually required, or audio playback will be choppy). Here's our implementation of the Read method:

First we need to read from our source (in our case, a WAV file converted to floating point samples)

If we get less data than we were expecting, we know that means we're at the end of the file, so we'll just pretend we got a full buffer

We then work out how many audio ‘frames' are present (which is the same as the number of samples since this is mono audio). It's floating point audio, so frames equal bytes divided by four

We then pass the data through our pitch detector algorithm (see below)

Next we use some experimental code to stabilize pitch detection by reporting the previous frequency when no pitch is picked up

Finally we pass the data into our pitch shifter, including details of the detected pitch

Now that we've seen the big picture of the AutotuneWaveProvider, let's drill down into its two main components—the pitch detector and pitch shifter.

Pitch Detection with Autocorrelation

The pitch detection part of autotune is vital to getting good results. If it can't accurately detect the input pitch, it will incorrectly calculate how much the pitch needs to be adjusted. However, high quality pitch detection is quite difficult to get right.
First of all, the microphone may well pick up background noise. Second, when you sing a into a microphone, the signal consists not only of a single frequency, but also “harmonics” at different frequencies.

The good news is that we need to detect only the primary pitch.

The awesomebox algorithm makes use of “autocorrelation” for its pitch detection, but I made a few small tweaks to how the algorithm is implemented in an attempt to improve its accuracy. Autocorrelation
has the advantage of being a relatively quick process. The basic principle is that if a signal is periodic, it will “correlate” well with itself when shifted forward (or backwards) one cycle.

Let's say we are looking to see if the note “Middle C” is being sung. The frequency of Middle C is around 262Hz. If we are sampling at 44.1kHz (which is standard for CD quality audio), then we will expect the signal to repeat at approximately every 168 samples
(44100/262). Accordingly, for every sample in the buffer, we calculate the sum of squares of that sample and the sample 168 samples previous. We do this for every possible offset that measures a frequency in the range we want to detect (I am using 85Hz to
300Hz, which is adequate for pitch detecting vocals). The offset with the highest score is the most likely frequency.

Let's have a look at the code for an autocorrelation algorithm, starting with the constructor for the
AutoCorrelator class:

First of all, we pre-calculate some values based on the minimum and maximum frequencies we are looking for. Remember that lower frequencies are harder to detect than higher frequencies, so don't set minFreq too low. MaxOffset and MinOffset are the maximum
and minimum backwards distances we will be seeking while looking for a match.

Notice that the audio comes in as an array of floating point numbers. NAudio performs this conversion from 16 bit audio for us by using
Wave16ToFloatProvider

We store the previous buffer. This allows us to look backwards for correlation

We then work through each and every possible integer offset within our range, and calculate a correlation value

The correlation is calculated as the sum of squares

If it is the largest so far, we store the “lag” value (i.e. number of samples back that we correlated with)

Notice that we return 0 (i.e. no frequency detected) if we don't find a strong frequency. This noise threshold may need to be tweaked depending on your input

Finally, we convert into a frequency with the formula sampleRate / maxLag

I wrote some unit tests to measure the accuracy of detection with sine waves (which admittedly are the easiest to detect). Here are the results for audio sampled at 44.1kHz:

Test Frequency

Detected Pitch

109.99Hz

108.35Hz

116.53Hz

118.23Hz

123.46Hz

123.18Hz

130.80Hz

129.71Hz

138.58Hz

140.00Hz

146.82Hz

148.48Hz

155.55Hz

154.74Hz

164.80Hz

163.33Hz

174.60Hz

172.94Hz

184.98Hz

183.75Hz

195.98Hz

194.27Hz

207.63Hz

206.07Hz

219.98Hz

219.40Hz

233.06Hz

234.57Hz

246.92Hz

247.75Hz

261.60Hz

256.40Hz

277.16Hz

139.56Hz

293.64Hz

146.03Hz

Notice that the detected frequencies from the final two tests are actually half the correct amount. This doesn't actually matter for our purposes, since this just means the frequency has been detected as one octave below the correct note.

To improve on the accuracy of the autocorrelator's results, there are a couple of things you can do:

Run a band-pass filter before-hand, to remove any frequencies outside of the desired range

Combine the results with those obtained from a different technique, such as counting zero-crossings

Pitch Detection with the Fast Fourier Transform

I decided to implement an alternative pitch detection algorithm to see if I could get better results. A different approach is to use the
Fast Fourier Transform, which converts signals from the “time domain” into the “frequency domain.”

The basic approach is to take a block of samples (which must be a power of 2 – e.g. 1024), and run the FFT on them. The FFT takes complex numbers as inputs, which for audio signals are entirely real. The implementation I am using expects real and complex
parts interleaved for the input buffer. Here's our code setting up fftBuffer
with interleaved samples:

Notice that we prepend the previous buffer we were passed. This is a common way of increasing the accuracy and resolution of an FFT by using overlapping windows, and can be further extended to store three previous buffers, allowing us to have 75% overlapping
windows instead of just the 50% that we have in this example.

For better peak frequency detection, the signal that is passed into the FFT is best pre-processed with a
“windowing” function. There are several to choose from, each with its own strengths and weaknesses. I used the Hamming window, which is a fairly common choice:

The next step is to pass on our interleaved buffer to the FFT algorithm. I am using Stephan Bernsee's here, though there is an alternative implementation in NAudio that I could have used. Since the same function can be used for an inverse FFT, the -1 parameter
means (rather counter-intuitively), do a forwards FFT. It processes the data in place, which is fine since we don't need to keep the contents of the input buffer:

c#:

// assuming frames is a power of 2
SmbPitchShift.smbFft(fftBuffer, frames, -1);

VB.Net

' assuming frames is a power of 2
SmbPitchShift.smbFft(fftBuffer, frames, -1)

Once we have completed the FFT, we are ready to interpret its output. The output of the FFT consists of complex numbers (again real followed by imaginary in our buffer), which represent frequency “bins.”

We start off by calculating the bin size and working out which bins correspond to the range of frequencies we are interested in detecting:

For example, if our sample rate is 44.1kHz and we analyse a block of 1024 samples, then each bin represents 43Hz, which is hardly the granularity we are looking for. To increase resolution, our options are to either sample at a higher rate or analyse a bigger
chunk. Our approach is to use overlapping blocks of 8192 samples, as we read 4096 samples each time. This means we have a resolution of around 5Hz, which is much more acceptable.

Now we can calculate the magnitude or “intensity” for each frequency by calculating the sum of squares (strictly we should then take the square root, but we don't need to since we are just looking for the largest value):

Since we have identified the bin with the maximum intensity, we can calculate the detected frequency:

c#:

return binSize * maxBinIndex;

VB.Net

Return binSize * maxBinIndex

I don't currently specify a minimum threshold for maxIntensity, but perhaps if it were very low, the FFT pitch detector would return zero to indicate no pitch detected instead of returning an answer that is probably not accurate.

Let's have a look at how the FFT pitch detector does:

Test Frequency

Detected Pitch

109.99Hz

107.67Hz

116.53Hz

118.43Hz

123.46Hz

123.82Hz

130.80Hz

129.20Hz

138.58Hz

139.97Hz

146.82Hz

145.35Hz

155.55Hz

156.12Hz

164.80Hz

166.88Hz

174.60Hz

172.27Hz

184.98Hz

183.03Hz

195.98Hz

193.80Hz

207.63Hz

209.95Hz

219.98Hz

220.72Hz

233.06Hz

231.48Hz

246.92Hz

247.63Hz

261.60Hz

263.78Hz

277.16Hz

274.55Hz

293.64Hz

296.08Hz

As can be seen, it correctly picks out the primary frequencies of the higher notes, but overall it doesn't get that much closer than the autocorrelator, so I've left that as the default algorithm. You, however, can swap in the FFT detector in the code if
it works better for the material you are auto-tuning.

There are ways of using the phase information from the FFT output to increase the accuracy of pitch detection even further, but I have left that as an exercise for the reader!

Pitch Shifting

The next step is to determine how much we will shift the pitch. The simplest way to do this is to look for the musical pitch that is closest to the detected pitch. Then, the amount of the shift by is simply the ratio of those two notes.

There are, however, some additional considerations. First, we may want to select a subset of musical notes that are acceptable. For example, only notes in the key of C#, or maybe F# minor pentatonic. This may require a slightly more radical adjustment.

Second, depending on the effect we are after, we may not want to instantaneously jump to the new frequency. The code I am using utilizes a fairly rudimentary “attack” time parameter, allowing you to gradually move to the new frequency.

The actual DSP for the pitch-shifting effect is more or less untouched from Stephan Bernsee's code, and this is because it works really well. The Bernsee's code makes use of the Fast Fourier Transform, plus a bunch of clever mathematics, which I
almost understand, but not quite well enough to try and explain here! You're better off reading an article in which the man himself explains
how it works.

The class that manages the pitch-shifting algorithm is called SmbPitchShifer
and inherits from a PitchShifter base class. This does the bulk of its work in the
ShiftPitch function:

The inputPitch parameter is set to the frequency detected by the PitchDetector. The
targetPitch parameter is currently unused, but will be used to specify the target pitch in real-time when accepting input from, say, a MIDI keyboard. In any case, we call
UpdateSettings in order to see if any of the autotune algorithm settings have changed since last time.

Next we calculate the amount we need to shift the pitch shift. A shift factor of 1 means no change. We don't allow the shift factor to go above 2 or below 0.5, since these figures represent a whole octave change:

The way this function works is that it runs through the MIDI notes 0-120 and, if that note is selected as one of the valid pitches we support, we remember the “corrected frequency,” which can be calculated from the MIDI note number with the following formula:

Obviously, a pitch is likely to fall somewhere in between two valid notes, so we choose the which pitch to correct by determining which one is closest to the detected frequency.

The snapFactor function is also responsible for implementing the attack time parameter. This allows the destinationFrequency to be slowly moved to the target note over the duration of the attack period. Having calculated our shift factor, we are now
ready to pass our data on to the actual pitch-shifting algorithm:

The final thing we do in the ShiftPitch function is keep a record of the pitch shifts we have made. These are stored in a queue (maximum of 5000 entries) and are very useful for diagnosing what is going on if you are not getting the results you wanted
from the algorithm:

Performance

Performance, as you might expect in a managed application that has not been extensively optimized, was not good. Using my laptop, I could autotune one minute of audio in about 90 seconds. Obviously, that rules out real-time autotuning. I decided to profile
the application to see if there were any quick ways I could improve things.

The profiling tools in Visual Studio revealed that 20% of the time was spent on pitch detection and 80% pitch shifting. Unfortunately, there were not too many options available for optimisation, since further investigation pointed to calls to
Math.Sin taking the bulk of the time. Possibly creating lookup tables could save a bit more time.

Fortunately, we have another option for speeding things up. The pitch-shifting algorithm takes an “oversampling” parameter, which by default is set to 32, the highest value. However, we can trade off speed for quality. Setting it to 16 meant that I could
autotune a minute of audio in 55ms (on my 2.4GHz Core2Duo laptop) – realtime but only just. Setting it to 8 reduced that down to 36s. The results still sounded reasonable, so I have left it set at 8 in the code.

An alternative way of speeding it up though would be to swap in a different pitch-shifting algorithm. You could start by trying out one I created as part of the
Skype Voice Changer project previously featured on Coding4Fun, which is also able to operate in real-time (although I haven't done any quality comparisons).

Creating a Test GUI

Rather than starting from scratch, I decided to build upon
.NET Voice Recorder, a WPF application I created for
a previous Coding4Fun article. This takes advantage of the
NAudio .NET audio library for audio recording and playback capabilities. The GUI has three screens. On the first is the input device used for recording. The second records a short voice clip. And the third allows you to edit a small portion of saved audio.

Here's a screenshot of the second screen showing a recording in progress:

And here's the screen that allows you to trim the recording, preview it, and save it as WAV:

As you can see, I have added a new button allowing access to the autotune effect settings. On this screen, you can select which notes are valid, and you can also adjust the “attack time” if you prefer to not go for the robotic effect. I've included a drop-down
menu that automatically selects the appropriate notes from various keys.

When you click “Apply,” the autotune effect is applied (while you wait on a background thread) and then you are returned to the screen, allowing you to play back your recording and see how it sounds. If you'd like, you can then go back and change the autotune
settings (or turn it off).

MVVM Light

The original VoiceRecorder application used a
MVVM (model-view-viewmodel) architecture for binding data to each view. I have updated it to make use of Laurent Bugnion's excellent
MVVM Light library. This removes the need for my own RelayCommand and ViewModelBase classes, and also enables me to replace my ViewManager with a more extensible framework using the event aggregator (“Messenger”)
that is included with MVVM light. This allows me to quickly navigate from one view to another by sending out a message on the event aggregator:

Getting the Best out of Autotune

Unfortunately, Autotune is an effect that doesn't always produce the desired result . Obviously, if you want great autotune, you're best off buying a commercial implementation, but here are a few tips for getting the most out of an autotune algorithm:

Get a good quality recording. Avoid background noise, hum, and too quiet or too loud (distorted) recordings. If you need to sing against a backing track, play it in headphones so the microphone doesn't pick it up.

Know what key you are singing in. This is where the test application won't help you out, since if you don't know what key you are singing in, you can hardly expect to be able to select the appropriate key from the list. You might even be singing
in between two keys! If you have an in-tune musical instrument at hand, play a note to give yourself a starting pitch.

Choose your scale. The easiest option is to just go with a chromatic scale, which means all 12 notes are valid. However, you can try to make the autotune force you into a monotone. Pentatonic scales are useful for instant gratification. They have
five notes in them, and so long as you stick to those five, almost anything you sing will fit in with a backing track in your chosen key.

Adjust the attack time. An attack time of zero is great for the robot effect. A longer attack time will smooth out transitions.

Why is it “warbling”? With this autotune algorithm, it is quite common to get a “warbling” effect. This is because it is either changing its mind about what note to pitch shift to (because the pitch detector is not providing stable pitch detection),
or because the pitch detector couldn't determine a pitch at all, so your voice isn't being pitch shifted. You can play around with the release slider (and may need to modify the release algorithm) if you want to eliminate warbling.

Taking it further

.NET Voice Recorder is open source
and hosted on CodePlex in a Mercurial repository. So what you waiting for? Make a fork and have a go at improving it:

Improve the pitch detector algorithm. I have already given some suggestions for how this can be done. One idea that might be worth experimenting with is making it “hold” the detected frequency for a short period until a strong, new frequency is detected.

Display the detected pitches. The autotune effect stores data of the detected pitches as well as the pitches it attempts to convert to. You might display this information to the user, perhaps underneath the waveform, so they can see what it is detecting.
(It currently outputs this information with Debug.WriteLine, which is useful for debugging purposes).

Suggest an appropriate scale? Instead of leaving it to the user to select what key to snap the notes to, how about auto-selecting the detected notes?

Allow direct input of desired pitch. Instead of letting the autotune effect try to work out what note it should be targeting at any given time, it is far more effective to let the user input what note they would like to shift to. This could be done
by entering the note using a MIDI keyboard in real-time, or by drawing the notes in, perhaps on a “piano-roll” control. Or, users could implement a simple, domain-specific language to specify the desired note. For example:

0:00.0 C#

0:01.5 E

0:02.7 G#

Port it to Windows Phone 7. This would be quite a cool effect to have on your phone. Of course, you might need to optimize performance a bit more to save battery life. And you'll want to put your graphic designer hat on to give it a more beautiful
look than it currently has.

About the Author

Mark Heath is the author of several open source .NET applications and libraries, including
NAudio and the
Skype Voice Changer. He works for NICE Systems, developing applications that search, display, and play back vast amounts of multimedia data. He has a blog,
Sound Code, and you can follow him on his sporadically updated
Twitter account.

I have also implemented a pitch detection algorithm which is used to display a realtime pitch graph in a VST plugin (although VST uses an unmanaged API, my plugin is written in C# and uses reverse P/Invoke). This plugin is used as a visual guide to train a singer to sing in key (or for vocal exercises), and can also display a "grade" based on previously entered notes (or via a MIDI clip) that has to be hit throughout the song. My algorithm is loosely based on auto-correlation, but it is heavily modified to solve the following two problems with it:

Auto-correlation is slow:Since you need to do a compare of all samples with all of the shifted samples, and repeat that once for every single frequency you want to detect, this quickly becomes slow.

The way I solved this is by first down-sampling to reduce the overall number of samples to work with (with a filter to prevent anti-alias noise - also conveniently removes frequencies I don't care about), and then to have 3 passes, one each with low, medium, and high resolution.

In the first pass, I only test a total of 5 samples that are spaced out within the two sample windows. You can quickly tell if there just isn't going to be correlation if the 5 samples are all different from the shifted window's equally spaced out samples. In the second pass, there are more samples and I use the detected frequency as a starting point for the final pass, and in the last step I use more samples.

Auto-correlation is inaccurate, especially at higher frequencies:As you have clearly seen in your results, the higher the frequency, the less accurate it becomes. This is because a higher frequency waveform has fewer total samples per cycle, and you are stepping in whole sample values, so the frequency steps become courser the higher it is.

To solve this, for the 3rd pass in my algorithm (at which point the search is centered around the frequency that was detected in the second pass), I use interpolation in order to compare samples that are not limited to whole numbers. So sample 0 will be at position 0.0, sample 1 will be at position 0.674, or whatever. This allows me to space the sample steps so that it is exactly at the frequency I want to detect during that pass, as opposed to being quantized into ever-more courser frequency steps. I use a 4-point, 3rd-order Hermite interpolator.

Each high resolution pass is 1.005 times the previous frequency, so I don't use linear frequency steps. Also, once I found the two passes with the highest correlation, I interpolate between those two, so the final detected frequency is even higher resolution than the 3'rd pass' step size.

This results in a very fast and accurate pitch detection algorithm. From my tests, the accuracy is within 0.1% of the input frequency, which IIRC, is about 50 times higher than what humans can distinguish.

can you please give the code for pitch detection? i have implemented fft algoritm, and jus need to detect the pitched at a high frequency. How can i choose the amplitude threshold to get the corect peaks? If the amplitue thresh is to high i get 0 peaks, if its; too low i get incorect peaks

@BitFlipper:do u have the pitch detection code? I implement FFT and need pitch detection algorithm. I need to detect all peaks from the spectrum. All peaks at 18000hz frequency for example. I also work with amplitude threshold..but it is;s too high it doesn;t show me the peaks, if it;s too low it dowsnt show corectly

Unfortunately right now my code isn't quite fit for public release. I think there are some dependencies on parts of other code that I don't want to post and which isn't really pitch-related. For instance I have a DSP class that the pitch-correction algorithm uses, but most of that code is unrelated to it.

If I have some time available I will clean it up and post it. Most likely before the end of the weekend.

OK I had some time to clean up my code. I am working on creating a CodePlex project in order to publish it. I also first want to create some sort of sample app in order to demonstrate the code in use (even though you need just three lines of code to instantiate and get your first pitch results back). Hopefully I will be done with it by this weekend.

Hi Mark,"Autotune.NET" is really very different topic. I am very excited to read this blog. Very interesting and useful.After reading your blog only i came to know about this technology in dotnet. Thanks for giving your knowledge to us. Your code also works well.http://gloriatech.com/microsoft-net-development-services.aspx

OK, I finally had some time to isolate my pitch tracker class and create a CodePlex project. I would be interested to find out whether you can use my pitch tracker in your project, and what the results are.

From my tests my algorithm has an error of less that 0.02% over the frequency range of 55Hz to 1.5kHz. Accuracy is unaffected by amplitude, frequency or complexity of the waveform.

Hi Mark, thanks for the code, it is rather interesting, but I can not understand one thing - in the codepublic float DetectPitch(float[] buffer, int frames){ if (prevBuffer == null) { prevBuffer = new float[frames]; } float maxCorr = 0; int maxLag = 0; for (int lag = maxOffset; lag >= minOffset; lag--) { float corr = 0; // sum of squares for (int i = 0; i < frames; i++) { int oldIndex = i - lag; float sample = ((oldIndex < 0) ? prevBuffer[frames +oldIndex]:prevBuffer[oldIndex]); corr += (sample * buffer[i]); } ..........................................Here we first initialize the array - prevBuffer = new float[frames] - and evidentially that all its members have zero as value. And then we have "sample = prevBuffer[frames +oldIndex] or sample=prevBuffer[oldIndex]", but as the prevBuffer has only zero values so sample will always have the same zero value.Could you explain this thing or maybe I am wrong?

@Eric:Hi, it is a very nice introduction for recording voice on .net or from microphone. I am looking for a streaming audio recorder to record voice or sound that can be passed through on my computer audio device.

Hi there,I want to know if its possible (or is there an algorithm I could use ) to convert a byte array sound data and change the pitch, frequency ... I did it with SoundEffectInstance in XNA but there is no way I could save the stream that was passed in. I would greatly appreciate any help. This is some code that i did (I want to change the pitch of bStream). Thank you very much: