Introduction

I've been playing an electric guitar for the last two years, but only a few months ago I started to learn with a professional teacher. The level of complexity rose incredibly - from just trying to playing some chords, I'm now playing difficult licks which are pretty hard to get right when a new song is learned. My guitar teacher uses an application to slow down the playback - Amazing Slow Downer.

It is commercial and I have no plans to pay money for it (surely it is affordable, but open source is better and I can tweak it to my preferences).

An open source alternative that I found is called BestPractice. This utility works fine on Windows XP. But it crashes on Windows 7 Pro 64 bit, the playback quality could have been better, the UI could have been better designed (e.g., volume bar has a tiny thumb button) and most importantly, it is missing a key feature - Presets. I downloaded the source code, but found out that the code was written in Borland C++
Builder - not quite the development environment I was looking forward to develop in.

User Interface and Presets are a point that is worth talking about - In order to get my guitar to have amazing sound effects, I purchased the Boss ME-70 (as shown below) a while ago.

This piece of hardware is truly amazing, not just because of the great effects and sound quality, but because the UI design is brilliant. There are no menus - everything is just laid out for immediate intuitive usage. The best part (for me) are presets - You can manually change some settings (for example: Amplifier emulation mode, Compressor, and Delay), then once you're happy with the sound, you can keep it by simply saving it to a persistent preset. This is great for practice - you come back the next day and that preset is waiting for you. No need to set it all up from scratch.

I really wanted these presets in a playback practice tool - the ability to define a lick with some slow down factor and perhaps some volume, maybe also set a loop that allows repetitive practice on that section. Once that "playback section" was properly defined, I wanted to be able to save to a preset just like I do with my Boss ME-70.

Previously, when I used BestPractice, I had to write down all the settings on paper and then manually 'dial' them in every time I started practice. With Practice#, I wanted to be able to make these settings persistent and also be able to save the same settings as two or more presets each having a different slow down (speed) so the lick can be played for example as 70% when you start, 85% when you feel more confident and finally 100% for regular speed. So once I had my internal 'Business Requirement', it was time to move on to Design and Implementation.

Architecture

I decided to go with .NET 2.0, C# and Windows Forms because the .NET Framework/C# is easy and fast to develop with (in particular UI), Windows Forms is very powerful and Visual Studio Express is free.

The architecture diagram shows the different layers of Practice#. Starting from top to bottom:

Core Audio Logic - Contains all the specific audio processing logic that is needed for Practice# to function. Controls and coordinates the other frameworks and libraries (NAudio, SoundTouch, Vorbis#). Does not handle any User Interface logic.

NAudio - (3rd Party) An audio playback platform that handles all the lower level API needed to play audio on an operating system.

Vorbis# - (3rd Party) A library that reads Ogg Vorbis files and returns the compressed samples as uncompressed PCM samples that can be processed by NAudio.

LibFlacSharp - A managed C# Interop wrapper to libFlac

libFlac - (3rd Party) A C library that reads FLAC files and returns the compressed samples as uncompressed PCM samples that can be processed by NAudio.

Design

Time Stretching

The biggest problem was how to stretch time - how to change the audio speed (or tempo) without changing its pitch. What exactly does it mean? If you have some audio file (e.g. mp3 or wav) and you just play it faster by sampling every second sample, then it will run twice as fast as the original audio but the pitch (tone of sound) would go up two times higher - sounding like cartoons. The same problem happens when you slow down an audio file using a 'naïve' brute-force way: The pitch will go down, e.g., the singer that used to be a Alto will become Bass. That's not good - the sound should have the same pitch, just slower.

A somewhat similar problem is how to change the pitch of the audio without changing its speed. That use-case is not as useful as the prior use-case but can still be used in some cases - for example for matching keys. Both use cases - Time Stretching and Pitch Changing are similar since they only change one parameter (time or pitch) but without affecting the other playback parameter (pitch or time, respectively).

The theory behind this problem is very interesting, but beyond the scope of this article. For those who wish to learn more about the theory behind Time Stretching and Pitch Changing, please refer to this page on DSPDimension. The topic is described there in detail, quite comprehensive and compares different algorithms with audio examples.

The basic requirements for a Time Stretching library are:

It has to be open source and LGPL.

It has to run on Windows 32/64.

The audio quality must be good.

An adequate API has to be provided and it has to work with .NET properly. Managed code is preferred but not mandatory.

It must have high performance (low CPU usage) and small latency. High latency is OK for batch utilities, but a front end practicing playback utility must have low latency.

SoundTouch

The only candidate library that matched these requirements was SoundTouch.
SoundTouch is a LGPL C++ Library that provides an API for performing Time Stretching and Pitch Changing.
SoundTouch's quality is pretty good as can be heard from these samples. The main challenge with this library was how to use it from a managed .NET application since it is a native C++ DLL.

SoundTouchSharp

In order to achieve this integration between .NET and SoundTouch, a wrapper was written - It is called SoundTouchSharp. Basically SoundTouchSharp is an C# Interop Class that wraps the SoundTouch C++ native DLL and exposes the DLL's functions as C# managed API.
Note:SoundTouchSharp can be used out of the scope of Practice# for applications that need to implement Time Stretching or Pitch Changing. If, for example, an ASP.NET Web Application needs that functionality, it can use SoundTouchSharp together with SoundTouch.

The main API methods are:

///<summary>/// Sets new tempo control value. Normal tempo = 1.0, smaller values
/// represent slower tempo, larger faster tempo.
///</summary>publicvoid SetTempo(float newTempo)
///<summary>/// Adds 'numSamples' pcs of samples from the 'samples' memory position into
/// the input of the object. Notice that sample rate _has_to_ be set before
/// calling this function, otherwise throws a runtime_error exception.
///</summary>publicvoid PutSamples(float[] pSamples, uint numSamples)
///</summary>/// Adjusts book-keeping so that given number of samples are removed from beginning of the
/// sample buffer without copying them anywhere.
////// Used to reduce the number of samples in the buffer when accessing
/// the sample buffer directly
/// with 'ptrBegin' function.
///</summary>publicuint ReceiveSamples(float[] pOutBuffer, uint maxSamples)

The method SetTempo() sets the tempo (or speed) of the playback. It should be set before putting samples into the SoundTouch queue. The method PutSamples() puts samples into SoundTouch queue. To receive these samples back as time stretched (or pitched changed) samples, the method ReceiveSamples() needs to be called. When the tempo is not 100% (i.e., regular speed), then it is important to understand that the number of received samples are different than the number of sample put in the queue, due to the inherent nature time stretching. Therefore, the client calling ReceiveSamples() needs to take this fact into account - ReceiveSamples needs to be called until the internal buffer in SoundTouch has no more samples to return. One call of PutSamples() with X samples might require more than one call to ReceiveSamples(), depending on the tempo value set in SetTempo().

Audio Playback Framework

I had some experience with Audio playback on Windows, but that was done in C++ and DirectSound and it was a pain to directly deal with DirectSound, generally speaking. My target was to write a practice tool that allows practicing without wasting too much time on core technologies as playing sound.

DirectSound is also unmanaged. Luckily, there is a very good managed library for Audio processing and playback - NAudio. NAudio takes care of all of the low level APIs (like DirectSound) and provides a simple interface which is easy to use, but also easy to extend. With NAudio, I managed to play sound files after a few minutes, but for dynamic time stretched playback, that was not enough. The main requirement from an interactive practicing tool is to be able to change the tempo on-the-fly. Therefore, a special audio processor was needed, one that can handle samples of different tempos.

Based on an idea that appeared on NAudio's discussion site, the AdvancedBufferedWaveProvider class was created. It manages a queue of audio buffers each starting on a different time (CurrentTime). The AdvancedBufferedWaveProvider doesn't hold too many audio buffers in its queue, new audio buffers are dynamically added to the queue all the time, as needed, with a dynamic time stretched parameter.

This technique allows the user to change the pitch on-the-fly and get the sound to change with low latency.

Ogg Vorbis

I really like open source products and open source formats. That is why I felt odd with a LGPL utility that can only play back WAV (Uncompressed and basically a Microsoft format) or MP3 (Compressed but proprietary). Ogg Vorbis is compressed and free - Why not use it? Unfortunately, NAudio (as of writing of this article) does not support out-of-the-box playback of Ogg Vorbis files. I had to come up with some solution - Luckily, there is a LGPL library named Vorbis# (or csvorbis) that does provide Ogg Vorbis playback support for managed code. Vorbis# is a port of the Jorbis Java library which itself is a port of the original xiph
orbis decoder which was written in C. In order to allow Vorbis# to work with NAudio, an assembly (project) was written: NAudioOggVorbis. NAudioOggVorbis encapsulates the Vorbis# code and also provides an NAudio Ogg Vorbis adapter class, OggVorbisFileReader, which plugs-in into NAudio as it inherits from the NAudio core abstract class: WaveStream. NAudio takes care of the handling audio playback logic and commands OggVorbisFileReader to return back buffers and/or change file positions when needed. OggVorbisFileReader delegates these requests to a Vorbis# class VorbisFile which takes care of decoding the compressed Ogg Vorbis packets into uncompressed PCM packets. VorbisFile is a high level wrapper of the decoding logic that is implemented by the many Vorbis# classes. Using it as a single point of entry keeps the client code in OggVorbisFileReader nice and clean.The code OggVorbisFileReader that uses VorbisFile to decode the next packet:

Once PCM packets are returned to NAudio, NAudio plays them as if they came from any other source - i.e., NAudio has no idea that the packets were originally Ogg Vorbis encoded packets, nor should it care about this fact. That approach proved to be quick and easy to implement - Ogg Vorbis are played back (and slowed down) just like WAV and MP3 files. Mission accomplished.

FLAC

Encouraged by the success of adding Ogg Vorbis playback (and later WMA playback, the details of which I left out), I decided to go on and add support for one more file format (decoder) - FLAC. Ogg Vorbis is a great open source audio format but it is lossy. A Free Lossless Audio Codec (AKA FLAC) which is also compressed would be a nice thing to have.

FLAC is starting to emerge as 'THE' format for audiophiles - its usability is really superb (in terms of quality and file size) and it is very fast to decode.

So, after some Googling and research, I stumbled upon a very nice demo (written by Stanimir Stoyanov) that shows how to decode FLAC files in C#. The demo was decoding FLAC by communicating with the official libFlac C API through P/Invoke calls.

I took Stan's code, added some missing APIs needed for decoding (e.g. Meta data, seek absolute) and re-factored it to be a new C# managed integration layer to libFlac API: LibFlacSharp. I've put Decoder and Encoder API together, though Practice# only uses the decoding API.

LibFlacSharp is a class that can be used by any C# client (not just for Practice#).

Its main decoding API is: (The API is documented very well in libFlac web site.)

The second thing I did was to write an NAudio File Reader adapter that was somewhat similar to the Ogg Vorbis, however it was more complex. libFlac returns Frames in some size which is not equal to the NAudio buffer size. libFlac also works with Callbacks which are inherently not as easy as a direct control. The callbacks are synchronous, as libFlac has no threads, but it is still somewhat cumbersome.

I found an elegant solution to these issues - the FLAC Frame is read into an intermediate samples buffer. Then when NAudio needs new samples to play, first there is an attempt pull samples from the intermediate buffer (if there are such samples available there).

After the buffer samples are used (if at all), and if there are still samples to fill the NAudio playback buffer, then a request is made to libFlac (through LibFlacSharp) to get one more FLAC Frame.

This design pattern is really not new, but in context of Practice# it reminds me of playing a bagpipe..so I will call it the Bagpiper Design Pattern

With a real bagpipe, a bagpiper is blowing air into the bag at his will (FLAC frames) but the actual playing (NAudio Playback) is continuous using the air in the bag (Intermediate FLAC Samples Buffer).

UI Design

As mentioned above, I was aiming for a user interface which would be productive and intuitive by being close in spirit to the Boss ME-70. Some other elements that I liked, like the Loop controls and Now buttons, were inspired by the design of BestPractice utility. Minimal menus are used (only for Recent Files, About Form) and only three Modal Dialog (Open File, Preset Description dialogs and About Form) - all other operational aspects of the tool are laid directly on the form. The 4 presets resemble the 4 Boss presets pedals - only one is active at a time. To write to a preset, the Write button (Floppy Disk Icon) has to be clicked - at this point, the LEDs of all presets light in Red waiting for the user to select which preset to write into. Once a preset is clicked, the preset settings are written to a file.

To cancel the Preset Write Mode, simply click the Write button again, and all Preset LEDS will revert to Green (regular mode). While in Preset Write Mode, clicking on another preset essentially acts as a 'Copy Preset' function, because the current preset's settings are going to be written also in the selected preset.

Each audio file automatically gets its own preset file - all these preset files are kept in a user folder %LOCALAPPDATA%\PracticeSharp (e.g. on my Win7 laptop, it is: C:\Users\Yuval\AppData\Local\PracticeSharp). This is good for a few reasons - each file gets its preset persistent without requiring menus, and when a file is re-opened, the correct presets are loaded automatically. If a preset needs to be reset to default values, the user has to click and hold the Eraser icon for a second or more and the current preset would blink a few times in with an orange LED, then all the settings will revert to default. Once again: no menus or modal dialogs.

Implementation - Audio Processing

The heart of Practice# is the PracticeSharpLogic class. It contains the Audio processing thread which is implemented by ProcessAudio and does the following things:

Processes the samples through an Equalizer DSP effect, to put the equalizer into effect

The first important code to notice is the reading of samples from the input file (shown in the image above as first green rectangle from top). This is achieved with NAudio's WaveChannel class.
Note: There is a nice trick I used for converting a float array to byte array without actually requiring CPU or memory. The class is ByteAndFloatsConverter and it is based on this discussion.

The second important code to notice is putting the read samples into SoundTouch, via SoundTouchSharp, for DSP processing (shown in the image above as the second green rectangle from top). The requested tempo and pitch are set before the call (in SetSoundSharpValues).

Finally, the third and perhaps most important code receives the processed samples back from SoundTouch and then puts them in the queued buffered player where they will be played by NAudio (shown in the image above as the third green rectangle from top).

1.5.0:

Fixed speed and pitch track bar mouse behavior. Values are now rounded up properly and the ticks are 'sticky'

Added a 'Show technical log' (F12) use-case. It is useful for viewing & sending the log if things don't work properly for some reason

Fixed issue when loading files after an existing was loaded. There was a short playback of the old song.
(SoundTouch ? buffers not flushed properly and had some left over samples). Not affecting stability, but annoying.

New feature! Vocal Suppression (AKA Voice Removal or Karaoke), note: works on Stereo files only

1.3.0:

1.2.0:

Released: 2/9/2011

Improved slow down playback quality by fine tuning the SoundTouch engine. There is a significant improvement in sound quality when playback is slowed down, in particular for singing/speech parts but also for music.

Manual settings are now used instead of the default automatic provide by Sound Touch

Added TimeStretchProfiles, to support custom tuning of the SoundTouch engine

Fixed minor bug with positionLabel handling: Clicking on it was not working in Pause mode

Position Reset (Back to start) keyboard changed from Home (used by other TrackBars by default) to F5

1.0.1:

Released: 1/22/2011

Note: I apologize, but I released a bad 1.0 version, 1.0.1 replaces 1.0.

Added Wix/MSI setup, with dotNetInstaller boot strapper

'Initialized' Status -> Renamed to 'Ready'

Fix: When application was loaded, the previous file did not show the loop boundaries ("bar"). Only after playing the file, it then showed up.

Share

About the Author

I've been punching code since the age of 9 when I got my first computer - A Sinclair Spectrum with 48Kb of RAM!
That was a great time, when peek and pokes were the way to do stuff.
Along the way I moved on to PC and never left it (EDIT: Since 2010 a false statement - I fell in love with Android).
I wrote in X86 Assembly, Logo , Basic, C, C++, Pascal, Delphi, Java and in the last 13 years C#.

I am also an amature photographer using Nikon D100, and taking pictures mostly of scenary & nature.
Some of my pictures are presented at: My photo gallery
(Titles are in Hebrew, but pictures have an international language.. )

Thanks.
I did so too before writing Practice# - but I found Audacity to be cumbersome and hard to use for practicing music. It is a music editor, not designed as a practice tool.
One example: Try switching settings, like tempo, on the fly (what I call Presets).

If you noticed in the beginning I put a lot of emphasis on a minimal, fast, and productive user interface (see Boss-ME70)

"The true sign of intelligence is not knowledge but imagination." - Albert Einstein