Windows 7Audio Fundamentals

Extraordinary Robot

This video covers the basics of reading audio data from the Kinect microphone array, a demo adapted from the built in audio recorder. The video also covers speech recognition using Kinect. For the built in example this was based on and the speech demo in C#, check out your "My Documents\Microsoft Research KinectSDK Samples\Audio" directory. You can download the the Visaul Basic examples here. You may find it easier to follow along by downloading the Kinect for Windows SDK Quickstarts samples and slides.

[h=3]Setup[/h]The steps below assume you have setup your development environment as explained in the "Setting Up Your Development Environment" video.
[h=1]Task: Designing Your UI[/h]We’ll add in a Slider and two Button controls, and we'll also use some stack panels to be sure everything lines up nicely:

XAML

http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="

[h=3]Creating Click events[/h]For each button, we'll want to create a click event. Go to the properties window (F4), select the RecordButton, select the Events tab, and double click on the Click event to create the RecordButton_Click event. Do the same for the Play Button so we have the PlayButton_Click event wired up as well

[h=1]Task: Working with the KinectAudioSource[/h]The first task is to add in the Kinect Audio library:

C#

using Microsoft.Research.Kinect.Audio;Visual Basic

Imports Microsoft.Research.Kinect.Audio[h=2]Threading and apartment states[/h]From this point forward, we'll be dealing with threading since the array requires a multi-threaded apartment state but WPF has a single threaded apartment state. To find out more about apartment states, check out the MSDN page on it: http://msdn.microsoft.com/en-us/library/system.threading.apartmentstate.aspx.
This is easy to work around—we just have to keep note of how we access different items. We'll accomplish this by creating a new thread that will do the actual recording and file saving.
We'll create two variables and an event outside the RecordButton_Click event to help deal with the cross-threading issue. The FinishedRecording event will allow us to notify the user-interface thread that we're done recording:

Private _amountOfTimeToRecord As DoublePrivate _lastRecordedFileName As StringPrivate Event FinishedRecording As RoutedEventHandlerNow that we can keep track of necessary information, we'll create a new method to do the recording. This is the method we'll tell the new thread to execute:

Imports System.ThreadingNow we'll create the thread and do some simple end-user management in the RecordButton_Click event. First we'll disable the two buttons, record the audio, and create a unique file name. Then we'll create a new Thread and use the SetApartmentState method to give it a MTA state:

Private Sub RecordButton_Click(ByVal sender As Object, ByVal e As RoutedEventArgs) RecordButton.IsEnabled = False PlayButton.IsEnabled = False _amountOfTimeToRecord = RecordForTimeSpan.Value _lastRecordedFileName = Date.Now.ToString("yyyyMMddHHmmss") & "_wav.wav" Dim t = New Thread(New ThreadStart(AddressOf RecordAudio)) t.SetApartmentState(ApartmentState.MTA) t.Start()End Sub[h=1]Task: Capturing Audio Data[/h]From here, this sample and the built-in sample are pretty much the same. We'll only add three differences: the FinishedRecording event, a dynamic playback time, and the dynamic file name. Note that the WriteWavHeader function is the exact same as the one in the built-in demo as well. Since we leverage different types of streams, we'll add the System.IO namespace:

Public Sub New() InitializeComponent() AddHandler FinishedRecording, AddressOf MainWindow_FinishedRecordingEnd SubSince that event will return on a non-UI thread, we'll need to use the Dispatcher to get us back on a UI thread so we can reenable those buttons:

Private Sub MainWindow_FinishedRecording(sender As Object, e As RoutedEventArgs) Dispatcher.BeginInvoke(New ThreadStart(ReenableButtons))End SubPrivate Sub ReenableButtons() RecordButton.IsEnabled = True PlayButton.IsEnabled = TrueEnd SubAnd finally, we'll make the Media element play back the audio we just saved! We'll also verify both that the file exists and that the user recorded some audio:

Using source = New KinectAudioSource source.FeatureMode = True source.AutomaticGainControl = False 'Important to turn this off for speech recognition source.SystemMode = SystemMode.OptibeamArrayOnly 'No AEC for this sampleEnd UsingWith that in place, we can initialize the SpeechRecognitionEngine to use the Kinect recognizer, which was downloaded earlier:

Private Const RecognizerId As String = "SR_MS_en-US_Kinect_10.0"Dim ri As RecognizerInfo = SpeechRecognitionEngine.InstalledRecognizers().Where(Function(r) r.Id = RecognizerId).FirstOrDefault()Next, a "grammar" needs to be setup, which specifies which words the speech recognition engine should listen for. The following code creates a grammar for the words "red", "blue" and "green".

C#

using (var sre = new SpeechRecognitionEngine(ri.Id)){ var colors = new Choices(); colors.Add("red"); colors.Add("green"); colors.Add("blue"); var gb = new GrammarBuilder(); //Specify the culture to match the recognizer in case we are running in a different culture. gb.Culture = ri.Culture; gb.Append(colors); // Create the actual Grammar instance, and then load it into the speech recognizer. var g = new Grammar(gb); sre.LoadGrammar(g);}Visual Basic

Using sre = New SpeechRecognitionEngine(ri.Id) Dim colors = New Choices colors.Add("red") colors.Add("green") colors.Add("blue") Dim gb = New GrammarBuilder 'Specify the culture to match the recognizer in case we are running in a different culture gb.Culture = ri.Culture gb.Append(colors) ' Create the actual Grammar instance, and then load it into the speech recognizer. Dim g = New Grammar(gb) sre.LoadGrammar(g)End UsingNext, several events are hooked up so you can be notified when a word is recognized, hypothesized, or rejected: