Overview

Stephen Hawking is one of the most famous people using speech synthesis to communicate

In this article we are going to explore the Speech API library that’s part of the TTS SDK that helps you reading text and speaking it. We’re going to see how to do it programmatically using C# and VB.NET and how to make use of LINQ to make it more interesting. The last part of this article talks about…… won’t tell you more, let’s see!

Introduction

The Speech API library that we are going to use today is represented by the file sapi.dll which’s located in %windir%System32SpeechCommon. This library is not part of the .NET BCL and it’s not even a .NET library, so we’ll use Interoperability to communicate with it (don’t worry, using Visual Studio it’s just a matter of adding a reference to the application.)

Implementation

In this example, we are going to create a Console application that reads text from the user and speaks it. To complete this example, follow these steps:

As an example, we’ll create a simple application that reads user inputs and speaks it. Follow these steps:

If you are using Visual Studio 2010 and .NET 4.0 and the application failed to run because of Interop problems, try disabling Interop Type Embedding feature from the properties on the reference SpeechLib.dll.

Building Talking Strings

Next, we’ll make small modifications to the code above to provide an easy way to speak a given System.String. We’ll make use of the Extension Methods feature of LINQ to add the Speak() method created earlier to the System.String. Try the following code:

I Love YOU ♥

Let’s make it more interesting. We are going to code a VBScript file that says “I Love YOU” when you call it. To complete this example, these steps:

Open Notepad.

Write the following code:

CreateObject("SAPI.SpVoice").Speak "I love YOU!"

Of course, CreateObject() is used to create a new instance of an object resides in a given library. SAPI is the name of the Speech API library stored in Windows Registry. SpVoice is the class name.

Save the file as ‘love.vbs’ (you can use any name you like, just preserve the vbs extension.)

Now open the file and listen, who is telling that he loves you!

Microsoft Speech API has many voices; two of them are Microsoft Sam (male), the default for Windows XP and Windows 2000, and Microsoft Ann (female), the default for Windows Vista and Windows 7. Read more about Microsoft TTS voices here.

A second honeymoon with Microsoft Agent. Do you remember our article Programming Microsoft Agent in WinForms and our sample application PartIt? After you have included your desired agent in your application, are you feeling bad with the default popup menu? If so, then you are in the right place (welcome :).)

Enough talking, let’s get to work! First, prepare your code that loads the Microsoft Agent and brings it to the screen.

Now, let’s complete it. Get to the code that loads the agent character (e.g. calls the Characters.Load() method of the agent control object, AxAgentObjects.AxAgent) and just disable the AutoPopupMenu flag/property of the character object, AgentObjects.IAgentCtlCharacterEx. This flag/property determines whether to allow the default popup menu or not.

Next comes the interesting point. When the character is clicked, the ClickEvent event of the agent control (AxAgent) fires. So the next step is to handle this event and to provide your code that brings up your custom context menu. Consider the following code:

Overview

Microsoft Agent is an unprecedented technology to create innovative, new conversational interfaces for applications and Web pages. It provides powerful animation capability, interactivity, and versatility, with incredible ease of development.

Microsoft Agent is a technology that provides a foundation for more natural ways for people to communicate with their computers. It is a set of software services that enable developers to incorporate interactive animated characters into their applications and Web pages. These characters can speak, via a Text-to-Speech (TTS) engine or recorded audio, and even accept spoken voice commands. Microsoft Agent empowers developers to extend the user interface beyond the conventional mouse and keyboard interactions prevalent today.

Enhancing applications and Web pages with a visible interactive personality will both broaden and humanize the interaction between users and their computers.

There are a limitless number of roles and functions that developers can create for these personalities to perform.

A welcome host could greet new users and provide a guided tour the first time a computer is turned on, an application is run, or a Web site is browsed.

A friendly tutor could lead someone through a task or a decision tree with instructions step-by-step along the way.

A messenger could deliver a notification or alert that a new e-mail has arrived and then offer to read it to you.

An assistant could perform tasks for you like looking up information on the Internet and then reading it out loud.

Although, this lesson focuses on Windows Forms, you can take advantage of Microsoft Agent in Web applications, and also XAML applications. For Web application, you can refer to the Microsoft Agent Platform SDK documentation. For XAML applications, it is the same as Windows Forms, except that you will need to add Microsoft Agent Control inside a Sysetm.Windows.Forms.Integration.WindowsFormsHost WPF control.

Microsoft Agent characters

Microsoft Agent characters can be found in various applications and Websites, including Office 2003 and its ascendants, it comes as the Office Assistant. Figure 1 shows the Clippit office assistant. And figure 2 shows Merlin the most widely-used Microsoft Agent character in Websites and applications.

Figure 1 - Clippit

Figure 2 - Merlin

There’re plenty of characters available for download. Though, Merlin is included by default in Windows. In addition, Office 2003 comes with many characters including Clippit.

Microsoft Agent SDK

Programming Microsoft Agent is very easy. But, in order to start programming you need to download the Microsoft Agent SDK. It’s available for downloadalong with its documentation in the Microsoft Agent SDK download page on Microsoft. In addition, you need to download the Speech API and Text-to-Speech (TTS) Engine in order to enable the speech feature of Microsoft Agent. These packages are available in the Speech SDK in this page. The most recent version is 5.1.

The Text-to-Speech (TTS) Engine is used to translate text into voice.

It is worth mentioning that, if you need to extend speech feature of Microsoft Agent to support other languages other than English, you need to download your preferred language package from the Speech SDK page.

In addition, you are not end up using characters created for you. If you need to create your own characters you can download the Agent Character Editor and start using character creation.

Take into consideration that, for the application to run correctly on user’s machine, the user need to install two packages, Microsoft Agent and Microsoft TTS. Remember to inculde them as prerequisites in your application installer package.

After downloading and installing the Microsoft Agent SDK, you may notice that Microsoft Agent SDK offers you many components for many purposes. The most component that we are interested in is the ActiveX COM library AgentCtl.dll that contains the Microsoft Agent support features for Windows Forms. It contains the Microsoft Agent Control that’s the door for Microsoft Agent Windows Forms programming. This component -and many others- resides on %windir%MSAgent.

Microsoft Agent Windows Forms Support

Because AgentCtl.dll is an ActiveX component you cannot use it directly in your code. You need to create a RCW (Runtime-Callable Wrapper) component (assembly) that will act as a bridge between .NET and the COM component. This is done automatically while adding the ActiveX component control to your toolbox in Visual Studio. Another way is to use the AxImp.exe tool that comes with the .NET Framework SDK.

You might be wandering why we haven’t used the TlbImp.exe tool? Because this is an ActiveX COM component contains ActiveX controls. Therefore, we used the AxImp.exe.

Because .NET and COM cannot call each other directly, two types of wrapper components exist in the space, RCWs (Runtime-Callable Wrappers) and CCWs (COM-Callable Wrappers). RCWs are .NET assemblies act as the bridge between .NET and COM. Conversely, CCWs are COM components act as the bridge between COM and .NET.

This component contains a single control Microsoft Agent Control that will be added automatically to the toolbox. After dragging it to the form, two components will be added automatically to the project references, AgentObjects and AxAgentObjects. These are the interoperability RCW wrapper components for the COM component. AxAgentObjects is the wrapper for the control and its related types. On the other hand, AgentObjects component is the wrapper for other types in the COM component. You might have noticed that a square-shaped control of type AxAgentObjects.AxAgent added to the form. This control will not be visible at runtime. Figure 4 shows the form.

Figure 4 – Microsoft Agent Control

AgentCtl uses STA (Single-Threaded Apartment) threading model. Means that the COM component supports -and bounds- only to a single thread.

To enable STA for your application you need to adorn the Main() function with the STAThreadAttribute attribute. Conversely, MTA (Multi-Threaded Apartment) threading model means that the COM component supports and can be accessed by multiple threads.

If neither STAThreadAttribute nor MTAThreadAttribute attributes has not been set, the threading model by default is MTA. Another way to set the threading model is the System.Threading.Thread.SetApartmentState instance (opposite of static) method.

To know whether a COM component supports STA, MTA, or both threading models, you can check the registry value ThreadingModel on HKCRCLSIDInprocServer32.

Loading Characters

After adding the AxAgent control to the form, you can now order the control to load the character(s) and show it.
AxAgent control contains a property Characters that’s of type AgentObjects.IAgentCtlCharacters that acts as a dictionary collection contains the loaded characters and its names.

For loading a character we use the Load() method of the Characters collection. For getting the loaded character we use the Character() method that returns an object of type AgentObjects.IAgentCtlCharacterEx that represents the loaded character.

Here’s the code for loading the character Merlin and showing it:

// Here I'll specify the character name
// and the character location.
this.axAgent1.Characters.Load(&quot;Merlin&quot;,
@&quot;C:WindowsMSAgentcharsmerlin.acs&quot;);
AgentObjects.IAgentCtlCharacterEx character = this.axAgent1.Characters.Character(&quot;Merlin&quot;);
// A single argument contains a value to whether
// to animate the character while showing it
// or skipping the animation.
// You should set this to null or false.
// Set it to True to skip the animation.
character.Show(null);

In addition to Load() and Character() methods of the IAgentCtlCharacters object, you can use the Unload() method to unload a given loaded character.

Also IAgentCtlCharacterEx supports the Hide() method for hiding the character, that’s opposite to the Show() method.

Ordering the Character to Speark

After creating the character and showing it you can order it to speak whether a specific text using the Text-to-Speech (TTS) Engine or a recorded audio file. The following code segment makes the character introduces itself using coded string first and a recorded audio file next:

As you have noticed, the Speak() method can take either a text for the first argument or a file path for the second argument. Figure 5 shows Merlin while speaking the text.

Figure 5 – Merlin Speaks

Take a look at the Think() method, it is very similar to Speak().

Moving the character

You can move the character around the screen by using the MoveTo() method.

character.MoveTo(100, 100, null);

This method takes three arguments, the first and second argument is the location on the screen that you wish to move the character to. The third argument takes a value indicating the moving speed. If this argument is null, then the default value 1000 is used. If you specified the value 0 the character will move without playing an animation. Obviously, a value under 1000 will slow down the character move, and a value greater than 1000 will speed up the move.

I had reasons to merge this section with the animation section. But, I had reasons too to put it in a single section, I think it’s a special type of animations!

Character Animation

Every character has its own animations. Although, many characters share many animation names but that’s not required.

Every animation has a name. And to get the animation names that a character supports, you need to dig into the AnimationNames property that’s of type AgentObjects.IAgentCtlAnimationNames which implements the System.Collections.IEnumerable interface.

Be aware that animation names that contain directions like left and right, it’s the left and right of the character not you.

In addition to the Play() method, the character object supports two methods for stopping playing animations, Stop() and StopAll(). Stop() stops the animation that’s specified by its single argument. StopAll() stops all animation of a specific type. This method has a single argument that can takes a string value of three “Move”, “Play”, and “Speak” to stop moving, animation or speaking commands. In addition to the three values, StopAll() can take a null value, and that means stopping all types of animations.

You might have noticed that many methods of the character object returns an object of type AgentObjects.IAgentCtlRequest. This type encapsulates a request to the character. If IAgentCtlRequest.Status is non-zero then the operation failed, otherwise succeeded.

Pay attention to the methods that return IAgentCtlRequest object, and the methods that requires it as an argument like Stop() method.

Another type of animation is gesturing to a specific point. Try this animation using the GestureAt() method.

IAgentCtlCharacterEx Properties

This list contains the most common properties of the character object (IAgentCtlCharacterEx):

AutoPopupMenu:
A Boolean property indicates whether to allow the character pop-up menu or not. Pop-up menu is the menu that is shown when the user right-clicks on the character or its icon -if exists- on the taskbar. Default value is False.

Balloon:
This is a read-only property that’s in turn contains read-only properties that indicate the characteristics like the back color of the balloon displayed while the character talking. This property contains many properties that most are self-explained from its names.

Left and Top:
Controls the character location on the screen. (It is better using the MoveTo() method instead of these two properties if you want to play an animation while moving the character).

MoveCause:
A read-only property returns a value indicates what caused the character’s last move. This property can return one of 5 values:

0:
The character has not been moved.

1:
The user moved the character.

2:
Your application moved the character.

3:
Another client application moved the character.

4:
The Agent server moved the character to keep it onscreen after a screen resolution change.

Pitch:
A read-only property returns a value indicates the Pitch setting of the TTS.

SoundEffectsOn:
Set this property to True to enable the sound effects, of False to disable it. Default is True.

Speed:
A read-only property indicates the speed of the character. (Also there’re some operations that supports specifying the speed like the MoveTo() method).

VisibilityCause:
A read-only property returns a value indicates the character’s visibility state. This property can return one of 8 values:

0:
The character has not been shown.

1:
User hid the character using the command on the character’s taskbar icon pop-up menu or using speech input..

2:
The user showed the character.

3:
Your application hid the character.

4:
Your application showed the character.

5:
Another client application hid the character.

6:
Another client application showed the character.

7:
The user hid the character using the command on the character’s pop-up menu.

Agent Server is the hard-core component that controls the Microsoft Agent behind the scenes. It controls all the characters loaded by your application and other applications and Websites. It’s the one which servers all orders for every Microsoft Agent character. Worth mentioning, it’s contained in the executable file AgentSvr.exe located on %windir%MSAgent. This file will be launched automatically when the first character loaded, and will shuts down automatically when the last character unloaded. After all, you can think about the Agent Server as the hard-core internal component of the Microsoft Agent.

AxAgent Control Events

An odd behavior of that control is that it defines events that related to the character and most likely to be located in the character object itself. Maybe it’s a feature to be located here because AxAgent controls the birth and death of the characters, but most of the time you will find that annoying.

The most widely-used events are:

BalloonShow and BalloonHide:
Occur when a balloon of a loaded character shown or hid.

ClickEvent and DblClick:
Occur when a loaded character clicked or double-clicked.

DragStart and DragComplete:
Occur when a user tries to drag a loaded character.

ShowEvent and HideEvent:
Occur when a loaded character shown or hide, whether the user who made this or your code. If you want to check what caused the character to be shown check the IAgentCtlCharacterEx.VisibilityCause. As a refresher, read the last section to know about this property.

MoveEvent:
Occurs when a loaded character moved whether the user moved it or it’s moved by your code. See the IAgentCtlCharacterEx.MoveCause in the previous section if you want to know what caused the character to be moved.

RequestStart and RequestComplete:
Occur when a request being restarted or after it completed. Note that many methods accept and many others return IAgentCtlRequest objects that defines requests sent to the server.

Creating a Custom Popup Menu

After you have included your desired agent in your application, are you feeling bad with the default popup menu? If so, then you are in the right place.

Now, let’s get it. Get to the code that loads the agent character (e.g. calls the Characters.Load() method of the agent control object, AxAgentObjects.AxAgent) and just disable the AutoPopupMenu flag/property of the character object, AgentObjects.IAgentCtlCharacterEx. This flag/property determines whether to allow the default popup menu or not.

Next comes the interesting point. When the character is clicked, the ClickEvent event of the agent control (AxAgent) fires. So the next step is to handle this event and to provide your code that brings up your custom context menu. Consider the following code:

Creating a Microsoft Agent controller

When dealing with Microsoft Agent, there’s recommended practices. One of is abstracting the Microsoft Agent implementation code from the UI code for achieving the maintainability and usability goals, and these goals can be achieved through a controller class that encapsulates the functionality that you need and provides easy access to lots of operations using a single call. A good controller class example is like this:

Freeing up Resources

After finishing your work with Microsoft Agent you should first unload the character by calling AxAgent.Characters.Unload() method. Second, because all objects of the RCW (Runtime-Callable Wrapper) assemblies created earlier are just a wrapper around the unmanaged COM objects, they exhaust a great amount of unmanaged resources that can retain in memory for a long time, you strictly should always dispose these object when you have done with it.

Fortunately, AxAgent control inherits indirectly from the IDisposable interface, so you can easily dispose it by calling its Dispose() method. Conversely, most objects (like the character object IAgentCtlCharacterEx) in the RCWs do not implement this interface, so you need to do it the hard way and use the Marshal class to dispose it. Here’s the code that can do that:

The FinalReleaseComObject() simple tries to decrement the references of the COM object to 0. You need to call FinalReleaseComObject() again and again until it returns 0, means that no references remain.

FinalReleaseComObject() tries to decrement the reference count to 0 in one time, ReleaseComObject() tries to decrement it gradually. So ReleaseComObject() needs more calls than FinalReleaseComObject(). and FinalReleaseComObject() is faster than ReleaseComObject().

Decrementing object’s references means disposing it.

You learned how to free-up the resources. But, there’s a question now. At which point should you dispose objects? It depends! The point at which you need to dispose objects depends on your use of objects. First, if you finished working with the AxAgent control and you do not want it any more till the user closes your application, it’s recommended that you dispose it as soon as possible. Otherwise, add the Dispose() call to the protected Dispose(bool) method of the form, so that it disposed when the form closes (honestly, disposes). The following code snippet demonstrates this:

You can step to the Dispose() method from the Members combo box in the top-right of the Code Editor window of the form.

A programming extremist like me (one of the Qaeda squad programming team :P) always disposes his objects even if the application is shutting down. To be honest, when the application shuts down all resources and memory managed by the application will be released immediately, so you actually do not need to release objects while the application shuts down.

On the other hand, for some objects like IAgentCtlCharacterEx you should take the Marshal class approach. But where to use it?

A great location for the FinalReleaseComObject() method (or ReleaseComObject() as well) is in the controller class. You can make your controller class implements the IDisposable interface and add your code inside the Dispose() method. Check the following example:

You should dispose the controller class as soon as you finish using your character, or even with the form disposal along with the disposing of the AxAgent control.

The Last Word

Needless to say that you can take advantage of the Microsoft Agent and TTS technology to produce applications that provide friendly tutor for the user.

You are not end up using existing Microsoft Agent characters, you can create your own characters and animations using the Microsoft Agent Character Editor that’s available for download in the Microsoft Agent SDK download page.

Microsoft Agent Compatibility

Microsoft announced that Windows 7 would not be compatible with Microsoft Agent. However, you still can install the SDK and run your applications correctly.

Where to Go Next?

Good references for Microsoft Agent are:

Microsoft Agent SDK Documentation. Available for download with the SDK, see “Micrososft Agent SDK” section on the top.

Book: Microsoft Agent Software Development Kit.
ISBN: 0-7356-0567-X

Book: Developing for Microsoft Agent.
ISBN: 1-5723-1720-5

A Real-world Application

This lesson does not contain a sample project, it contains a real-world application, Geming PartIt! that’s used for parting large files into small pieces for merging them later, so you can send them via e-mail or storing them on CDs and Floppy Disks. This application uses Microsoft Agent character Merlin to provide a friendly guide for the user through the application.