Introduction

Perceptor is an artificially intelligent guided navigation system for WPF. Perceptor tracks a user's behaviour while he or she
interacts with the user interface. Changes to the DataContext of a host control indicate
user navigation behaviour, and induce the training of a neural network.
Knowledge acquired by the neural network is used to predict the IInputElements
to which a user may intend to navigate. This accelerates interface interaction, improves user efficiency,
and allows for dynamic and evolving business rule creation.

Background

Last year (2008), I was asked to implement some business rules for an ASP.NET application.
Part of this application was designed to allow high volume data entry, and used a tabbed interface
with which staff would navigate and manually validate and amend information.
The rules I implemented were designed to streamline this process.
At the time it struck me that hardwiring the behaviour of a user interface,
based on business procedures, was too rigid. The way people work changes, and the way
an application is used varies from user to user. Moreover, refinement of such rules overtime leads
to increased maintenance costs, and to the retraining of staff to handle new improved application behaviour.

I envisioned a system where we could let the users define the behaviour by using it.
A system that could learn how to respond by itself.
To this end, this article and the accompanying code are provided as a proof of concept.

A Neural Network Driven Interface

Even though we have at our disposal terrific technologies such as WPF to build dynamic and highly reactive interfaces,
most interfaces, albeit rich, are in themselves not smart; they employ not even a modicum of AI when responding to user interaction.
Perhaps one may liken intelligent interfaces to the flying car; they are both much easier to do in sci-fi, are both
the next step in the evolution of the technology, and both take a lot of refinement to get right.

My wish is that I want the interface to know what I want, and to learn about me.
But I also want it to do this in a way that doesn't bother me by making poor assumptions,
and that is probably one of the biggest challenges.
If running out of petrol requires a crash landing, then I'd prefer to remain land bound.

We've all seen how artificial neural networks (ANN) can be used to do things such
as facial and optical character recognition. Indeed they work well at pattern recognition
where there exists well defined training data, and it appears that we are able to leverage
the same technology, albeit in a different manner, to recognize user behaviour as well.
There are, however, a number of challenges, such as dealing with temporal based progressive training,
because training data is not predefined; the network is trained as we go.
An advantage of using an ANN is that we are able to provide predictions
for situations that haven't been seen yet.

Perceptor uses a three layered neural network, which becomes associated
with a host ContainerControl and a DataContext type.
In this article we will not be looking at neural networks,
as there are already some very good articles here on CP. I recommend taking a look at
Sacha Barber's series of articles
on the topic if you are new to neural networks. I will mention though that during experimentation it was realised that
a future enhancement might be a Long Short Term Memory (LSTM) implementation.
In this prototype we retrain the neural network repeatedly with all inputs in order to learn progressively.

Building a flying car with WPF

Perceptor trains a neural network using the state of the DataContext
of a host control as input, and a list of IInputControls id's as output.
Prediction data, and the serialized neural network is saved locally when offline,
or remotely on the server when online.

Figure: Perceptor system overview.

Perceptor monitors the DataContext of a host control for changes.
By doing this, rather than tracking only the state of the controls, we are able to gather more information
about how the user is affecting the state of the system. We are able to make inferences based on not only user behaviour
but also system behaviour, as the system is capable of modifying the DataContext as a result of an internal or external event.
Put another way, if we were to merely track the controls, we would not be able to associate properties that didn't
have a visual representation in the interface.
By tracking the DataContext we can analyse the structure more deeply, and
we can even enhance how we generate the input for our neural network. We can, in effect, drill down into the DataContext
to improve the granularity and the quality of Perceptor's predictions.

Input for our neural network is generated by the NeuralInputGenerator.
This takes an object exposed by the DataContext property of a control,
and converts it into a double[], which can then be used to train or pulse
our neural network.

Here we examine the provided instance's properties and, using some
rules based on whether a property is populated etc., populate the double[].

The input generated by this method provides us with a fingerprint of our DataContext,
and indeed a discreet representation of the interface model. There is an opportunity
to refine the NeuralInputGenerator, to increase its recognition of known field types,
and even add child object analysis.

Persistence

The ADO.NET entity framework is used to access a table of prediction
data associated with a user and a control id. When Perceptor is attached to a host control,
it will attempt to retrieve existing prediction data for the user and the particular host control id.
It does this by, firstly checking if the host control has assigned the Perceptor.PersistenceProvider
attached property. If so, Perceptor will attempt to use the provider for persistence.
This extensibility point for
persisting prediction data can be utilised by implementing the IPersistPredictionData
interface.

When the window of host control is closing, Perceptor will attempt to save its prediction data.
In the sample application we associate the prediction data with a user id. The following excerpt
from the sample demonstrates how this can be done.

Sample overview

The download includes a sample application, which is meant to demonstrate how Perceptor can be used
to guide the user to input elements. It displays a list of employee names, and each when selected
populates the Employee Details tab and Boss panel of the application.

Figure: Opening screen shot of sample application.

Each time a field is modified, causing a modification to the DataContext,
the ANN is pulsed, and a candidate input prediction is taken. If the prediction's
confidence level is above a predefined threshold, the user is presented with the option
to navigate directly to the predicted input control.

An overview of Perceptor's learning process is illustrated below.

Learning Phase

Figure: Learning Phase

Once Perceptor has acquired enough knowledge to make confident predictions,
it can be used to navigate to predicted elements.

Predictive Phase

Figure: Predictive Phase

A feature of Perceptor is automatic expansion when the predicted
element happens to reside in an Expander. This expansion occurs
as soon as a confident prediction is detected.

In the sample application we can see how a confident prediction of an element
is highlighted.

Figure: Perceptor guides the user to the next predicted element.

Shifting Control Focus

Deterministic focus shifting in WPF can be tricky. When we call Focus() on a UIElement
there is no guarantee that the element will gain focus. That is why this method returns true if it succeeds.
In Perceptor we use the FocusForcer class to move focus within the user interface.
UIElement.Focus() returns false if either IsEnabled, IsVisible
or Focusable are false, and true if focus is shifted.
Yet when performed on the same thread that is handling e.g. PreviewLostKeyboardFocus
of the currently focused element ϑ, the call will return false as ϑ won't be ready to relinquish focus.
Thus we use our FocusForcer and an extension method to perform the change of focus in the background if required.
The following excerpt shows how FocusForcer attempts to focus the specified element.

The PersistenceProvider property is not necessary. But it exists so that we can customize how the user's
prediction data is saved between sessions. In the example download we use the window to transport the prediction data
to and from the ILearningUIService WCF service. As this it is a hybrid smart client, Perceptor allows
the user to work offline if the service is unavailable, and will fall back on persisting the prediction data
to the user's local file system if the PersistenceProvider is unavailable or raises an Exception.
The following excerpt shows the IPersistPredictionData interface.

NavigateForwardIs used to change focus to the next predicted UIElement.

NavigateBackwardIs used to return to the UIElement that previously had focus.
When NavigateForward is performed, the current element
with focus is placed on a stack.

ResetLearningIs used to recreate the neural network, so that previous learning is forgotten.

Service Channel Management

In order to manage channels efficiently I have implemented a class called ChannelManagerSingleton.
In a previous article
I wrote a little about the Silverlight incarnation, so I won't restate things here. I will, however, mention that since then
I have produced a WPF version (included in the download) with support for duplex services.
Duplex services are cached using the callback instance and service type combination as a unique key.
In this way, we are still able to have centralised management of services, even though a callback instance is involved.
The following excerpt shows the GetDuplexChannel method in full,
and how duplex channels are created and cached.

Unit Testing WPF with White

Black-box testing can compliment your existing unit tests. One advantage of black-box testing that I quite like
is that we are testing functionality within a real running environment, and interdependencies are also tested.
Another advantage is that tests remain independent of any implementation. For example, during the development
of Perceptor I changed much of the implementation, yet I was able to leave my black-box tests alone.
In the past I have used NUnitForms for black-box testing.
This was my first foray into black-box testing in WPF, and I needed to find another tool because NUnitForms
doesn't support WPF. So I decided to give the White project a try.
White uses UIAutomation,
so can be used with both Windows Forms and WPF applications.

Getting started with White merely involves referencing the White assemblies
and starting an instance of our application in a unit test, as the following excerpt shows.

In order to have Perceptor not attempt to use the WCF during the test,
we use an argument to let it know that it is being black-box tested. Once we start
the application we use White to get a testable representation of the application.

The test method uses the window instance to locate and manipulate UIElements.
Among other things, we are able to set textbox values, click buttons, and switch tabs.
It appears that some elements are not yet supported, such as the Expander
control. I was using the rather old release version, and others may be better of acquiring and building
the source via a subversion client.

Another nicety of black-box testing is that we don't need to worry about creating mocks.
There are, of course, disadvantages to black-box testing compared to traditional white-box testing.
But there's no reason why we can't use both!

Figure: Test results for unit tests.

Possible Applications

A version of Perceptor could be used in Visual Studio to present the appropriate tool window when a particular designer,
with a particular state, is selected.
Perceptor could prove especially useful in areas such as mobile phone interfaces, where the user's ability
to interact with the interface is inhibited by limited physical input controls.
Likewise, people with certain disabilities, who have a limited capacity to manipulate the user interface may also benefit.

Perhaps this kind of predictive UI technology could be classified
as a fifth-generation user interface technology (5GUI).
This suggestion is based on the way in which programming language classification,
in particular 5GL, is defined.
The following is an excerpt from the Wikipedia entry.

While fourth-generation
programming languages are designed to build specific programs,
fifth-generation languages are designed to make the computer solve a given problem without the programmer.
This way, the programmer only needs to worry about what problems need to be solved and what conditions
need to be met, without worrying about how to implement a routine or algorithm to solve them.

Over time, Perceptor learns how the user interface should behave, removing the need for programmer intervention.
Thus the classification 5GUI.

Conclusion

In this article we have seen how Perceptor tracks a user's behaviour while he or she
interacts with the user interface, and induces the training of a neural network.
We also saw how Perceptor is able to save its prediction data, either locally or remotely.
Knowledge acquired by the neural network is used to predict the user's navigation behaviour.
This allows for a dynamic and evolving interface not encumbered by rigid, predefined business rules.

Through the application of AI to user interfaces we have a tremendous opportunity to increase
the usability of our software. The burden of hardwiring behaviour directly into our user interfaces can be reduced,
and rules can be dynamic and refined over time.
By combining the visual appeal and richness afforded to us by technologies such as WPF,
we are able to move beyond merely reactive UIs, to provide a new level of user experience.

I hope you find this project useful. If so, then I'd appreciate it if you would rate it and/or leave feedback below.
This will help me to make my next article better.

Future Enhancements

Modify the neural network to use Long Short Term Memory
or an alternative progressive recurrent learning strategy.

Share

About the Author

Daniel Vaughan is a Microsoft MVP and co-founder of Outcoder, a Swiss software and consulting company dedicated to creating best-of-breed user experiences and leading-edge back-end solutions, using the Microsoft stack of technologies--in particular WPF, UWP, and the Xamarin tools.

Daniel is the author of Windows Phone 8 Unleashed and Windows Phone 7.5 Unleashed, both published by SAMS.

Daniel is the developer behind several acclaimed mobile apps including Surfy Browser for Android and Windows Phone. Daniel is the creator of a number of popular open-source projects, most notably the Calcium MVVM Toolkit.

Would you like Daniel to bring value to your organisation? Please contact

This is one of the most interesting and though-provoking articles I have read in a long time. Thank you!

I was interested to see how you encoded the DataContext into a form which the NN could 'digest'. I was wondering if it could be made even more effective by encoding the contents of each DataContext property, rather than just a flag to indicate whether it is null or not. For example, an age could be transformed into a double, this might influence whether someone working on with a data-entry app moves to a section for 'children' for example.

One thing that struck me though is what happens if the application is used in an 'update' rather than 'entry' mode. The usage pattern is likely to be quite different and less linear. Perhaps the mode could be an input to the NN as well?

However, what if the application is more complex, Visual Studio does not have data-entry and update modes. The contextual information that I personally think is vital would be very hard to capture and encode. You can't just throw everything at the NN, you need to capture the context and extract its key features to train the NN effectively.

Do you have any thoughts on this? Do you really think it could be used within Visual Studio (or Word, Excel etc ...)?

I was interested to see how you encoded the DataContext into a form which the NN could 'digest'. I was wondering if it could be made even more effective by encoding the contents of each DataContext property, rather than just a flag to indicate whether it is null or not. For example, an age could be transformed into a double, this might influence whether someone working on with a data-entry app moves to a section for 'children' for example.

There is a class called NeuralInputGenerator. It is this that translates the data context, or any object, into something that is usable by the neural network. This would be the place to add an extensibility point, and indeed to increase the level of detail captured. We could in fact perform deep analysis of an instance; drill down into child objects etc. An attribute based mechanism could also be a nice enhancement.

Colin Eberhardt wrote:

One thing that struck me though is what happens if the application is used in an 'update' rather than 'entry' mode. The usage pattern is likely to be quite different and less linear. Perhaps the mode could be an input to the NN as well?

Yes, you’re right, and there’s certainly no reason why we can’t feed that information into to the network.

Colin Eberhardt wrote:

However, what if the application is more complex, Visual Studio does not have data-entry and update modes. The contextual information that I personally think is vital would be very hard to capture and encode. You can't just throw everything at the NN, you need to capture the context and extract its key features to train the NN effectively.

Perhaps this is related to your first point, and we might employ a specialized NeuralInputGenerator to extract a more effective representation of the view. Since Visual Studio 2010 will employ WPF for the shell, we can imagine that it will perhaps afford us new opportunities to ‘hook in’ a Perceptor like system. I can imagine a similar system as demonstrated in the download, to reveal autohidden tool views such as the Toolbox, depending on the state of the designer. As designers are becoming increasingly less ‘linear’ with drilldown and nesting of designer items, and linking of complex instances in the workflow designer, a system such as this might enable new designers that were hitherto overly complex to support within the IDE.

I wrote a short theoritical paper 3 months ago called "Personalization Intelligence" about creating a global AI based engine, that can personalize the entire computer for different users, based on neural networks. I'm an undergraduate student studying AI only this semester, so mine did not have any practical implementation, or any clear working principles. I read through your article now, and can understand a bit, but was amazed by how our ideas correlate. My idea is in a bigger scale, although I never really set upon learning neural networks to implement it. After reading this, I think I might start working on it. Thanks

Hello,May be we can do it in Windows too. I was working in a automation testing project for a year. I wrote an applicaction that can hook all windows events, and MSAA events, including some interesting ActiveX controls readding memmory spaces and works like the Event32.dll, so I got the knowledge. It will be a really advance for the Accessibility Tools. What do you think? This could help a lot of people with disabilities!!!

Yes, could be interesting. The key, however, is to be able to infer what the user is intending to do based on changes in data rather than only UI changes. The reason for this is that we are able to capture more of the meaning of an action. Using Windows hooks etc may not get us there. Don’t get me wrong, I’m certain something could be achieved though.But fortunately now, WPF gives us the opportunity to accomplish this because of the ubiquity of data binding, and the ease of which we are able to ‘monitor’ it.

Quite an interesting article, Daniel. Along with standard focus switching, text auto-completion can be added. Your existing solution is for focus is great, but it would've been even better if you could incorporate it as non-intrusive feature. Examples include using Tab for focus switch (instead of button press), and to roll back to original control + 1 user would press Control, Space or peform some other least tedious action.

I deliberately chose not to use Tab. The reason, and I may be wrong, was that I believe users expect Tab to behave in the predictable manner, which they’re accustomed to.

The autocomplete text could be an extended feature. It might be better implemented though as a template in XAML and applied using a style. That way it would be more customizable, and could be readily disabled for each control. As Simon Stevens pointed out in a previous message, autocomplete text can be not what the user wants in some situations. Of course we could use an attached property for that, but again I believe a template would be the way to go with that one.

Thanks for your comment and suggestions, and I’m glad you liked the article.

I remember my college days where in the final year project, two of my friends came up with the project which predicts the behavior of user and detect whether that user is genuine or not.

i.e. they have some 20-30 set of behavior that their windows service is tracking for few days for each user who is logged in.

> How do you open start menu - by click, ctrl + esc, windows key> do you right click on the desktop and click refresh many times> How you open folder - double click, using keyboard and enter key> How many short-cuts you are using for general operations> do you make type mistake while typing your name> Etc...

So if you logged in using someone else’s password that system will logs you out with the message that you are not that use. (So if somebody stole your password then also just don’t worry. )

I recall reading an article a few years ago about authenticating users based on keystroke latency. I guess one obstacle is that the way people use the computer changes. For example, if someone breaks an arm, then they may stop using the mouse and use the keyboard instead.

It is a nice idea though.

Thank you kindly for your message, and I’m happy you liked the article.

Thanks for another great article Daniel. I enjoyed stepping through your code and seeing how you bring it all together. It would be interesting to see how it would work in a real world implementation. Keep up the writing.

One thing I would raise though is this; Personally, I hate "fluid" interfaces. I'll give you two examples.

1) The XP style start bar with most recently used programs appearing at the top level on the start bar. Before XP, I used to use keyboard short cuts to reach start menu Items. I would hit "start"->"p"->"f" to open Firefox for example. XP breaks this because you can non longer predict what the key presses will do as you may have something in the top level MRU items list that begins with "p" so your key press won't go to the "programs" item. You can no longer quickly and predictably navigate the XP start menu with keys without pausing to check your press of the "p" key has had the desired effect. You have to resort to arrow keys which adds several key presses. (As a result I used to switch my start menu back to the classic style") 2) For exactly the same reason, I dislike the "hide not recently used items" feature of the word 2002/2003 menu bars. It breaks some repeatable key press patterns by moving things around depending on when they were last used.

Using NNs to predict movement and do things for you is great for some people, but I think could frequently break an advanced users expectations, and slow them down by forcing them to always check how the NN has reacted to their actions before moving on. Experienced users work by knowing in advance how the system will react to their actions so they don't have to pause after each action to check they get the response they expected. A beginner typist for example will search for a key, press it, look up to check the response was as expected, then proceed to the next key. An advanced user will type without checking the response, because they just know it will respond correctly. Have you ever noticed if you are typing and the computer locks up or hangs for a few seconds you will naturally finish off the word before stopping and waiting for the computer to catch up. This is because you are and advanced user and you are just naturally expecting the computer to perform it's actions predictably like it always has done. Imagine how much it would slow you down if Word auto completed sentences for you sometimes. If you don't believe me, try writing in open office. It auto suggests word completions. It sounds very useful, but in reality I never use them because by the time I have consciously realised that it's suggested a word, I've already finished typing it.

There are ways that this could be done without getting in the way of the user though. (By providing an always fixed and predictable short cut key for "start"->"programs" for example)

Anyway, it's not a criticism at all, just something to think about. I know your just proposing ideas, and I think the articles great, very thought provoking, and certainly something I suspect will slowly start to happen in the future. Sorry I've gone on a bit, it just got me thinking.

I see Perceptor's role, or a system like Perceptor, as more like the Google "Did you mean" suggestion list. Where, the system doesn't automatically do things for us, but provides us with suggestions and the capability to enact those suggestions. Also, refinement of the learning mechanism might also serve to improve the experience, so much so that the user would get 'a feeling' for how the system learns, and the learning would truly reflect the behaviour of the user.

Auto-complete for a word processor may turn out to be less useful than in e.g., a code editor like in Visual Studio, where the user doesn't have every API memorized. Here, where there is an absence of domain knowledge, then it is useful. If the user has a descent grasp of the domain, such as with the English language, then it could be annoying. Perceptor is aimed at those cases where the user may find it time consuming to determine what needs to be done next.

I have to agree. While I found the article excellent, the applicability to high volume data entry, which was mentioned in the background as an impetus, seems like a poor fit. So much of data entry is "heads down", wherein the entrant knows from experience how many times to hit the tab or enter key to get to a particular field. They rarely need to look at the screen if they are good. If they have to pause to consider how the system has adapted to their usage pattern, then their data entry rate will go way down.

It is an excellent article nevertheless. I'm sure there are situations for which this would be a nice fit. I like the notion of things being semi-automatic, as with intellisense, in which the system indicates or suggests what it thinks you might like to do and then makes the option easily accessible, without otherwise getting in the way.

I agree, we wouldn’t want to interfere with the input of data. Indeed, the scenario I was referring to in the article dealt with partially completed data, and was a case of guiding the user to the most applicable ‘section’ of a form, where there was incompleteness.

This is an awesome article Daniel. I do not know how practical it is, but I imagine this is going to get A LOT of interest.

I have just examined and run the code, it works well, it got it right after 2 property changes, which is very impressive, but I guess the Network is only trying to find 1 output and is only taking a few inputs (I guess as many properties as the DataContext has in the GenerateInput() method)

I mean underneath the surface this is a standard NN but you have applied it very well and in an interesting way.

I would be curious to see how well it does on a large DataContext object with many properties. Did you try that?

Still its all great. Have a 5

PS : Thanks for the link to my NN articles.

Sacha Barber

Microsoft Visual C# MVP 2008/2009

Codeproject MVP 2008/2009

Your best friend is you.I'm my best friend too. We share the same views, and hardly ever argue