Getting to Know Machine Learning

Machine learning is critical to organizing and making use of the ever-increasing amount of data being generated every day about user preferences and behaviors.

Article No :773 | December 14, 2011 | by Huyen Tue Dao

Artificial intelligence (AI) is usually relegated to the domain of science fiction, video games, abstract academic work, and the occasional chat bot. However, AI isn’t just about robots or chess games; it’s a tool. AI systems already exist in our devices and websites, and some even have the ability to learn. The idea of machine learning may still seem futuristic, but it is a relevant and important tool for all sorts of technology applications, including UIs and user experiences.

Machine learning is a branch of the AI field that focuses on the creation of algorithms that evolve and change based on experience. To a machine, experience is data; this could be pre-collected data handed to the algorithm (such as a customer’s purchase history) or it could be real-time data such as a robot’s sensor information. A machine learning algorithm processes this data, draws conclusions, and makes inferences, all towards the goal of becoming better at what it does.

With this in mind, many popular devices, applications, and websites have implemented different types of machine learning. A common one is adaptive content: the content presented is tailored to meet user needs and habits. Adaptive content is ubiquitous on the Web: recommendations for similar products on e-commerce sites like Amazon, friend suggestions on Facebook, dynamic ad-serving by Google AdSense, and other sites that serve content tailored specifically to each user.

Recommendation systems utilize data mining to make inferences about a user’s interests as well as to filter out irrelevant content. Filtering is based on both explicit feedback from the user (ratings and reviews, wish lists, etc.) and implicit feedback from user interaction and behaviors (purchase history, time spent viewing certain items, etc.). Collaborative filtering takes this a step further by grouping users who have similar interests and pooling explicit and implicit feedback for recommendations. Collaborative filtering is a huge component of Amazon’s recommendation system. In fact, the Kindle Fire product page touts the Amazon Silk browser’s use of Amazon’s collaborative filtering and machine learning arsenal as one of its strengths.

Another example of a well-known and widely used recommendation system is Netflix. Netflix tracks what its users watch and tries not only to make suggestions to users but also tries to predict how the user will rate the suggestions. Towards the goal of improving the system’s machine learning algorithms, Netflix held a number of Netflix Prize competitions. Netflix provided a sample data set consisting of users, movies, ratings, and the dates of the ratings. The $1 million contest prize would be awarded to whoever could develop a system that predicted most accurately how the same users rated movies in a different, qualifying set. The winning team, BellKor’s Pragmatic Chaos, incorporated a novel type of implicit feedback they called “frequency.” The team noticed that when users rated several movies at the same time, they were usually rating older movies. Furthermore, users rated movies seen a long time ago in a different way than ones they had seen more recently. These observations helped BellKor to improve the accuracy their winning algorithm in predicting user ratings by improving its ability to discover and describe patterns in a user’s ratings.

The Netflix Prize is also a good example of supervised learning. In supervised learning, an algorithm is given a training data set containing inputs and expected outputs and infers the relationship between the two. In contrast, unsupervised learning utilizes pattern-finding to infer the structure of data rather than relying on training data.

Adaptive navigation is another form of machine learning that changes the user’s paths to information or tasks to make interactions more efficient. Adaptive navigation, though, has had a more troubled adoption than adaptive content. One example of unsuccessful adaptive navigation is Microsoft Word 2000’s adaptive menus, which would hide infrequently used items and push frequently used items at the top of menus. While getting to the frequently used items became more efficient, trying to find those infrequently used items could be difficult if users did not know where to look.

Search engines have become an integral part of the web experience, and how “good” a search engine is depends greatly on how effectively it ranks results. As search engines have progressed, so has search engine optimization (SEO). Search engine rankings have become skewed by “content farms,” sites with low-quality content that utilize SEO techniques to gain high rankings. In 2011 Google introduced a set of changes to their search algorithm to help separate low quality from high quality content. These changes were nicknamed “Panda” after Google engineer Navneet Panda. The details are obviously secret, but from information provided in interviews, it seems that Panda depends on training data gathered from testers who were asked qualitative questions about the quality of site content. The challenge for Google engineers was to create a model that reflected the cues and intuition of human testers to determine what differentiates high quality and low quality content. SEO experts speculate that Panda is learning from user behavior and usage metrics, assessing and ranking based on not just whether a site has good content but also whether it provides a good experience.

Another growing area for both user experience and machine learning is natural language processing (NLP), which is human–computer interaction via natural written or spoken language. The applications of NLP include speech recognition, voice command, document search and retrieval, and translation. Historically, the approach to NLP was to manually code language knowledge and rules into an algorithm. This approach has limitations: creation of the rule system is time-consuming and complex, and the approach is not always robust when dealing with novel input.

The use of machine learning has greatly improved NLP as a method for automating the construction of dictionaries, inferring grammar, parsing syntax, analyzing sentiment and emotion, translating from one language to another (machine translation), understanding and answering questions, and myriad other tasks. Siri, the personal assistant for iOS, is a great example of NLP. The speculation is that Siri’s voice recognition is powered by NLP technology from Nuance Communications. Apple’s FAQ on Siri states that over time it learns a user’s accent and voice characteristics and that as its user base increases, it will continue to improve overall. There are some critics of Siri and voice interface in general, but thanks to advances in NLP, voice interface may one day become as ubiquitous as touch.

As users and consumers, we continue to demand stronger, better, and faster technology. As part of the effort to meet these desires, machine learning has quickly pervaded technology. Some specific applications of machine learning may take off and others may fizzle, but we already have grown accustomed to and even expect effective recommendation systems, personalized content, and accurate searches. With the vast amounts of user data available for algorithms to consume, with the ever-increasing computation power of our devices, and with the persistent demands of consumers, machine learning will become an everyday part of our technology experiences.