Teen’s iOS App Uses Complex Algorithms to Summarize the Web

Image: Ariel Zambelich/Wired.com

Nick D’Aloisio, a 16-year-old iOS developer based in London, England, sounds composed and confident on the phone. He refers to his company, Summly Limited, with a professional-sounding “we” — this despite the fact that he’s basically running a one-man operation.

While he exhibits surprising maturity for a teenager, an audible excitement in his voice betrays his youth, and suggests he’s not some jaded Silicon Valley serial entrepreneur.

D’Aloisio’s just released his newest product, Summly. The app uses advanced algorithms to summarize web content into manageable bullet points and keyword listings, which you can then share.

“We don’t take a normal approach to summarization,” he says during our conversation. By “normal,” D’Aloisio is referring to the keyword-based summarization that is commonly used in other products. For example, when you google the phrase “keyword summarization,” you get more than 262 million results.

Summly uses a more abstract method, starting with a special algorithm that extracts text from a web page using HTML processing. The app analyzes the text and regurgitates selected, condensed portions of the article as bullet points. The Summly algorithm accomplishes this using a number of machine learning techniques and “genetic” algorithms — a search heuristic that mimics evolution.

D’Aloisio developed his final algorithm by initially employing a training algorithm: His method looked at human-authored summaries of articles of various types and from various publications. It then used these summaries as models for what Summly should be spitting out, and how it should change its own metrics to better emulate the work of flesh-and-blood information curators.

Summly also looks at the topics a website covers, so individual pieces of content can be classified as relating to business, tech, sports, and so on. This helps the algorithm more accurately consolidate text.

D’Aloisio believes long lists of hyperlinks that take you straight to content-filled websites were great for Google in the early days of the web, but things have changed. “Hyperlinks aren’t effective anymore. It’s information overload,” he says. He found this particularly true of hyperlinks when he started using the Twitter app six months ago.

“I was trying to evaluate URLs, and found I was clicking in and out a lot, and the data connection was slow,” D’Aloisio says. “I thought there should be a service that lets you assess a website’s content quickly and easily.” And so, the idea for Summly was born.

The Summly app can be used to summarize search content or specific web pages. Image: Christina Bonnington/Wired.com

Of course, Summly has other benefits besides just streamlining how you access web content on your phone. I liken the concept of Summly to CliffsNotes, but for the web. And, indeed, D’Aloisio sees his tool becoming highly useful for kids working on homework, as well as for general web searching.

“I think, fundamentally, there is a real need for this on a mobile device, when you’re short for time,” D’Aloisio says.

When you search for a topic using the app, it compiles results from different search engines, so you’ll notice it doesn’t deliver the same results as a Google search, or even a Bing search. You’ll also notice that typical results like Wikipedia articles and dictionary definitions don’t show up in the listing; the search function generally appears to be limited to actual news articles relating to the subject you type in. However, you can also type in a URL if you have a specific text-heavy web page you’d like summarized.

D’Aloisio says that Summly works best with well-formulated articles that conform to a consistent structure. This lets the algorithm learn what’s important — and where to find that important information — more easily. Tech articles and news articles tend to marry well with Summly’s algorithm, as does the consistently organized content from the New York Times and the BBC. The app doesn’t do quite as well with narrative text written in the third person, but D’Aloisio says that there are no areas that are seriously troublesome to his algorithm.

In fact, because Summly is language-independent, language isn’t a barrier to its functionality. It’s currently optimized in 12 different languages (primarily Latin-based), but will soon expand to Chinese now that Summly has the backing of Hong Kong billionaire investor Li Ka Shing.

In tests performed independently by researchers at MIT, the summaries from D’Aloisio’s patent-pending technologies performed up to 30 percent better than other existing algorithms. D’Aloisio says that to get this number, they took a corpus of past documents and articles and compared the quality of human summaries to Summly’s output. From this, they derived a recall/precision score. That was then tested against other algorithms.

Truth be told, the app isn’t perfect. It will sometimes include dates or minor numerical figures as bullet points, or an expository sentence in the opening paragraph of an article that doesn’t actually contain any rich information. Also, if the content of a site is under 500 characters, Summly won’t provide a summary — because the site content is already pretty concise at that point. Generally, though, the app does a decent job of picking out three to four key points of the page it’s summarizing, and it does so remarkably fast.

What’s next for D’Aloisio and Summly? The teenage developer, who’s been featured in publications like GigaOm, Forbes, and Wired’s App Guide, plans to release a web app version of the iOS app for use on desktop browsers early in the new year. D’Aloisio says he has “other ideas and aspirations,” but for now he’s happy to continue working on and improving Summly.