It’s now just over a month since I joined SmartNews and I am digging into what’s under the hood and the mad science that drives the deceptively simple interface of the SmartNews product.

On the surface, SmartNews is a news aggregator. Our server pulls in urls from a variety of feeds and custom crawls but the magic happens when we try and make sense of what we index to refine the 10 million+ stories down to several hundred most important stories of the day. That’s the technical challenge.

The BHAG is to address the increased polarization of society. The filter bubble that results from getting your news from social networks is caused by the echo chamber effect of a news feed optimized to show you more of what you engage with and less of what you do not. Personalization is excellent for increasing relevance in things like search where you need to narrow results to find what you’re looking for but personalization is dangerously limiting for a news product where a narrowly personalized experience has what Filter Bubble author Eli Pariser called the “negative implications for civic discourse.”

So how do you crawl 10 million URLs daily and figure out which stories are important enough for everyone to know? Enter Machine Learning.

I’m still a newbie to this but am beginning to appreciate the promise of the application of machine learning to provide a solution to the problem above. New to machine learning too? Here’s a compelling example of what you can do illustrated in a recent presentation by Samiur Rahman, and engineer at Mattermark that uses machine learning to match news to their company profiles.

The word relationship map above was the result of a machine learning algorithm being set loose on a corpus of 100,000 documents overnight. By scanning all the sentences in the documents and looking at the occurrence of words that appeared in those sentences and noting the frequency and proximity of those words, the algo was able to learn that Japan: sushi as USA : pizza, and that Einstein : scientist as Picasso : painter.

Those of you paying close attention will notice that some the relationships are off slightly – France : tapas? Google : Yahoo? This is the power of the human mind at work. We’re great with pattern matches. Machine learning algorithms are just that, something that needs continual tuning. Koizumi : Japan? Well that shows you the limitations of working with a dated corpus of documents.

But take a step back and think about it. In 24 hours, a well-written algorithm can take a blob of text and parse it for meaning and use that to teach itself something about the world in which those documents were created.

Now jump over to SmartNews and understand that our algorithms are processing 10 million news stories each day and figuring out the most important news of the moment. Not only are we looking for what’s important, we’re also determining which section to feature the story, how prominently, where to cut the headline and how to best crop the thumbnail photo.

The algorithm is continually being trained and the questions that it kicks back are just as interesting as the choices it makes.

A story about President Obama playing a round of golf. Is it a sports story or is it a political story?

The push and pull between discovery, diversity, and relevance are all inputs into the ever-evolving algorithm. Today I learned about “exploration vs. exploitation”. How do we tell our users the most important stories of the day in a way that covers the bases but also teaches you something new?

in the documentary Hearts of Darkness, A Filmmaker’s Apocalypse Francis Ford Coppola’s wife, Eleanor, chronicled the filming and production of her husband’s masterpiece Apocalypse Now. It’s an fascinating film, a meta-commentary of the American entertainment industry as a metaphor for American imperialism and the war in Vietnam. I highly recommend it.

The clip above comes right before the credits start rolling. A weary Francis looks forward to the return of the amateur who practices film making purely for the art. It’s a prescient glimpse to the world of YouTube and Snapchat artists where we find ourselves today, a refreshing support of new art forms from a lion of the old.

To me, the great hope is that now these little 8mm video recorders and stuff have come out, and some… just people who normally wouldn’t make movies are going to be making them. And you know, suddenly, one day some little fat girl in Ohio is going to be the new Mozart, you know, and make a beautiful film with her little father’s camera recorder. And for once, the so-called professionalism about movies will be destroyed, forever. And it will really become an art form. That’s my opinion.

Variety reports that the TV sitcom Modern Family is going to film an entire episode featuring the UI of phones, laptops, and tablets as a way to tell a story. The idea came from a short film, Noah, that debuted at the 2013 Toronto Film festival and won many awards for it’s innovative commentary on our device-mediated society.

I’ve embedded Noah below (kinda NSFW, remember Chatroulette?). I look forward to Modern Family’s treatment which will air on ABC February 25th with the title “Connection Lost”

Like everyone else I too read through 17,000 word profile of Apple Design chief Jony Ive. It’s extensive and well worth your time if you want to get a sense of the scope of Apple’s vision and how they think about design.

What struck me most was the passage below which shows you just how much of a lead Apple has when it comes to it’s intellectual property. It’s not just the idea, it’s not even the physical design of their products, the materials or dimensions. Apple design IP extends to how their products are made, the speed and force with which the tools cut the metal.

“Years ago, you thought you’d fulfilled your responsibility, as a designer, if you could accurately define the form”—in drawings or a model. Now, Ive said, “our deliverable just begins with form.” The data that Apple now sends to a manufacturer include a tool’s tracking path, speed, and appropriate level of lubricant. Ive noted that the studio’s prototyping expertise creates the theoretical risk of beautiful dumb ideas.

Two perspectives of the modern war correspondent in this age of the personal brand and selfie sticks.

We want our anchors to be both good at reading the news and also pretending to be in the middle of it. That’s why, when the forces of man or Mother Nature whip up chaos, both broadcast and cable news outlets are compelled to ship the whole heaving apparatus to far-flung parts of the globe, with an anchor as the flag bearer.

We want our anchors to be everywhere, to be impossibly famous, globe-trotting, hilarious, down-to-earth, and above all, trustworthy. It’s a job description that no one can match.

The correspondent retelling war stories surely knows that fellow correspondents had faced the same dangers or worse. More important, they knew that the GIs or Marines they were on patrol with or with whom they were sharing an outpost faced these and greater dangers every day. The troops obviously were the story; not the reporter. To brag about one’s own little brush with danger was unseemly; it was simply bad form.

David Carr left us today. He was simultaneously optimistic about the adaptability of the news media in the modern age while pessimistic about the agility of the institutions that were the keepers, underwriters, and employers of those that practice the craft in its current form. He was conflicted which way to go, like a man astrid two ice floes drifting apart.

Their tiny netbooks and iPhones, which serve as portals to the cloud, contain more informational firepower than entire newsrooms possessed just two decades ago. And they are ginning content from their audiences in the form of social media or finding ways of making ambient information more useful. They are jaded in the way youth requires, but have the confidence that is a gift of their age as well.

Following a month off after my unexpected liberation from Gigaom, I started this week as Director of Media & Technology Partnerships at SmartNews. I feel very fortunate to have discovered this company at a time when I believe I have a lot to offer.

While researching the company, I was delighted to learn they had hired Rich Jaroslovsky. Rich and I crossed paths a few times when I was working at Dow Jones as he was getting wsj.com off the ground. We both have a fascination with technology’s impact on media and I shared his mission to bring The Wall Street Journal online. We had since gone our separate ways but I always admired his love and respect for good journalism as a writer, editor, and business guy.

Rich explained to me that SmartNews thinks of itself as a machine learning company with a news front-end which is right in the nexus of what makes me tick. The co-founders, Ken Suzuki and Kaisei Hamamoto, are super-sharp engineers who see news discovery as an interesting problem to solve and hugely important for society to get right. To give you a sense for how they think, as they look for real estate for their San Francisco office, Ken and Kaisei each created their own interactive maps showing the locations of high tech startups and compared notes to determine that the area of 2nd and Howard was the ideal spot to focus their search.

I made my pitch (excerpted below) and here I am!

—

Two of the hardest challenges for the publishing industry are distribution and advertising. When publishers moved online, they had to reinvent their traditional distribution channels and navigate a new landscape.

Initially it was the portals such as Yahoo and AOL that would curate the best of the web. Advertising was also sold this way, manually curated and matched to broad channels of interest maintained by the portals.

As technology improved, search engines such as Google automated discovery and matching a reader’s interests to a publisher’s content. Advertising was automated and optimized via keyword matching and auction systems to extract maximum value. Distributed widgets allowed publishers to embed advertising into their sites and a combination of publisher tags and indexing that allowed them to take advantage of an ad network’s inventory.

Social media platforms have recently taken over as a source of traffic for publishers and content snippets shared via these networks represent the fastest growing segment of inbound readers for a publisher.

A common thread to success across all these channels is attractive representation of a publisher’s content within each distribution channel. Whether it’s meta-data, SEO, or “social media optimization,” each new distribution channel has spawned a new method of representing your content to the service which is doing the crawling and aggregation.

For a new distribution channel both the crawling and aggregation algorithms are key to successful presentation of content and relevant advertising to the reader.

Technology has enabled effortless distribution of news so the looming challenge is not so much the distribution of content but more its discovery and presentation. Social media burnout and personalization algorithms are still very basic and often push more and more similar content to the reader resulting in a “filter bubble” which shows the reader only what they want to see or worse, what they already know.

Working with publishers to find them new sources of readership and readers to teach them something they didn’t know is an important goal that aligns with my interests. The fact that the team is based in Japan, a culture with a strong culture of news readership, is attractive to me as I am a big fan of introducing Japan to the rest of the world.

What if we follow the trend of the “app-ificaiton” of media to the next logical step? What if Snapchat’s Discover feature is just the modern version of network television where channels control distribution and readers become passive again, replacing their allotted 5 hours of TV with 5 hours of browsing Facebook, Twitter, Snapchat and the rest?

If in five years I’m just watching NFL-endorsed ESPN clips through a syndication deal with a messaging app, and Vice is just an age-skewed Viacom with better audience data, and I’m looking up the same trivia on Genius instead of Wikipedia, and “publications” are just content agencies that solve temporary optimization issues for much larger platforms, what will have been point of the last twenty years of creating things for the web?

We all know this stuff goes on. Lobbyist ghost writes letters for elected officials or even drafting legal amendments to try and turn their way towards their clients. But it’s not pretty when you see it in broad daylight like this. The latest exhibit is from Comcast who is using their influence to fabricate support for their proposed merger with Time Warner Cable, creating the essentially the largest ISP monopoly in the nation.

For those of you who forgot what it was like the last time a single communications provider was the only game in town, I present you with Lily Tomlin who, on Saturday Night Live in the late 70s, skewered the then dominant AT&T on a regular basis.

I never got around to writing about the Search and Alerts products I worked on while at Gigaom. Using native WordPress features and extending it just a bit, we were able to build a full-fledged faceted search engine and notification platform at a fraction of the cost of what it cost to do when I was at Factiva.

search.gigaom.com pulled in content from across gigaom.com, research.gigaom.com, and events.gigaom.com and presented results in a way that allowed you to filter by tags and explore relationships between tags applied on to the content. Built in was a well structured taxonomy and basics smarts which would map a keyword to the appropriate tag.

Gigaom Alerts solves a different problem. While search allows you to search back in time through the archives (which at Gigaom were a significant portion of their total traffic), Alerts let’s you, in a sense, look forward. One of the problems of a media site is that it is often not a destination. Visits come by way of an app or aggregator so the challenge is getting your readers to return. Newsletters are one way but we are experiencing a proliferation of newsletters competing for readers’ attention.

Alerts was built as a way to store a standing query which would deliver notification if and only if there was new content which matched that query. Results are highly relevant because the alerts are constructed by those who read them. If you explicitly state your interest in “Nest” or “Tony Fadell” then there is a high likelihood that you will click thru on a notification of new articles about those topics. Indeed, we did see high engagement from readers that came in via Gigaom Alerts, they stayed on the site longer and read significantly more pages per session the our average readers.

Gigaom Alerts leverages the native WordPress post-taxonomy architecture so that you can have scale to a large number of individual alerts without a significant cost.

Each saved alert is a post

The terms for the alert are taxonomy terms on the post

The author of the post is the user to be alerted

WordPress VIP kindly archived a talk that Casey Bisson did at one of their meetups which I’ll share here along with a link to the slides.

Hat tip to the folks at Followistic.com who let me know that Casey’s session was posted. If Gigaom Alerts sounds interesting to you, I’d check them out. They have built a plug-in which works much the same and is super-easy to install if you’re running WordPress.