---
title: Predicting Google closures
description: Analyzing predictors of Google abandoning products; predicting future shutdowns
created: 28 Mar 2013
tags: statistics, archiving, predictions, R, survival analysis
status: finished
confidence: likely
importance: 7
...
> Prompted by the shutdown of Google Reader, I ponder the evanescence of online services and wonder what is the risk of them disappearing.
> I collect data on [350 Google products](#sources) launched before March 2013, looking for [variables predictive of mortality](#variables) (web hits, service vs software, commercial vs free, FLOSS, social networking, and internal vs acquired).
> Shutdowns are unevenly distributed over the calendar year or Google's history.
> I use logistic regression & survival analysis (which can deal with right-censorship) to [model the risk of shutdown over time](#modeling) and examine correlates.
> The logistic regression indicates socialness, acquisitions, and lack of web hits predict being shut down, but the results may not be right.
> The survival analysis finds a median lifespan of 2824 days with a roughly Type III survival curve (high early-life mortality); a Cox regression finds similar results as the logistic - socialness, free, acquisition, and long life predict lower mortality.
> Using the best model, I [make predictions](#predictions) about probability of shutdown of the most risky and least risky services in the next 5 years (up to March 2018).
> (All data & R source code is provided.)
Google has occasionally shutdown services I use, and not always with serious warning (many tech companies are like that - here one day and gone the next - though Google is one of the least-worst); this is frustrating and tedious.
Naturally, we are preached at by apologists that Google owes us nothing and if it's a problem then it's all our fault and we should have prophesied the future better (and too bad about the ordinary people who may be screwed over or the unique history^[One sobering example I mention in my [link rot page](/Archiving-URLs): [11% of Arab Spring-related tweets](http://arxiv.org/pdf/1209.3026v1.pdf "'Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost?', SalahEldeen & Nelson 2012") were gone within a year. I do not know what the full dimension of the Reader RSS archive loss will be.] or data casually destroyed).
But how can we have any sort of rational expectation if we lack any data or ideas about how long Google will run anything or why or how it chooses to do what it does?
So in the following essay, I try to get an idea of the risk, and hopefully the results are interesting, useful, or both.
# A glance back
> 'This is something that literature has always been very keen on, that technology never gets around to acknowledging. The cold wind moaning through the empty stone box. When are you gonna own up to it? Where are the Dell PCs? This is Austin, Texas. Michael Dell is the biggest tech mogul in central Texas. Why is he not here? Why is he not at least not selling his wares? Where are the dedicated gaming consoles you used to love? Do you remember how important those were? I could spend all day here just reciting the names of the causalities in your line of work. It's always the electronic frontier. Nobody ever goes back to look at the electronic forests that were cut down with chainsaws and tossed into the rivers. And then there's this empty pretense that these innovations make the world "better"...Like: "If we're not making the world better, then why are we doing this at all?" Now, I don't want to claim that this attitude is hypocritical. Because when you say a thing like that at South By: "Oh, we're here to make the world better" - you haven't even *reached* the level of hypocrisy. You're stuck at the level of childish naivete.' --[Bruce Sterling](!Wikipedia), ["Text of SXSW2013 closing remarks"](http://www.wired.com/beyond_the_beyond/2013/04/text-of-sxsw2013-closing-remarks-by-bruce-sterling/)
The shutdown of the popular service [Google Reader](!Wikipedia), announced on [13 March 2013](http://googleblog.blogspot.com/2013/03/a-second-spring-of-cleaning.html), has brought home to many people that some products they rely on exist only at Google's sufferance: it provides the products for reasons that are difficult for outsiders to divine, may have little commitment to a product[^Reader-threats-1][^Reader-threats-2][^Reader-threats-3], may not include their users' best interests, may choose to withdraw the product at any time for any reason[^Reader-popularity] (especially since most of the products are services[^cloud-suckers] & not [FLOSS](!Wikipedia "Free and open-source software") in any way, and may be too tightly coupled with the Google infrastructure[^unportable] to be spun off or sold, so when the CEO turns against it & [no Googlers](http://www.buzzfeed.com/mattlynley/google-reader-died-because-no-one-would-run-it) are willing to waste their careers championing it...), and users have no voice[^voice-utility] - [only exit](!Wikipedia "Exit, Voice, and Loyalty") as an option.
[^unportable]: From Gannes's ["Another Reason Google Reader Died: Increased Concern About Privacy and Compliance"](http://allthingsd.com/20130324/another-reason-google-reader-died-increased-concern-about-privacy-and-compliance/)
> But at the same time, Google Reader was too deeply integrated into Google Apps to spin it off and sell it, like the company did last year with its SketchUp 3-D modeling software.
[mattbarrie](https://news.ycombinator.com/item?id=5373584) on Hacker News:
> I'm here with Alan Noble who runs engineering at Google Australia and ran the Google Reader project until 18 months ago. They looked at open sourcing it but it was too much effort to do so because it's tied to closely to Google infrastructure. Basically it's been culled due to long term declining use.
[^cloud-suckers]: This would not come as news to [Jason](http://ascii.textfiles.com/archives/1717 "FUCK THE CLOUD - January 16, 2009") [Scott](http://ascii.textfiles.com/archives/2229 "Oh Boy, The Cloud - October 5, 2009") of [Archive Team](!Wikipedia), of course, but nevertheless [James Fallows points out](http://www.theatlantic.com/technology/archive/2013/03/finale-for-now-on-googles-self-inflicted-trust-problem/274286/ "Finale for Now on Google's Self-Inflicted Trust Problem") that when a cloud service evaporates, it's simply gone and gives an interesting comparison:
> [Kevin Drum](!Wikipedia), [in _Mother Jones_](http://www.motherjones.com/kevin-drum/2013/03/problem-google-and-cloud "The Problem With Google -- and The Cloud"), on why the inability to rely on Google services is more disruptive than the familiar pre-cloud experience of having favorite programs get orphaned. My example is [Lotus Agenda](!Wikipedia): it has officially been dead for nearly 20 years, but *I can still use it* (if I want, in a DOS session under the VMware Fusion Windows emulator on my Macs. Talk about layered legacy systems!). When a cloud program goes away, as Google Reader has done, it's gone. There is no way you can keep using your own "legacy" copy, as you could with previous orphaned software.
[^voice-utility]: The sheer size & dominance of some Google services have lead to comparisons to natural monopolies, such as the _Economist_ column ["Google's Google problem"](http://www.economist.com/blogs/freeexchange/2013/03/utilities). I saw this comparison mocked, but it's worth noting that at least one Googler made the same comparison years before. From [Levy](!Wikipedia "Steven Levy")'s _[In the Plex](!Wikipedia)_ 2011, part 7, section 2:
> While some Googlers felt singled out unfairly for the attention, the more measured among them understood it as a natural consequence of Google's increasing power, especially in regard to distributing and storing massive amounts of information. "It's as if Google took over the water supply for the entire United States", says Mike Jones, who handled some of Google's policy issues. "It's only fair that society slaps us around a little bit to make sure we're doing the right thing."
[^Reader-threats-1]: Google Reader affords examples of this lack of transparency on a key issue - Google's willingness to support Reader (extremely relevant to users, and even more so to the third-party web services and applications which relied on Reader to function); from BuzzFeed's ["Google's Lost Social Network: How Google accidentally built a truly beloved social network, only to steamroll it with Google+. The sad, surprising story of Google Reader"](http://www.buzzfeed.com/robf4/googles-lost-social-network):
> The difficulty was that Reader users, while hyper-engaged with the product, never snowballed into the tens or hundreds of millions. Brian Shih became the product manager for Reader in the fall of 2008. "If Reader were its own startup, it's the kind of company that Google would have bought. Because we were at Google, when you stack it up against some of these products, it's tiny and isn't worth the investment", he said. At one point, Shih remembers, engineers were pulled off Reader to work on OpenSocial, a "half-baked" development platform that never amounted to much. "There was always a political fight internally on keeping people staffed on this little project", he recalled. Someone hung a sign in the Reader offices that said "DAYS SINCE LAST THREAT OF CANCELLATION." The number was almost always zero. At the same time, user growth - while small next to Gmail's hundreds of millions - more than doubled under Shih's tenure. But the "senior types", as Bilotta remembers, "would look at absolute user numbers. They wouldn't look at market saturation. So Reader was constantly on the chopping block."
>
> So when news spread internally of Reader's gelding, it was like Hemingway's line about going broke: "Two ways. Gradually, then suddenly." Shih found out in the spring that Reader's internal sharing functions - the asymmetrical following model, endemic commenting and liking, and its advanced privacy settings - would be superseded by the forthcoming Google+ model. Of course, he was forbidden from breathing a word to users.
[Marco Arment](http://www.marco.org/2013/07/03/lockdown "Lockdown") says "I've heard from multiple sources that it effectively had a staff of zero for years".
[^Reader-threats-2]: Shih further [writes on Quora](https://www.quora.com/Google-Reader-Shut-Down-March-2013/Why-is-Google-killing-Google-Reader):
> Let's be clear that this has nothing to do with revenue vs operating costs. Reader never made money directly (though you could maybe attribute some of Feedburner and AdSense for Feeds usage to it), and it wasn't the goal of the product. Reader has been fighting for approval/survival at Google since long before I was a PM for the product. I'm pretty sure Reader was threatened with de-staffing at least three times before it actually happened. It was often for some reason related to social:
>
> - 2008 - let's pull the team off to build OpenSocial
> - 2009 - let's pull the team off to build Buzz
> - 2010 - let's pull the team off to build Google+
>
> It turns out they decided to kill it anyway in 2010, even though most of the engineers opted against joining G+. Ironically, I think the reason Google always wanted to pull the Reader team off to build these other social products was that the Reader team actually understood social (and tried a lot of experiments over the years that informed the larger social features at the company) [See Reader's friends implementations v1, v2, and v3, comments, privacy controls, and sharing features. Actually wait, you can't see those anymore, since they were all ripped out.]. Reader's social features also evolved very organically in response to users, instead of being designed top-down like some of Google's other efforts [Rob Fishman's Buzzfeed article has good coverage of this: [Google's Lost Social Network](http://www.buzzfeed.com/robf4/googles-lost-social-network)]. I suspect that it survived for some time after being put into maintenance because they believed it could still be a useful source of content into G+. Reader users were always voracious consumers of content, and many of them filtered and shared a great deal of it. But after switching the sharing features over to G+ (the so called "share-pocalypse") along with the redesigned UI, my guess is that usage just started to fall - particularly around sharing. I know that my sharing basically stopped completely once the redesign happened [[Reader redesign: Terrible decision, or worst decision?](http://www.brianshih.com/post/30194495552/reader-redesign-terrible-decision-or-worst-decision) I was a lot angrier then than I am now -- now I'm just sad.]. Though Google did ultimately fix a lot of the UI issues, the sharing (and therefore content going into G+) would never recover. So with dwindling usefulness to G+, (likely) dwindling or flattening usage due to being in maintenance, and Google's big drive to focus in the last couple of years, what choice was there but to kill the product?
[^Reader-threats-3]: The sign story is confirmed by another Googler; ["Google Reader lived on borrowed time: creator Chris Wetherell reflects"](http://gigaom.com/2013/03/13/chris-wetherll-google-reader/):
> "When they replaced sharing with +1 on Google Reader, it was clear that this day was going to come", he said. Wetherell, 43, is amazed that Reader has lasted this long. Even before the project saw the light of the day, Google executives were unsure about the service and it was through sheer perseverance that it squeaked out into the market. At one point, the management team threatened to cancel the project even before it saw the light of the day, if there was a delay. "We had a sign that said, '*days since cancellation*' and it was there from the very beginning", added a very sanguine Wetherell. My translation: Google never really believed in the project. Google Reader started in 2005 at what was really the golden age of RSS, blogging systems and a new content ecosystem. The big kahuna at that time was [Bloglines](!Wikipedia) (acquired by [Ask.com](!Wikipedia)) and Google Reader was an upstart.
[^Reader-popularity]: The official PR release stated that too little usage was the reason Reader was being abandoned. Whether this is the genuine reason has been questioned by third parties, who observe that Reader seems to [drive far more traffic](http://www.buzzfeed.com/jwherrman/google-reader-still-sends-far-more-traffic-than-google) than another service which Google had yet to ax, Google+; that one app had [>2m users who also had Reader accounts](http://allthingsd.com/20130324/another-reason-google-reader-died-increased-concern-about-privacy-and-compliance/); that just one alternative to Reader (Feedly) had in excess of [3 million signups post-announcement](http://blog.feedly.com/2013/04/02/announcing-the-new-feedly-mobile-and-welcoming-3-million-reader-refugees/) (reportedly, up to [4 million](http://www.nytimes.com/2013/05/09/technology/personaltech/three-ways-feedly-outdoes-the-vanishing-google-reader.html?pagewanted=all "Google's Aggregator Gives Way to an Heir")); and the largest of several petitions to Google reached [148k](https://www.change.org/petitions/google-keep-google-reader-running) signatures (less, though, than the [>1m downloads](http://web.archive.org/web/20130122102151/http://www.appbrain.com/app/google-reader/com.google.android.apps.reader) of the Android client). Given that few users will sign up at Feedly specifically, sign a petition, visit the BuzzFeed network, or use the apps in question, it seems likely that Reader had closer to 20m users than 2m users when its closure was announced. An unknown Google engineer has been quoted as [saying in 2010](http://www.quora.com/How-many-users-does-Google-Reader-have/answer/Andreas-Pizsa) Reader had "tens of millions active monthly users". [Xoogler Jenna Bilotta](http://www.forbes.com/sites/alexkantrowitz/2013/07/01/google-reader-founder-i-never-would-have-founded-reader-inside-todays-google/) (left Google [November 2011](http://www.thepinkestblack.com/2011/11/all-good-things.html)) said
> "I think the reason why people are freaking out about Reader is because that Reader did stick," she said, noting the widespread surprise that Google would shut down such a beloved product. "The numbers, at least until I left, were still going up."
The most popular feed on Google Reader in March 2013 had [24.3m subscribers](http://googlesystem.blogspot.com/2013/03/google-reader-data-points.html) (some [pixel-counting](http://googlesystem.blogspot.com/2013/03/google-reader-data-points.html?google_comment_id=z12wjrqpao3qhjgl223zzv1h4s2hx5na504) of [an official user-count graph](http://googlereader.blogspot.com/2010/09/welcome-and-look-back.html) & inference from a [leaked video](http://blogoscoped.com/forum/108194.html) suggests Reader in total may've had >36m users in Jan 2011). Jason Scott in 2009 reminded us that this lack of transparency is completely predictable: "Since the dawn of time, companies have hired people whose entire job is to tell you everything is all right and you can completely trust them and the company is as stable as a rock, and to do so until they, themselves, are fired because the company is out of business."
[Andy Baio](https://twitter.com/waxpancake) (["Never Trust a Corporation to do a Library's Job"](https://medium.com/message/never-trust-a-corporation-to-do-a-librarys-job-f58db4673351)) summarizes Google's track record:
>> "Google's mission is to organize the world's information and make it universally accessible and useful."
>
> For years, Google's mission included the *preservation of the past*. In 2001, Google made their first acquisition, the Deja archives. The largest collection of Usenet archives, Google relaunched it as *Google Groups*, supplemented with archived messages going back to 1981. In 2004, *Google Books* signaled the company's intention to scan every known book, partnering with libraries and developing its own book scanner capable of digitizing 1,000 pages per hour. In 2006, *Google News Archive* launched, with historical news articles dating back 200 years. In 2008, they expanded it to include their own digitization efforts, scanning newspapers that were never online. In the last five years, starting around 2010, the shifting priorities of Google's management left these archival projects in limbo, or abandoned entirely. After a series of redesigns, Google Groups is effectively dead for research purposes. The archives, while still online, have no means of searching by date. Google News Archives are dead, killed off in 2011, now [directing searchers](https://support.google.com/news/answer/1638638) to just use Google. Google Books is still online, but [curtailed their scanning efforts](http://chronicle.com/article/Google-Begins-to-Scale-Back/131109/) in recent years, likely discouraged by a decade of legal wrangling [still in appeal](https://gigaom.com/2014/12/03/in-google-books-appeal-judges-focus-on-profit-and-security/). The [official blog](http://booksearch.blogspot.com/) stopped updating in 2012 and the [Twitter account's](https://twitter.com/googlebooks) been dormant since February 2013. Even Google Search, their flagship product, stopped focusing on the history of the web. In 2011, Google [removed](http://readwrite.com/2011/11/11/google_kills_its_own_timeline_feature) the Timeline view letting users filter search results by date, while a series of [major changes](http://readwrite.com/2011/11/03/armed_with_social_signals_google_moves_back_toward) to their search ranking algorithm increasingly favored freshness over older pages from established sources. (To the [detriment of some](https://medium.com/technology-musings/on-the-future-of-metafilter-941d15ec96f0).)...As it turns out, organizing the world's information isn't always profitable. Projects that preserve the past for the public good aren't really a big profit center. Old Google knew that, but didn't seem to care.
In the case of Reader, while Reader destroyed the original RSS reader market, there still exist some usable alternatives; the consequence is a shrinkage in the RSS audience as inevitably many users choose not to invest in a new reader or give up or interpret it as a deathblow to RSS, and an irreversible loss of Reader's uniquely comprehensive RSS archives back to 2005.
Although to be fair, I should mention 2 major points in favor of Google:
1. a reason I did and still do use Google services is that, with a few lapses like [Website Optimizer](/AB-testing#max-width) aside, they are almost unique in enabling users to back up their data via the work of the [Google Data Liberation Front](!Wikipedia) and have been far more proactive than many companies in encouraging users to back up data from dead services - for example, in automatically copying Buzz users' data to their Google Drive.
2. Google's practices of undercutting all market incumbents with free services *also* has very large benefits[^benefits], so we shouldn't focus just on [the seen](http://bastiat.org/en/twisatwins.html "'That Which is Seen, and That Which is Not Seen', Frederic Bastiat (1850)").
But nevertheless, every shutdown still hurts its users to some degree, even if we - currently[^survival] - can rule out the most devastating possible shutdowns, like Gmail. It would be interesting to see if shutdowns are to some degree predictable, whether there are any patterns, whether common claims about relevant factors can be confirmed, and what the results might suggest for the future.
[^benefits]: Specifically, this can be seen as a sort of issue of reducing [deadweight loss](!Wikipedia): in some of the more successful acquisitions, Google's modus operandi was to take a very expensive or highly premium service and make it completely free while also improving the quality. Analytics, Maps, Earth, Feedburner all come to mind as services whose predecessors (multiple, in the cases of Maps and Earth) charged money for their services (sometimes a great deal). This leads to deadweight loss as people do not use them, who would benefit to some degree but not to the full amount of the price (plus other factors like riskiness of investing time and money into trying it out). Google cites figures like billions of users over the years for several of these formerly-premium services, suggesting the gains from reduced deadweight loss are large.
[^survival]: If there is one truth of the tech industry, it's that no giant (except IBM) survives forever. Death rates for all corporations and nonprofits are [very high](/Girl-Scouts-and-good-governance#fn1), but particularly so for tech. [One blogger](http://shkspr.mobi/blog/2013/03/preparing-for-the-collapse-of-digital-civilization/ "Preparing for the Collapse of Digital Civilization") asks a good question:
> As we come to rely more and more on the Internet, it's becoming clear that there is a real threat posed by tying oneself to a 3rd party service. The Internet is famously designed to route around failures caused by a nuclear strike - but it cannot defend against a service being withdrawn or a company going bankrupt. It's tempting to say that multi-billion dollar companies like Apple and Google will never disappear - but a quick look at history shows Nokia, Enron, Amstrad, Sega, and many more which have fallen from great heights until they are mere shells and no longer offer the services which many people once relied on...I like to pose this question to my photography friends - "What would you do if Yahoo! suddenly decided to delete all your Flickr photos?" Some of them have backups - most faint at the thought of all their work vanishing.
# Data
## Sources
### Dead products
> The summer grasses -- \
> the sole remnants of many \
> brave warriors' dreams.
I begin with a list of services/APIs/programs that Google has shutdown or abandoned taken from the _Guardian_ article ["Google Keep? It'll probably be with us until March 2017 - on average: The closure of Google Reader has got early adopters and developers worried that Google services or APIs they adopt will just get shut off. An analysis of 39 shuttered offerings says how long they get"](http://www.guardian.co.uk/technology/2013/mar/22/google-keep-services-closed) by Charles Arthur. Arthur's list seemed relatively complete, but I've added in >300 items he missed based on the [Slate graveyard](http://www.slate.com/articles/technology/map_of_the_week/2013/03/google_reader_joins_graveyard_of_dead_google_products.html), Weber's ["Google Fails 36% Of The Time"](http://thenextweb.com/google/2011/10/17/google-fails/)[^Weber], the [Wikipedia category](!Wikipedia "Category:Google acquisitions")/[list](!Wikipedia "List of mergers and acquisitions by Google") for Google acquisitions, the [Wikipedia category](!Wikipedia "Category:Discontinued Google services")/[list](!Wikipedia "List of Google products#Discontinued products and services"), and finally the official [Google History](https://www.google.com/about/company/history/). (The additional shutdowns include many shutdowns predating 2010, suggesting that Arthur's list was biased towards recent shutdowns.)
[^Weber]: Weber's conclusion:
> We discovered there's been a total of about 251 independent Google products since 1998 (avoiding add-on features and experiments that merged into other projects), and found that 90, or approximately 36% of them have been canceled. Awesomely, we also collected 8 major flops and 14 major successes, which means that 36% of its high-profile products are failures. That's quite the coincidence! NOTE: We did not manipulate data to come to this conclusion. It was a happy accident.
In an even more happy accident, my dataset of 350 products yields 123 canceled/shutdown entries, or 35%!
In a few cases, the start dates are well-informed guesses (eg. [Google Translate](https://plus.google.com/u/0/103530621949492999968/posts/fqxuM2SBRQ5)) and dates of abandonment/shut-down are even harder to get due to the lack of attention paid to most (Joga Bonito) and so I infer the date from archived pages on the Internet Archive, news reports, blogs such as [Google Operating System](http://googlesystem.blogspot.com/), the dates of press releases, the shutdown of closely related services (eReader Play based on Reader), source code repositories (AngularJS) etc; some are listed as discontinued (Google Catalogs) but are still supported or were merged into other software (Spreadsheets, Docs, Writely, News Archive) or sold/given to third parties (Flu Shot Finder, App Inventor, Body) or active effort has ceased but the content remains and so I do not list those as dead; for cases of acquired software/services that were shutdown, I date the start from Google's purchase.
### Live products
> "...He often lying broad awake, and yet / Remaining from the body, and apart / In intellect and power and will, hath heard / Time flowing in the middle of the night, / And all things creeping to a day of doom."^[["The Mystic"](http://www.blackcatpoems.com/t/the_mystic.html), _Poems, Chiefly Lyrical_; [Lord Alfred Tennyson](!Wikipedia "Alfred, Lord Tennyson")]
A major criticism of Arthur's post was that it was fundamentally using the wrong data: if you have a dataset of all Google products which have been shutdown, you can make statements like "the average dead Google product lived 1459 days", but you can't infer very much about a live product's life expectancy - because you don't know if it will join the dead products. If, for example, only 1% of products ever died, then 1459 days would lead to a massive underestimates of the average lifespan of all currently living products. With his data, you can only make inferences conditional on a product eventually dying, you cannot make an unconditional inference. Unfortunately, the unconditional question "will it die?" is the real question any Google user wants answered!
So drawing on the same sources, I have compiled a second list of *living* products; the ratio of living to dead gives a base rate for how likely a randomly selected Google product is to be canceled within the 1997-2013 window, and with the date of the founding of each living product, we can also do a simple right-censored [survival analysis](!Wikipedia) which will let us make better still predictions by extracting concrete results like mean time to shutdown. Some items are dead in the most meaningful sense since they have been closed to new users (Sync), lost major functionality (FeedBurner, Meebo), degraded severely due to neglect (eg. [Google Alerts](/Google-Alerts)), or just been completely neglected for a decade or more (Google Group's Usenet archive) - but haven't actually died or closed yet, so I list them as alive.
## Variables
> To my good friend \
> Would I show, I thought, \
> The plum blossoms, \
> Now lost to sight \
> Amid the falling snow.^[[Yamabe no Akahito](!Wikipedia), [_Man'yōshū_](http://en.wikipedia.org/wiki/Man%27y%C5%8Dsh%C5%AB) [VIII: 1426](http://www.temcauley.staff.shef.ac.uk/waka0088.shtml)]
Simply collecting the data is useful since it allows us to make some estimates like overall death-rates or median lifespan. But maybe we can do better than just base rates and find characteristics which let us crack open the Google black box a tiny bit. So finally, for all products, I have collected several covariates which I thought might help predict longevity:
- `Hits`: the number of Google hits for a service
While number of Google hits is a very crude measure, at best, for underlying variables like "popularity" or "number of users" or "profitability", and clearly biased towards recently released products (there aren't going to be as many hits for, say, "Google Answers" as there would have been if we had searched for it in 2002), it may add some insight.
There do not seem to be any other free quality sources indicating either historical or contemporary traffic to a product URL/homepage which could be used in the analysis - services like Alexa or Google Ad Planner either are commercial, for domains only, or simply do not cover many of the URLs. (After I finished data collection, it was pointed out to me that while Google's Ad Planner may not be useful, Google's AdWords *does* yield a count of global searches for a particular query that month, which would have worked albeit it would only indicate current levels of interest and nothing about historical levels.)
- `Type`: a categorization into "service"/"program"/"thing"/"other"
1. A *service* is anything primarily accessed through a web browser or API or the Internet; so Gmail or a browser loading fonts from a Google server, but not a Gmail notification program one runs on one's computer or a FLOSS font available for download & distribution.
2. A *program* is anything which is an application, plugin, library, framework, or all of these combined; some are very small (Authenticator) and some are very large (Android). This does include programs which require Internet connections or Google APIs as well as programs for which the source code has not been released, so things in the program category are not immune to shutdown and may be useful only as long as Google supports them.
3. A *thing* is anything which is primarily a physical object. A cellphone running Android or a Chromebook would be an example.
In retrospect, I probably should have excluded this category entirely: there's no reason to expect cellphones to follow the same lifecycle as a service or program, it leads to even worse classification problems (when does an Android cellphone 'die'? should one even be looking at individual cellphones or laptops rather than entire product lines?), there tend to be many iterations of a product and they're all hard to research, etc.
4. *Other* is the catch-all category for things which don't quite seem to fit. Where does a Google think-tank, charity, conference, or venture capital fund fit in? They certainly aren't software, but they don't seem to be quite services either.
- `Profit`: whether Google *directly* makes money off a product
This is a tricky one. Google excuses many of its products by saying that anything which increases Internet usage benefits Google and so by this logic, every single one of its services could potentially increase profit; but this is a little stretched, the truth very hard to judge by an outsider, and one would expect that products without direct monetization are more likely to be killed.
Generally, I classify as for profit any Google product directly relating to producing/displaying advertising, paid subscriptions, fees, or purchases (AdWords, Gmail, Blogger, Search, shopping engines, surveys); but many do not seem to have any form of monetization related to them (Alerts, Office, Drive, Gears, Reader[^Reader-monetization]). Some services like Voice charge (for international calls) but the amounts are minor enough that one might wonder if classifying them as for profit is really right. While it might make sense to define every feature added to, say, Google Search (eg. Personalized Search, or Search History) as being 'for profit' since Search lucratively displays ads, I have chosen to classify these secondary features as being not for profit.
- `FLOSS`: whether the source code was released or Google otherwise made it possible for third parties to continue the service or maintain the application.
> In the long run, the utility of all non-Free software approaches zero. All non-Free software is a dead end.^[[Mark Pilgrim](!Wikipedia), ["Freedom 0"](http://web.archive.org/web/20110726001925/http://diveintomark.org/archives/2004/05/14/freedom-0); ironically, Pilgrim (hired by Google in 2007) seems to be responsible for at least one of the entries being marked dead, Google's Doctype tech encyclopedia, since it disappeared around the time of his "infosuicide" and has not been resurrected - it was only *partially* FLOSS.]
Android, AngularJS, and Chrome are all examples of software products where Google losing interest would not be fatal; services spun off to third parties would also count. Many of the codebases rely on a proprietary Google API or service (especially the mobile applications), which means that this variable is not as meaningful and laudable as one might expect, so in the minority of cases where this variable is relevant, I code `Dead` & `Ended` as related to whether & when Google abandoned it, regardless of whether it was then picked up by third parties or not. (Example: App Inventor for Android is listed as dying in December 2011, though it was then half a year later handed over to MIT, who has supported it since.) It's important to not naively believe that simply because source code is available, Google support doesn't matter.
- `Acquisition`: whether it was related to a purchase of a company or licensing, or internally developed.
This is useful for investigating the so-called ["Google black hole"](http://www.slate.com/articles/technology/technology/2008/08/the_google_black_hole.single.html "The Google Black Hole: Sergey and Larry just bought my company. Uh oh."): Google has bought many startups (DoubleClick, Dodgeball, Android, Picasa), or technologies/data licensed (SYSTRAN for Translate, Twitter data for Real-Time Search), but it's claimed many stagnate & wither (Jaiku, JotSpot, Dodgeball, [Zagat](http://www.businessinsider.com/google-zagat-story-2013-6 "MISERY AT GOOGLE: You'd Never Expect NSFW Graffiti Like This On Google's Bathroom Walls")). So we'll include this. If a closely related product is developed and released after purchase, like a mobile application, I do not class it as an acquisition; just products that were in existence when the company was purchased. I do not include products that Google dropped immediately on purchase (Apture, fflick, Sparrow, Reqwireless, PeakStream, Wavii) or where products based on them have not been released (BumpTop).
[^Reader-monetization]: Some have justified Reader's shutdown as simply a rational act, since Reader was not bringing in any money and Google is not a charity. The truth seems to be related more to Google's lack of interest since the start - it's hard to see how Google could possibly be able to monetize Gmail and not also monetize Reader, which is confirmed by two involved Googlers (from "Google Reader lived on borrowed time: creator Chris Wetherell reflects"):
> I wonder, did the company (Google) and the ecosystem at large misread the tea leaves? Did the world at large see an RSS/reader market when in reality the actual market opportunity was in data and sentiment analysis? [Chris] Wetherell agreed. "The reader market never went past the experimental phase and none was iterating on the business model," he said. "Monetization abilities were never tried."
>
> "There was so much data we had and so much information about the affinity readers had with certain content that we always felt there was monetization opportunity," he said. Dick Costolo (currently CEO of Twitter), who worked for Google at the time (having sold Google his company, Feedburner), came up with many monetization ideas but they fell on deaf ears. Costolo, of course is working hard to mine those affinity-and-context connections for Twitter, and is succeeding. What Costolo understood, Google and its mandarins totally missed, as noted in this [November 2011 blog post](http://massless.org/?p=174 "Dreams, discernment, and Google Reader") by Chris who wrote:
>
>> ***Reader exhibits the best unpaid representation I've yet seen of a consumer's relationship to a content producer***. You pay for HBO? That's a strong signal. Consuming free stuff? Reader's model was a dream. Even better than Netflix. You get affinity (which has clear monetary value) for free, and a tracked pattern of behavior for the act of iterating over differently sourced items - and a mechanism for distributing that quickly to an ostensible audience which didn't include social guilt or gameification - along with an extensible, scalable platform available via commonly used web technologies - all of which would be an amazing opportunity for the right product visionary. ***Reader is (was?) for information junkies; not just tech nerds***. This market totally exists and is weirdly under-served (and is possibly affluent).
Overall, from just the PR perspective, Google probably would have been better off switching Reader to a subscription model and then eventually killing it while claiming the fees weren't covering the costs. Offhand, 3 examples of Google adding or increasing fees come to mind: the Maps API, Talk international calls (apparently free initially), and App Engine fees; the API price increase was eventually rescinded as far as I know, and no one remembers the latter two (not even App Engine devs).
### Hits
Ideally we would have Google hits from the day before a product was officially killed, but the past is, alas, no longer accessible to us, and we only have hits from searches I conducted 1-5 April 2013. There are three main problems with the Google hits metric:
1. the Web keeps growing, so 1 million hits in 2000 are not equivalent to 1 million hits in 2013
2. services which are not killed live longer and can rack up more hits
3. and the longer ago a product's hits came into existence, the more likely the relevant hits may be to have disappeared themselves.
We can partially compensate by looking at hits averaged by lifespan; 100k hits means much less for something that lived for a decade than 100k hits means for something that lived just 6 months. What about the growth objection? We can estimate the size of Google's index at any period and interpret the current hits as a fraction of the index when the service died (example: suppose Answers has 1 million hits, died in 2006, and in 2006 the index held 1 billion URLs, then we'd turn our 1m hit figure into 1/1000 or 0.001); this gives us our "deflated hits". We'll deflate the hits by first estimating the size of the index by fitting an exponential to the rare public reports and third-party estimates of the size of the Google index. The data points with the best linear fit:
![Estimating Google WWW index size over time](/images/google/www-index-model.png)
It fits reasonably well. (A sigmoid might fit better, but maybe not, given the large disagreements towards the end.) With this we can then average over days as well, giving us 4 indices to use. We'll look closer at the hit variables later.
## Processing
If a product has not ended, the end-date is defined as 01 April 2013 (which is when I stopped compiling products); then the total lifetime is simply the end-date minus the start-date. The final CSV is available at [`2013-google.csv`](/docs/statistics/2013-google.csv). (I welcome corrections from Googlers or Xooglers about any variables like launch or shutdown dates or products directly raising revenue.)
# Analysis
> I spur my horse past ruins \
> Ruins move a traveler's heart \
> the old parapets high and low \
> the ancient graves great and small \
> the shuddering shadow of a tumbleweed \
> the steady sound of giant trees. \
> But what I lament are the common bones \
> unnamed in the records of Immortals.^[[Han-Shan](!Wikipedia "Hanshan (poet)"); #18 in [_The Collected Songs of Cold Mountain_](http://www.amazon.com/Collected-Mountain-Mandarin-Chinese-English/dp/1556591403/), Red Pine 2000, ISBN 1-55659-140-3]
## Descriptive
Loading up our hard-won data and looking at an R summary (for full source code reproducing all graphs and analyses below, see the [appendix](#source-code); I welcome statistical corrections or elaborations if accompanied by equally reproducible R source code), we can see we have a lot of data to look at:
~~~{.R}
Dead Started Ended Hits Type
Mode :logical Min. :1997-09-15 Min. :2005-03-16 Min. :2.04e+03 other : 14
FALSE:227 1st Qu.:2006-06-09 1st Qu.:2012-04-27 1st Qu.:1.55e+05 program: 92
TRUE :123 Median :2008-10-18 Median :2013-04-01 Median :6.50e+05 service:234
Mean :2008-05-27 Mean :2012-07-16 Mean :5.23e+07 thing : 10
3rd Qu.:2010-05-28 3rd Qu.:2013-04-01 3rd Qu.:4.16e+06
Max. :2013-03-20 Max. :2013-11-01 Max. :3.86e+09
Profit FLOSS Acquisition Social Days AvgHits
Mode :logical Mode :logical Mode :logical Mode :logical Min. : 1 Min. : 1
FALSE:227 FALSE:300 FALSE:287 FALSE:305 1st Qu.: 746 1st Qu.: 104
TRUE :123 TRUE :50 TRUE :63 TRUE :45 Median :1340 Median : 466
Mean :1511 Mean : 29870
3rd Qu.:2112 3rd Qu.: 2980
Max. :5677 Max. :3611940
DeflatedHits AvgDeflatedHits EarlyGoogle RelativeRisk LinearPredictor
Min. :0.0000 Min. :-36.57 Mode :logical Min. : 0.021 Min. :-3.848
1st Qu.:0.0000 1st Qu.: -0.84 FALSE:317 1st Qu.: 0.597 1st Qu.:-0.517
Median :0.0000 Median : -0.54 TRUE :33 Median : 1.262 Median : 0.233
Mean :0.0073 Mean : -0.95 Mean : 1.578 Mean : 0.000
3rd Qu.:0.0001 3rd Qu.: -0.37 3rd Qu.: 2.100 3rd Qu.: 0.742
Max. :0.7669 Max. : 0.00 Max. :12.556 Max. : 2.530
ExpectedEvents FiveYearSurvival
Min. :0.0008 Min. :0.0002
1st Qu.:0.1280 1st Qu.:0.1699
Median :0.2408 Median :0.3417
Mean :0.3518 Mean :0.3952
3rd Qu.:0.4580 3rd Qu.:0.5839
Max. :2.0456 Max. :1.3443
~~~
### Shutdowns over time
> `Google Reader`: "Who is it in the blogs that calls on me? / I hear a tongue shriller than all the YouTubes / Cry 'Reader!' Speak, Reader is turn'd to hear."
>
> `Dataset`: "Beware the ides of [March](http://googleblog.blogspot.com/2013/03/a-second-spring-of-cleaning.html "'A second spring of cleaning', 13 March 2013")."^[[_Google Reader_](!Wikipedia "Julius Caesar (play)") Act 1, scene 2, 15-19; with apologies.]
An interesting aspect of the shutdowns is they are unevenly distributed by month as we can see with a chi-squared test (_p_=0.014) and graphically, with a major spike in September and then March/April[^by-months-vacation]:
![Shutdowns binned by month of year, revealing peaks in September, March, and April](/images/google/shutdownsbymonth.png)
[^by-months-vacation]: Xoogler [Rachel Kroll](http://rachelbythebay.com/) [on this spike](https://news.ycombinator.com/item?id=5653934):
> I have some thoughts about the spikes on the death dates.
>
> September: all of the interns go back to school. These people who exist on the fringes of the system manage to get a lot of work done, possibly because they are free of most of the overhead facing real employees. Once they leave, it's up to the FTEs [Full Time Employee] to own whatever was created, and that doesn't always work. I wish I could have kept some of them and swapped them for some of the full-timers.
>
> March/April: Annual bonus time? That's what it used to be, at least, and I say this as someone who quit in May, and that was no accident. Same thing: people leave, and that dooms whatever they left.
As befits a company which has grown enormously since 1997, we can see other imbalances over time: eg. Google launched very few products from 1997-2004, and many more from 2005 and on:
![Starts binned by year](/images/google/startsbyyear.png)
We can plot lifetime against shut-down to get a clearer picture:
![All products scatter-plotted date of opening vs lifespan](/images/google/openedvslifespan.png)
That clumpiness around 2009 is suspicious. To emphasize this bulge of shutdowns in late 2011-2012, we can plot the histogram of dead products by year and also a kernel density:
![Shutdown density binned by year](/images/google/shutdownsbyyear.png)
![Equivalent kernel density (default bandwidth)](/images/google/shutdownsbyyear-kernel.png)
The kernel density brings out an aspect of shutdowns we might have missed before: there seems to be an absence of recent shut downs. There are 4 shut downs scheduled for 2013 but the last one is scheduled for November, suggesting that we have seen the last of the 2013 casualties and that any future shut downs may be for 2014.
What explains such graphs over time?
One candidate is the 4 April 2011 accession of Larry Page to CEO, replacing Eric Schmidt who had been hired to provide "adult supervision" for pre-IPO Google.
He respected Steve Jobs greatly (he and Brin suggested, before meeting Schmidt, that their CEO be Jobs).
[Isaacon's _Steve Jobs_](http://www.amazon.com/Steve-Jobs-Walter-Isaacson/dp/1442369051) records that before his death, Jobs had strongly advised Page to "focus", and asked "What are the five products you want to focus on?", saying "Get rid of the rest, because they're dragging you down."
And on [14 July 2011](https://plus.google.com/+LarryPage/posts/dRtqKJCbpZ7) Page posted:
> ...Greater focus has also been another big feature for me this quarter -- more wood behind fewer arrows. Last month, for example, we announced that we will be closing Google Health and Google PowerMeter. We've also done substantial internal work simplifying and streamlining our product lines. While much of that work has not yet become visible externally, I am very happy with our progress here. Focus and prioritization are crucial given our amazing opportunities.
While some have [tried to disagree](http://thenextweb.com/google/2013/01/12/larry-page-did-well-to-ignore-steve-jobs/ "Larry Page ignored Steve Jobs's deathbed advice, and Google is doing great"), it's hard not to conclude that indeed, a wall of shutdowns followed in late 2011 and 2012.
But this sound very much like a one-time purge: if one has a new focus on focus, then one may not be starting up as many services as before and the services which one does start up should be more likely to survive.
## Modeling
### Logistic regression
A first step in predicting when a product will be shutdown is predicting whether it will be shutdown. Since we're predicting a binary outcome (a product living or dying), we can use the usual: an ordinary [logistic regression](!Wikipedia). Our first look uses the main variables plus the total hits:
~~~{.R}
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.3968 1.0680 2.24 0.025
Typeprogram 0.9248 0.8181 1.13 0.258
Typeservice 1.2261 0.7894 1.55 0.120
Typething 0.8805 1.1617 0.76 0.448
ProfitTRUE -0.3857 0.2952 -1.31 0.191
FLOSSTRUE -0.1777 0.3791 -0.47 0.639
AcquisitionTRUE 0.4955 0.3434 1.44 0.149
SocialTRUE 0.7866 0.3888 2.02 0.043
log(Hits) -0.3089 0.0567 -5.45 5.1e-08
~~~
In [log odds](!Wikipedia), >0 increases the chance of an event (shutdown) and <0 decreases it. So looking at the coefficients, we can venture some interpretations:
- Google has a past history of screwing up social and then killing them
This is interesting for confirming the general belief that Google has handled badly its social properties in the past, but I'm not sure how useful this is for predicting the future: since Larry Page became obsessed with social in 2009, a we might expect anything to do with "social" would be either merged into Google+ or otherwise be kept on life support longer than it would before
- Google is deprecating software products in favor of web services
A lot of Google's efforts with Firefox and then Chromium was for improving web browsers as a platform for delivering applications. As efforts like HTML5 mature, there is less incentive for Google to release and support standalone software.
- But apparently not its FLOSS software
This seems due to a number of its software releases being picked up by third-parties (Wave, Etherpad, Refine), designed to be integrated into existing communities (Summer of Code projects), or apparently serving a *strategic* role (Android, Chromium, Dart, Go, Closure Tools, VP Codecs) in which we could summarize as 'building up a browser replacement for operating systems'. (Why? ["Commoditize your complements."](http://www.joelonsoftware.com/articles/StrategyLetterV.html))
- things which charge or show advertising are more likely to survive
We expect this, but it's good to have confirmation (if nothing else, it partially validates the data).
- Popularity as measured by Google hits seems to matter
...Or does it? This variable seems particularly treacherous and susceptible to reverse-causation issues (does lack of hits diagnose failure, or does failing cause lack of hits when I later searched?)
#### Use of hits data
Is our popularity metric - or any of the 4 - trustworthy? All this data has been collected after the fact, sometimes many years; what if the data have been contaminated by the fact that something shutdown? For example, by a burst of publicity about an obscure service shutting down? (Ironically, this page is contributing to the inflation of hits for any dead service mentioned.) Are we just seeing [information "leakage"](http://www.cs.umb.edu/~ding/history/470_670_fall_2011/papers/cs670_Tran_PreferredPaper_LeakingInDataMining.pdf "'Leakage in Data Mining: Formulation, Detection, and Avoidance', Kaufman et al 2011")? Leakage can be subtle, as I [learned for myself](#leakage) doing this analysis.
Investigating further, hits by themselves do matter:
~~~{.R}
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.4052 0.7302 4.66 3.1e-06
log(Hits) -0.3000 0.0549 -5.46 4.7e-08
~~~
Average hits (hits over the product's lifetime) turns out to be even more important:
~~~{.R}
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.297 1.586 -1.45 0.147
log(Hits) 0.511 0.209 2.44 0.015
log(AvgHits) -0.852 0.217 -3.93 8.3e-05
~~~
This is more than a little strange; the higher the average hits, the less likely to be killed makes perfect sense but then, surely the higher the hits, the less likely as well? But no. The mystery deepens as we bring in the third hit metric we developed:
~~~{.R}
Estimate Std. Error z value Pr(>|z|)
(Intercept) -21.589 11.955 -1.81 0.0709
log(Hits) 2.054 0.980 2.10 0.0362
log(AvgHits) -1.921 0.708 -2.71 0.0067
log(DeflatedHits) -0.456 0.277 -1.64 0.1001
~~~
And sure enough, if we run all 4 hit variables, 3 of them turn out to be statistically-significant and large:
~~~{.R}
Estimate Std. Error z value Pr(>|z|)
(Intercept) -24.6898 12.4696 -1.98 0.0477
log(Hits) 2.2908 1.0203 2.25 0.0248
log(AvgHits) -2.0943 0.7405 -2.83 0.0047
log(DeflatedHits) -0.5383 0.2914 -1.85 0.0647
AvgDeflatedHits -0.0651 0.0605 -1.08 0.2819
~~~
It's not that the hit variables are somehow summarizing or proxying for the others, because if we toss in all the non-hits predictors and penalize parameters based on adding complexity without increasing fit, we still wind up with the 3 hit variables:
~~~{.R}
Estimate Std. Error z value Pr(>|z|)
(Intercept) -23.341 12.034 -1.94 0.0524
AcquisitionTRUE 0.631 0.350 1.80 0.0712
SocialTRUE 0.907 0.394 2.30 0.0213
log(Hits) 2.204 0.985 2.24 0.0252
log(AvgHits) -2.068 0.713 -2.90 0.0037
log(DeflatedHits) -0.492 0.280 -1.75 0.0793
...
AIC: 396.9
~~~
Most of the predictors were removed as not helping a lot, 3 of the 4 hit variables survived (but not the both averaged & deflated hits, suggesting it wasn't adding much in combination), and we see two of the better predictors from earlier survived: whether something was an acquisition and whether it was social.
The original hits variable has the wrong sign, as expected of data leakage; now the average and deflated hits have the predicted sign (the higher the hit count, the lower the risk of death), but this doesn't put to rest my concerns: the average hits has the right sign, yes, but now the effect size seems way too high - we reject the hits with a log-odds of +2.1 as contaminated and a correlation almost 4 times larger than one of the known-good correlations (being an acquisition), but the average hits is -2 & almost as big a log odds! The only variable which seems trustworthy is the deflated hits: it has the right sign and is a more plausible 5x smaller. I'll use just the deflated hits variable (although I will keep in mind that I'm still not sure it is free from data leakage).
### Survival curve
The logistic regression helped winnow down the variables but is limited to the binary outcome of shutdown or not; it can't use the potentially very important variable of how many days a product has survived for the simple reason that *of course* mortality will increase with time! ("But this long run is a misleading guide to current affairs. In the long run we are all dead.")
For looking at survival over time, [survival analysis](!Wikipedia) might be a useful elaboration. Not being previously familiar with the area, I drew on Wikipedia, [Fox & Weisberg's appendix](http://socserv.mcmaster.ca/jfox/Books/Companion/appendix/Appendix-Cox-Regression.pdf), [Bewick et al 2004](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1065034/ "Statistics review 12: Survival analysis"), [Zhou's tutorial](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.551.3600&rep=rep1&type=pdf "Use Software R to do Survival Analysis and Simulation. A tutorial"), and Hosmer & Lemeshow's [_Applied Survival Analysis_](http://www.amazon.com/Applied-Survival-Analysis-Regression-Probability/dp/0471754994/) for the following results using the `survival` library (see also [CRAN Task View: Survival Analysis](http://cran.r-project.org/web/views/Survival.html), and the taxonomy of survival analysis methods in [Wang et al 2017](https://arxiv.org/abs/1708.04649 "Machine Learning for Survival Analysis: A Survey")). Any errors are mine.
The initial characterization gives us an optimistic median of 2824 days (note that this is higher than Arthur's mean of 1459 days because it addressed the conditionality issue discussed earlier by including products which were never canceled, and I made a stronger effort to collect pre-2009 products), but the lower bound is not tight and too little of the sample has died to get an upper bound:
~~~{.R}
records n.max n.start events median 0.95LCL 0.95UCL
350 350 350 123 2824 2095 NA
~~~
Our overall [Kaplan-Meier](!Wikipedia "Kaplan-Meier estimator") [survivorship curve](!Wikipedia) looks a bit interesting:
![Shutdown cumulative probability as a function of time](/images/google/overall-survivorship-curve.png)
If there were constant mortality of products at each day after their launch, we would expect a "type II" curve where it looks like a straight line, and if the hazard [increased with age like with humans](http://lesswrong.com/lw/5qm/living_forever_is_hard_or_the_gompertz_curve/ "Living Forever is Hard, or, The Gompertz Curve") we would see a "type I" graph in which the curve nose-dives; but in fact it looks like there's a sort of "leveling off" of deaths, suggesting a "type III" curve; per Wikipedia:
> ...the greatest mortality is experienced early on in life, with relatively low rates of death for those surviving this bottleneck. This type of curve is characteristic of species that produce a large number of offspring (see [r/K selection theory](!Wikipedia)).
Very nifty: the survivorship curve is consistent with tech industry or startup philosophies of doing lots of things, iterating fast, and throwing things at the wall to see what sticks. (More pleasingly, it suggests that my dataset is not biased against the inclusion of short-lived products: if I had been failing to find a lot of short-lived products, then we would expect to see the true survivorship curve distorted into something of a type II or type I curve and not a type III curve where a lot of products are early deaths; so if there were a data collection bias against short-lived products, then the true survivorship curve must be even more extremely type III.)
However, it looks like the mortality only starts decreasing around 2000 days, so any product that far out must have been founded around or before 2005, which is when we previously noted that Google started pumping out a lot of products and may also have changed its shutdown-related behaviors; this could violate a basic assumption of Kaplan-Meier, that the underlying survival function isn't itself changing over time.
Our next step is to fit a Cox [proportional hazards model](!Wikipedia) to our covariates:
~~~{.R}
...
n= 350, number of events= 123
coef exp(coef) se(coef) z Pr(>|z|)
AcquisitionTRUE 0.130 1.139 0.257 0.51 0.613
FLOSSTRUE 0.141 1.151 0.293 0.48 0.630
ProfitTRUE -0.180 0.836 0.231 -0.78 0.438
SocialTRUE 0.664 1.943 0.262 2.53 0.011
Typeprogram 0.957 2.603 0.747 1.28 0.200
Typeservice 1.291 3.638 0.725 1.78 0.075
Typething 1.682 5.378 1.023 1.64 0.100
log(DeflatedHits) -0.288 0.749 0.036 -8.01 1.2e-15
exp(coef) exp(-coef) lower .95 upper .95
AcquisitionTRUE 1.139 0.878 0.688 1.884
FLOSSTRUE 1.151 0.868 0.648 2.045
ProfitTRUE 0.836 1.197 0.531 1.315
SocialTRUE 1.943 0.515 1.163 3.247
Typeprogram 2.603 0.384 0.602 11.247
Typeservice 3.637 0.275 0.878 15.064
Typething 5.377 0.186 0.724 39.955
log(DeflatedHits) 0.749 1.334 0.698 0.804
Concordance= 0.726 (se = 0.028 )
Rsquare= 0.227 (max possible= 0.974 )
Likelihood ratio test= 90.1 on 8 df, p=4.44e-16
Wald test = 79.5 on 8 df, p=6.22e-14
Score (logrank) test = 83.5 on 8 df, p=9.77e-15
~~~
And then we can also test whether any of the covariates are suspicious; in general they seem to be fine:
~~~{.R}
rho chisq p
AcquisitionTRUE -0.0252 0.0805 0.777
FLOSSTRUE 0.0168 0.0370 0.848
ProfitTRUE -0.0694 0.6290 0.428
SocialTRUE 0.0279 0.0882 0.767
Typeprogram 0.0857 0.9429 0.332
Typeservice 0.0936 1.1433 0.285
Typething 0.0613 0.4697 0.493
log(DeflatedHits) -0.0450 0.2610 0.609
GLOBAL NA 2.5358 0.960
~~~
My suspicion lingers, though, so I threw in another covariate (`EarlyGoogle`): whether a product was released before or after 2005. Does this add predictive value above and over simply knowing that a product is really old, and does the regression still pass the proportional assumption check? Apparently yes to both:
~~~{.R}
coef exp(coef) se(coef) z Pr(>|z|)
AcquisitionTRUE 0.1674 1.1823 0.2553 0.66 0.512
FLOSSTRUE 0.1034 1.1090 0.2922 0.35 0.723
ProfitTRUE -0.1949 0.8230 0.2318 -0.84 0.401
SocialTRUE 0.6541 1.9233 0.2601 2.51 0.012
Typeprogram 0.8195 2.2694 0.7472 1.10 0.273
Typeservice 1.1619 3.1960 0.7262 1.60 0.110
Typething 1.6200 5.0529 1.0234 1.58 0.113
log(DeflatedHits) -0.2645 0.7676 0.0375 -7.06 1.7e-12
EarlyGoogleTRUE -1.0061 0.3656 0.5279 -1.91 0.057
...
Concordance= 0.728 (se = 0.028 )
Rsquare= 0.237 (max possible= 0.974 )
Likelihood ratio test= 94.7 on 9 df, p=2.22e-16
Wald test = 76.7 on 9 df, p=7.2e-13
Score (logrank) test = 83.8 on 9 df, p=2.85e-14
~~~
~~~{.R}
rho chisq p
...
EarlyGoogleTRUE -0.05167 0.51424 0.473
GLOBAL NA 2.52587 0.980
~~~
As predicted, the pre-2005 variable does indeed correlate to less chance of being shutdown, is the third-largest predictor, and almost reaches a random^[0.057; but as the old [criticism of NHST](http://lesswrong.com/lw/g13/against_nhst/) goes, "surely God loves the 0.057 almost as much as the 0.050".] level of statistical-significance - but it doesn't trigger the assumption tester, so we'll keep using the Cox model.
Now let's interpret the model. The covariates tell us that to reduce the risk of shutdown, you want to:
1. Not be an acquisition
2. Not be FLOSS
3. Be directly making money
4. Not be related to social networking
5. Have lots of Google hits relative to lifetime
6. Have been launched early in Google's lifetime
This all makes sense to me. I find particularly interesting the profit and social effects, but the odds are a little hard to understand intuitively; if being social increases the odds of shutdown by 1.9233 and not being directly profitable increases the odds by 1.215, what do those *look* like? We can graph pairs of survivorship curves, splitting the full dataset (omitting the confidence intervals for legibility, although they do overlap), to get a grasp of what these numbers mean:
![All products over time, split by `Profit` variable](/images/google/profit-survivorship-curve.png)
![All products over time, split by `Social` variable](/images/google/social-survivorship-curve.png)
### Random forests
Because I can, I was curious how [random forests](!Wikipedia) ([Breiman 2001](/docs/ai/2001-breiman.pdf "Random Forests")) might stack up to the logistic regression and against a base-rate predictor (that nothing was shut down, since ~65% of the products are still alive).
With [`randomForest`](http://cran.r-project.org/web/packages/randomForest/index.html), I trained a random forest as a classifier, yielding reasonable looking error rates:
~~~{.R}
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 2
OOB estimate of error rate: 31.71%
Confusion matrix:
FALSE TRUE class.error
FALSE 216 11 0.04846
TRUE 100 23 0.81301
~~~
To compare the random forest accuracy with the logistic model's accuracy, I interpreted the logistic estimate of shutdown odds >1 as predicting shutdown and <1 as predicting not shutdown; I then compared the full sets of predictions with the actual shutdown status. (This is not a [proper scoring rule](!Wikipedia) like those I employed in grading forecasts of the [2012 American elections](/2012-election-predictions), but this should be an intuitively understandable way of grading models' predictions.)
The base-rate predictor got 65% right by definition, the logistic managed to score 68% correct ([bootstrap](!Wikipedia "Bootstrapping (statistics)")^[Specifically: building a logistic model on a bootstrap sample and then testing accuracy against full Google dataset.] 95% CI: 66-72%), and the random forest similarly got 68% (67-78%). These rates are not quite as bad as they may seem: I excluded the lifetime length (`Days`) from the logistic and random forests because unless one is handling it specially with survival analysis, [it leaks information](#leakage); so there's predictive power being left on the table. A fairer comparison would use lifetimes.
#### Random survival forests
The next step is to take into account lifetime length & estimated survival curves. We can do that using ["Random survival forests"](http://arxiv.org/pdf/0811.1645 "Ishwaran et al 2008") (see also ["Mogensen et al 2012"](http://www.jstatsoft.org/v50/i11/paper "Evaluating Random Forests for Survival Analysis: Using Prediction Error Curves")), implemented in [`randomForestSRC`](http://cran.r-project.org/web/packages/randomForestSRC/index.html) (successor to Ishwaran's original library [`randomSurvivalForest`](http://cran.r-project.org/web/packages/randomSurvivalForest/index.html)). This initially seems very promising:
~~~{.R}
Sample size: 350
Number of deaths: 122
Number of trees: 1000
Minimum terminal node size: 3
Average no. of terminal nodes: 61.05
No. of variables tried at each split: 3
Total no. of variables: 7
Analysis: Random Forests [S]RC
Family: surv
Splitting rule: logrank *random*
Number of random split points: 1
Estimate of error rate: 35.37%
~~~
and even gives us a cute plot of how accuracy varies with how big the forest is (looks like we don't need to tweak it) and how important each variable is as a predictor:
![Visual comparison of the average usefulness of each variable to decision trees](/images/google/rsf-importance.png)
Estimating the error rate for this random survival forest like we did previously, we're happy to see a 78% error rate. Building a predictor based on the Cox model, we get a lesser (but still better than the non-survival models) 72% error rate.
How do these models perform when we check their robustness via the bootstrap? Not so great. The random survival forest collapses to 57-64% (95% on 200 replicates), but the Cox model just to 68-73%. This suggests to me that something is going wrong with the random survival forest model (overfitting? programming error?) and there's no real reason to switch to the more complex random forests, so here too we'll stick with the ordinary Cox model.
## Predictions
Before making explicit predictions of the future, let's look at the [relative risks](!Wikipedia) for products which haven't been shutdown. What does the Cox model consider the 10 most at risk and likely to be shutdown products?
It lists (in decreasingly risky order):
1. Schemer
2. Boutiques
3. Magnifier
4. Hotpot
5. Page Speed Online API
6. WhatsonWhen
7. Unofficial Guides
8. WDYL search engine
9. Cloud Messaging
10. Correlate
These all seem like reasonable products to signal out (as much as I love Correlate for making it [easier than ever to demonstrate "correlation≠causation"](http://slatestarcodex.com/2013/02/16/google-correlate-does-not-imply-google-causation/), I'm surprised it or Boutiques still exist), except for Cloud Messaging which seems to be a key part of a lot of Android. And likewise, the list of the 10 *least* risky (increasingly risky order):
1. Search
2. Translate
3. AdWords
4. Picasa
5. Groups
6. Image Search
7. News
8. Books
9. Toolbar
10. AdSense
One can't imagine flagship products like Search or Books ever being shut down, so this list is good as far as it goes; I am skeptical about the actual unriskiness of Picasa and Toolbar given their general neglect and old-fashionedness, though I understand why the model favors them (both are pre-2005, proprietary, many hits, and advertising-supported). But let's get more specific; looking at still alive services, what predictions do we make about the odds of a selected batch surviving the next, say, 5 years? We can derive a survival curve for each member of the batch adjusted for each subject's covariates (and they visibly differ from each other):
![Estimated curves for 15 interesting products (AdSense, Scholar, Voice, etc)](/images/google/15-predicted-survivorship-curves.png)
But these are the curves for hypothetical populations all like the specific product in question, starting from Day 0. Can we extract specific estimates assuming the product has survived to today (as by definition these live services have done)? Yes, but extracting them turns out to be a pretty gruesome hack to extract predictions from survival curves; anyway, I derive the following 5-year estimates and as commentary, register my own best guesses as well (I'm [not too bad](/Prediction-markets) at making predictions):
Product 5-year survival Personal guess Relative risk vs average (lower=better) Survived (March 2018)
----------- ----------------- -------------- --------------------------------------- --------------------
AdSense 100% [99%][] 0.07 Yes
Blogger 100% [80%][] 0.32 Yes
Gmail 96% [99%][GMAIL] 0.08 Yes
Search 96% [100%][] 0.05 Yes
Translate 92% [95%][T] 0.78 Yes
Scholar 92% [85%][S] 0.10 Yes
Alerts 89% [70%][] 0.21 Yes
Google+ 79% [85%][] 0.36 Yes^[But note that "sunsetting" of "consumer Google+" was announced in October 2018.]
Analytics 76% [97%][] 0.24 Yes
Chrome 70% [95%][C] 0.24 Yes
Calendar 66% [95%][] 0.36 Yes
Docs 63% [95%][DOCS] 0.39 Yes
Voice[^v] 44% [50%][] 0.78 Yes
FeedBurner 43% [35%][] 0.66 Yes
Project Glass 37% [50%][Glass] 0.10 No
[99%]: http://predictionbook.com/predictions/17897
[70%]: http://predictionbook.com/predictions/17898
[97%]: http://predictionbook.com/predictions/17899
[80%]: http://predictionbook.com/predictions/17900
[95%]: http://predictionbook.com/predictions/17901
[DOCS]: http://predictionbook.com/predictions/17902
[35%]: http://predictionbook.com/predictions/17903
[GMAIL]: http://predictionbook.com/predictions/17904
[85%]: http://predictionbook.com/predictions/17905
[S]: http://predictionbook.com/predictions/17906
[50%]: http://predictionbook.com/predictions/17907
[Glass]: http://predictionbook.com/predictions/17911
[100%]: http://predictionbook.com/predictions/17912
[T]: http://predictionbook.com/predictions/17913
[C]: https://predictionbook.com/predictions/17916
[^v]: I include Voice even though I don't use it or otherwise find it interesting (my criteria for the other 10) because [speculation](http://www.wired.com/gadgetlab/2013/04/google-voice-future-uncertain/ "Will Google Hang Up on Voice?") has been rife and because a prediction on its future [was requested](http://lesswrong.com/r/discussion/lw/h3w/open_thread_april_115_2013/8p2q).
One immediately spots that some of the model's estimates seem questionable in the light of our greater knowledge of Google.
I am more pessimistic about the [much-neglected Alerts](/Google-Alerts). And I think it's absurd to give any serious credence Analytics or Calendar or Docs being at risk (Analytics is a key part of the advertising infrastructure, and Calendar a sine qua non of any business software suite - much less the core of said suite, Docs!). The Glass estimate is also interesting: I don't know if I agree with the model, given how famous Glass is and how much Google is pushing it - could its future really be so chancy? On the other hand, many tech fads have come and go without a trace, hardware is always tricky, the more intimate a gadget the more design matters (Glass seems like the sort of thing Apple could make a blockbuster, but can Google?), Glass has already received a [hefty helping of criticism](http://www.businessinsider.com/nobody-really-likes-google-glass-2013-5), particularly the man most experienced with such HUDs ([Steve Mann](!Wikipedia)) [has criticized Glass](http://spectrum.ieee.org/geek-life/profiles/steve-mann-my-augmediated-life "Steve Mann: My 'Augmediated' Life: What I've learned from 35 years of wearing computerized eyewear") as being "much less ambitious" than the state of the art and worries that "Google and certain other companies are neglecting some important lessons. Their design decisions could make it hard for many folks to use these systems. Worse, poorly configured products might even damage some people's eyesight and set the movement back years. My concern comes from direct experience."
But some estimates are more forgivable - Google *does* have a bad track record with social media so some level of skepticism about Google+ seems warranted (and indeed, in October 2018 Google [quietly announced public Google+ would be shut down & henceforth only an enterprise product](https://blog.google/technology/safety-security/project-strobe/ "Project Strobe: Protecting your data, improving our third-party APIs, and sunsetting consumer Google+")) - and on FeedBurner or Voice, I agree with the model that their future is cloudy. The extreme optimism about Blogger is interesting since before I began this project, I thought it was slowly dying and would inevitably shut down in a few years; but as I researched the timelines for various Google products, I noticed that Blogger seems to be favored in some ways: such as getting exclusive access to a few otherwise shutdown things (eg. Scribe & Friend Connect); it was the ground zero for Google's Dynamic Views skin redesign which was applied globally; and Google is still heavily using Blogger for all its official announcements even into the Google+ era.
Overall, these are pretty sane-sounding estimates.
# Followups
> Show me the person who doesn't die - \
> death remains impartial. \
> I recall a towering man \
> who is now a pile of dust. \
> The World Below knows no dawn \
> though plants enjoy another spring; \
> those visiting this sorrowful place \
> the pine wind slays with grief.^[Han-Shan, #50]
It seems like it might be worthwhile to continue compiling a database and do a followup analysis in 5 years (2018), by which point we can judge how my predictions stacked up against the model, and also because ~100 products may have been shut down (going by the >30 casualties of 2011 and 2012) and the survival curve & covariate estimates rendered that much sharper. So to compile updates, I've:
- set up 2 Google Alerts searches:
- `google ("shut down" OR "shut down" "shutting" OR "closing" OR "killing" OR "abandoning" OR "leaving")`
- `google (launch OR release OR announce)`
- and subscribed to the aforementioned Google Operating System blog
These sources yielded ~64 candidates over the following year before I shut down additions 4 June 2014.
# See also
- [Archiving URLs](/Archiving-URLs)
- [survival analysis of _MoR_ readers](/hpmor#survival-analysis)
- [Wikipedia and Knol](/Wikipedia-and-Knol)
# External links
- ["Google Memorial, le petit musée des projets Google abandonnés"](http://www.lemonde.fr/pixels/visuel/2015/03/06/google-memorial-le-petit-musee-des-projets-google-abandonnes_4588392_4408996.html)
- [Archive Team](!Wikipedia) ([ArchiveTeam Warrior](http://www.archiveteam.org/index.php?title=ArchiveTeam_Warrior): [Reader](http://www.archiveteam.org/index.php?title=Google_Reader))
- Comments:
- [Hacker News](https://news.ycombinator.com/item?id=5653748)
- [Metafilter](http://www.metafilter.com/127712/In-a-few-cases-the-start-dates-are-wellinformed-guesses)
- Article coverage:
- [_Ars Technica_](http://arstechnica.com/business/2013/05/google-services-survive-if-they-make-money-arent-social/ "Google services survive if they make money, aren't social: Statistical analysis of Google products finds a shutdown rate of 35 percent") ([comments](http://arstechnica.com/business/2013/05/google-services-survive-if-they-make-money-arent-social/?comments=1))
- ["'Interesting' Software Follow-Up: Scrivener, Google's Orphans: A new, free guide to an exceptional program "](http://www.theatlantic.com/technology/archive/2013/05/interesting-software-follow-up-scrivener-googles-orphans/275563/) (James Fallows, _Atlantic_)
- _Forbes_:
- ["Is This One Emotion Driving The Google Stock Price Up?"](http://www.forbes.com/sites/haydnshaughnessy/2013/05/08/what-is-driving-the-google-stock-price-up/)
- ["Google Glass - Is It Meant To Last?"](http://www.forbes.com/sites/haydnshaughnessy/2013/05/09/google-glass-has-only-a-37-chance-of-going-five-years-lessons/)
- [_Boy Genius Report_](http://bgr.com/2013/05/07/google-services-shut-down-study/ "Statistical analysis finds Google shuts down 35% of its services")
- Acquisition background:
- ["What Happened to Yahoo"](http://paulgraham.com/yahoo.html), Paul Graham
- 37signals' ["Exit Interview" series](http://www.google.com/search?q=%22Exit+Interview%22&sitesearch=37signals.com/svn/posts/):
1. ["Exit interview: Jaiku's Jyri Engeström"](http://37signals.com/svn/posts/2883-exit-interview-jaikus-jyri-engestrm)
2. ["Exit Interview: Founders look back at acquisitions by Google, AOL, Microsoft, and more"](http://37signals.com/svn/posts/2942-exit-interview-founders-look-back-at-acquisitions-by-google-aol-microsoft-and-more)
3. ["Exit Interview: Ask Jeeves' acquisition of Bloglines"](http://37signals.com/svn/posts/2806-exit-interview-ask-jeeves-acquisition-of-bloglines)
4. ["What happens after Yahoo acquires you"](http://37signals.com/svn/posts/2777-what-happens-after-yahoo-acquires-you)
# Appendix
## Source code
Run as `R --slave --file=google.r`:
~~~{.R}
set.seed(7777) # for reproducible numbers
library(survival)
library(randomForest)
library(boot)
library(randomForestSRC)
library(prodlim) # for 'sindex' call
library(rms)
# Generate Google corpus model for use in main analysis
# Load the data, fit, and plot:
index 1) == google$Dead) / nrow(google))
cat("\nRandom forest's correct predictions:\n")
print(sum((as.logical(predict(rf))) == google$Dead) / nrow(google))
cat("\nBegin bootstrap test of predictive accuracy...\n")
cat("\nGet a subsample, train logistic regression on it, test accuracy on original Google data:\n")
logisticPredictionAccuracy 1) == google$Dead) / nrow(google))
}
lbs t, x$time)]
if (is.null(p)) { coxProbability(d, (t-1)) } else {if (is.na(p)) p '... And this is why you want to discontinue products and services your engineers can't be motivated to maintain. Amazing. This should scare
anyone who has ever left an old side project running; I could see a lot of companies doing a product/service portfolio review based on this
as a case study.' https://news.ycombinator.com/item?id=7571942 no kidding
-->