Tuesday, January 16, 2007

This is one of the most fascinating case studies I’ve ever read in the IT literature. Free-of-charge social networks are among the most fragile creations in the web universe. They live and die by their ability to stoke the “network effect” of snowballing invitations among people within diverse social circles. If the service shows any chronic degradation in performance and reliability, users will abandon it freely and speedily.

The article shows in painful detail how MySpace.com—without any fixed strategy--has continually evolved its distributed access, application, processing, storage, hosting, and management infrastructure to keep pace with surging membership, traffic, content, and expectations. It breaks the architectural evolution of MySpace.com into “membership milestones”--500,000 users, 1 million, 3 million, 9 million, 26 million, ….—and shows how the service broke and was quickly fixed to avoid strangling the golden goose they had birthed.

What I found most fascinating about this case study is the following statement, in which a rival (Friendster) partly attributes MySpace.com’s runaway success to MySpace.com’s superior performance (and Friendster’s concurrent growing pains): “MySpace was launched in 2003, just as Friendster started having trouble keeping pace with its own runaway growth. In a recent interview with Fortune magazine, Friendster president Kent Lindstrom admitted his service stumbled at just the wrong time, taking 20 to 30 seconds to deliver a page when MySpace was doing it in 2 or 3 seconds.”

Once MySpace.com started to explode, they continually ran into bottlenecks in data access performance that threatened to derail them as well. The case study lays out the peril to MySpace in stark terms: “MySpace has tens of millions of people posting messages and comments or tweaking their profiles on a regular basis—some of them visiting repeatedly throughout the day. That makes the technical requirements for supporting MySpace much different than, say, for a news Web site, where most content is created by a relatively small team of editors and passively consumed by Web site visitors. In that case, the content management database can be optimized for read-only requests, since additions and updates to the database content are relatively rare. A news site might allow reader comments, but on MySpace user-contributed content is the primary content. As a result, it has a higher percentage of database interactions that are recording or updating information rather than just retrieving it…..Every profile page view on MySpace has to be created dynamically—that is, stitched together from database lookups. In fact, because each profile page includes links to those of the user's friends, the Web site software has to pull together information from multiple tables in multiple databases on multiple servers. The database workload can be mitigated somewhat by caching data in memory, but this scheme has to account for constant changes to the underlying data.”

Since the beginning, MySpace.com has operated in ad-hoc fire-fighting mode, evolving its architecture to oil whatever new squeaks presented themselves. In reading this article, I scribbled down notes on the convoluted saga of ad-hoc fixes. Here (reading like the “and then, and then, and then” run-on ramblings of toddlers trying to make sense of an apparently pointless plot) are my notes on what they’ve done to keep heads above water:

§first: single database server, with two access/web servers:

§then: handle access/usage growth by throwing more more web servers at the problem

§then: divide database loads among single master database and two access databases that have replicated copies of data posted to master, plus more database servers and bigger hard disks

§then: vertical partitioning of separate databases among various functions of the MySpace.com service

§then: a storage area network with pool of disk storage devices tied together by a high-speed specialized network

§then: every database was given its own copy of the users table

§then: distributed computing architecture treating the website as a single app, with one user table, split into chunks of 1 million accounts, with those chunks in separate mirrored instances of SQL Server, with webserver/access server redirecting logins to the applicable database servers

§then: rewrite app in faster, more efficient computing environment (ASP.NET), with re-examination of every function for streamlining opportunities

§then: continually redistributing data across the SAN to reduce I/O imbalances, but it was a manual process

§then: virtualized storage architecture where the entire SAN is treated as one big pool of storage capacity, without requiring that specific disks be dedicated to serving specific applications

§then: caching tier

§then: faster version of database server running on 64-bit hardware with more memory access/less memory bottleneck

§then: turn off distributed denial of service protection in order to goose performance further (introducing risk)

§then: implement backup data centers/SANs tied to different power grids

§and always: in any fix, impossible to do thorough load/performance testing on each new architectural fix/stopgap, simply resigning themselves to fixing new problems ad-hoc as they spring up

Of course, MySpace.com’s teenager customers don’t care, and don’t want to care, about any of this. The service continues to grow smartly, which may be attributed, among many factors, to the fact that it has yet to cross a “MySpace sucks, let’s leave” threshold. As the article states over and over, MySpace.com continues to experience significant performance and reliability problems, but they’ve never been showstoppers.

How long would it take for a social networking site to slow down and/or crash before it gets abandoned by its users? Are these sites so “group-sticky” that participants will tolerate poor performance for long periods? Are users’ performance/reliability expectations on these services lower than for standard corporate and e-commerce websites?

Or could the life or death of a social networking service—or of any online channel/forum—be driven more by the zeitgeist—fad, fashion, weariness, exhaustion, restless, new cool alternatives? Ten years ago, my 9-year-old son created a Digimon website. He’s in college now, and I don’t snoop into his doings, but I suspect that both of my kids have MySpace.com pages (no—I have better things to do then spy on them). Ten years from now, they and their peers will probably have long abandoned whatever social networking services they’re currently using.

They’ll write it off as youthful experimentation. To the extent they’ll still be participating in any online community that resembles today’s social networking services, it’ll be an act of nostalgia, more than anything. From a technical standpoint, it’ll probably be lightning-fast and scalable as can be, the fruit of lessons learned in the ‘00s by MySpace.com and other pioneers in this world of hyper-federated data service layers. And it will probably be far more navigation-friendly, as today’s chaotic MySpace homepage designs (which resemble the overcrowded Web-site home page designs that even big corporations were using in the ‘90s) settle into more consistent, pleasing patterns that everybody accepts without question.

But at that point the messy fun of the ungoverned social-networking frontier will be a distant, and slightly quaint, cultural memory. Like hippie communes in '60s.

Wednesday, January 03, 2007

This is, of course, inane in the extreme. Zogby should be ashamed of themselves for phrasing the question in this ridiculous manner, and for pretending that the collective responses are worthy of serious consideration. While we’re on the topic, where are the “next” instances of the 6 billion-plus unique humans on the planet going to come from? Cloning seems the only truly effective approach, but I digress.

But I suppose that institutionalized idiocy has its own weird logic. What this misbegotten market research shows is that Mr. William Gates III has truly passed from the realm of limited mortals into the cultural hagiosphere. Does the following phrase sound like Christian second-coming imagery to you? “’The next Bill Gates has already been born, and time will tell what country is providing the environment of innovation, entrepreneurism and opportunity to enable him or her to flourish with the next great idea,’ said 463 partner Tom Galvin.” Or perhaps the current mortal Bill is an avatar of some timeless Hindu deity. But once again, I digress.

Clearly, the man who co-founded Microsoft has long since passed far beyond, say, Rockefeller, Carnegie, or JP Morgan in the pantheon of capitalist earthly gods. How can I make that assertion? Ask children to name one rich person known principally for his/her riches (as opposed to whatever celebrity career, be it athlete actor or musician, furnished them their largesse). Even in their days, I suspect, those ancient robber barons were probably known to few children. Even the kids who frequented the Carnegie-funded libraries (such as the one in my father’s Wisconsin hometown) probably didn’t realize that a munificent human was putting books in their hot little hands.

But today’s preschoolers, everywhere, know quite well that some well-endowed individual named Bill Gates is behind all things software, including the Internet (yes, I’m assuming that few tots care about the distinctions between Microsoft and other software companies). Maybe that lowest-common-denominator overattribution will diminish over time as Gates (perhaps) withdraws from active participation in the high-tech industries, but maybe not.

The world community demands an “inventor” of this created software universe. “Bill Gates” is a serviceable “father” to it all. “He” is two easy syllables that almost everybody everywhere can say with ease and be immediately understood.

Better than G. Presper Eckert.

But in some sense this annoying opinion survey may be onto something. If Bill Gates is an avatar of some timeless presence—in Hindu terms, the “creator”—then maybe the “next Bill Gates” refers to the next manifest avatar of some counter-personage in that pantheon. The “destroyer”? Who will come along to destroy or dismantle Gates’ software industry legacy? From a financial standpoint, Gates and his wife are self-dismantling through their foundation.

From an industry standpoint, open source, software as a service, virtualization, etc are dismantling Microsoft’s created order, steadily, erosively. “The Zogby/463 Internet Attitudes poll found that practically half of all Americans (49 percent) believe that the next great technology leader will come from either China or Japan. Twenty-one percent believe that ‘next Bill Gates’ will come from the United States while 13 percent believe he or she will come from India.”

So, given that half the world’s people are from Asia, chances of the “next Bill Gates”—the “destroyer” but also the “creator” of the next softworld order--coming from that region are a coin flip.

James Kobielus

About Me

James Kobielus is IBM's
Big Data Evangelist. He is an industry veteran who spearheads IBM's thought
leadership activities in big data, data science, enterprise data
warehousing, advanced analytics, Hadoop, business intelligence, data management,
and next best action technologies. He works with IBM's product management
and marketing teams across the big data analytics portfolio. Prior to
joining IBM, he was a leading industry analyst, with firms including
Forrester Research, Current Analysis, and Burton Group. He has spoken at
such leading industry events as IBM Information On Demand, IBM Big Data
Integration and governance, Strata, Hadoop Summit, and Forrester Business
Process Forum. He has published several business technology books and is a
very popular provider of original commentary on blogs, podcasts, bylined
business/technology press publications, and many social media.