This paper is behind a registration-wall, you can't do anything on the MySQL site without filling out a form of some kind, but it's a short, decent introduction to using MySQL for a good sized website.

Eventually every database system hit its limits. Especially
on the Internet, where you have millions of users
which theoretically access your database simultaneously,
eventually your IO system will be a bottleneck. [A] promising but more complex solution with nearly no scale-out limits is application partitioning. If
and when you get into the top-1000 rank on alexa [1], you have to think about such solutions.

If the clustered file system, clustered storage system, storage virtualization movement is new to you then this is a good intro paper. I's a both vendor puff piece and informative, so it might be worth your time.

A Quick Hit of What's Inside

Clustered storage architectures have the ability to pull together two or more storage devices to behave as a single entity. Clustered storage can be broken down into three types:
* 2-way simple failover clustering
* Namespace aggregation
* Clustered storage with a distributed file systems (DFS)

Replication Under Scalable Hashing:
A Family of Algorithms for Scalable Decentralized Data Distribution
Typical algorithms for decentralized data distribution
work best in a system that is fully built before it first used;
adding or removing components results in either extensive
reorganization of data or load imbalance in the system.
We have developed a family of decentralized algorithms,
RUSH (Replication Under Scalable Hashing), that
maps replicated objects to a scalable collection of storage
servers or disks. RUSH algorithms distribute objects to
servers according to user-specified server weighting. While
all RUSH variants support addition of servers to the system,
different variants have different characteristics with
respect to lookup time in petabyte-scale systems, performance
with mirroring (as opposed to redundancy codes),
and storage server removal. All RUSH variants redistribute
as few objects as possible when new servers are
added or existing servers are removed, and all variants
guarantee that no two replicas of a particular object are
ever placed on the same server. Because there is no central
directory, clients can compute data locations in parallel,
allowing thousands of clients to access objects on thousands
of servers simultaneously.

Web Analytics: An Hour A Day is the first book by an in-the-trenches practitioner of web analytics. It provides a unique insider’s perspective of the challenges and opportunities that web analytics presents to each person who touches the Web in your organization. Rather than spamming you with metrics and definitions, Web Analytics: An Hour A Day will enhance your mindset and teach you how to fish for yourself.
Avinash Kaushik is a expert in web analytics and author of the top-rated blog Occam’s Razor (http://www.kaushik.net/avinash). In this book, he goes beyond web analytics concepts and definitions to provide a step-by-step guide to implementing a successful web analytics strategy. His revolutionary approach to web analytics challenges prevalent thinking about the field and guides readers to a solution that will provide truly informed and actionable insights.

Building, scaling, and optimizing the next generation of web applications. Learn the tricks of the trade so you can build and architect applications that scale quickly--without all the high-priced headaches and service-level agreements associated with enterprise app servers and proprietary programming and database products. Culled from the experience of the Flickr.com lead developer, Building Scalable Web Sites offers techniques for creating fast sites that your visitors will find a pleasure to use.
Creating popular sites requires much more than fast hardware with lots of memory and hard drive space. It requires thinking about how to grow over time, how to make the same resources accessible to audiences with different expectations, and how to have a team of developers work on a site without creating new problems for visitors and for each other.
Presenting information to visitors from all over the world
* Integrating email with your web applications
* Planning hardware purchases and hosting options to have as much as you need without breaking your wallet
* Partitioning and distributing databases to support large datasets and simultaneous transactions
* Monitoring your applications to find and clear bottlenecks
* Providing services APIs and using services from other providers to increase your site's reach and capabilities
Whether you're starting a small web site with hopes of growing big or you already have a large system that needs maintenance, you'll find Building

Author of Web Analytics An Hour of Day. Has a fresh and practical take on unlocking the power of web research and web analytics to create truly data driven organizations for gaining a strategic competitive advantage.

The Isilon IQ family of clustered storage systems was designed from the ground up to meet the needs of data-intensive enterprises and high-performance computing environments. By combining Isilon's OneFS® operating system software with the latest advances in industry-standard hardware, Isilon delivers modular, pay-as-you-grow, enterprise-class clustered storage systems. OneFS, with TrueScale™ technology, powers the industry's first and only storage system that enables linear or independent scaling of performance and capacity. This new flexible and tunable system, featuring a robust suite of clustered storage software applications, provides customers with an "out of the box" solution that is fully optimized for the widest range of applications and workflow needs.
* Scales from 4 TB ti 1 PB
* Throughput of up to 10 GB per seond
* Linear scaling
* Easy to manage

Lustre® is a scalable, secure, robust, highly-available cluster file system. It is designed, developed and maintained by Cluster File Systems, Inc.
The central goal is the development of a next-generation cluster file system which can serve clusters with 10,000's of nodes, provide petabytes of storage, and move 100's of GB/sec with state-of-the-art security and management infrastructure.
Lustre runs on many of the largest Linux clusters in the world, and is included by CFS's partners as a core component of their cluster offering (examples include HP StorageWorks SFS, and the Cray XT3 and XD1 supercomputers). Today's users have also demonstrated that Lustre scales down as well as it scales up, and runs in production on clusters as small as 4 and as large as 25,000 nodes.
The latest version of Lustre is always available from Cluster File Systems, Inc. Public Open Source releases of Lustre are available under the GNU General Public License. These releases are found here, and are used in production supercomputing environments worldwide.