About the Author

Fourat Zourai works as a Software Engineer for TriTUX.com, which is located in Tunis, Tunisia. For some years, he worked as a Lead Developer for an Internet Service Provider before joining TriTUX.com to focus more on designing secure and high-load architectures dedicated to web content serving and system integration. His areas of expertise and interest include: Architecture & Concept, Security, High-load, Technical consulting, Linux and Open Source, Web technologies like PHP, XHTML, XML, Javascript, Ajax, and CSS. In his spare time, he enjoys scuba diving and maintains a blog at http://tuxmining.blogspot.com/

Using Partitioning for Data Warehousing at TriTUX.com

Overview

TriTUX is based in Tunis, Tunisia. Founded in 2006 as Systems Analysis and Weblication Development, TriTUX is providing high-quality professional services for achieving flexible, creative and scalable systems. TriTUX engineers is a team of jack-of-all-trades, possessing intimate knowledge in networking, eXtreme programming, GNU/Linux and scalable solutions.

Purpose of our application

We've been using partitioning at TriTUX.com since March 2008 and we're currently using MySQL 5.1.24.

Our application is used by a Telecom Operator, to calculate and track bonus points given to subscribers for incoming calls. The application is built on a strongly partitioned MySQL database to handle 60 million calls (INSERTs) per day and to deliver a daily per-subscriber call performance-meter.

How partitioning meets our needs

We realized that a simple SELECT for SUM() on a table with billions of rows would take an eternity. By partitioning our tables, we can now perform these querys in less than 10 seconds! We use both RANGE (by month/day) and HASH partitioning schemes. We also utilize MySQL triggers to perform updates on the partitioned summary tables.

Choosing and tuning the hardware was a difficult task too, sizing memory, CPU frequency and hard disks quality were the most time-consuming issues, along side with tuning MySQL's configurations to get the most out of it.

Our environment

Our development platform is Ubuntu Linux. We develop in Java and PHP, using the Talend Open Studio ETLs development environment.

Our production system consists of one NAS connected to 6 HP Blade 64-bit multicore servers that run Red Hat Linux. Each of them is equipped with 32GB of RAM. After performing some benchmarking, we decided to use the InnoDB storage engine on top of an XFS file system. InnoDB was chosen because of its advantages to MyISAM when it comes to concurrency and data integrity. Backups are performed by taking LVM snapshots of the file systems.

Here are some relevant metrics about the size and type of our application:

60 million INSERTS per day (number of calls per day)

6 million subscribers

The database is growing ~4GB per day

We're using InnoDB with LVM snapshotting for hot backup

We did some software tuning to get the most of a 64bits multicores architecture such as using massive multithreaded software and get the most of the huge RAM space.

Why MySQL?

At TriTUX.com, we believe in the power of mature open source projects, so rejecting commercial databases from the beginning was a matter of love addiction to other alternatives. In our opinion, PostgreSQL lacks a bit when it comes to partitioning (in version 8.3) and it does not provide some of the advanced partitioning options that are available with MySQL 5.1.

What we also do with the new features in MySQL

In addition to the application above, we also utilize MySQL in several other projects we are working on.

We're continuously working on improving our SOA middleware, called PodBridge which is based on a MySQL server (and can be configured to use PostgreSQL too), the event scheduler and partitioning were features which mostly helped us in designing robust and reliable software.

Replication is our key in succeeding real time synchronization between 3 distant sites for our ERP, called Alvanet and dedicated to automotive production companys. Replicating databases lets our customer have real time access to what's happening on the other side of the planet, in another site of the company without depending on bandwidth and internet connection quality, and especially without deploying a centralized platform for that purpose.

Replication and partitioning were also our keys to success in deploying our mobile banking software, called MobiBank.