Saving Yourself with Data Replication

Data can be the currency, Intellectual Property, and life blood of many a company. One technique to make sure that your data is readily available is data replication. Not quite the same as data backup but can be equally important.

Recently there has been a transition from physical products being the most critical aspect of many company’s businesses to data being the key driver. This transformation started some time ago and has steadily progressed over time. While one can argue over the subtleties of whether a company actually makes a physical product or not, it is fairly clear that to almost all companies data has become if not the key to their success, then very close to it.

An even more subtle — but perhaps more important result of this transformation — is the impact that these original “non-physical-product” companies have had on other companies. For example, the creation of spreadsheet applications has allowed companies to better manage themselves even if they make a physical product. In essence, these companies have become the tool-makers in the age of data. Using these non-physical tools, virtually all companies have become more efficient, more transparent, and better managed. Along the way, they had to create their own data perhaps switching the importance from the actual physical product as the key to the company to the data being the star of the show. I think one can safely argue that virtually all companies today rely on data for operating their company. The degree to which they rely on data is variable, but for many, not having access to the data is equivalent to having the raw materials for their products taken away, or having factories shut down (a condition that seems to happen just about weekly in France these days).

Consequently, data has become the life blood of just about every company. Without it, operation comes to a screeching halt with the requisite injury to the company that results from any massive deceleration. Therefore, availability of data and access to data is extraordinarily important for just about everyone including companies.

There are many techniques to ensuring that data doesn’t disappear and is accessible in a timely manner. Techniques such as backups, off-site copies, disaster recovery sites, and replication, are all used to make sure that data is available and “safe” at all times. The simplest concept for ensuring that data is available is to have multiple copies of the data in case something happens to the original copy. This can be accomplished in a number of ways but the fundamental goal is to ensure that a mistake, accident, disaster, or other occurrence does not cause the complete lose of data.

In an effort to stave off disaster, let’s investigate data replication. While the phrase “replication” and “backup” are sometimes used interchangeably, we’ll see that they are, in fact, very different from each other, but they can used together and often they are.

Replication

Typically the word replication, in a data context, is used to mean the process of sharing data between resources (storage) to make sure that they are consistent. In essence, it’s making sure that a copy of the data on one storage pool is mirrored on a secondary storage pool. Many times this means redundant storage resources but this isn’t always the case depending upon your definition of redundancy.

However, replication is different than a backup. A backup can keep some historical information about the data allowing you to get to a previous version of the data (such as an earlier version of a document or of an application). Replication of the data means that the copy is an exact duplicate, or as close as possible, of the current data. Consequently, no historical information is kept. Simply put, backups keep records of past versions and replication is just a mirror of the current state of the data.

Therefore replication is not a replacement for a backup but it can be used as a compliment to backups. Replication allows immediate restoration of the data as it was when the primary storage went off-line. It can happen by using fail-over storage or by taking the secondary storage pool and using it as the primary storage pool (depending upon how the storage and servers are configured this could involve rebooting the servers).

In contrast, restoring the current state of data from backups can take a great deal of time. Moreover, restoring from a backup will only restore the data to the point at which it was taken. This means that data created between the last backup and when the primary storage went off-line is lost. But backups can be very useful in restoring previous versions of data or restoring data that has been erased. The classic example is a user who just did “rm -rf” in their /home directory. Replication can’t restore any of the missing data since the recursive remove also removed data from the secondary storage pool, but a backup can at least restore a version of the data from the time when the backup was created.

A common desire in using replication is to keep a copy of the data at a remote site. Exactly what “remote” constitutes depends upon your situation and requirements but the concept is that if the primary storage pool is lost due to an accident or a disaster such as a fire or a tornado, the second storage pool is in a different location and can be used in place of the primary storage. Consequently, people will sometimes refer to replication as a “disaster recovery” mechanism.

For non-database data storage there are typically two approaches to replication – (1) real-time replication and (2) point-in-time replication. The first option means that a write operation happens on the primary storage and also happens at the same time, or very shortly thereafter, on the replicated storage. The second option means that something like a snapshot is made on the primary storage and then replicated to the secondary storage. The point-in-time replication means that the secondary storage is not necessarily up to date and again you have a “gap” in data states on the storage pools similar to a backup. But this gap is typically fairly small.

The rest of this article will focus on real-time replication. The phrase “real-time” has special meaning in IT, but I will be using it more loosely meaning that things aren’t really happening in real-time. There are two techniques for real-time replication: synchronous, and asynchronous.

In the case of synchronous replication, a write on the primary storage also takes place on the secondary storage at the same time. Both writes, the one on the primary storage and the one on the secondary storage, must complete for the “write” to complete. So if the write on the primary or secondary storage is slow, it blocks the completion of the write operation to the application. This can have an enormous impact on performance which means that synchronous replication happens over very short networks (to reduce latencies and improve overall performance), and happens over very reliable networks. Since the replication happens synchronously there is no difference in the data between the two pools of storage. Typically synchronous replication is used when zero differences in the data between the two storage pools is desired (or required).

Asynchronous replication is more common because it relaxes the need for the write on the primary and the secondary to complete before the application write function is completed successfully. Asynchronous replication allows the write operation to be completed on the primary storage so that the application can continue. Then the data is copied or replicated from the primary storage to the secondary storage in some period of time. This means that at any instance in time the data on the primary and the secondary storage may not be exactly the same. However, the delay between them is usually fairly small depending upon a number of factors such as the amount of data to be replicated, the network between the two storage pools, and the replication mechanism.

Asynchronous replication is the most common type of replication because it allows for slower networks or longer distances between storage pools to be used. Let’s examine two common options in Linux for asynchronous replication – rsync and drbd.

Comments on "Saving Yourself with Data Replication"

We are a group of voluneers and opening a new
scheme in our community. Yourr website offered us with valuable information to work on. You’ve done a
formidable job and our entire community will be grateful tto you.

Have you ever thought about publishing an ebook or guest authoring on other sites?
I have a blog based on the same subjects you discuss annd would really like to
have youu share some stories/information. I know my
visitors would value your work. If you are
even remotely interested, feel free to shoot me an email.

hey there and thank you for your info – I’ve certainly picked up something
new from right here. I did however expertise several technical issues using this
website, since I experienced to reload the site many times pprevious to
I could get it to lolad properly. I had been wondering if your web hosting is OK?
Not that I am complaining, but slow loading instances times will sometiumes affect your placement in google and can damage your
high-quality score if advertising and marketing with Adwords.
Welll I’m adding this RSS tto my email and can look
out for a lot more of your respective fascinating content.
Ensure that you update this again very soon.

Do you have a spam problem on this website; I alo am a blogger, and I
was wanting to know your situation; we have created some nice procedures
and we are looking to trade solutions with other folks, why not shoot me an e-mail iff interested.

I’m really enjoying the design and layout of your blog.
It’s a verry easy on the eyes which makes it much more enjoyable
for me to come hefe and visit more often. Did you hire out a developer to create your
theme? Fantastic work!

Fantastic goods from you, man. I have understand your stuff previous to and you’re just too magnificent.
I actually like what you’ve acquired here, certainly like what
you’re saying and the way in which you say it. You make it enjoyable and you still take care of to keep it
wise. I can’t wait to read far more from you. This is actually a
tremendous website.

Advertiser Disclosure:
Some of the products that appear on this site are from companies from which QuinStreet receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. QuinStreet does not include all companies or all types of products available in the marketplace.