maandag 27 oktober 2014

SQL Server : A tutorial for building a replication between 2 machines (Part I)

Introduction

Replication is a set of technologies for copying and distributing database objects and data from one database to another database. After the intial configuration a synchronization process is running to maintain consistency.The distributor is kept in sync synchroneously with the publisher.

This blogpost is the first blogpost in a series of blogposts about replication on 2 machines. This series of blogpost is about a more complex configuration of replication than the blogpost I've written earlier.

Initial data set

The initial dataset is the dataset that is initially captured on the publisher and stored on the subscriber. This is done by the snapshot agent. It is also possible to carry this out through a backup or SSIS.

Changed dataset

The log reader agent is running at the distributor. The log reader identifies INSERT, UPDATE, and DELETE statements or other changes marked for replication in the transaction log. The agent will then copy transactions in batches to the distribution database on the distributor. The distribution database is a store-and-forward queue and the final step is storing the changed data set on the subscriber.

The agents

The transactional replication of SQL Server is implemented with a Snapshot Agent, Log Reader Agent, Queue Reader Agent and a Distribution Agent:

The Snapshot Agent makes snapshot files based on the schema and data from the publisher of the databaseobjects and data. The Snapshot Agent places a shared lock on all tables ready for publishing while the BCP files are generated. This ensures transactional consistency of the data, but it ensures that other requests that attempt to write simultaneously are blocked. Depending on the size of the table this may be substantial. This is only the first time. During normal operations there are much finer-grained and very short-locks required.

The log reader Agent monitors the transaction log of each database which is configured for replication.. The Log Reader agent is responsible for copying the transactional log records in the distribution database. This is done according to the ACID principles. The LogReader connects the Publisher, and looking through the log files for records marked with the replication flag and copies the information in the distribution database. Transactions can be as long as the LogReader records when tranaction can't be removed from the log file due to problems. Growth of the log file can be the result. A possible cause may lie in the distributor that is out. This is viewable in the "log_reuse_wait_desc column column in the sys.database.

The distribution agent copies the initial snapshot files from the snapshot folder and the transactions in the distribution database to the subscriber. The distribution agent is responsible for the transfer of the distribution database to the subscription database. The distribution Agent connects with the distributor and reads the stored changes. Then, the changes will be processed in jacket order.

The Queue read agent is used together with transactional replication with updating. The agent runs on the distributor and move changes from the subscriber back to the publisher.

Transactional replication is not performed by SQL Server database engine but with windows programs that connect to the servers involved in the replication process. The agents are implemented by the SQL Server Jobs.

Important choices

For replication, there are several possible important choices, I've gathered so far: