Good design factors in the overall lifecycle of the database. Databases grow over time, and it is critical to have a strategy to deal with this. For every piece of data you need to determine how long you are going to keep it. Once you determine the lifespawn of the data you will have to put in place a means of removing it. It is a sad fact that this is often neglected until a problem arises that demands it be dealt with.

+

Good design factors in the overall lifecycle of the database. Databases grow over time, and it is critical to have a strategy to deal with this. For every piece of data you need to determine how long you are going to keep it. Once you determine the lifespan of the data you will have to put in place a means of removing it. It is a sad fact that this is often neglected until a problem arises that demands it be dealt with.

There are two basic types of data that we talk about when we are talking about retention: temporary and semipermanent.

There are two basic types of data that we talk about when we are talking about retention: temporary and semipermanent.

Line 6:

Line 6:

==Temporary Data==

==Temporary Data==

-

Any data which is only necessary for a short period of time is considered temporary data. These tables will typically have a large number of inserts that are only valid for a narrow window such as session tables. Query performance will degrade over time if you fail to prune this table regulaly. It is not uncommon for the developer to write logic which will invalidate a session after a specific period of time, but neglect to delete it. This is a mistake seasoned developers have made, so be sure to take it seriously.

+

Any data which is only necessary for a short period of time is considered temporary data. These tables will typically have a large number of inserts that are only valid for a narrow window such as session tables. Query performance will degrade over time if you fail to prune this table regularly. It is not uncommon for the developer to write logic which will invalidate a session after a specific period of time, but neglect to delete it. This is a mistake seasoned developers have made, so be sure to take it seriously.

===Strategies for Managing Temporary Data===

===Strategies for Managing Temporary Data===

Line 21:

Line 21:

===Retention Policy===

===Retention Policy===

-

The first step is to understand the retention policies that could apply to this data. If you have a corporate retention policy that applies than you simply need to follow it. Some policies will mandate specific retention durations, while others will mandate destruction. Lacking clear external direction for this you will need to make a judgement call. However you establish the policy it is safe to assume it will change in the future, so allow this to be adjusted. At this point you will also need to assess if you need to archive old data moving it out of the primary database to a secondary location.

+

The first step is to understand the retention policies that could apply to this data. If you have a corporate retention policy that applies than you simply need to follow it. Some policies will mandate specific retention durations, while others will mandate destruction. Lacking clear external direction for this you will need to make a judgment call. However you establish the policy it is safe to assume it will change in the future, so allow this to be adjusted. At this point you will also need to assess if you need to archive old data moving it out of the primary database to a secondary location.

Even data that seems like it should be permanent should have a strategy for removal. If you have a database that tracks your customers you obviously don't want to expire current customers. You do however want to purge out former customers at some point.

Even data that seems like it should be permanent should have a strategy for removal. If you have a database that tracks your customers you obviously don't want to expire current customers. You do however want to purge out former customers at some point.

Line 31:

Line 31:

====Deleting old Records====

====Deleting old Records====

-

Deleting records from a database is a DML operation that can be quite time consuming. Very large deletes can block your database on reads and prevent your application from operating normally.

+

Deleting records from a database is a DML operation that can be quite time consuming. Very large deletes can block your database on reads and prevent your application from operating normally. There are a couple simple techniques you can do to minimize this impact however.

+

* Only perform the deletes during off-peak hours

+

* Put a limit on the delete call and call it multiple times

+

Those two approaches can minimize the negative impact of the large deletes. It is worth noting that you must properly index your time fields or your delete options will be painfully slow.

====Partition====

====Partition====

-

====

+

If your database supports partitioning you can manage it via DDL operations that are near instantaneous. If you partitioned your database by months you could apply your retention policy by adding and removing months at a time. This action is very rapid consuming very little disk i/o. Partitions are an elegant solution as long as the version of database you are running supports them.

+

+

If your database does not support partitioning your application can still be written to work in a similar fashion. For example if your application creates tables named log_yyyy_mm and manages adding and dropping those tables as needed. This requires substantially more work in the application code to accomplish so it clearly is preferred to use a database that supports partitions.

Current revision

Good design factors in the overall lifecycle of the database. Databases grow over time, and it is critical to have a strategy to deal with this. For every piece of data you need to determine how long you are going to keep it. Once you determine the lifespan of the data you will have to put in place a means of removing it. It is a sad fact that this is often neglected until a problem arises that demands it be dealt with.

There are two basic types of data that we talk about when we are talking about retention: temporary and semipermanent.

Temporary Data

Any data which is only necessary for a short period of time is considered temporary data. These tables will typically have a large number of inserts that are only valid for a narrow window such as session tables. Query performance will degrade over time if you fail to prune this table regularly. It is not uncommon for the developer to write logic which will invalidate a session after a specific period of time, but neglect to delete it. This is a mistake seasoned developers have made, so be sure to take it seriously.

Strategies for Managing Temporary Data

Delete all expired entries

Delete all entries that are older than one hour

These deletes can either be done from a separate utility, or embedded inside of the application.

Semipermanent Data

Any data which you intend to keep for a long period of time but not permanently is considered semipermanent data. You will need to determine how long you will be keeping the data and then choose a strategy for managing it.

Retention Policy

The first step is to understand the retention policies that could apply to this data. If you have a corporate retention policy that applies than you simply need to follow it. Some policies will mandate specific retention durations, while others will mandate destruction. Lacking clear external direction for this you will need to make a judgment call. However you establish the policy it is safe to assume it will change in the future, so allow this to be adjusted. At this point you will also need to assess if you need to archive old data moving it out of the primary database to a secondary location.

Even data that seems like it should be permanent should have a strategy for removal. If you have a database that tracks your customers you obviously don't want to expire current customers. You do however want to purge out former customers at some point.

Strategies for Managing Semipermanent Data

Small data sets can be managed easily by simply deleting based on the age of the record. Larger data sets should have greater thought put into the management strategy.

Deleting old Records

Deleting records from a database is a DML operation that can be quite time consuming. Very large deletes can block your database on reads and prevent your application from operating normally. There are a couple simple techniques you can do to minimize this impact however.

Only perform the deletes during off-peak hours

Put a limit on the delete call and call it multiple times

Those two approaches can minimize the negative impact of the large deletes. It is worth noting that you must properly index your time fields or your delete options will be painfully slow.

Partition

If your database supports partitioning you can manage it via DDL operations that are near instantaneous. If you partitioned your database by months you could apply your retention policy by adding and removing months at a time. This action is very rapid consuming very little disk i/o. Partitions are an elegant solution as long as the version of database you are running supports them.

If your database does not support partitioning your application can still be written to work in a similar fashion. For example if your application creates tables named log_yyyy_mm and manages adding and dropping those tables as needed. This requires substantially more work in the application code to accomplish so it clearly is preferred to use a database that supports partitions.