SQL SERVER – Soft Delete – IsDelete Column – Your Opinion

Just a day ago, I was reading the blog post of Michale J Swart. If you are a regular reader of this blog, I am sure you will be familiar with him. He is a very interesting blogger for sure. He recently wrote an article about Ten Things I hate to See in T-SQL; it was really fun, but the thing which caught my eyes was the subject of isDeleted Column. First of all, let me say that I totally agree with his view point. Let me re-produce what Michale exactly suggests.

“Deleted records aren’t deleted. Look, they’re right there!”

You sometimes see this. Records that are marked as deleted with a flag. It’s sometimes needed to support an undelete feature but it still bugs me.

It’s also confusing when looking at procedures called s_OBJECT_Delete(). Without looking at the definition, does this procedure delete the record, or just mark it as deleted?”

I, in fact, encountered a similar experience very soon. I had gone for performance tuning consultation and I was reviewing a large table. I spotted one table called Orders. Naturally, the size of the table was in millions of the rows. I thought – it is fine, a table can have that many rows. Then I checked another table called customers and it had under thousand records. The question which came to my mind was how come thousand customers ordered millions of items. I asked local DBA coordinator the same question. He said oh, we just cleaned the customer table but the orders table is yet to clean – consider it as a smaller table.

I asked him how can I consider the orders table as a small table when it is too large. He said, well we have most of the column marked as isDeleted.

I just do not like this kind of design at all. I am firm believer of the architecture where only necessary data should be in single table and the useless data should be moved to an archived table. Instead of following the isDeleted column, I suggest the usage of two different tables: one with orders and another with deleted orders. In that case, you will have to maintain both the table, but in reality, it is very easy to maintain. When you write UPDATE statement to the isDeleted column, write INSERT INTO another table and DELETE it from original table. If the situation is of rollback, write another INSERT INTO and DELETE in reverse order. If you are worried about a failed transaction, wrap this code in TRANSACTION.

What are the advantages of the smaller table verses larger table in above described situations?

A smaller table is easy to maintain

Index Rebuild operations are much faster

Moving the archive data to another filegroup will reduce the load of primary filegroup (considering that all filegroups are on different system) – this will also speed up the backup as well.

Statistics will be frequently updated due to smaller size and this will be less resource intensive.

Size of the index will be smaller

Performance of the table will improve with a smaller table size.

I think we can count many other advantages. Again, this is totally my opinion and I would like to ask you question, do you have this kind of soft delete mechanism in your database? If yes, what is the usage of it and how do you handle this. This is very interesting question to me; so I may publish the result on blog post as well.

I’ve seen that sort of thing in our data warehouse, where they’re combining data from multiple clients, and therefore want to use the same structure to serve all. To “modify” the structure, they’ve added an isDeleted column which allows for a specific client to act as though some of the data is not present, without causing issues with the other clients.

However, I think this is misleading. If you know what you’re doing, then as you’ve pointed out, you just fix the design. If you don’t know what you’re doing (end user situation), then, well, you don’t know what you’re doing, and you’re not likely to be savvy enough to filter out the isDeleted column anyway. The end user will say the table has millions of rows.

The situations where I have seen this done for a good reason is in cases where there are mass deletes. Perhaps one needs to delete 25% of a very large table that also requires cascade deletes to even larger tables.

Marking the 25% as “to be deleted”, then processing those deletions in batches or during down times allows the workload to be moved to a time when it has the least negative impact on business processes.

Another use case would be where there is a business process that needs a Undelete function within a short timeframe.

But setting a deleted flag for one-off delete instead of a real delete when the business means delete is just not cricket.

I have to agree and disagree at the same time. While table size issues can be a constraint, flagging rows as deleted has far more advantages.

I use flags frequently because end users seem to rarely consider the ramifications before deleting a record.

When you use archive tables you then need your application to recognize that there are two different tables that store the same information. And it makes code and db maintenance more than twice as difficult.

This doesn’t even go into how do you deal with transition tables and dependent rows in other tables. Would you need to create a archive table for every table in your database? How would you write your code to take this into account? How would you maintain it?

As you mentioned in specific scenario, IsDeleted doesnt make sense in applications like order processing where there will certainly be cases like data archival/data getting invalidated/obselete over a period of time.

But, In my personal exprience, this column will be definitely of use when you are defining metadata of an application. For instance, I have worked on an ERP application where in, users are asked to setup/configure their metadata for the first time when the application is deployed, and then that can be revisited over a period of time. In that, one of the customer has configured specific part of metadata and has removed an option in that by selecting DELETE option. Later after almost couple of months, he came back and asked that he wanted to revoke the option which he has deleted. As we have implemented the soft delete, and also have UI where he can make that change, we were able to do that on a button click and he was amazed to see that getting activated so fast.

Critical part of the above process is, if there are any referential integrity exists across the data model, this could have been next to impossible unless we go with a crud work around by having a series of DML operations.

So my point is IsDeleted kind of columns are very handy in tables which are used for metadata setup.

In CRM application it’s quite normal that you have IsDeleted or IsDisabled column. Let’s take an example.

Let’s say for instance some of our employee decides to switch jobs. The employee has done numerous customer calls, deals etc. which are all stored in the CRM database. The employee has long list of contacts to customer like who to call and when and some personal info about the customer.

Now the employee leaves the job. What do we do regarding the CMR? We can’t just go and delete the employee. We’d have to delete all the related records also. If we delete all related records we loose those contact infos, sales infos, all the employee’s history would be gone. We can’t do that because we need that info for two reason, a) we need to be able to transfer that info to some other employee b) raporting.

So instead of deleting we have IsDisabled field in the Employee table which tells that “this row in this table is not valid anymore but all the records referencing this row are valid”.

Now we haven’t had situation where there would be too much data in the table to lower the performance. I think there would be, at least, two options in that case:

– Create Employee_History table where all the employee rows, which has been disabled are moved
– Use table partitioning and put all disables employee rows to a different partition than enabled ones (and use IsDisabled as partition index)

i am regular reader of your blog,and may be this would be my first comment on your blog…i hope i will keep doing it…:)

BTW..thxs for the post as in my scenario i have done the same this,but in my case i need to have for types of status in my DB i.e IsActive,IsSuspended,IsLocked,IsPending but all for i am maintaining one flag only i.e 0,1,2,3 likewise i have enum at front end…so what you say..for these design also i should go for four more table and what if one more status comes??

I agree with the concept of moving deleted orders to a DeletedOrders table to avoid the isDeleted column, however how do you deal with situation where the Orders table serves is a parent in foreign key relationships?

Eg, you have an Orders table, an a child OrderLog table, OrderWorkflowItems, OrderLedgerItems, etc?

If each of these tables has a foreign key relationship to the Orders table then you can’t move records from the Orders table to DeletedOrders table unless your child tables also contain “deletedOrderID” type columns and the OrderID on these tables becomes nullable OR you haver a DeletedOrderLog table, a DeletedOrderWorkflowItems table, DeletedOrderLedgerItems table, etc.

In my opinion, either of these options is messier than having an OrdersIsDeleted column.

I’m hoping you can enlighten me with a 3rd solution.

Dave

PS – instead of OrderIsDeleted as a bit column, I usually use an OrderDeletedDate column then have a second process that purges deleted order records a month after the deleted date. – This handles the case where the user hits the wrong button.

I never use the isDeleted mechanism….it gradually increases the table length….i always prefer to use the two-table mechanism that u have specified sir….I agree with u as u said a smaller table is always easy to maintain and i believe inner joins work more efficiently in that case…

It depends very much on the nature of the table and your data. I use ‘IsDeleted’ columns from time to time, typically only when the proportion of the soft-deleted rows is quite small in comparison to the table size. In those cases, the overhead of checking the Deleted column is lower than maintaining a whole other Archived/Deleted table.

In other scenarios, when the proportion of Archived rows is much higher, I will use a separate table for the reasons you mention.

The isDelete column is not as evil as it sounds. I have implemented several different solutions, depending on the system metrics and on many other factors.

Case 1: in a busy system, where the peak hours are between 12pm and 1 am, and the rest of the time is idle, I would not do anything else but mark records with a flag isDeleted during the peak times and then I will have a batch job running at night to move the records marked for deletion and physically delete them from the work table.

Think about it. The last thing you want in your busy system is to delete physical data during peak times (think about threading and parallelism, IO consumption and cache; what about fragmentation? ).

Case 2: 100% busy system at all times. Here it gets even trickier, because the above mentioned problems of bulk deletion are almost impossible at any time. Here is where I would engage SSDs and a smart mechanism to “outdate” data by duplicating it to an archive spindle drive and then removing it with conjunction with the application’s performance metrics.

Bottom line: we cannot simply disregard a technique just because it seems poor at first sight. If a technique exists and it has been in use for a while, then there must be something useful about it.
I do agree with Michael Swart, though: in his example, the biggest problem is that the procedure is named incorrectly. s_OBJECT_Delete() is a poor naming convention in regards to the next DBA which comes to look at the system. However, this does not mean yet that the implementation logic is wrong…

Oops…this is something that we just introduced! Ours was a legacy system and earlier, we only had the hard-delete concept. You delete it from the UI, it’s gone forever. However, we have now started giving soft-delete concept for new enhancements. The use of an archive table was debated I am aware, but it was later eliminated because items are not “Deleted”, we now call them “retired” – they can be reintroduced. However, this is an interesting line of thought and definitely something that one should spend a great deal of time on when working on the design.

Dave, at a previous employer we would add not both a deleted date field and modified date field to essentially every table. The tables rarely grew to large sizes so we did not encounter any serious performance problems. The reasoning for the fields given was recoverability. My translation is that the company had been burned before by bad code or careless end users. For highly critical data an “undo” capability may be worth the overhead which you detailed in your post. Also, partitioning could help with performance. The thing I most disliked about the practice was the requirement to add a where clause to every SQL statement to ensure the “deleted” row was always ignored. On more than one occasion a reporting discrepancy was caused because this filter was missing. Thank you for the great content and I look forward to your follow-up post.

There is inherent problem with the backup table you propose. In the example you mentioned, it seamed a very simple situation and the history table would aply nicely. But in some more complex scenarios, you have one row that is referenced by a couple of table, and those table have more references in other table and so on. Some situations in the previous company that I worked at, had about 10 table directly or indirectly associated with one row that had to be deleted. Should I make a complex procedure that moves the row tree of data to a bunch of backup tables?
What I do in this situation is to just flag the rows that have to be removed, and index the flag column (to have good performance accessing the active rows) and create a view that only shows the active row. I use the view in all situations accept when I explicitly have to access the history rows or the flag column.

We do have this mechanism in my company’s primary database, that was why this post caught my attention. As a matter of fact, there is an IsDeleted column in almost every table! So yes, nothing ever gets deleted in the database when it is deleted from the application.

So far we have done nothing as the database size is still small (<100gb) but I suspect we will have to deal with this soon…

Is this not an ideal situation for implementing a partitioned table? I don’t recall the details of how Microsoft has implemented this concept, but…

…if membership in a partition be determined by column value… If a column called Status be defined, with certain values to be in the “active” partition, and other values to determine membership in the “archive” partition, then would the appropriate change of status value cause the record to be transferred automatically to the archive partition?

I must confess that I have designed a database with an IsDeleted column. My question is: how would deleting a record from the Orders table and inserting it into an ArchivedOrders table work if another table has a foreign key reference on the OrderID column of the Orders table? Assume that OrderID is an autoincrement column.

I guess what is appropriate depends strongly on application requirements and what ‘delete’ means to a developer, user and a DBA can be 3 completely different things. For a user it means data does not appear when he looks at it, for a developer it is gone from his query results and for a DBA it is about where it is gone physically. There may also be reporting situations particularly in high compliance companies that need ‘deleted’ data, businesses that have lot of turnover such as call centers have lot of careless employees who do accidental deletes.In some situations it may be appropriate to keep it in same table with a flag, some situations to archive it to a different table or in some situations to dump it all completely. It might be appropriate for the team to get together and decide what is appropriate for the business and implement correctly.

As a DBA i definitely prefer a flag to restoring a huge databse for someone who did a careless delete!!

Our table structures contain a DeleteDate field in MOST tables.. for our large CRM-like SAAS database, it is crucial that we do NOT physically delete records, as this will destroy our billing history for our customers.

Each DeleteDate field is a SmallDateTime, defaulted to 6/1/2079 (the max value on a smalldatetime is roughly at 6/6/2079 at 11:59AM) and we filter all records in our queries as having a DeleteDate > GetUtcDate() With a small, simple index on the deletedate… we can accurately and quickly filter these records out.

As someone else had suggested, once our database grows large enough, we have already planned to partition these records away from the bulk of the table data.

With more than several million records in some tables using this structure, we are still not seeing any significant performance hit.

In certain systems the records cannot be physically deleted due to the business rules which govern them. We design tables to achieve like a Isdeleted kind of functionality. So i feel it depends on business needs where in in some situations soft delete kind of methods have to be employed.

I think the reason for isDeleted column is simple. You have orders linked to a price list table, then after new year they create a new price list and delete the old one (since it is no longer used). They go home happy and come back the next day…

Suddenly all of last years orders have “disappeared” from the system, mass panic ensues everywhere.

The orders table query is no longer able to join orders with an Id in the price list table, so those orders are not returned by the query. Since a system will have many changes and additions over its lifetime it is hard to know what records you really can remove from the table (archive) and which ones have to be their because they are part of a historic relationship (i.e. you have to know the changing application design to make that call).

So you fix that kind of problem forever by telling the developers to never delete a record just mark it as deleted, and even novice developers can follow those instructions.

Basically “Delete” to the user really means “Hide” and isDelete solves this problem. It would be possible to automatically archive “mistakes” by identifying records with no relationship to any other table.

Simon, wouldn’t that situation make more sense to have an effective/termdate on the price? Once you have more than one cycle of a new price, you’d surely want to know which old price belonged to which timeframe? Otherwise you’ll end up with multiple prices with isDeleted=TRUE, and couldn’t tell if you raised or lowered the price from the original.

I completely agree with you on the effective dating of the price list. I would normalize the price to a separate table such that there is NO price in the products table.. and the price is stored with a foreign key to the Product, a start and end date range.

However, I do have this SAME scenario… and my design is to copy the price extended for a product into my OrderLineItem table at the time of checkout. In this way, if the Price changes in the product table, the orders are not effected.

For the idea of an extra “deleted” table to match each data table, you could use a trigger on each of the tables to automatically insert the deleted record into its corresponding “deleted” table, instead of having to worry about putting that logic into your stored procedures. This could also be implemented without the application even knowing about it.

This seems to work for cascaded deletes as well, as long as the foreign key “deleted” tables don’t try to maintain referential integrity with their primary key “deleted” table (as the cascade operation deletes rows in the foreign key tables first, meaning that the foreign key records would be inserted into their “deleted” tables before their associated primary key record was inserted into its “deleted” table).

“Un-deleting” would of course then be the problem, in which you would definitely need some application logic to perform that task.

Implementing the ‘deleted’ column and having to remember to include the ‘where deleted = 0′ in each query is still much easier and less time consuming to implement than an extra table for everything in your database (and having to maintain those tables if your schema changes). A few unit tests along with some simple test data could verify that your queries are correct as well (which you should most likely implement anyway). Not to mention, it is very easy to un-delete records, even if they have many foreign key relationships.

But if you’re having a performance problem and need data out of a large table, perhaps some triggers are the way to go instead of messing around with your application code, and having to possibly write many statements in many different procedures. You should probably write a few “un-delete” procedures for when the need for that arises, though.

While your post was interesting reading, this sounds a lot like jumping to the conclusion of “optimising too early”.

I am looking at it from a developers view only (not a DBA or SYSADMIN’s) and keeping things in a single table is a simpler thing to write and manage (from a code level) and leads to faster development time. If it becomes an issue long term performance wise, THEN move the data into a second table – not simply because it is a DBA’s prerogative to have everything “perfect”.

Too often as people in IT progress in their trade do they appear to build castles in the sky without real world reasoning to back it up. Developers fall into this trap as well with over-architecting. I believe that solutions should be aimed at solving the problems they are designed too.

This is why DBA’s hate us developers. We shouldn’t be saying we should only optimize once it’s become a problem, we should be avoiding problems to begin with. Just because it’s easier on one end of the process doesn’t mean it’s the right thing to do.

I think it all depends on the business process and the type of application you find yourself in. But from a users point of view, delete means i dont need it anymore or dont want to see it again, DBA always want to be able to recover records but at whose expense e.g if i delete one or two mails from my box why does yahoo want to keep it for me, but from a sales application if he/she deletes a sales record, it is important for the business owner to know where deleted records go to and why they where deleted.

But moving the deleted records to another table is a whole waste of time and energy for developers to code with that mentality.

in summary you business process determines alot and your foresight as a solution provider is very needful in that effect.

I just found this article and I think it’s a great discussion. It was really helpful to me.

I only want to make a small contribution to the thread: in the case you decided to use the IsDeleted, isActive or IsWhatever flag, to avoid having to add the isDeleted=0 in all your queries one solution that we found is to create views exposing only active fields.

So you may have a view vwActiveOrders (select * from orders where isDeleted = 0) which you can use in your queries, to join it to other tables, etc without having to remember to add isDeleted = 0.

We have implemented this soft delete mechanism in our data warehouse. We are getting few invalid records from source which should not be visible to the end users.

But we have one more component which makes referene of all the records in our datawarehouse. Hence we should not hard delete these invalid records. Hence we have introduced a flag which specifies the validity of the flag.

am developing one application for vehicle tracking system. one service connected with thousands of vehicles . each vehicle sending data in 10 seconds delay. am storing values in one table called msglog and events stored in another table msgevnts. how can we improve the performance of the database ?. every month am getting more than millions of data in to that tables so please advice me

also in the case of history you want the user to be able to mark a record as no longer in user but still show the previous history for what happens .

we also have procedures in place which run hard deletes (with tombstone tables to keep history) at set periods of time to help clean up the database.

this to me is the best approach – mostly because we do have constraints on the tables and deleting first hand would fail as we dont use cascades (cascading deletes is just lazy if you ask me) you want to be fully warned if a delete would cause relationships to be invalid so thats what we do – attempt a delete with a reference then the system warns you to delete with it first before continuing with the delete.

Its more work to build the system like this but worth it in the long run.

also in the case of history you want the user to be able to mark a record as no longer in user but still show the previous history for what happens .

we also have procedures in place which run hard deletes (with tombstone tables to keep history) at set periods of time to help clean up the database.

this to me is the best approach – mostly because we do have constraints on the tables and deleting first hand would fail as we dont use cascades (cascading deletes is just lazy if you ask me) you want to be fully warned if a delete would cause relationships to be invalid so thats what we do – attempt a delete with a reference then the system warns you to delete with it first before continuing with the delete.

Its more work to build the system like this but worth it in the long run.

This is the very first comment I have on your blog, I have been reading your blog from pas few months and I found it knowledgeable and interesting.

Now about the soft delete We had Isdeleted column as flag for deleted record in past but We introduced terminatedate concept for soft delete in our database 2 year ago and It helps us a lot in keeping the audit with dates. We used “create date” and “terminate date” as column in tables and we also had “create userid” and “terminate userid” to identify who created and terminated the record. With every update we terminate the previous record and create new record with changes. We easily keep and move the data with the help of partitioning and with scheduled jobs.

Community Initiatives

About Pinal Dave

Pinal Dave is a Pluralsight Developer Evangelist. He has authored 11 SQL Server database books, 17 Pluralsight courses and have written over 3200 articles on the database technology on his blog at a http://blog.sqlauthority.com. Along with 11+ years of hands on experience he holds a Masters of Science degree and a number of certifications, including MCTS, MCDBA and MCAD (.NET). His past work experiences include Technology Evangelist at Microsoft and Sr. Consultant at SolidQ. Follow @pinaldave
Send Author Pinal Dave
an email at pinal@sqlauthority.com

Email Subscription

Enter your email address to subscribe to this blog and receive notifications of new posts by email.