It's an idea I've heard repeated in a handful of places. Some more or less acknowledging that once trying to solve a problem purely in SQL exceeds a certain level of complexity you should indeed be handling it in code.

The logic behind the idea is that for the large majority of cases, the database engine will do a better job at finding the most efficient way of completing your task than you could in code. Especially when it comes to things like making the results conditional on operations performed on the data. Arguably with modern engines effectively JIT'ing + caching the compiled version of your query it'd make sense on the surface.

The question is whether or not leveraging your database engine in this way is inherently bad design practice (and why). The lines become blurred further when all the logic exists inside the database and you're just hitting it via an ORM.

This is one of those sayings that has to be taken thoughtfully. It's whipped out whenever one finds another engineer doing 'select * from table' and then combing through the result set instead of using a where clause and specifying columns. But if you take it too far, you end up with a different mess.
–
Michael KohneOct 23 '12 at 17:08

116

Starting a phrase with "never" or "always" is almost always a recipe for a bad design.
–
vszOct 23 '12 at 18:49

Could you can expand the question to also ask people to weigh in on scalability especially distributing application across servers?
–
Kinjal DixitOct 23 '12 at 19:05

22

While its certainly possible to try to do too much in SQL, I can honestly say that in 30 years of development and consulting, I have never seen an actual serious case of it (a few minor ones). On the other hand, I have seen literally hundreds of serious cases of developers trying to do to much in "code" that they should have been doing in SQL. And I still do see them. Frequently ...
–
RBarryYoungOct 23 '12 at 21:14

4

This question is two in one - I think it should be split. 1) How much should be done in SQL? 2) How much should be done in the DBMS? Stored procedures fall in the middle. I've seen entire applications coded in stored procedures.
–
reinierpostOct 24 '12 at 12:36

Doing these things instead of relying in SQL or the RDBMS leads to writing tons of code with no added value, meaning more code to debug and maintain. And it dangerously assumes the database will only be accessed via the application.

+10000000000 for pointing out that it dangerously assumes everything will only happen through the application.
–
HLGEMOct 23 '12 at 17:09

9

@skynorth It leads to bad database design. Down the line you end up with a database which can only be accessed meaningfully by that application due to all the post-processing it does.
–
SirexOct 23 '12 at 20:42

13

@skynorth If you rely on the code to make sure your keys maintain integrity, then you are removing a fundamental principle of RDBMS from the DB. That makes no sense, because then every application that accesses the DB will have to make sure to precisely replicate that functionality. Why not just let the DB handle that, since that's what it's designed for. The DB can prevent duplicate keys natively, for example.
–
Buttle ButkusOct 23 '12 at 22:18

@skynorth: tl;dr: The rules that keep your data consistent should be implemented in the database. i.e. for 99% of applications ever written, the data (and therefore the database) lives looooooooooong after your application is dead and gone. I've seen this many, many times down the years (Hey, we need to deploy a version on Windows/iPhone/Android/whatever-the-new-thing-is, because {insert old platform here} is dying, we'll host or Oracle database here and create a new UI there). There's no reason for this trend to stop today, or any time soon.
–
Binary WorrierOct 24 '12 at 9:53

Use the tool most appropriate for the job. For data integrity, this is often the database. For advanced business rules, this is a rule-based system like JBoss Drools. For data visualisation, this would be a reporting framework. etc.

If you have any performance issues, you should then afterwards look whether any data can be cached, or whether an implementation in the database would be quicker. In general, the cost of buying extra servers or extra cloud power will be far lower than the added maintenance cost and the impact of extra bugs.

I think it would be poor design to not use the database for the things it is meant for. I have never seen any database where the rules were enforced outside the database that had good data. And I have looked at hundreds of databases.

So things that must be done in a database:

Auditing (application only auditing will not track all changes to the
database and thus is worthless).

Data ingerity constrainsts including default values, foreign key
constraints and rules which must always be applied to all data. All
data is not always changed or inserted through an application, there
are one-time data fixes especially of large data sets that are no
practical to do one record at a time (please update these 100,000
records that got mismarked as status 1 when they should be 2 due to
an application code bug or please update all records from client A to
client B because company B bought company A) and data imports and
other applications which might touch the same database.

JOINS and where clause filtering (to reduce the number of records
sent across the network)

One of the things people don't seem to realize is that doing all of your processing on the SQL server is not necessarily good, regardless of the effects on code quality.

For instance, if you need to grab some data and then compute something from the data and then store that data in the database. There are two choices:

Grab the data into your application, compute within your application, and then send the data back to the database

Craft a stored procedure or similar to grab the data, compute across it, and then store it all from a single call to SQL server.

You may think that the second solution is always the fastest, but this is definitely not true. I'm ignoring even if SQL is a bad fit for the problem(ie regex and string manipulation). Let's pretend you have SQL CLR or something similar to have a powerful language in the database even. If it takes 1 second to make a round trip and get the data and 1 second to store it, and then 10 seconds to do the computation across it. You're doing it wrong if you're doing it all in the database.

Sure, you shave off 2 seconds. However, had you rather waste 100% of (at least) one CPU core on your database server for 10 seconds, or had you rather waste that time on your web server?

Web servers are easy to scale up, databases on the other hand are extremely expensive, especially SQL databases. Most of the time, web servers are "stateless" as well and can be added and removed at whim with no additional configuration to anything but the load balancer.

So, think not just about shaving 2 seconds off of an operation, but also think about scalability. Why waste an expensive resource like database server resources when you can use the much cheaper web server resources with a relatively small performance impact

you're also forgetting network trips - you cannot scale horizontally by adding servers without some efficiency hit. So reducing the data load by adding a where clause is obvious - but the other sql operations will not necessarily reduce performance. Your point is correct in general though, but not to the point where you treat the DB as a dumb datastore. The most scalable app I've ever worked on used stored procedures for every data call (except 2 complex queries). A 3rd solution is the best - "stored proc to grab just the necessary data", no sure if you meant that as 'compute' or not.
–
gbjbaanbNov 11 '12 at 14:54

Never putting set operations in your code base would mean absolutely everything done in LINQ to collections (select, sum, where, single) should be done in SQL and not in your app, this would put a LOT of business logic into your database.
–
Jimmy HoffaOct 23 '12 at 16:10

4

The things that you describe, are not a client code. It is a Business layer, where you may have your own manipulation logic. However performing this logic on 1M+ records is going to hit you back.
–
YusubovOct 23 '12 at 16:19

@JimmyHoffa: That's not true, sometimes you generate transient information that need to be processed with the data you already have on app memory. Linq works wonders on that.
–
Fabricio AraujoOct 24 '12 at 16:57

@FabricioAraujo I'm aware of why linq is great, but this answer states to Never do set based operations in app code, if you Never did set operations in app code you would never use linq because that's the entire purpose of linq. I'm making the point that Never doing set operations in app code is a bad rule to follow
–
Jimmy HoffaOct 24 '12 at 16:59

@JimmyHoffa: No, the rule says "never do in app what the RDBMS can do well for you" . And I'm talking about transient information - not information persisted on database. I worked on systems where, to fullfill business rules, I needed to do processing on code. I remember a Business rule that I had, after do heavy processing on DB, do additional processing on that data to generate an (very important) report. I which I could use linq on that (it was done on the now-defunct Delphi.Net). In other words, linq can be used even following that rule.
–
Fabricio AraujoOct 24 '12 at 17:14

As mentioned before, the goal is to send to and receive as little as possible from the database because the round trips are very costly time-wise. Sending SQL statments over and over again is a waste of time especially in more complex queries.

Using stored procedures in the database allows developers to interact with the database like an API, without worrying about the complex schema on the back. It also reduce the data sent to the server since only the name and a few parameters are sent. In this scenario, most of the bussines logic can still be in the code but not in the form of SQL. The code would essentially prepare what is to be sent or requested from the database.

The idiom is more to do with keeping the business rules, to do with the data, together with the relations (the data and structure and relationships.) It's not a one-stop-shop for every problem but it helps to avoid things like manually maintained record counters, manually maintained relationship integrity etc, if these things are available at the database level. So if someone else comes along and extends the programs or writes another program that interacts with the database, they won't have to figure out how to maintain database integrity from previous code. The case of a manually maintained record counter is particularly pertinent when someone else wants to author a new program to interact with the same database. Even if the newly created program has exactly the right code for the counter, the original program and the new one running at approximately the same time are likely to corrupt it. There's even code out there that retrieves records and checks conditions before writing a new or updated record (in code or as separate queries), when if possible this can often be achieved right in the insert or update statement. Data corruption can again result. The database engine guarantees atomicity; an update or insert query with conditions is guaranteed to affect only the records meeting the conditions and no external query can change the data half way through our update. There's many other circumstances where code is used when the database engine would better serve. It's all about data integrity and not about performance.

So it's actually a good design idiom or rule of thumb. No amount of performance is going to help in a system with corrupt data.

As a rule, your DB has more information to work with than your application, and can do common data operations more efficiently. Your database maintains indices, for example, while your application would have to index the search results on the fly. So all else being equal, your overall workload can be decreased by pushing the work to the database rather than the application.

But as your product scales, it typically becomes easier to scale your app than to scale your db. In large installations, is not uncommon to see application servers outnumber database servers by a factor of 10 to 1 or more. Adding more application servers is often a simple matter of cloning an existing server onto new hardware. Adding new database servers, on the other hand, is dramatically more difficult in most cases.

So at this point, the mantra becomes protect the database. It turns out that by caching the database results in memcached or by queueing updates in a application-side log, or by fetching the data once and calculating your statistics in your app, you can dramatically reduce your database workload, saving you from having to resort to an even more complicated and fragile DB cluster configuration.

@user1598390 Indeed: Hardware is Cheap, Programmers are Expensive. Money can solve software complexity. Money spent on programmers. But note that we're not talking about clean code versus speghetti. We're talking about performing work on the app side versus the DB side. Software complexity is only marginally related, since both options can follow good design principles. A better question is: "which design costs more?".
–
tylerlOct 23 '12 at 18:41

Once you have a code base that is humongous and full of fat, most of it doing non-business stuff, the only thing you can do is the mother of all re-engineerings, which costs more than hardware and involves too much uncertainty, besides you will always know where to find good hardware, but good programmers is a different story... meanwhile your competitors are using their time to improve, adapt to change and make clients happy.
–
user61852Oct 23 '12 at 18:51

+1 for being the only person to mention scaling in your answer.
–
MattOct 23 '12 at 20:16

Hardware was cheap, not any longer - in the datacenter, electricity and hardware amounts to 88% of the running cost (cited by Microsoft) so spending more on programmers to write efficient code is very cost effective, and will be until we get unlimited and cheap fusion power.
–
gbjbaanbNov 11 '12 at 14:48

"Premature optimization is the root of all evil (most of it, anyway) in computer programming" - Donald Knuth

The database is exactly that; the data layer of your application. Its job is to provide your application with the data asked for, and store the data given to it. Your application is the place to put code that actually works with the data; displaying it, validating it, etc.

While the sentiment in the title line is admirable, and accurate to a point (the nitty-gritty of filtering, projecting, grouping etc should in the overwhelming number of cases be left to the DB), a definition of "well" might be in order. The tasks that SQL Server can execute with a high level of performance are many, but the tasks that you can demonstrate that SQL Server does correctly in an isolated, repeatable manner are very few. SQL Management Studio is a great database IDE (especially given the other options I've worked with like TOAD), but it has its limitations, first among them being that pretty much anything you use it to do (or any procedural code you execute in the DB underneath) is by definition a "side effect" (altering state lying outside the domain of your process's memory space). In addition, procedural code within SQL Server is only just now, with the latest IDEs and tools, able to be measured the way managed code can using coverage metrics and path analysis (so you can demonstrate that this particular if statement is encountered by tests X, Y, and Z, and test X is designed to make the condition true and execute that half while Y and Z execute the "else". That, in turn, assumes you have a test that can set the database up with a particular starting state, execute the database procedural code through some action, and assert the expected results.

All of this is much more difficult and involved than the solution provided by most data access layers; assume the data layer (and, for that matter, the DAL) know how to do their job when given the correct input, and then test that your code provides correct input. By keeping procedural code like SPs and triggers out of the DB and instead doing those types of things in application code, said application code is much easier to exercise.

Wait, wait, what? How did you get from correctness proofs to tests, which can prove that bugs exist but can never prove that code is correct?
–
Mason WheelerOct 23 '12 at 17:30

1

a stored procedure is not procedural code. A SP is a pre-computed SQL query stored and run inside the DB. It is not application code.
–
gbjbaanbNov 11 '12 at 14:56

If the SP is limited to a SQL query, then you're right. If it's T-SQL or PL/SQL including conditional breaks, loops, cursors and/or other non-query logic, you're wrong. And a LOT of SPs, functions and triggers in DBs all over cyberspace have these extra elements.
–
KeithSNov 12 '12 at 2:09

Generally I agree that the code should control the business logic and the DB should be a logic free hash. But here are some counter points:

Primary, foreign key, and required (not null) constraints could be enforced by code. Constraints are business logic. Should they be left out of the database since they duplicate what code can do?

Do other parties outside of your control touch the database? If so having constraints enforced close to the data is nice. Access could be restricted to a web-service which implements logic, but this assumes you were there "first" and have the power to enforce the use of the service on the other parties.

Does your ORM perform a separate insert/update for each object? If yes, then you will have severe performance problems when batch processing large data sets. Set operations is the way to go. An ORM will have trouble accurately modeling all the possible joined sets which you could perform operations on.

Do you consider a "layer" to be a physical split by servers, or a logical split? Running logic on any server could theoretically still fall under it's logical layer. You might organize the split by compiling into different DLL's rather than splitting servers exclusively. This can dramatically increase response time (but sacrificing througput) while maintaining separation of concerns. A split DLL could later be moved to other servers without a new build to increase throughput (at the cost of response time).

I haven't downvoted, but any database sepcialist will tell you that considering the database a logic free hash is a very poor idea. It causes data integrity problems or performance problems or both.
–
HLGEMOct 23 '12 at 17:12

1

@HLGEM. The answer describes reasons to keep logic in the database or sitting on the DB server. Still doesn't explain it.
–
mike30Oct 23 '12 at 17:45

They may not have gotten to the counterpoints as I did which is why I didn't downvote.
–
HLGEMOct 23 '12 at 17:50

I like to look at it as SQL should only deal with the data itself. The business rules that decide what the query may look like can happen in code. The regex or validation of the informaiton should be done in code. SQL should be left to just join your table, query your data, insert clean data, etc.

What gets passed into SQL should be clean data and SQL should not really need to know anything more than it needs to store it, update it, delete it or retrieve something. I have seen way too many developers want to throw their business logic and coding in SQL because they think of the data as their business. Decouple your logic from your data and you will find your code gets cleaner and easier to manage.

Why would you run a regex or validation on data that's already in the database? Constraints should prevent bad data from ever getting there, and the use of regex probably means you need more useful columns..
–
Brendan LongOct 23 '12 at 23:13

I was not saying that I would use regex or validation on data that was coming from the database. I guess I should have clarified that was for data going to the database. My point there was that the data should be cleaned and validated before it gets to the DAL.
–
Stanley Glass JrOct 24 '12 at 12:24

Never do in code what you can get the SQL server to do well for you (emphasis is mine)

The key to the answer is you need to look for SQL doing something well, as opposed to simply doing something, for you. SQL is an amazingly powerful language. Coupled with built-in functions, it can potentially do a lot of things. However, the fact that you can do something in SQL should not be an excuse for actually doing it in SQL.

My specific criteria to make a decision is to look at the amount of data that you get back and the number of round-trips: if you can cut the amount of data by shipping a task to the server, without increasing the number of round-trips, then the task belongs on the server; if the amount of data remains the same or increases without a simultaneous drop in the number of round-trips, the task belongs in your code.

Consider these examples:

You store a birth date, and you need to calculate the age for a group of users. You can have SQL server do the subtraction, or you can do it in your code. The number of round-trips stays the same, and the amount of data sent back to you goes up. Therefore, a code-based solution wins

You store a birth date, and you need to find users of ages between 20 and 30. You can load all users back on the client, do the subtraction to find the age, and then do the filtering, but shipping the logic to SQL Server would reduce the amount of data without requiring a additional round-trips; therefore, SQL-based solution wins.

When I worked somewhere the business logic became amorphous with the SQL, we had no trouble with multiple round trips; we just used multiple result sets in a single round trip, so that rule kinda breaks down there, though the spirit of the rule is pretty good in aiming for the golden mean
–
Jimmy HoffaOct 23 '12 at 16:18

1

+1 this is a fantastic answer because it gives concrete examples to support both directions.
–
BrandonJun 28 '13 at 13:37

I would rephrase that to "Never do in code what SQL Server can do for you well".

Things like string manipulation, regex work and such I would not do in SQL Server (barring SQL CLR).

The above tends to talk about things like - joins, set operations and queries. The intention behind it is to delegate much of the heavy lifting to SQL Server (at things it is good at) and reduce the amount of IO as much as possible (so let SQL do the joins and filter down with a WHERE clause, returning a much smaller data set than otherwise).

If everything SQL would do better than app code were put into the SQL layer, there's a lot of business logic which would end up in the database, for better or worse. I've seen this and yes, performance was stellar. But luckily the dev team all knew app development and SQL extremely well because the border between the two became very amorphous. I would not suggest this as a starting point but rather an ending point after the system becomes enormously popular and performance degrades over time.
–
Jimmy HoffaOct 23 '12 at 15:52

@JimmyHoffa: certainly. Dealing with resultsets more than the enough to be visualized certainly fit better on the SQL Engine rather than application code. It's an architectural choice decide what fits where - and sometimes is not easy to detect.
–
Fabricio AraujoOct 23 '12 at 16:40

19

@NathanLong I don't know why so many people still think you can't keep your SQL in source control. At first we just had all our stored procedures/table scripts/etc necessary to create the database from scratch in source control, then later used visual studio database projects. It worked fine without the projects and better with them. SQL as with every other changeable thing necessary for creating your system should be under version control! Deployment can be done with redgate diff tools for most RDBMS if you keep your create scripts under version control, don't maintain diff scripts use tools
–
Jimmy HoffaOct 23 '12 at 17:28

2

If your SQL has support for REGEX operations and string manipulation, doing them in SQL can be a good choice.
–
kevin clineOct 23 '12 at 18:29