I'm now doing a lot of SQL development at my new job where as before I was doing Object Oriented desktop app stuff. I keep running across very large scripts (thousands of lines) and wanting to refactor in some way. I am seeing that SQL is a different sort of beast and it's probably fine to have these big scripts for the most part but while explaining this to me people are also insisting that the whole idea of refactoring is bad. That stuff like the .NET compiler are actually burdened by refactored code and that a big wall of code is more efficient and better design than code designed for reuse, readability and scalability.

The other argument is that OO compilers are almost dangerously inefficient and don't have efficient memory management or runs too many CPU instructions compared to older "simpler" compilers and compared to SQL.

Are these valid complaints? Even if some compiler like a C compiler is modestly more "efficient" (whatever that means on this high of a level without seeing code) would you want to write applications in C over C# or Java? Is comparing an OO compiler to a SQL compiler/optimizer even valid?

I could write a mammoth answer about about optimizations, compilers, language features, abstractions, and their relationships to one another. Part of that would be about how certain language features inhibit certain optimizations. Some of it would even sound vaguely like stuff these guys have been spouting. But from the other stuff you cite, these similarities would be mere coincidences, not hints at deep insight on their part. I think it's far more likely that these guys are simply horribly confused (but unaware of it, sadly) about everything related to this topic.
–
delnanNov 1 '12 at 14:30

It's unclear what kind of experience they have outside of SQL. I pretty sure they're never written OO stuff though.
–
BradNov 1 '12 at 14:41

1

Refactoring is a bad thing, but for a totally different reason.
–
SK-logicNov 1 '12 at 14:51

The notion that a compiler could be "burdened" by refactored code must mean that it would collapse if presented code that was written so it wouldn't need refactoring in the first place.
–
BlrflNov 1 '12 at 14:51

I'm not sure how to convince them that pretty much any modern compiler is pretty darn efficient or how to practice better coding. I'm very new to all this (this work place and programming) so I don't have a lot of clout.. At one point I even printed out the assembly generated in C# to demonstrate that short-circuit logic (&&/||) is more efficient than evaluating everything in a conditional (using & / |). I think that fell on deaf ears.
–
BradNov 1 '12 at 14:53

5 Answers
5

Refactoring is bad -- That's like getting a few scratches on the paint job of your car and insisting you must buy a new car

Big wall of code is better -- Hardly ever. The biggest thing about refactoring is usually it helps you remove both redundant code and operations(ie, making it faster)

OO compilers are not inefficient. Sure, you can write faster code in C in some cases, but a 1% performance impact isn't going to matter if you can't see that your Algorithm is O(n^2) instead of O(n)

SQL is not a "fast" language. Doing things in SQL other than what it is designed for is a bad idea.

Basically, you write C over C#/Java when you have to. Good cases are when dealing with low-level details(drivers, embedded), or when you really know that you need every bit of speed you can get from a piece of code. Comparing an OO compiler to a SQL compiler doesn't make sense at all. Sure, you can write complex (non-data oriented) algorithms in SQL, but it's going to be slow, have horrible code quality, and generally suck because SQL was NOT designed for this. It was designed to get your data from a set of tables and transfer it to a more competent language.

Why SQL isn't fast:

For one, it's not actually a fair comparison. SQL by itself isn't Turing Complete. I'm assuming you're talking about T-SQL or similar

It enforces heavy structures for simple data types. What do you do if your algorithm needs the equivalent of a hash-table or dictionary? Temporary tables are quite heavy, particularly for older versions of SQL servers

SQL is just designed in a way that certain optimizations can not be done because it's designed for fetching and updating data

Good comparison, SQL Server versus C# for the very simple operation of getting the length of a string: comparison. Also, this is not even a true comparison. This uses the SQL-CLR bridge, which of course has overhead, but you get the point. Using pure SQL is over 30 times slower than calling a C# function from SQL for this operation

I don't think you will see larger trade-offs with: reuse, readability and scalability than in a RDBMS. Refactoring databases is not the same as other code bases.

Reuse - rarely will you work with a programmingn language where you turn a block of code into a method and take a noticable performance hit even within a long running loop. This is the opposite in sql, each rdbms has its own quirks where reusing functions (put in a select statement and you have yourself a loop) and views can drastically slow down performance and must be used with care.

Readability - let's assume the sql is consistently formated or at least close enough. Seeing lengthy stored procs that created all kinds of temp tables and performed data updates and inserts use to bother me because I wasn't use to it. If you want to snatch a knot in a DBA's sphincter, ask to change a table name in production because its purpose is not clear.

Scalability - you will not see many places where Reuse and Readability (cosmetic reasons excluded) will hinder scalability/performance like you will in sql. Along with improved code you'll need indexing, file management, and possibly clustering, but fixing code that runs poorly is a good place to start.

This doesn't mean you can't gain any benefits from database refactoring. If you can learn how to do it in a way the improves performance, it's not such a hard sell.

Thousand line sql scripts, sounds like someone is using cursors (almost certainly bad). If they are not using cursors then refactoring should be approached very carefully as SQL is different and things that would be improvements in other languages can be real performance killers. DRY is much less of a concern.

As for big wall of code for .Net -- outside loop unrolling, and ignoring the occassional bit of unnecessary and excessive abstraction, that's just crazy. The possible performance improvement in having a single thousand line function versus 20 50 line function is entirely negible.

Do not compare anything to do with the best way to write SQL with what you learned in object-oriented programming. It is a different beast entirely.

In database work we are mainly concerned with the code accomplishing three things:

Preserving data integrity

Database performance

Security of the data

All code should be evaluated based on these three factors above everything else.

Database people are inherently conservative about refactoring becasue it can drastically affect data integrity or cause huge performance slowdowns on production machines if done incorrectly. There are ways to refactor successfully, but a system for it needs to be set up and you have to have buy in from the dbas. I can suggest a book to give you ammunition to help get a refactoring system in place:

Do you think having sql databases around as long as they have along with their design theory, helps with getting it right early on? I'm mainly refering to tables and relationship structures which are difficult to change later on.
–
JeffONov 1 '12 at 17:59

@JeffO I think having an experienced database designer attached to the project in the early stages is the best thing. All databases I have seen where the database specialists didn't come in until the inevitable performance porblems are showing on prod were badly designed. If you have a project dependent on data, the database designer should be the second person you hire after the initial application developer.
–
HLGEMNov 1 '12 at 18:42

There are a couple things going on here, so it's back to basics time. As a rule of thumb, when you are implementing you want to make things:

Complete

Correct

Optimized

Optimizing before making things complete or correct will have a negative impact on schedule and, likely, quality. If you are being told to do this without concrete technical reasons and examples, you are dealing with superstition and not science. As for the refactoring issue, refactoring is a tool of optimization - you refactor to optimize. Objecting to that just means the developers have a poor understanding of what refactoring is.

The steps of optimization are:

Profile

Analyze

Optimize

Repeat

Anyone maintaining a large database who is not operating by those rules either is not maintaining a large database in working order, or has inherited a good system that they haven't messed up yet.

It sounds like you have fallen into a culture that is a little superstitious about why they are successful and how the system works. Possibly they have inherited a lot of code and lost the architect? Or there is one 'genius' who keeps everything working through force of will? Anyway, you can't fight religion on a large scale. Pick small things and do what you know is best. Be rigorous, provide data, and be successful. That will win you converts over time. At least, as long as it works, they won't mess with your mojo. ;)