What are the advantages and disadvantages of externalising SQL queries in Java (or similar general-purpose object-oriented programming language. Externalizing means storing all SQL in a separate non-java file or files. Several good examples are here.

Perhaps send this question over to programmers stack exchange? This question applies equally to C# and Python and a host of other programming languages, though the availability of a good ORM library is a factor.
–
LeonidSep 5 '12 at 16:06

1

The quality of this question needs to be improved dramatically no matter on what site it ends up being answered. See How to Ask for guidance.
–
codesparkleSep 5 '12 at 17:50

5 Answers
5

Pro:

Isolates changes to your code if you ever have to change databases, or could provide a mechanism for supporting multiple SQL dialects, optimizations, or databases with the same Java code as @Morons said

SQL-only changes can be made after deployment.

Separation of programmer duties and DBA duties as @JVXR said.

A really talented and hardworking DBA should be able to write SQL and manually load database fields into Java objects with better performance (faster run-time) than an ORM tool. The same was said of compilers vs. assembly language for years, and you don't see us writing much assembly any more. Where it matters, this is a pro, but I would say it does not often matter.

Con:

How do you meaningfully manage your database sessions if you've hidden the way your code interacts with the database in a separate non-Java file? Seems it would make this more difficult.

The point of ORM (Object-Relational-Mapping) tools like Hibernate is to let you work with Java objects and query abstractions and let the ORM tool do the queries for you. Querying a table and loading each column into the proper field of your Java object is a lot of programming work. Maintaining that code is lots of work too. ORM tools do nice things like provide proxy objects that eliminate unnecessary queries, while letting you work with database data as type-safe Java objects - without you writing any of the code to do that. If you then start writing lots of manual SQL queries, you are defeating the purpose of ORM. Some SQL almost always ends up being written and that could go in an external (non-Java) file. But usually the less, the better. It's a question of what you want to spend your time on. If you use hardly any SQL, it doesn't matter so much where you put it.

In most applications, the "real work" is carried out in the database queries and updates and the code closest to those queries and updates. Hiding those queries in another file seems to obscure it - encouraging the programmer to be ignorant of the database. That could be good division of labor, or even a useful abstraction (such as what ORM tools provide) or it could be an unnecessary hurdle to understanding. Good naming of the SQL routines could mitigate this issue somewhat.

Increased separation of Java and Database code could be a barrier to refactoring. Instead of searching for the field, you need to find the queries that use it, then find where each query is used. An extra step.

Thoughts

I've been moving in the opposite direction - I generate my ORM Java objects from the database after every database change and I have it also generate SQL and HQL (Hibernate Query Language) tokens for each column and each table. In every SQL query or HQL symbolic query, I append the token instead of the field name. This way I can search for where each field is used in the application, either as a Java object field, in a HQL query, or in a SQL query.

Here is a typesafe Hibernate-style Criteria query example, querying un-deleted records from the "thing" table from a single company that have an owner:

In-Java SQL Example

If I had to drop down to SQL, e.g. to query thing IDs, it would look like the following. Please note that since symbol is a final String, the concatenation is done by the compiler, not at runtime, even in Java, even with the + operator:

I'm not showing or think much about how the external SQL file query gets made available to Java. Presumably a tool like WEB4J would take care of that, but you need to specify the SQL file or files somehow and your tool makes those queries available, I'm imagining as a String in this example, but it could be a prepared statement or some other abstraction.

Compare this to the Pure-Java ORM Example or the Criteria Query Example - this is a ton of code, spread across four files! If you need the performance and the control, then invest the energy. But when ORM can save me from this, I think I'll let it.

Debugging

I assume here that you have a tool that reliably generates your Java database objects for you.

With the Pure-Java example, there is no debugging of SQL or queries unless performance is an issue. You could still perform the wrong tests (e.g. forget to exclude ownerless things), but it's pretty straight-forward - keeps you focused on what you are doing, thus minimizing this possiblity. You can ask your ORM to spit out the actual queries in your log file if you need to tune them or see that you query the thing table 100 separate times or something like that, but this code results in a single query.

It's possible to pass the wrong type of object or nulls to the Criteria query. Also, since it's wordy, it takes longer to see what you are actually doing and bugs are easier to creep into your criteria.

With In-Java SQL, you have all the issues above, plus getting the SQL syntax right. I admit my symbols make this a little more complicated, but I feel the refactoring plus is bigger than the readability minus.

With the External SQL example, you have all of the error possibilities mentioned above, plus all the extra code from loading the results into the object. Plus the multiple files that can get out of synch. What if you rename, delete, or add a field - will you get a compilation error? Will you find all the places you need to change? Or do you have to find it in testing.

I'd have to say that from a debugging point of view, ORM with pure Java is a clear win, with additional debugging every step you move away from there.

Conclusion

Using ORM and symbols has saved me countless hours of refactoring difficulties and lets me make database changes with ease because when something breaks, it won't compile. When code gets big, refactoring is the only way to manage complexity, and it gets harder as it gets bigger, so above a certain size of code base, ease of refactoring becomes a primary goal.

ORM also eliminates opportunities for bugs and the performance cost for this is usually minimal. You can still drop-down to SQL as necessary, but I think having less SQL in your application is a better goal than putting SQL in separate files.

I totally agree to your point. but considering the aspect that, in the future we decide to change the database. We could do it by just changing the queries in the external file.
–
Shamis ShukoorSep 6 '12 at 13:58

True, if making that change had no side effects.
–
GlenPetersonSep 6 '12 at 15:08

will there be considerable amount of lag (on an average) due to externalising the sqls (the I/O) ?
–
Shamis ShukoorSep 10 '12 at 14:33

I see no reason for significant lag. I was concerned more with the possibility of introducing a bug when the program expected results formatted one way and the changed query now suddenly sometimes returns null when it never did before, or false instead of null or b'0' instead of false or has other changes. All my concerns can be summed up by saying that the program and SQL are generally designed and developed together and that separating them discourages the programmer/DBA from thinking about the side effects a change in one arena might cause in the other.
–
GlenPetersonSep 12 '12 at 3:42

Those are cons for work separation per layer. Not necessarily cons of externalizing sql. A developer could put the sql in files or storedProcs without passing the sql work to someone else. The ability to facilitate work separation is a pro, but the negatives of work separation is not a con for external sql.
–
mike30Sep 5 '12 at 20:40

"the application will beat all other ORM frameworks hand down in performance" [citation needed]
–
TMNSep 6 '12 at 12:08

Having queries as stored procedures in the database is preferable because:

1> They are pre-compiled, hence will probably run faster.
2> You can update them independent of the code base. It is usually easier to run a sql script than it is to reinstall an application, particularly in a managed corporate style environment.
3> Easier to debug/test/performance tune - you can run the query easily free of the application.
4> Allows the database structure to be changed without needing to update the application.
5> Allows implementation on different database engines with one code base for the application, with the negative of multiple code bases for each DB. The syntax used for different db's tends to be different enough that some tweaking will be needed unless you use the simplest of SQL (particularly with stored procedures).

The main con I can see is that the sql is not hidden in the compiled code, so is easier for an external party to read and reverse engineer the application which may or may not be important for a commercial application.

the problem with stored procedures is that we have to maintain that in multiple environments, this makes it difficult to keep its integrity. Also the fact that multiple versions would be accessing and implementing different procedures for different versions would be a lot of trouble
–
Shamis ShukoorSep 6 '12 at 14:02