Wednesday, February 20, 2008

Application Developers vs Database Developers

It started innocently enough with this article. I sent it out to about 20 colleagues.

The best line from the article:"Jerry: "Yeah, databases cause lots of headaches. They crash all the time, corrupt data, etc. Using text files is better."

One of my more recently arrived colleagues (I'll call him Mr. M) replied to everyone with this statement:

"Kind of funny actually, databases are less and less important at the large investment banks, where they basically load everything up into a data grid across a several hundred node cluster. Writing to the db is way too slow."

This started a day long exchange of emails. What follows is the entire thread (up until my last post tonight).

Me:"I would just argue that they don’t necessarily know how to write to databases. I would however love to see benchmarking done on both methods. Would be an interesting test..."

Mr. M:"Well, my understanding is they just can’t scale out the db enough. Even something like Oracle RAC won’t work. And outside of the military, these are probably the top 1% of programmers in the world building this stuff."

Me:"A benchmark would be the only way I would believe it.

If you said the top 1% of database developers tried it and failed, I would be more likely to agree.

Do you need a benchmark before you would believe in-memory retrieval is faster than disk retrieval? Essentially, this is what we’re talking about.

'If you said the top 1% of database developers tried it and failed, I would be more likely to agree. My experience is that application developers != database developers. Different type of thinking involved.'

Why? It’s an issue to do with application performance not simply database performance. Database concerns are a subset of application concerns, essentially a specialization, requiring less encompassing knowledge. ;)

From the article you linked to (http://www.watersonline.com/public/showPage.html?page=432587)

"Better data management is the answer, says Lewis Foti, manager of high-performance computing and grid at The Royal Bank of Scotland (RBS) global banking and markets. "For very large compute arrays, the key issue is data starvation and saturation. This problem requires data grids with high bandwidth and scalable, parallel access,...Banks are learning that data management in a distributed grid environment is very different from online transaction processing. "With so many data sources, distribution channels, demands for aggregation and analytics, surges in data volumes and complex dynamics between the flows, we need to manage 'data in motion' and give up the notion that data is somehow stored. It's dynamic, not static," says Michael Di Stefano, vice president and architect for financial services at GemStone Systems...There is even some debate over how small a unit of work can be put on today's grids. Di Stefano at GemStone, for example, says, "One client has gone from 200 trades per second in a program trading application to more than 6,000 trades per second. This shows what the technology can do."

Me:"Good points. If it is in-memory it would be faster. I have not had the pleasure to work on such a system.

I do disagree with the database concerns being a subset of application concerns. The data drives the app. We’re probably getting religious at this point (or am I)."

Mr. M:"‘The data drives the app.”

Exactly, but who’s to say where the data comes from or in what format? My application data may reside completely in xml files, or maybe I get it from some third party web services a la the en vogue “mashup.” Heck, I may not even need to worry about a database anymore…. http://www.amazon.com/gp/browse.html?node=16427261 The database is only one particular concern of the overall application. And it’s the application that matters. Data is useless if it just sits on a disk somewhere. It’s the ways in which the application lets the users view and manipulate the data that adds value to the business.

Yep, definitely a different type of thinking between application developers and database developers."

Exactly. Businesses are not static, nor are the markets they compete in. Changing applications are a function of changing business processes and changing markets.

'data stays the same.'

Nonsense. Otherwise UPDATE would not be an SQL reserved word. If you mean database technology stays the same, well, I’m more inclined to agree with that.

'Think Green Screens, EJBs, Ruby...what’s next?'

Whatever comes along to let the business more effectively respond to current market realities. Application platforms have evolved much faster than database platforms have. They’ve had to, their sphere of operation is much broader than that of databases, this is only natural, they deal with much broader concerns than do databases. Databases in the internet era function in essentially the same role they did in the era of dumb terminals. Clearly application platforms have evolved orders of magnitude more. Hence the statement, database concerns are a subset of application concerns.

Here’s a simple test….if I take some business application and I’m forced to throw away one or the other, either the database or the appl- wait a second, it doesn’t even make sense to finish it, does it? The business can live without the database. I could do all kinds of things with the data, I could stick it anywhere. The business can’t live without the application though. Another way to look at is, what do the business users look at, test, approve, and use? The database? Of course not, they look at the application. They could care less whether the data sits on disk in an RDBMS, xml, or flat files."

Me:"We obviously violently disagree.

Without the database (and I use database and data interchangebly), the business could no longer function. The app is meaningless. How would you contact your customer? You couldn’t find it.

'Exactly. Businesses are not static, nor are the markets they compete in. Changing applications are a function of changing business processes and changing markets.'

Poorly designed applications…that is all."

A Feisty Colleague:"Using data and database interchangeably is incorrect. A database is a mechanism for data storage. XML data sets and flat files are mechanisms for data storage, too. So is a file cabinet, because, the data doesn’t have to be electronic, it could be … gasp! … on paper, and the application to use that data would be hands for holding the paper and a pencil to update and add data to the page."

Me:"No it isn’t. I take into account xml files, flat files, web services (but not paper, unless it’s scanned) and all that. It would be consumed by the database and then accessed by the application via SQL.

Mr. M:"Implicit acknowledgment that disk IO operations that come with traditional database access simply can’t match the performance of in-memory data access (a point which you previously were unconvinced of but now seem perfectly accepting of the idea once you see it’s got Oracle’s imprimatur on it).

Of course, why any application developer would want to program against an SQL interface if they weren’t forced to is beyond me. It is orthogonal to the programming model of most application platform languages.

Surely Oracle recognize this fact too or they wouldn’t be buying Tangosol and other data grid technologies. Of course, most of those products are far more technically advanced than TimesTen or anything Oracle has in that space.

Incidentally, it’s illustrative to note that Coherence and other products like it were for the most part designed and built by application programmers. The development of all these products is pretty much driven by the needs of the large investment banks on Wall Street. These trading applications simply had too many concurrent transactions to use an RDBMS (a problem quite a number of public domains now share, most famously google.com, nope, no RDBMS there, yet miraculously there is still data). The database just simply would not scale to such a degree. So the application developers, by necessity, came up with an alternate solution that did work, a fully transactional cache of data replicated across a cluster with node numbers in the thousands, and no relational model whatsoever to speak of. A perfect example of how database concerns are only one, sometimes small, concern amongst many that application developers must be aware of and ready to solve."

Me:"Like you said initially, the top 1%.

Many of us will never touch a system like this.

I will certainly concede that it is faster (still would love to see benchmarking though), but that still leaves 99% of the applications out there that do not require that kind of performance."

Me (again):"And don’t forget, I use data and database interchangeably. Applications are nothing without the data right?

As to the object/relational impedance mismatch...well, more people that don’t know how to work in sets. Looping is what they understand. I understand the application side more than you seem to give me credit for.

I’m not saying applications aren’t important, they are. Data (databases) and applications go hand in hand. If the application went away though, they could still access their data via SELECT statements (yes, via an application client tool), however painful that may be. Applications make retrieving data that much easier for our users.

If anyone wants to unsubscribe from this mailing list, just let us know. This is fun for me (I’m guessing Mr. M too)."

Needless to say it was a fun day. It didn't get [too] personal. More than anything I'm happy to have an equally passionate colleague.

9 comments:

Google does use MySQL for their Adwords program. I would say the way data is used is a natural evolution but the database will be around and be around for a long time. Data just keeps growing and growing and so does the need to store it. Databases themselves evolve. Look at the functionality added in Oracle 8i, 9i... Sql Server 2000 to 2005 now going on 2008. Look at the vertical / columnar databases. There is just a plethora of technology out there. I wouldn't say Oracle is debt by any means, just adapting.

Now if they can just fix their licensing and pricing model we'll be all set.

An interesting discussion. The conclusion that I came to when we originally built Coherence is that the database is never going away, so the data grid becomes even more a necessity in enabling effective scale-out in front of that master system of record.

BTW - the "Lewis Foti" quote that you referenced is very appropriate, as RBS standardized on Oracle Coherence ;-)