I recently started learning about ORM through Terry's book Information Modeling and Relational Databases (2e). It's a great resource and really satisfies my craving for sound logical structure in database modeling. On to the question:

In a small start-up business I work with, we are creating an online presence which by necessity needs, in my opinion, a very solid relational database. It tracks users, contributions of users, conversations between them, data facts, sources referenced by those facts, etc. I feel that ORM is a very capable vehicle for this. At the moment I'm leaning toward implementing it logically with PostgreSQL

The owner of the company, who funds all the development, is always looking for the next big thing on the internet to use in his plan for the company. Basically, if Facebook or Google does it, then so should we. He recently came across an article promoting the use of noSQL design. I'm not 100% familiar with it, but I do know that many basic tenants of relational design are not present (such as assuring concurrency, ACID compatibility, etc). He hasn't decided to go in that direction yet, but I know he'll want to talk about it, especially since several of the main social networking and microblogging sites have migrated to noSQL type architectures (like Cassandra for instance). I'm hoping someone here can give me some ammunition to help keep this in the "relational" modeling category.

As a side note, I do understand that if the site and user base grow to such a size as we hope, that it necessitates a conversion to another architecture. So my question is two-fold:

What are the advantages for a small to medium sized web company managing data online for several end users to design and maintain their databases in ORM and relational-type architecture?

If we do have to migrate later, has there been any successful attempts to adapt ORM to some of these more disconnected cluster database systems? If so, when would it be wise to switch over? Do the gains in speed and scalability really compensate for the loss of non-redundancy and possible data contamination?

I apologize for the long winded post, and appreciate any viewpoints or professional opinions on the matter. Thank you.

Sorry it has taken so long to get back to you. I have only just seen your post! (Grovel)No SQL is new to me, but having just looked at Wikipedia, I see that it is some kind of movement against the relational model.

The relational model is based on sound mathematical principles. (as is ORM)So I infer that the No SQL folks do not like following mathematical principles.

Deductive logic only works when there is a well structured set of well defined axioms and propositions.So I infer that the No SQL folks don't believe in well defined axioms & propositions.

So my initial conclusion is that No SQL seems to be more like a religeous movement than a serious attempt to solve problems of complexity management.

Regarding your two questions:1: I'm convinced that there is a good market opportunity however, the "advantages" can only be measured in terms of your business model and your marketing model.

2: Sorry, I don't understand this question at all:

ORM is widely used for data modeling and database management. ORM can be used to manage various kinds of physical implementation (i.e. clusters).But migrate later from what to what?ORM is for conceptual modeling. Scalability is a physical matter.

Non-redundancy: If you have redundant data then queries will not give the "right answer" Not sure what yo mean by "data contamination"

No apology necessary. Thank you for answering my questions. You may feel like you didn't, but I do. "Complexity Management" is a good way to phrase the logic and the rules necessary to encapsulate a business model. In my first question, I was essentially asking for confirmation that ORM is valuable to a domain where logic and relationships dictate data flow. I now know that I wasn't crazy in choosing this method.

My second question is aimed at being prepared for the future. Say, for instance, that I use the ORM method and then implement it in, say, MySQL. That works fine for a small website, even some medium sites. But say we arrived at the exhaustion point of the MySQL database server, and wanted to move all of our data into a format that could be easily scalable across X number of servers, etc. At least two paths lie ahead: go for a larger corporate RDB like Oracle, or go to a non-RDB (noSQL) solution. These claim to be much much faster, scalable, easily managed, etc etc etc. The list goes on. Plus many big name sites have moved there infrastructure from MySQL to noSQL. This is what I meant by migration of my MySQL data to a noSQL DB system (to support the greater load of users).

I was just wondering if someone in a similar position, who wanted to go from a RDB to a non-RDB system, had tried to reconcile all the rules and logic they had set up with ORM into their new DB server. Because if people are going from MySQL to noSQL, it's possible, right? [I'm being sarcastic here.] I know there will be a loss in functionality. But the question is how bad is the loss? Can things be moved around, adapted at the logical level to still keep the majority of the conceptual model intact?

I then ask "when would it be wise to switch over?", which I realize isn't a question that can be answered easily. That takes more research on my part. So, don't worry about that one.

Redundancy is possible in these noSQL implementations (as far as I know), which can be a problem. "Data contamination" is my unsuccessful attempt to describe what happens when a user inputs data that doesn't conform to the static rules of the domain. I imagine that there are some verification procedures and various constraints you could try, but it wouldn't be the same level of control. I was just looking for a pro/con list if anyone happened to have one.

For a start, if you envisage migrating to a "big" RDBMS then I suggest that you start with a big RDBMS and save yourself the hassle of hitting the limits of a "small" RDBMS.

I use SQL Server. You could use the free SQL Server Express to get most of the "big" RDBMS goodies at the beginning and know there is a clear migration path to its "paid for" big brother. However, the developer version of SQL Server 2008 costs very little.

All that stuff about scalability and "redundancy for performance" belongs in the physical layer. So I'm wondering if there is a mix up here between layers.You can fiddle around with the physical layer as long as you maintain a meaningful link to a validated conceptual layer.

A relational database is just a deductive logic machine. So if you use a database where deductive logic no longer works, then you cannot guarantee that the "answers" that you get from queries, updates and reads will be true. Any so-called "database" (or DBMS) that does not allow you to maintain a single version of the truth is worse than useless. You might as well not waste your time with such a database and just guess the answers. Guesses are cheap and your guesses are likely to be right as often as are the results you get from a non-deductive logic DB.

Terry discusses "controlled redundancy" and "safe redundancy" of pages 297 and 761-767 of the BBB.The basic message is that you must start with a well designed data model before looking at the physical layer and using "redundancy for performance"

Anyway, this is nothing new. When I was in IBM in the 1970's, we used to sell a second machine for reporting. Thus the customer had one machine optimised for transaction performance and another that was optimised for reporting performance. Plus ca change!.

If you have not read Ted Codd's paper (1970), I strongly recommend that you read it. Just google "a relational model of data for large data banks"

Finally, I can't resist quoting Josh Billings (1818-1895)

"It ain't what a man don't know that makes him a fool, but what he knows that ain't so."

Edit: I had typed out a long winded explanation of several things only to realize I hadn't really addressed your two questions.

1. Personally I have witnessed countless projects struggle, run over budget, and often fail entirely because they got the conceptual model wrong. This discovery is usually made late in the project lifecycle and it is tragic thing. ORM is an approach, that I believe goes to great lengths to allow precise logical modeling to be approached semantically in a way that domain experts are able to validate. This reason alone is an advantage for any size company to adopt ORM and the CSDP.

2. I think analysis of the data structures, the usage requirements, both in the near term and in the future has to drive the decisions that would be made. You most likely should not use the same data storage approaches of Amazon or Google in systems for Electronic Medical Records, and visa versa. This is a huge simplification of the topic, but hopefully a concise one.

There's a really useful and thoughtful article that describes the drivers behind NoSQL databases (and the reasons why they aren't usually the right answer) here. Contrary to Ken's dismissal, there are situations where a rigorous DBMS (usually SQL) is not only not required, but not even feasible.

However, for every organisation that has this problem, there are ten thousand more who wish to have it (that is, who wish to become large enough to have it), and so think they must apply a NoSQL solution in preparation for fame and fortune, even though it really isn't appropriate and is in fact more likely to ensure they never need it.

When you add to this desire to have a scalability problem, a hatred of SQL (arising mostly from the poor standard of tools for programming databases - well-justified in my view), and the amazingly inferior performance of the query optimiser in MySQL (the most widespread free SQL engine), you begin to get a feeling for why there is such a ground-swell towards NoSQL systems.

There is no ACID hardware in the world that can handle the transaction rates and data volumes of Facebook, Twitter, Google, etc. Read the article, and do some research, looking for the actual techniques these folk have applied. There are plenty of good articles about them without me having to write one. They aren't idiots, in fact there are some exceptionally competent database professionals amongst the implementors. You should not discount what they have done as being ill-educated.

ACID refers to a set of properties that are recommended for a DBMS.It has nothing to do with "hardware"REF: http://en.wikipedia.org/wiki/ACID

Clifford Heath:

transaction rates and data volumes of Facebook, Twitter, Google

To the best of my knowledge, these applications are not "transaction" systems in the traditional database sense.e.g. I bet Walmart would have a hard job to run its logistics systems with the technology used by the applications you mention.REF: http://en.wikipedia.org/wiki/Database_transaction

Clifford Heath:

Read the article

I read the article - and all the comments.Some of the contributors seem to be failing to differentiate between matters that are in the conceptual and theoretical domains and matters that are in the domain of physical implementation.The relational model is separate from SQL which in turn is separate from a DBMS implementation and from the hardware on which the DBMS runs.REF: http://c2.com/cgi/wiki?RelationalModel and http://en.wikipedia.org/wiki/Relational_model

Picky, picky. There is no hardware that can support ACID behaviour in these scenarios.

Ken Evans:

Clifford Heath:

transaction rates and data volumes of Facebook, Twitter, Google

To the best of my knowledge, these applications are not "transaction" systems

You're right, and you're wrong. They relax ACID requirements slightly (so are not "transactional"), yet still produce the desired behaviour with the desired reliability. They preserve the vast majority (5 9's at least) of all transactions, otherwise folk would shriek about lost updates. However it's likely their businesses would not fail as a result of a failure rate even significantly higher than they actually have.

Even Walmart could afford to not get paid for 0.001% of its transactions, so yes, even Walmart could operate with the loss of ACID semantics. Their transaction volume is however many orders of magnitude lower than the online services, and are within the reach of ACID DBMSs. I wouldn't recommend NoSQL to Walmart anyhow, it takes a lot of expertise to get it right, whereas the ACID guarantees of traditional DBMS are much easier to use.

Ken Evans:

Clifford Heath:

Read the article

I read the article - and all the comments.Some of the contributors seem to be failing to differentiate between matters that are in the conceptual and theoretical domains and matters that are in the domain of physical implementation.

Yes, there's a good deal of ignorance and blurring of boundaries, including beliefs such as I described in my first post in this thread.

However, although Twitter, Facebook et al could theoretically implement greater reliability using an ACID-compliant DBMS running on some hypothetical computer that's orders of magnitude larger and faster than any available, that's a pointless thing to note. They have the problem they have, and they solve it the way they do. Because they're successful, others want to emulate them, so apply the same techniques inappropriately.

I might point out that for applications that contain terabytes of data, which need to be able to implement schema changes in seconds or minutes not hours or days, a departure from the relational model is often justified. Many non-1NF objects may be stored having different internal schema versions, and as long as the access code can read all live format versions, there's no need to rewrite all objects (and blow the time limit) when a new version is deployed. Such applications have needs which fall halfway between full relational/transaction support and a basic file system, and the NoSQL DBMS are suitable candidates. ACID semantics can still be implemented, but need help from the application code. It's a lot of work, yet a lot of modern web applications fall into this class, hence the enthusiasm for NoSQL. It's not necessarily evil, despite ignorant people doing it inappropriately.