Highly Dynamic DB Entities, I have a worry.

I'm currently putting together a software package for my company and it's highly dynamic. We have about 5 or 6 entities that will have perhaps 4 attributes that we know about ahead of time. The rest will be created by the user. To make matters even more complex, not all entities of a specific type will have the exact same dynamic attributes. They'll *will* have a set of "expected" attributes that can be specified by the user, but we still must allow these expected attributes to not exist because data entry may happen over a weeks time.

The solution I've come up with is to create a values table that holds the dynamic values, along with the entity type and id.

I'm strongly leaning towards just making the entities 100% dynamic, even for the few fields we know will always be there. There are a few requirements that will be much simpler as a result of this approach. It will also allow us to keep a history w/o taking a snapshot of the entire entity every single time a data value changes.

My biggest worry is that we're basically implementing a DB engine on top of a DB engine (only for these specific entities, everything else works as per usual). We're not keeping a lot of "meta-data", but we are keeping some such as type information, which increases my misgivings even more.

So I really have 2 questions.

1. Does anyone have an approach that doesn't feel like it's devolving into a DB on top of a DB?

2. Has anyone taken such an approach before, and if so, what where your experiences with it? Would you do it again, given the same set of circumstances?

I really interested in hearing from people who have seen these types of system in use over a period of several years.

worried about the future
Tuesday, September 30, 2008

Deleting …Approving …

I should add, dynamically updating the DB itself isn't an option, unfortunately.

worried about the future
Tuesday, September 30, 2008

Deleting …Approving …

You are in for a world of pain...

These meta-driven databases always fail. Spend some time actually defining the requirements and building a database with a proper, normalized schema.

There are many reasons. Here are a few:1. This is usually done because the customer doesn't know what they want, so just make the system ultimately flexible. (warning)2. No constraints. So this leads t a raft of optimizer problems.3. No relationships between the entities. No referential integrity.4. Performance usually is sub-optimal.

jim
Tuesday, September 30, 2008

Deleting …Approving …

I agree with some of the other responses. This kind of design is almost always a good example of over design.

Does the client really need that requirement or do they just need the ability to add a few custom fields here and there? Often a developer will hear a requirement and their brain will explode with ideas, many of which are just total overkill.

Bummer, because that's how most CRM systems I've seen do it. Usually, they have a core table (ex: Contacts), and you can customize it by adding a new table to store your custom fields. You make one join to pull in your custom fields, and you're all set. Separating the core table from the customizations allow you to upgrade the core system with a minimum impact on your customizations.

I've given thought to the uber-meta-dynamic database to end all databases, but it sure seems like trouble to me.

Documents are stored as XML fragments and you query using XPath/XQuery.

I know Microsoft SQL Server, Oracle 10g and Oracle/Berkley DB support XML driven DB. There may be others.

Just a tought.

My CRM application has a switch that says is it design mode, if so it takes the XML meta data that defines the application and generates/updates the database schema (columns, foreign keys etc). This is far the best way of getting a high performance system.

This question gets asked here regularly and is often shot down as "impossible", "poor performance", etc, etc.

The fact is that it is possible to do this successfully. I know this, because I have done it.

My client had a terrible history with their IT department taking months to do even simple changes to previous applications. With this in mind, they insisted that for a new project they should be able to make changes to any screen, report, table, etc themselves, without having to wait months for their in-house developers to do it.

This was an interesting challenge, and I wasn't even sure it was possible, but I accepted the assignment, as it was interesting.

It turns out that it is possible and not only that, but works really well.

I created a framework upon which the business users built up their own applications from scratch. They started with an HR database, which was just a pilot, then went on to create a complete multi-line insurance underwriting, claims handling, broker/client management and reporting system, with over 2,000 screens.

The interesting thing is that they created their system on their own, much quicker, and at far less cost, than it could have been done in the traditional way, even accounting for my up-front costs.

It has been such a success that I am attempting to turn it into a product that can be the basis of a new µISV. I believe that if it can be presented in the right way, there must be a market for it. We shall see...

In the meantime, I continue to believe in the concept of dynamic applications, as I have been there, done that, got the t-shirt.

Sadly, many people still think it is impossible, but you shouldn't let that stop you from trying.

Scorpio Dragon
Wednesday, October 01, 2008

Deleting …Approving …

It's entirely possible to build a system like this - if you're mainly building a data capture application. However, if your customers want to attach business logic to said data, development pace will become painfully slow. In my experience, customers *always* want to run their data through business logic...

As you can't rely on strongly typed objects/collections, you would have to treat everything like a dictionary. Most of your testing has to happen at run-time - there would be little in the way of compile time checking.

Does locking the DB schema also restrict database views? If you decide on using key/value tables, you might be able to dynamically create views that pivot the data (SQL Server 2005 supports PIVOTs).

@Vijay:Yep, business logic is important (obviously), so I had to allow that to be defined by the users as part of my implementation.

Scorpio Dragon
Wednesday, October 01, 2008

Deleting …Approving …

Perhaps you can just give them Access, or something like that, and say, "Here, use this: you can use it to define new tables, add new fields, write your own business logic, etc."

Christopher Wells
Wednesday, October 01, 2008

Deleting …Approving …

Thanks for all the responses everyone.

Just to clarify, the relationships are known ahead of time, as are the actual entities. I'm not proposing that the user's be able to create their own entities, or the relationships between said entities. All FK's, et al, would be known ahead of time. We have a few fields that we know will be there such as creation date, last edited date, etc, and a few that are relatively standard for the industry. Outside of that, one of our business requirements is to allow the user to attach their own attributes/values to these entities, and allow them to filter out the data via our UI based upon these custom attributes/values. Thing is, if we have a single method of accessing our attributes it simplifies things(IMO).

From the wiki link posted:

"The circumstances where you would need to go beyond standard row-modeling to EAV are listed below:

[snip]

Certain ("hybrid*) classes have some attributes that are non-sparse (present in all or most instances), while other attributes are highly variable and sparse. The latter are suitable for EAV modeling."

I think this most accurately reflects our business needs, except our data isn't necessarily sparse in the way they mean.

Do you guys think this may be a valid decision given the above constraints, or would you still try to find a way to accomplish this in another manner?

worried about the future
Wednesday, October 01, 2008

Deleting …Approving …

It sounds like your requirements are much simpler than mine, as you already know which entities you need and the relationships between them.

It sounds like it would be quite simple to attach "name-value" tables to each of your entities and link on PKs.

It is easy to create queries/stored procedures to turn your data back into a row/column grid (Pivot command in SQL Server, etc).

Or, you can grab the data in two goes, from the same SP, and then stitch it back together in your application.

Either option is easy to do and should give good performance.

Scorpio Dragon
Wednesday, October 01, 2008

Deleting …Approving …

I wouldn't use EVA for entities that I already understand the requirements for. I would definitely use standard tables for all known entities that your application uses and will be basing its business logic off of. The ability to add additional attributes to existing entities of a system is a pretty common requirement. Using a simple key/value table for such values is the easiest way to do this. However, the big issue is whether or not your business logic has any knowledge of these values and what to do with them. Other than basic validation for type, length, etc., your business logic will not be able to utilize them for decision making without custom programming. And that's where most of these systems fall flat on their face.

dood mcdoogle
Wednesday, October 01, 2008

Deleting …Approving …

For the most part it is going to be key/value pairs, except the values don't have to be completely flat (they can be a list of values).

This affects two areas of the application.

1.We're creating an interface so the user can filter the data in ways they need to (grab all of entity X with attribute color = red, etc). Underneath we're generating the SQL necessary for the filter. Having the values in mulitiple areas complicates things, although not too terribly.

2.The creation of customized reports that can have these attributes. Basically, run report X over entity Y with id Z, merge the data together, and present it to the user.

I'm considering requiring all values that can be used for the filters and fields as being dynamic. This simplifies things in that everything is retrieved in a single place, and we don't have to track whether or not a field is static or dynamic, etc, but, as I said earlier, it strikes me as reimplementing the DB on top of the DB. Since I've never built a system quite like this, I decided to post. As much as I'd like to believe I'm prescient, experience indicates otherwise :(

worried about the future
Wednesday, October 01, 2008

Deleting …Approving …

I should also add that of the values are typed, and they all exist in a single table. One of our requirements is that some entities can have values that shadow the values of other parent entities. We handle this by marking the value as owned or shadowed (R/W or just R), and keeping it in a single table makes the updates simpler.

worried about the future
Wednesday, October 01, 2008

Deleting …Approving …

We did this exact thing a few years ago, because we have one system that gets run by 20+ customers and they each have different ideas about what details they should capture from people opening accounts with them.

We did both a dynamic (name-value pair type thing) and a dynamic UI (controls and their positions defined in the database). We also had to deal with legacy tables where the data wasn't, and couldn't be made, dynamic.

At the end of the day, I think the dynamic UI was a correct move, and the dynamic data was a mistake. It is just too awkward and confusing, especially when you need to implement a myriad of arcane business rules (maybe the big success story cited above didn't have this aspect). The lack of proper typing and the fact you can never rely on attribute X existing for each person are just two of the many irritations this causes.

If I could do it over, I think I would define "normal" static tables that include every field anyone ever asked for, and just use the dynamic UI to selectively hide the things specific customers don't care about.

Ok, yes, it's "possible" to hand over a pile of developer tools and let the end user write their own applications. Whether or not the customer actually wants that degree of flexibility and the associated learning curve is another question.

Wednesday, October 01, 2008

Deleting …Approving …

>Has anyone taken such an approach before, and if so, what where your experiences with it? Several times.

>Would you do it again, given the same set of circumstances?Sure. It was the right tool for the problem at hand. Again, I repeat myself: if you don't know what you're doing, then you are in for a world of hurt.

One sample that I *didn't* design, but it turned out to be an elegant design did indeed end up on the daily wtf:http://thedailywtf.com/Articles/RealXML__0x2e__0x2e__0x2e__really!.aspxThis was the "xml" that got pushed around. It automated paperwork for real estate transactions (the paperwork behind buying or refinancing a house in the US). The fields all ended up having a text attribute, as well as the type (usually the type was part of the form field's name, so CM1_AdjstmntNeg_CHK was a checkbox, and anything ending with _$TWC was "currency, use commas for thousands separators" and _$ was "currency").

While the data was effectively stored in a single giant EAV table, it took about 30 other tables around it to support the state machine, ACL and the relationships between which fields were on which versions of which forms. The primary key for the big table had about 6-7 columns. This app was bringing in a couple million dollars per month in revenue for the parent company - a bank.

One sample that I *did* design was a small CRM app to handle car dealerships when I worked for a place that serviced dealerships. Just about all dealerships have a new car, used car, service, body shop and parts departments. Some are in the same building, some are blocks apart. Some are departments of the same company, some are separate legal corps with separate billing. In many dealerships, fleet sales is "whoever is lucky enough to take the call" where others have a separate department. More details at: http://discuss.joelonsoftware.com/default.asp?biz.5.332716.11

Peter
Thursday, October 02, 2008

Deleting …Approving …

Greg: 'At the end of the day, I think the dynamic UI was a correct move, and the dynamic data was a mistake. It is just too awkward and confusing, especially when you need to implement a myriad of arcane business rules (maybe the big success story cited above didn't have this aspect).'

Yeah, insurance has lots of "arcane business rules", so I had to find a way to support pretty much arbitrary rules at any point.

I agree with you though. The dynamic UI was trivial to implement (I had one .aspx page that rendered everything, based on screen definitions in the database). The business rules part was much harder.

I didn't find the lack of proper typing to be such an issue, as I had meta-data, including default values for missing data.

blank: 'Scorpio Dragon is the mind behind "Microsoft Access" ? Wow, that's amazing! Ok, yes, it's "possible" to hand over a pile of developer tools and let the end user write their own applications. Whether or not the customer actually wants that degree of flexibility and the associated learning curve is another question.'

I wish I did invent Microsoft Access. That'd be a great claim to fame, so thanks for the compliment ;-)

You sound kind of bitter in your post, and you seem to have mis-interpreted what I said and extrapolated your own conclusions with no regard for the facts.

Could it be perhaps that you are the kind of "developer" that my client was trying to cut out of the game in the first place?

How many users are there? Do these users get to invent new attributes for the 5 or 6 entities, or do they get to invent new entities as well?

Do these users understand the difference in the consequences of inventing a new attribute and inventing a new entity? I've used the word "inventing". Are the users inventing new data, or are they discovering new data?

Do the users coordinate with each other, to prevent the incidence of synonyms, where two users come up with different names for attributes (or entities) that turn out to be the same? Do they coordinate to prevent the incidence of homonyms, where two users use the same name to describe different attributes (or entities)?

Is there an expectation that some reporting or extracting system is going to be able to make coherent sense out of the data in its entirety, or does each user have his/her own logical database, that just happens to be physically stored in the same container as the databases of the other users?

The answers to these questions will help discern whether you are on the road to success or disaster. The questions may seem philosophical, but they have very practical consequences.

Walter Mitty
Friday, October 03, 2008

Deleting …Approving …

Oops. You had already responded to my first question about users inventing entities. Sorry for responding before carefully reading the entire topic.

The rest of my response is still worth considering.

Walter Mitty
Friday, October 03, 2008

Deleting …Approving …

I want to thank everyone that responded. Your input helped me form a more solid opinion, so here's an update.

After reading Greg's post it pretty much solidified my fears and drove home the point everyone else was making. I approached the powers that be with my misgivings, as well as informing them that I have yet to speak with a developer who didn't regret building a system similar to what we're doing now (well aside from Scorpio, but I believe his requirements/goals where vastly different than ours).

Unfortunately the suggestion to do away with the dynamicism was shot down. The belief is that we'll be using this software to move into other parts of the industry, and we need the flexibility in the interim until we become more familiar with said parts. We actually don't need the dynamicism for what we're doing now, mostly for "the future".

What I'm going to do is put together a memo that I'll be emailing to the relevant parties detailing the dangers of our approach and what we need to do in order to avoid the problems (mostly constant vigilance of turning common dynamic values into tried and true columns. Outside of that, it's out of my hands. I've made my professional recommendation and they chose to ignore it. Let the cards fall where they may :)

worried about the future
Monday, October 06, 2008

Deleting …Approving …

I hope I'm not beating a dead horse.

If you get shot down, and if you build a database that's dynamic in the sense you have been outlining here, then maybe what you can do is this:

Build a second database that trails behind the dynamic database. Don't add new attributes to the second database until additions made by the users have been subjected to formal data analysis. Use formal analaysis to, among other things, guard against synonyms and homonyms.

In your queries, avoid the construct "select *". This way, you won't be vulnerable to queries being broken by added attributes.

Write and maintain application SW that copies recently added data from the dynamic database to this second one. This includes any new attributes that have been through the formal analysis process.

Maintain SW that generates extracts and reports, so as to include new attributes as needed.

This is a big task. Whether it pays off as much as it costs remains to be seen. But it may be a godsend, when the dynamism of the dynamic database causes it to become incoherent.

Walter Mitty
Tuesday, October 07, 2008

Deleting …Approving …

...we have use a similar solution with several customers. We found this:

* Tables must have some basic fields like "primary key"* Others dynamic fields are complementary* These fields MUST BE stored as strings ( value varchar(*) ), for easy manipulation* An optional field or catalog may indicate the intended type of the field