I have a strong feeling that database design and normalisation often comes in second hand when dealing with spatial data.
With software costing a fortune and databases with over 100 fields tables I have to ask:
Is there good reasons for taking other considerations than normalisation when designing a spatial database?

I guess people will ask for examples, but that I can not give here, so my question is maybe more aimed for those who means that 100 fields is no problem and easier to maintain than a proper normalized design.
What is the arguments?

Edit:

Ok, I have got two answers pointing in the same direction as my own belief. But why is that often not followed? I will give some of my own thoughts that maybe someone can deny or verify.

Is it the software that forces people to work in a special way?

Or is it just that the software hides the database behind the desktop GIS?

One theory is that in education and organizations there is that the database part is kept away from GIS part. That there is different people running tools on geometries from the people knowing the database. Can that be the case?

if so:

Is it a lack in the education of GIS-professionals to get to understand the relational database concept?

In the case of ArcGIS, a normalized database with referential integrity is hard to accomplish, as you are limited to only the database features exposed to you and supported by ArcGIS. This is very frustrating as a relational database guy... playing a game of telephone, with ArcSDE in the middle.
–
nw1Feb 4 '11 at 18:23

6 Answers
6

I feel that spatial databases should be treated no differently to traditional databases. They are essentially doing the same thing, storing large amounts of data for fast retrieval. As an example, in PostgreSQL / PostGIS, the geometry is just another datatype. Just like text, or integer. Same in SQL Server 2008. Same in Oracle. If the "spatial" part is just another field type in the database, then is it really that different from the original database? Does this mean we should throw out all the rules of traditional database design?

Obviously normalization can be taken too far, just as with traditional databases, so it's a trade off to find the best design that suits your needs.

If you are planning on created a highly de-normalized structure with tables of 100 columns, then you have to ask yourself what is likely to change in the future? With a vast increase in rows, is this also going to affect querying performance? Is this going to affect maintainability in the future?

What's wrong with created a normalized structure and using views to expose all the data to the database client, be it GIS, or any other client?

If the software you are using on top of the database is forcing you to use highly de-normalized structures, then this is a different argument. You are constrained by the software and not the database, so you have no choices in the best database design.

So I think, the short answer is (in my opinion) database design is just as important with spatial databases as with traditional databases.

I see this a lot. I feel that it stems from the fact that traditionally GIS people come from surveying backgrounds, and don't have a background/understanding of databases. I am seeing this change though, as more and more organisations move the GIS infrastructure into the IT fold.

this is my feeling too, but I hope in some way that the explanation is more like Pauls discussion, that it is a deliberate choice in some way. that would give more sence to the GIS buissness with so many fancy words, models an" techniques than finding out that the database in the bottom was misused because of ignorance.
–
Nicklas AvénNov 14 '10 at 21:33

1

sorry, misused is wrong. if it is delibirate with a good reason it is not misuse.
–
Nicklas AvénNov 14 '10 at 21:47

The previous high cost of ArcSDE and lack of a spatial datatype in SQL Server (until 2008), and Oracle until version 10, meant there was little choice but to store data in shapefiles for many organisations (and by tenderers to keep bid costs down).

The introduction of native spatial types in SQL Server meant almost instantly that ArcSDE went from a huge investment, to being included for free in ArcGIS, and the "bringing in to the fold" of spatial data in organisations.

Organisations using ArcGIS and SQL Server previously had three choices:

Store spatial data in shapefiles / personal GDBs, and link to the rest of the organisational data in databases (or export these attributes to DBFs)

Switch GIS vendors and store spatial data in a single database but in a format only accessible by the new GIS software

Once SQL Server had a native spatial type most vendors used this instead of their proprietary formats, meaning spatial data could suddenly be accessed by other applications. ESRI had to either reduce the cost of ArcSDE (which they did by integrating it into ArcGIS) and/or allow spatial data to be stored in the native database format.

In addition queries performed in ArcIMS on shapefiles meant associated with DBFs had to include all required fields and duplication as there was no option to create spatial views, or easily linking features with a back end database.

Organisational Reasons

I agree with others that that until recently spatial data became a native database type it has long been ignored or kept separate by database administrators in organisations, and become the reponsibility of a GIS manager. The concepts of database design, normalisation, replication, security, and SQL views require an often very different, and specialised skillset and cannot easily be learned as you go along.

Cost Reasons

Explaining in a tender the requirement of large amount of time and effort to be spent on a data model, and cleaning / importing data into this model is often impossible. Often the project purchasers are coming from an analytical view of GIS and overlook the importance of structured data.

I understand and agree with most what you write. But saying the SDE part is given for free after the renaming to ArcGIS server, isn't that like saying: If you buy the bautiful color of this car for 100000 dollars you will get the rest of the car for free. I don't know ArcGIS that well but what is ArcGIS server without the SDE part? and I have never heard anyone say that ArcGIS server is cheap. I don't really see how SQL Server spatial types has affected ArcGIS. But since Arc products is so widely spread I agree that the Arc road has an big inpact on how people think of their spatial data.
–
Nicklas AvénNov 15 '10 at 10:37

Before ArcGIS Server, ArcSDE used to be completely separate from ArcMap and ArcIMS and had to be bought and licencsed separately. As ArcSDE was the only way to store spatial data in SQL Server (or Oracle at the time) it meant spatial data was stored elsewhere.
–
geographikaNov 15 '10 at 11:16

ok, ArcIMS in package with SDE is the new consept. Arcmap is still in need of separate licenses per user or floating, right? offtopic, but I am a little curious.
–
Nicklas AvénNov 15 '10 at 11:32

isn't ArcGIS server large amount of money? As far s I know you can not use sqlserver fomat or postgis format (without ziggis) in arcmap without sde, sorry ArcGIS Server in between.
–
Nicklas AvénNov 15 '10 at 13:20

By 100-column-tables, I assume you mean the kinds of outputs you get from building "master coverage" overlays of multiple inputs. Yes, these are artifacts of the Arc/INFO workflow. But, in defence, you can also think of them as being deliberately de-normalized tables for OLAP. Since they are being used largely for query processing, not for data update, the de-normalized form makes some sense. Like a star schema, but without the, er, points. OK, weak tea, but still I think there's something there.

yes, Paul. I knew there would be some explanation out there including words I don't really understand :-). Very interesting that there is a deliberate history behind this. Great!
–
Nicklas AvénNov 14 '10 at 21:21