If the data in the source system is poorly structured and not normalized in any way, what is the most typical way to handle this? Is it common to try and normalize the data as it moves through the staging area before pushing it out into a dimensional data mart? We're pulling data from one of our own websites that is stored in SQL server, but it isn't really structured in any way. Would the right approach be to try and normalize it first?

I wouldn't make any effort to normalize data for a Data Mart application - I prefer the Star Schema design which is heavily de-normalized.

I also prefer the "aggresive load" strategy, where source system data is accepted as valid and correct until proven otherwise. In reality its typically the most heavily scrutinized data available, flawed or not.

I'd see examples of "poor structure" as an opportunity for the Data Mart application to add value. You can clean data, add hierarchies, banding etc.

I wouldn't normalize it either. Normalizing requires two extensive sets of ETL logic. One to extract and normalize to ETL staging area (ODS). Another set of ETL logic to extract and load to the dimensional warehouse. I try to minimize the amount of transformation logic that must be done to stage the data, usually just adding dates to store a few weeks of data in case I need to reload anything to the warehouse.

I wouldn't normalize it either. Normalizing requires two extensive sets of ETL logic. One to extract and normalize to ETL staging area (ODS). Another set of ETL logic to extract and load to the dimensional warehouse. I try to minimize the amount of transformation logic that must be done to stage the data, usually just adding dates to store a few weeks of data in case I need to reload anything to the warehouse.