Can't really say if it is good or bad, just different. When we started a data warehouse project at a previous employer we set up the Staging and ODS databases separately. We could have just as easily put them in the same the database using schemas to keep things separate.

If this is being done, one thing I would also make sure of is that the staging and data warehouse tables were in different file groups. This would allow for the placing of these groups on different spindles on the server (or SAN) so that they weren't using the same disks.

What I often see are separate databases on the same instance, where the staging database is in simple recovery mode and the data warehouse is in full recovery mode. Procs parked in the staging database use cross-database calls to load the data into the data warehouse database after it has been massaged and prepared in tables in the staging database. Different schemas in the same database do not give you the separation necessary to run different recovery modes.

The big barrier to moving the staging database onto a different instance altogether is the additional overhead of getting that data from the staging database into the data warehouse. Consider a simple cross-database INSERT...SELECT versus a secondary push job using SSIS or some other means. It's a tradeoff when having your staging database on a separate instance between flexibility in terms of scaling your environment and ease of loading your data warehouse.

__________________________________________________________________________________________________There are no special teachers of virtue, because virtue is taught by the whole community. --Plato

opc.three (3/23/2012)What I often see are separate databases on the same instance, where the staging database is in simple recovery mode and the data warehouse is in full recovery mode. Procs parked in the staging database use cross-database calls to load the data into the data warehouse database after it has been massaged and prepared in tables in the staging database. Different schemas in the same database do not give you the separation necessary to run different recovery modes.

The big barrier to moving the staging database onto a different instance altogether is the additional overhead of getting that data from the staging database into the data warehouse. Consider a simple cross-database INSERT...SELECT versus a secondary push job using SSIS or some other means. It's a tradeoff when having your staging database on a separate instance between flexibility in terms of scaling your environment and ease of loading your data warehouse.

I understand the Staging database using the simple recovery model, but curious why the data warehouse is using the full recovery model? Is it due to size and amount of data loaded each time that t-log backups are considered better?

Previous company I worked at that was building a dat warehouse had theirs using the simple recovery model. A full backup was run after the nightly load process was completed.

opc.three (3/23/2012)What I often see are separate databases on the same instance, where the staging database is in simple recovery mode and the data warehouse is in full recovery mode. Procs parked in the staging database use cross-database calls to load the data into the data warehouse database after it has been massaged and prepared in tables in the staging database. Different schemas in the same database do not give you the separation necessary to run different recovery modes.

The big barrier to moving the staging database onto a different instance altogether is the additional overhead of getting that data from the staging database into the data warehouse. Consider a simple cross-database INSERT...SELECT versus a secondary push job using SSIS or some other means. It's a tradeoff when having your staging database on a separate instance between flexibility in terms of scaling your environment and ease of loading your data warehouse.

I understand the Staging database using the simple recovery model, but curious why the data warehouse is using the full recovery model? Is it due to size and amount of data loaded each time that t-log backups are considered better?

Previous company I worked at that was building a dat warehouse had theirs using the simple recovery model. A full backup was run after the nightly load process was completed.

Generically, a staging database where tables are wiped and reloaded with each new incoming feed makes it volatile, i.e. there is little to no value in recovering a previous DB. Whereas to recover a DW to minimize downtime for data clients could be important.

In the first case that comes to mind running the DW in FULL was purely a DR position. Standard operating procedure was for all mission-critical databases, for which the DW database qualified, to be mirrored. As you know mirroring requires the database to be in FULL recovery mode 100% of the time. Staging was classified as volatile so the DR plan for that DB was to restore the last full backup to the secondary server and resume the job schedule. With mirroring, the DW could be brought online by simply failing over to the secondary mirror. Overkill? Maybe, but the shop had a very "one size fits all" mentality.

I could see value in log shipping a DW though, or in being able to restore to a PIT for purposes of isolating a production issue. Are those worth the overhead of running a DW in FULL? Not sure, depends on the shop and their SLAs I guess.

__________________________________________________________________________________________________There are no special teachers of virtue, because virtue is taught by the whole community. --Plato

opc.three (3/23/2012)What I often see are separate databases on the same instance, where the staging database is in simple recovery mode and the data warehouse is in full recovery mode. Procs parked in the staging database use cross-database calls to load the data into the data warehouse database after it has been massaged and prepared in tables in the staging database. Different schemas in the same database do not give you the separation necessary to run different recovery modes.

The big barrier to moving the staging database onto a different instance altogether is the additional overhead of getting that data from the staging database into the data warehouse. Consider a simple cross-database INSERT...SELECT versus a secondary push job using SSIS or some other means. It's a tradeoff when having your staging database on a separate instance between flexibility in terms of scaling your environment and ease of loading your data warehouse.

I understand the Staging database using the simple recovery model, but curious why the data warehouse is using the full recovery model? Is it due to size and amount of data loaded each time that t-log backups are considered better?

Previous company I worked at that was building a dat warehouse had theirs using the simple recovery model. A full backup was run after the nightly load process was completed.

Generically, a staging database where tables are wiped and reloaded with each new incoming feed makes it volatile, i.e. there is little to no value in recovering a previous DB. Whereas to recover a DW to minimize downtime for data clients could be important.

In the first case that comes to mind running the DW in FULL was purely a DR position. Standard operating procedure was for all mission-critical databases, for which the DW database qualified, to be mirrored. As you know mirroring requires the database to be in FULL recovery mode 100% of the time. Staging was classified as volatile so the DR plan for that DB was to restore the last full backup to the secondary server and resume the job schedule. With mirroring, the DW could be brought online by simply failing over to the secondary mirror. Overkill? Maybe, but the shop had a very "one size fits all" mentality.

I could see value in log shipping a DW though, or in being able to restore to a PIT for purposes of isolating a production issue. Are those worth the overhead of running a DW in FULL? Not sure, depends on the shop and their SLAs I guess.

As soon as I saw DB Mirroring, it makes sense. With that I would also assume that t-log backups are run periodically or after each load to keep the t-log from filling the disk.

Lynn Pettis (3/23/2012)As soon as I saw DB Mirroring, it makes sense. With that I would also assume that t-log backups are run periodically or after each load to keep the t-log from filling the disk.

I cannot recall specifically, but it was something like every 15 minutes during heavy load times, and every few hours during the times when the DB was being used for reporting. Some lighter data feeds were loaded throughout the day as well.

__________________________________________________________________________________________________There are no special teachers of virtue, because virtue is taught by the whole community. --Plato

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.