The amazing adventures of Doug Hughes

Archive for August, 2007

This post will be the first in a short series of blog posts showing an overview of Data Warehousing with a slant on how we as ColdFusion developers may encounter it. In these articles we will use a simple database example.

The first major stepping stone in understanding Data Warehousing is to grasp the concepts and differences between the two overall database categories. The type most of us are used to dealing with is the On Line Transactional Processing (OLTP) category. I go into greater detail below. The other major category is On Line Analytical Processing (OLAP), in this design we are flattening out the data and the examples below should help to clarify this.

Accessing data to produce required reports is a crucial need in most organizations; in this example we will take data contained in an OLTP format and create two notional OLAP modeled tables from the OLTP format of the data. The OLAP format is how we might hold that same data if our goal was to produce an efficient version of the same data for reporting on large numbers of records. This need for large reports with lots of data can sometimes conflict with the needs of storing data in the first place as the database structures needed to accept, store and easily manipulate data efficiently can be markedly different from the structures needed to produce effective reports from that same data. This is clearly illustrated if we look at the differences between two basic kinds of database categories:

ON-LINE TRANSACTIONAL PROCESSING (OLTP)

This is what we would characterize as the ongoing day-to-day functional copy of the database. It is where data is added and updated but never overwritten or deleted. The main needs of the OLTP operational database being easily controlled insertion and updating of data with efficient access to data manipulation and viewing mechanisms. Typically only single record or small record-sets should be manipulated in a single operation in an OLTP designed database. The main thrust here is to avoid having the same data in different tables. This basic tenet of Relational Database modeling is known as normalizing data.

Here is an example of a simple OLTP normalized database. (By the way I use Visio to create this and it is a bit quirky in accurately assigning Foreign Keys (FK)). The main thing to note is the number of tables and the relationships. This is really a simple database with in essence, people, companies and related items. In order to pull meaningful reports we have to join tables at the query level. In larger databases that are not well designed this can be a very challenging and inefficient process.

ON-LINE ANALYTICAL PROCESSING (OLAP)

OLAP is a broad term that also encompasses data warehousing. In this model data is stored in a format which enables the efficient creation of data mining/reports. OLAP design should accommodate reporting on very large record sets with little degradation in operational efficiency. The overall term used to describe taking data structures in an OLTP format and holding the same data in an OLAP format is Dimensional Modeling It is the primary building block of Data Warehousing and we will go into greater detail on what that is and how it is used in a later blog post. Data in an OLAP database should always be read-only.

Here is an example of how some of the data shown above might be stored in an OLAP format for the purpose of producing large reports of a particular extract of the OLTP data format above. Here we created a dimensional model of two types of user reports. One for details of where users live the other of where they work, note that the same data is now in a de-normalized state. The OLTP and OLAP data could be kept synchronized by using Transactional Replication. It is important to note that all day-to-day manipulation of the data is carried out in the OLTP format database; the OLAP version being a replicated, read-only copy.

We have two tables here that contain data from the OLTP database but in this case it is in a much more flattened state. In this case we could and probably would have lots of repeated data particularly in the ReportPeopleWork table. In an OLAP model repetition is not necessarily a bad thing.

In this first article we have demonstrated some differences between OLTP and OLAP database types.

Firstly all transactions on the data are carried out in the OLTP, transactional, copy of the database. The normalized structure is best for ensuring that little or no data is repeated. Basic user information is contained in the People table, the home and work address for the user is maintained in the Location table and the home address work address for each person is determined by the Locationtype. These different tables are related via Link tables. All adding and updating of data is done in the OLTP database.

Secondly data which needs to be used in large recordsets in a read-only format, such as for reporting, is best held in an OLAP format. Here each table can be thought of in the same way as we might think of an object in OO programming paradigms, with each row in the table being a single complete object. There still may be reasons for linking tables but in the OLAP model this should be kept to a bare minimum. An excessive need for linking tables in the OLAP database probably denotes a structural weakness in how the OLAP database was modeled.

I think the greatest version of ColdFusion ever was 1.54; why because I cut my teeth on that, how many other dbmlers are out there? The original lingua- franca of ColdFusion was .dbml of course, now I am showing my age in blog years!

It is an incredible privilege to be working with Dough Hughes, Critter, Jeff, Scott et al at Alagad. Collectively they have contributed so much to the CF community that I am part of, now hopefully its my turn to contribute. I will be putting together articles more related to that grubby-dirty engine room, it is where I belong after all. The network, web servers, the database servers, the directory servers, the JVM and if Im lucky enough they might even let me crawl up to the CF Server Admin! Its amazing that there are so few meaty articles about these things and how CF-JRun relate to them.

I’m happy to announce that Alagad has recently had yet another talented and experienced member join our ranks. Our newest employee is Mike Brunt.

Mike is a server and systems guru. He used to be a support technician for Allaire and Macromedia. After that, he helped found Webapper. When it comes to things like JVM tuning, advanced server configuration, application bottleneck troubleshooting and a lot more, I’ve never met anyone who comes close to Mike’s capabilities.

For example, he recently helped me with a client who wanted a very, very, complex IIS and ColdFusion configuration. In particular, they had four servers, two IIS web servers and two ColdFusion servers. The IIS servers were load balanced (I think) and the two ColdFusion servers were distributed (meaning not on the same servers as IIS), clustered, made use of session replication and sticky sessions. This was to meet the client’s specific needs for this configuration.

On the surface this sounds simple, but it’s possibly the most complex setup you can have for ColdFusion. It took many, many, hours to work this out. In the end it required a specific version of JVMs, ColdFusion, Jrun and more installed in the correct order! But Mike worked it out and the client has exactly what they needed working perfectly.

Beyond that, Mike recently (before he started with me) did some work for another client which he described in an email to me:

“Well I thought I would share what I had to do tonight for an old client I have. They have a small CF Server farm of three CFMX 7 Enterprise boxes running an eCommerce app with three CF instances each. So they wanted me to install SeeFusion on each and wrap the DSN with the SeeFusion JDBC wrapper but with no noticeable loss of service. This sometimes requires two restarts of CF, one for the initial SeeFusion install and another for the JDBC wrapper to take effect and tonight was no exception. Just to spice things up they have two levels of monitoring one is a heartbeat from the Cisco Load Balancers, the other an ISAPI monitoring system called TEALEAF and they asked me to complete this exercise with neither system picking up any service outage. I was successful and this is not braggery and I apologize if it sounds so. I am just glad to be here.”

Beyond that, I’ve worked with Mike to track down memory leaks and performance problems in applications. I’ve used him to eek out every iota of performance from applications.

So, suffice it to say, if you need something complex done with your servers or need to figure out how to help your applications run better, Mike (via Alagad!) is your man.

Beyond the server tuning and tweaking, Mike will be writing a long series of blog articles to teach you how to do these things. We’re also going to be putting together some training classes and aggressively growing this wing of Alagad.

I decided tonight to upgrade my development environment to run on the latest and greatest ColdFusion 8. After diligently backing up all the settings from my various ColdFusion instances and uninstalling all my various versions of ColdFusion I started installing CF 8.

Unfortunately, changing the CFIDE installation directory seems to cause problems in ColdFusion 8. After the installation was completed the installer directed me to complete the installation by pulling up the ColdFusion administrator. This failed with the error: Could not find the ColdFusion Component or Interface CFIDE.adminapi.security

Googling this returned only one message on the Adobe forums which didn’t address the problem. Luckily I knew about the #coldfusion channel on the Dalnet IRC network. Much to my surprise, Scott Pinkston was there with the answer.

Apparently the ColdFusion 8 installer has a bug where if you change the CFIDE directory from its default it won’t set the correct path in a rather important ColdFusion configuration file.

To fix the issue you need to find the neo-runtime.xml file in your installation. Mine was located under C:JRun4servers{servername}cfusion-earcfusion-warWEB-INFcfusionlib. I don’t know where this would be in the stand-alone server.

Anyhow, edit that file in a text editor (back it up first!) and look for this node:

<var name='/CFIDE'>
<string>c:InetpubCFIDE</string>
</var>

That’s what mine was set to by default, which was incorrect. I had to change the path in the &lt;string&gt; tags to the correct path.

After making that change I restarted ColdFusion and I was able to reach the administration interface and complete the installation.