Wednesday Nov 16, 2011

The Business Case for Big Data Part 1

What's the Big Deal

Okay, so a new buzz word is emerging. It's gone beyond just a buzzword now, and I think it is going to change the landscape of retail, financial services, healthcare....everything. Let me spend a moment to talk about what i'm going to talk about.

Massive amounts of data are being collected every second, more than ever imaginable, and the size of this data is more than can be practically managed by today’s current strategies and technologies. There is a revolution at hand centering on this groundswell of data and it will change how we execute our businesses through greater efficiencies, new revenue discovery and even enable innovation. It is the revolution of Big Data. This is more than just a new buzzword is being tossed around technology circles.This blog series for Big Data will explain this new wave of technology and provide a roadmap for businesses to take advantage of this growing trend.

Cases for Big Data

There is a growing list of use cases for big data. We naturally think of Marketing as the low hanging fruit. Many projects look to analyze twitter feeds to find new ways to do marketing. I think of a great example from a TED speech that I recently saw on data visualization from Facebook from my masters studies at University of Virginia. We can see when the most likely time for breaks-ups occurs by looking at status changes and updates on users Walls. This is the intersection of Big Data, Analytics and traditional structured data.
Ted Video

Marketers can use this to sell more stuff.

I really like the following piece on looking at twitter feeds to measure mood. The following company was bought by a hedge fund. They could predict how the S&P was going to do within three days at an 85% accuracy. Link to the article Here we see a convergence of predictive analytics and Big Data.
So, we'll look at a lot of these business cases and start talking about what this means for the business. It's more than just finding ways to use Hadoop + NoSql and we'll talk about that too.

Monday Nov 07, 2011

During Oracle Open world I did an interview along with Tom and Marc Hebeert ( http://estuate.com/blog) about our latest book on Oracle Information Integration, Consolidation and Migration.
Part 1 was just posted on OTN last week. Take a listen:

How would you sum up the Oracle Information, Integration, Migration and Consolidation book?

"At it's core this book is about how you architect an organization with disparate data sources to play well together. We understand that every shop is not a pure Oracle, IBM or Microsoft house and because of this we need to help organizations build systems, interfaces and processes to create information out of data so that businesses can react. We do feel that Oracle has best of breed tools and products to get this done and we show you when and how to migrate, integrate and consolidate."

What made you want to write this book?

"Every organisation, large and small, are faced with the challenge of creating that single version of the truth. I have been to so many customers that can see the end game, but don't know the best path to get there. My intent with this book was to allow people to see a path to completion by providing hands-on examples. The book doesn't cover every scenario, but does a good job of explaining many concrete examples. The reader will come away with a good model for how to get integration work done and what tools are available to accomplish this goal."

How would you describe your writing style?

"People often ask me when I have the time to write - I fly a lot and I find that the time on the plane or in a hotel room is conducive to getting words in my head on paper. I have four kids at home, so writing there can be a bit of a challenge! It means you have to write fast and furious, but once you get into the groove, the words flow"

What do you see for the future of this Oracle technology area?

"Data integration and consolidation will continue to be a huge project focus for customers in the foreseeable future. There are continued economic pressures to reduce costs yet provide robust services. The next, and growing trend is Big Data, which cannot be accomplished if organizations can't get a handle on integration and migration. So, this will continue to be a core issue for years to come"

Tuesday Aug 30, 2011

Why consider information integration?

The useful life of pre-relational mainframe database management system engines is coming to an end because of a diminishing application and skills base, and increasing costs.—Gartner Group

During the last 30 years, many companies have deployed mission critical applications running various aspects of their business on the legacy systems. Most of these environments have been built around a proprietary database management system running on the mainframe. According to Gartner Group, the installed base of mainframe, Sybase, and some open source databases has been shrinking. There is vendor sponsored market research that shows mainframe database management systems are growing, which, according to Gartner, is due primarily to increased prices from the vendors, currency conversions, and mainframe CPU replacements.

Over the last few years, many companies have been migrating mission critical applications off the mainframe onto open standard Relational Database Management Systems (RDBMS) such as Oracle for the following reasons:

Reducing skill base: Students and new entrants to the job market are being trained on RDBMS like Oracle and not on the legacy database management systems. Legacy personnel are retiring, and those that are not are moving into expensive consulting positions to arbitrage the demand.

Lack of flexibility to meet business requirements: The world of business is constantly changing and new business requirements like compliance and outsourcing require application changes. Changing the behavior, structure, access, interface or size of old databases is very hard and often not possible, limiting the ability of the IT department to meet the needs of the business. Most applications on the aging platforms are 10 to 30 years old and are long past their original usable lifetime.

Lack of Independent Software Vendor (ISV)applications: With most ISVs focusing on the larger market, it is very difficult to find applications, infrastructure, and tools for legacy platforms. This requires every application to be custom coded on the closed environment by scarce in-house experts or by expensive outside consultants.

Total Cost of Ownership (TCO): As the user base for proprietary systems decreases, hardware, spare parts, and vendor support costs have been increasing. Adding to this are the high costs of changing legacy applications, paid either as consulting fees for a replacement for diminishing numbers of mainframe trained experts or increased salaries for existing personnel. All leading to a very high TCO which doesn't even take into account the opportunity cost to the business of having inflexible systems.

Business challenges in data integration and migration

Once the decision has been taken to migrate away from a legacy environment, the primary business challenge is business continuity. Since many of these applications are mission critical, running various aspects of the business, the migration strategy has to ensure continuity to the new application—and in the event of failure, rollback to the mainframe application. This approach requires data in the existing application to be synchronized with data on the new application.

Making the challenge of data migration more complicated is the fact that legacy applications tend to be interdependent, but the need from a risk mitigation standpoint is to move applications one at a time. A follow-on challenge is prioritizing the order in which applications are to be moved off the mainframe, and ensuring that the order meets both the business needs and minimizes the risk in the migration process.

Once a specific application is being migrated, the next challenge is to decide which business processes will be migrated to the new application. Many companies have business processes that are present, because that's the way their systems work. When migrating an application off the mainframe, many business processes do not need to migrate. Even among the business processes that need to be migrated, some of these business processes will need to be moved as-is and some of them will have to be changed. Many companies utilize the opportunity afforded by a migration to redo the business processes they have had to live with for many years.

Data is the foundation of the modernization process. You can move the application, business logic, and work flow, but without a clean migration of the data the business requirements will not be met. A clean data migration involves:

Data that is organized in a usable format by all modern tools

Data that is optimized for an Oracle database

Data that is easy to maintain

Technical challenges of information integration

The technical challenges with any information integration all stem from the fact that the application accesses heterogeneous data (VSAM, IMS, IDMS, ADABAS, DB2, MSSQL, and so on) that can even be in a non-relational hierarchical format. Some of the technical problems include:

The flexible file definition feature used in COBOL applications in the existing system will have data files with multi-record formats and multi-record types in the same dataset—neither of which exist in RDBMS. Looping data structure and substructure or relative offset record organization such as a linked list, which are difficult to map into a relational table.

Data and referential integrity is managed by the Oracle database engine. However, legacy applications already have this integrity built in. One question is whether to use Oracle to handle this integrity and remove the logic from the application.

Finally, creating an Oracle schema to maximize performance, which includes mapping non-oracle keys to Oracle primary and secondary keys; especially when legacy data is organized in order of key value which can affect the performance on an Oracle RDBMS. There are also differences in how some engines process transactions, rollbacks, and record locking.

General approaches to information integration and migration

There are several technical approaches to consider when doing any kind of integration or migration activity. In this section, we will look at a methodology or approach for both data integration and data migration.

Data integration
Clearly, given this range of requirements, there are a variety of different integration strategies, including the following:

Consolidated: A consolidated data integration solution moves all data into a single database and manages it in a central location. There are some considerations that need to be known regarding the differences between non- Oracle and Oracle mechanics. Transaction processing is an example. Some engines use implicit commits and some manage character sets differently than Oracle does, this has an impact on sort order.

Federated: A federated data integration solution leaves data in the individual data source where it is normally maintained and updated, and simply consolidates it on the fly as needed. In this case, multiple data sources will appear to be integrated into a single virtual database, masking the number and different kinds of databases behind the consolidated view. These solutions can work bidirectionally.

Shared: A shared data integration solution actually moves data and events from one or more source databases to a consolidated resource, or queue, created to serve one or more new applications. Data can be maintained and exchanged using technologies such as replication, message queuing, transportable table spaces, and FTP.

So, we have talked about some of the factors and challenges for integration in a heterogeneous world. In the next post, we will look into ways of doing this using things like Oracle Golden Gate, and ODI as well as some migration techniques.