Category Archives: Data Synchronization

For those hoping to push through a hard-hitting analytics effort that will serve as a beacon of light within an otherwise calcified organization, there’s probably a lot of work cut out for you. Evolving into an organization that fully grasps the power and opportunities of data analytics requires cultural change, and this is a challenge organizations have only begin to grasp.

“Sitting down with pizza and coffee could get you around can get around most of the technical challenges,” explained Sam Ransbotham, Ph.D, associate professor Boston College, at a recent panel webcast hosted by MIT Sloan Management Review, “but the cultural problems are much larger.”

That’s one of the key takeaways from a the panel, in which Ransbotham was joined by Tuck Rickards, head of digital transformation practice at Russell Reynolds Associates, a digital recruiting firm, and Denis Arnaud, senior data scientist Amadeus Travel Intelligence. The panel, which examined the impact of corporate culture on data analytics, was led by Michael Fitzgerald, contributing editor at MIT Sloan Management Review.

The path to becoming an analytics-driven company is a journey that requires transformation across most or all departments, the panelists agreed. “It’s fundamentally different to be a data-driven decision company than kind of a gut-feel decision-making company,” said Rickards. “Acquiring this capability to do things differently usually requires a massive culture shift.”

That’s because the cultural aspects of the organization – “the values, the behaviors, the decision making norms and the outcomes go hand in hand with data analytics,” said Ransbotham. “It doesn’t do any good to have a whole bunch of data processes if your company doesn’t have the culture to act on them and do something with them.” Rickards adds that bringing this all together requires an agile, open source mindset, with frequent, open communication across the organization.

So how does one go about building and promoting a culture that is conducive to getting the maximum benefit from data analytics? The most important piece is being about people who ate aware and skilled in analytics – both from within the enterprise and from outside, the panelists urged. Ransbotham points out that it may seem daunting, but it’s not. “This is not some gee-whizz thing,” he said. “We have to get rid of this mindset that these things are impossible. Everybody who has figured it out has figured it out somehow. We’re a lot more able to pick up on these things that we think — the technology is getting easier, it doesn’t require quite as much as it used to.”

The key to evolving corporate culture to becoming more analytics-driven is to identify or recruit enlightened and skilled individuals who can provide the vision and build a collaborative environment. “The most challenging part is looking for someone who can see the business more broadly, and can interface with the various business functions –ideally, someone who can manage change and transformation throughout the organization,” Rickards said.

Arnaud described how his organization – an online travel service — went about building an espirit de corps between data analytics staff and business staff to ensure the success of their company’s analytics efforts. “Every month all the teams would do a hands-on workshop, together in some place in Europe [Amadeus is headquartered in Madrid, Spain].” For example, a workshop may focus on a market analysis for a specific customer, and the participants would explore the entire end-to-end process for working with the customer, “from the data collection all the way through to data acquisition through data crunching and so on. The one knowing the data analysis techniques would explain them, and the one knowing the business would explain that, and so on.” As a result of these monthly workshops, business and analytics teams members have found it “much easier to collaborate,” he added.

Web-oriented companies such as Amadeus – or Amazon and eBay for that matter — may be paving the way with analytics-driven operations, but companies in most other industries are not at this stage yet, both Rickards and Ransbotham point out. The more advanced web companies have built “an end-to-end supply chain, wrapped around customer interaction,” said Rickards. “If you think of most traditional businesses, financial services or automotive or healthcare are a million miles away from that. It starts with having analytic capabilities, but it’s a real journey to take that capability across the company.”

The analytics-driven business of the near future – regardless of industry – will likely to be staffed with roles not seen as of yet today. “If you are looking to re-architect the business, you may be imagining roles that you don’t have in the company today,” said Rickards. Along with the need for chief analytics officers, data scientists, and data analysts, there will be many new roles created. “If you are on the analytics side of this, you can be in an analytics group or a marketing group, with more of a CRM or customer insights title. Yu can be in a planning or business functions. In a similar way on the technology side, there are people very focused on architecture and security.”

Ultimately, the demand will be for leaders and professionals who understand both the business and technology sides of the opportunity, Rickards continued. Ultimately, he added, “you can have good people building a platform, and you can have good data scientists. But you better have someone on the top of that organization knowing the business purpose.’

I live in a small town in Maine. Between my town and the surrounding three towns, there are seven Main Streets and three Annis Roads or Lanes (and don’t get me started on the number of Moose Trails). If your insurance company wants to market to or communicate with someone in my town or one of the surrounding towns, how can you ensure that the address that you are sending material to is correct? What is the cost if material is sent to an incorrect or outdated address? What is the cost to your insurance company if a provider sends the bill out to the wrong ?

How much is poor address quality costing your business? It doesn’t just impact marketing where inaccurate address data translates into missed opportunity – it also means significant waste in materials, labor, time and postage . Bills may be delivered late or returned with sender unknown, meaning additional handling times, possible repackaging, additional postage costs (Address Correction Penalties) and the risk of customer service issues. When mail or packages don’t arrive, pressure on your customer support team can increase and your company’s reputation can be negatively impacted. Bills and payments may arrive late or not at all directly impacting your cash flow. The cost of bad address data causes inefficiencies and raises costs across your entire organization.

The best method for handling address correction is through a validation and correction process:

What are Incorrect Addresses Costing your Company? There are Steps to Follow

When trying to standardize member or provider information one of the first places to look is address data. If you can determine that John Q Smith that lives at 134 Main St in Northport, Maine 04843 is the same John Q Smith that lives at 134 Maine Street in Lincolnville, Maine 04849, you have provided a link between two members that are probably considered distinct in your systems. Once you can validate that there is no 134 Main St in Northport according to the postal service, and then can validate that 04849 is a valid zip code for Lincolnville – you can then standardize your address format to something along the lines of: 134 MAIN ST LINCOLNVILLE,ME 04849. Now you have a consistent layout for all of your addresses that follows postal service standards. Each member now has a consistent address which is going to make the next step of creating a golden record for each member that much simpler.

Think about your current method of managing addresses. Likely, there are several different systems that capture addresses with different standards for what data is allowed into each field – and quite possibly these independent applications are not checking or validating against country postal standards. By improving the quality of address data, you are one step closer to creating high quality data that can provide the up-to-the minute accurate reporting your organization needs to succeed.

To deliver flawless field service, companies often require integration across multiple applications for various work processes. A good example is automatically ordering and shipping parts through an ERP system to arrive ahead of a timely field service visit. Informatica has partnered with ServiceMax, the leading field service automation solution, and subsequently joined the new ServiceMax Marketplace to offer customers integration solutions for many ERP and Cloud applications frequently involved in ServiceMax deployments. Comprised of Cloud Integration Templates built on Informatica Cloud for frequent customer integration “patterns”, these solutions will speed and cost contain the ServiceMax implementation cycle and help customers realize the full potential of their field service initiatives.

Existing members of the ServiceMax Community can see a demo or take advantage of a free 30-day trial that provides full capabilities of Informatica Cloud Integration for ServiceMax with prebuilt connectors to hundreds of 3rd party systems including SAP, Oracle, Salesforce, Netsuite and Workday, powered by the Informatica Vibe virtual data machine for near-universal access to cloud and on-premise data. The Informatica Cloud Integration for Servicemax solution:

Accelerates ERP integration through prebuilt Cloud templates focused on key work processes and the objects on common between systems as much as 85%

Have you noticed something different this winter season that most people are cheery about? I’ll give you a hint. It’s not the great sales going on at your local shopping mall but something that helps you get to the mall allot more affordable then last year. It’s the extremely low gas prices across the globe, fueled by over-supply of oil vs. demand contributed from a boom in Geo-politics and boom in shale oil production in N. America and abroad. Like any other commodity, it’s impossible to predict where oil prices are headed however, one thing is sure that Oil and Gas companies will need timely and quality data as firms are investing in new technologies to become more agile, innovative, efficient, and competitive as reported by a recent IDC Energy Insights Predictions report for 2015.

The report predicts:

80% of the top O&G companies will reengineer processes and systems to optimize logistics, hedge risk and efficiently and safely deliver crude, LNG, and refined products by the end of 2017.

Over the next 3 years, 40% of O&G majors and all software divisions of oilfield services (OFS) will co-innovate on domain specific technical projects with IT professional service firms.

The CEO will expect immediate and accurate information about top Shale Plays to be available by the end of 2015 to improve asset value by 30%.

By 2016, 70% percent of O&G companies will have invested in programs to evolve the IT environment to a third platform driven architecture to support agility and readily adapt to change.

With continued labor shortages and over 1/3 of the O&G workforce under 45 in three years, O&G companies will turn to IT to meet productivity goals.

By the end of 2017, 100% of the top 25 O&G companies will apply modeling and simulation tools and services to optimize oil field development programs and 25% will require these tools.

Spending on connectivity related technologies will increase by 30% between 2014 and 2016, as O&G companies demand vendors provide the right balance of connectivity for a more complex set of data sources.

In 2015, mergers, acquisitions and divestitures, plus new integrated capabilities, will drive 40% of O&G companies to re-evaluate their current deployments of ERP and hydrocarbon accounting.

With a business case built on predictive analytics and optimization in drilling, production and asset integrity, 50% of O&G companies will have advanced analytics capabilities in place by 2016.

With pressures on capital efficiency, by 2015, 25% of the Top 25 O&G companies will apply integrated planning and information to large capital projects, speeding up delivery and reducing over-budget risks by 30%.

Realizing value from these investments will also require Oil and Gas firms to modernize and improve their data management infrastructure and technologies to deliver great data whether to fuel actionable insights from Big Data technology to facilitating post-merger application consolidation and integration activities. Great data is only achievable by Great Design supported by capable solutions designed to help access and deliver timely, trusted, and secure data to need it most.

Lack of proper data management investments and competences have long plagued the oil and gas sector with “less-than acceptable” data and higher operating costs. According to the “Upstream Data and Information Management Survey” conducted by Wipro Technologies, 56% of those surveyed felt that business users spent more than ¼ or more of their time on low value activities caused by existing data issues (e.g. accessing, cleansing, preparing data) for “high value” activities (e.g. analysis, planning, decision making). The same survey showed the biggest data management issues were timely access to required data and data quality issues from source systems.

So what can Oil and Gas CIO’s and Enterprise Architects do to prepare for the future? Here are some tips for consideration:

Look to migrate and automate legacy hand coded data transformation processes by adopting tools that can help streamline the development, testing, deployment, and maintenance of these complex tasks that help developers build, maintain, and monitor data transformation rules once and deploy them across the enterprise.

Simplify how data is distributed across systems with more modern architectures and solutions and avoid the cost and complexities of point to point integrations

Deal with and manage data quality upstream at the source and throughout the data life cycle vs. having end users fix unforeseen data quality errors manually.

Create a centralized source of shared business reference and master data that can manage a consistent record across heterogeneous systems such as well asset/material information (wellhead, field, pump, valve, etc.), employee data (drill/reservoir engineer, technician), location data (often geo-spatial), and accounting data (for financial roll-ups of cost, production data).

Establish standards and repeatable best practices by adopting an Integration Competency Center frame work to support the integration and sharing of data between operational and analytical systems.

In summary, low oil prices have a direct and positive impact to consumers especially during the winter season and holidays and I personally hope they continue for the unforeseeable future given that prices were double just a year ago. Unfortunately, no one can predict future energy prices however one thing is for sure, the demand for great data by Oil and Gas companies will continue to grow. As such, CIO’s and Enterprise Architects will need to consider and recognize the importance of improving their data management capabilities and technologies to ensure success in 2015. How ready are you?

Every two years, the typical company doubles the amount of data they store. However, this Data is inherently “dumb.” Acquiring more of it only seems to compound its lack of intellect.

When revitalizing your business, I won’t ask to look at your data – not even a little bit. Instead, we look at the process of how you use the data. What I want to know is this:

How much of your day-to-day operations are driven by your data?

The Case for Smart Data

I recently learned that 7-Eleven Japan has pushed decision-making down to the store level – in fact, to the level of clerks. Store clerks decide what goes on the shelves in their individual 7-Eleven stores. These clerks push incredible inventory turns. Some 70% of the products on the shelves are new to stores each year. As a result, this chain has been the most profitable Japanese retailer for 30 years running.

Click to enlarge

Instead of just reading the data and making wild guesses on why something works and why something doesn’t, these clerks acquired the skill of looking at the quantitative and the qualitative and connected dots. Data told them what people are talking about, how it’s related to their product and how much weight it carried. You can achieve this as well. To do so, you must introduce a culture that emphasizes discipline around processes. A disciplined process culture uses:

A template approach to data with common processes, reuse of components, and a single face presented to customers

Employees who consistently follow standard procedures

If you cannot develop such company-wide consistency, you will not gain benefits of ERP or CRM systems.

Make data available to the masses. Like at 7-Eleven Japan, don’t centralize the data decision-making process. Instead, push it out to the ranks. By putting these cultures and practices into play, businesses can use data to run smarter.

I have been re-reading Enterprise Architecture as Strategy from the MIT Center for Information Systems Research (CISR).* One concept that they talk about that jumped out at me was the idea of a “Foundation for Execution.” Everybody is working to drive new business initiatives, to digitize their businesses, and to thrive in an era of increased technology disruption and competition. The ideas around a Foundation for Execution in the book are a highly practical and useful framework to deal with these problems.

This got me thinking: What is the biggest bottleneck in the delivery of business value today? I know I look at things from a data perspective, but data is the biggest bottleneck. Consider this prediction from Gartner:

“Gartner predicts organizations will spend one-third more on app integration in 2016 than they did in 2013. What’s more, by 2018, more than half the cost of implementing new large systems will be spent on integration. “

When we talk about application integration, we’re talking about moving data, synchronizing data, cleansing, data, transforming data, testing data. The question for architects and senior management is this: Do you have the Data Foundation for Execution you need to drive the business results you require to compete? The answer, unfortunately, for most companies is; No.

All too often data management is an add-on to larger application-based projects. The result is unconnected and non-interoperable islands of data across the organization. That simply is not going to work in the coming competitive environment. Here are a couple of quick examples:

Many companies are looking to compete on their use of analytics. That requires collecting, managing, and analyzing data from multiple internal and external sources.

Many companies are focusing on a better customer experience to drive their business. This again requires data from many internal sources, plus social, mobile and location-based data to be effective.

When I talk to architects about the business risks of not having a shared data architecture, and common tools and practices for enterprise data management, they “get” the problem. So why aren’t they addressing it? The issue is that they find that they are only funded to do the project they are working on and are dealing with very demanding timeframe requirements. They have no funding or mandate to solve the larger enterprise data management problem, which is getting more complex and brittle with each new un-connected project or initiative that is added to the pile.

Studies such as “The Data Directive” by The Economist show that organizations that actively manage their data are more successful. But, if that is the desired future state, how do you get there?

Changing an organization to look at data as the fuel that drives strategy takes hard work and leadership. It also takes a strong enterprise data architecture vision and strategy. For fresh thinking on the subject of building a data foundation for execution, see “Think Data-First to Drive Business Value” from Informatica.

* By the way, Informatica is proud to announce that we are now a sponsor of the MIT Center for Information Systems Research.

This post was written by guest author Justin Passofaro, Principal Data Management Consultant at SSG, a consulting practice focused on innovative ways to leverage data for better business decisions.

Configuring your Oracle environment for using PowerExchange CDC can be challenging, but there are some best practices you can follow that will greatly simplify the process. There are two major factors to consider when approaching this: latency requirements for your data and the ability to restart your environment.

Data Latency Requirements

The first factor that will effect latency of your data is the location of your PowerExchange CDC installation. From a best practice perspective, it is optimal to install the PowerExchange Listener on the source database server as this eliminates the need to pass data across the network and will provide the smallest amount of latency from source to target.

The volume of data that PowerExchange CDC has to process can also have a significant impact on performance. There are several items in addition to the changed data that can effect performance, including, but are not limited to, Oracle catalog dumps, Oracle workload monitor customizations and other non-Oracle tools that use the redo logs. You should conduct a review of all the processes that access Oracle redo logs, and make an effort to minimize them in terms both volume and frequency. For example, you could monitor the redo log switches and the creation of archived log files to see how busy the source database is. The size of your production archive logs and knowing how often they are being created will provide the information necessary to properly configure PowerExchange CDC.

Environment Restart Ability

When certain changes are made to the source database environment, the PowerExchange CDC process will need to be stopped and restarted. The amount of time this restart takes should be considered whenever this needs to occur. PowerExchange CDC must be restarted when any of the following changes occur:

– A change is made to the schema or a table that is part of the CDC process

– An existing Capture Registration is changed

– A change is made to the PowerExchange configuration files

– An Oracle patch is applied

– An Operating System patch or upgrade is applied

– A PowerExchange version upgrade or service pack is applied

If using the CDC with LogMiner, a copy of the Oracle catalog must be placed on the archive log in order to function properly. The frequency of these copies is site-specific and will have an impact on the amount of time it will take to restart the CDC process.

Once your PowerExchange CDC process is in production, any changes to the environment must have extensive impact analysis performed to ensure the integrity of the data and the transactions remains intact upon restart. Understanding the configurable parameters in the PowerExchange configuration files that will assist restart performance is of the utmost importance.

Even with the challenges presented when configuring PowerExchange CDC for Oracle, there are trusted and proven methods that can significantly improve your ability to complete this process and have real time or near real time access to your data. At SSG, we’re committed to always utilizing best practice methodology with our PowerExchange Baseline Deployments. In addition, we provide in-depth knowledge transfer to set end users up with a solid foundation for optimizing PowerExchange functionality.

I know I normally don’t write about topics like this, but this topic just kept coming up for some reason in the past few customer visits I had, so I thought I would share what I recently learned about mainframe connectivity with you.

Ah yes, the Old Mainframe. It just won’t go away. Which means there is still valuable data sitting in it. And that leads to a question that I have been asked about repeatedly in the past few weeks, about why an organization should use a tool like Informatica PowerExchange to extract data from a mainframe when you can also do it with a script that extracts the data as a flat file.

So below, thanks to Phil Line, Informatica’s Product Manager for Mainframe connectivity, are the top ten reasons to use PowerExchange over hand coding a flat file extraction.

1) Data will be “fresh” as of the time the data is needed – not already old based on when the extraction was run.

2) Any data extracted directly from files will be as the file held it, any additional processes needed to run in order to extract/transfer data to LUW could potentially alter the original formats.

3) The consuming application can get the data when it needs it; there wouldn’t be any scheduling issues between creating the extract file and then being able to use it.

4) There is less work to do if PowerExchange reads the data directly from the mainframe, data type processing as well as potential code page issues are all handled by PowerExchange.

5) Unlike any files created with ftp type processes, where problems could cut short the expected data transfer, PowerExchange/PowerCenter provide log messages so as to ensure that all data has been processed.

6) The consumer has the capacity only to select the data that is needed for the consumer application, use of filtering can reduce the amount of data being transferred as well as any potential security aspects.

7) Any data access of mainframe based data can be secured according to the security tools in place on the mainframe; PowerExchange is fully compliant to RACF, ACF2 & Top-Secret security products.

8) Using Informatica’s PowerExchange, along with Informatica consuming tools (PowerCenter, Mercury etc.) provides a much simpler and cleaner architecture. The simpler the architecture the easier it is to find problems as well as audit the processes that are touching the data.

9) PowerExchange generally can help avoid the normal bottlenecks associated to getting data off of the mainframe, programmers are not needed to create the extract processes, new schedules don’t need to be created to ensure that the extracts run, in the event of changes being necessary they can be controlled by the Business group consuming the data.

10) Helps control mainframe data extraction processes that are still being run but from which no one uses the generated data as the original system that requested the data has now become obsolete.

This fable has nothing to do with Big Data, but instead deals with an Overabundance of Food and how to better digest it to make it useful.

And it all started when this SEO copywriter from IT Corporation walked into a bar, pub, grill, restaurant, liquor establishment, and noticed 2 large crowded tables. After what seemed like an endless loop, an SQL programmer sauntered in and contemplated the table problem. “Mind if I join you?”, he said? Since the tables were partially occupied and there were no virtual tables available, the host looked on the patio of the restaurant at 2 open tables. “Shall I do an outside join instead?” asked the programmer? The host considered their schema and assigned 2 seats to the space.

The writer told the programmer to look at the menu, bill of fare, blackboard – there were so many choices but not enough real nutrition. “Hmmm, I’m hungry for the right combination of food, grub, chow, to help me train for a triathlon” he said. With that contextual information, they thought about foregoing the menu items and instead getting in the all-you-can-eat buffer line. But there was too much food available and despite its appealing looks in its neat rows and columns, it seemed to be mostly empty calories. They both realized they had no idea what important elements were in the food, but came to the conclusion that this restaurant had a “Big Food” problem.

They scoped it out for a moment and then the writer did an about face, reversal, change in direction and the SQL programmer did a commit and quick pivot toward the buffer line where they did a batch insert of all of the food, even the BLOBS of spaghetti, mash potatoes and jello. There was far too much and it was far too rich for their tastes and needs, but they binged and consumed it all. You should have seen all the empty dishes at the end – they even caused a stack overflow. Because it was a batch binge, their digestive tracts didn’t know how to process all of the food, so they got a stomach ache from “big food” ingestion – and it nearly caused a core dump – in which case the restaurant host would have assigned his most dedicated servers to perform a thorough cleansing and scrubbing. There was no way to do a rollback at this point.

It was clear they needed relief. The programmer did an ad hoc query to JSON, their Server who they thought was Active, for a response about why they were having such “big food” indigestion, and did they have packets of relief available. No response. Then they asked again. There was still no response. So the programmer said to the writer, “Gee, the Quality Of Service here is terrible!”

Just then, the programmer remembered a remedy he had heard about previously and so he spoke up. “Oh, it’s very easy just <SELECT>Vibe.Data.Stream from INFORMATICA where REAL-TIME is NOT NULL.”

Informatica’s Vibe Data Stream enables streaming food collection for real-time Big food analytics, operational intelligence, and traditional enterprise food warehousing from a variety of distributed food sources at high scale and low latency. It enables the right food ingested at the right time when nutrition is needed without any need for binge or batch ingestion.

And so they all lived happily ever after and all was good in the IT Corporation once again.

When the average person hears of cloning, my bet is that they think of the controversy and ethical issues surrounding cloning, such as the cloning of Dolly the sheep, or the possible cloning of humans by a mad geneticist in a rogue nation state. I would also put money down that when an Informatica blog reader thinks of cloning they think of “The Matrix” or “Star Wars” (that dreadful episode II Attack of the Clones). I did. Unfortunately.

But my pragmatic expectation is that when Informatica customers think of cloning, they also think of Data Cloning software. Data Cloning software clones terabytes of database data into a host of other databases, data warehouses, analytical appliances, and Big Data stores such as Hadoop. And just for hoots and hollers, you should know that almost half of all Data Integration efforts involve replication, be it snapshot or real-time, according to TDWI survey data. Survey also says… replication is the second most popular — or second most used — data integration tool, behind ETL.

Cloning should be easy and very natural. It’s an important part of life (at your job). However, we can all admit that it is also a process that causes many a headache and ruins many a relationship.

Do your company’s cloning tools work with non-standard types? Know that Informatica cloning tools can reproduce Oracle data to just about anything on 2 tuples (or more). We do non-discriminatory duplication, so it’s no wonder we especially fancy cloning the Oracle! (a thousand apologies for the bad “Matrix” pun)

Just remember that data clones are an important and natural component of business continuity, and the use cases span both operational and analytic applications. So if you’re not cloning your Oracle data safely and securely with the quality results that you need and deserve, it’s high time that you get some better tools.

Send in the Clones

With that in mind, if you haven’t tried to clone before, for a limited time, Informatica is making Fast Clone database cloning trial software product available for a free download. Click here to get it now.