With the risk of having the comment area on this blog filled up with SQL statements I will follow the track and tone from the last post called Create Table Homo_Sapiens.

In the last post some challenges around modelling people in databases was discussed with focus on uniqueness. Now we will have a look at the same challenges with companies – the other big part of party master data.

Companies often act in the same role as individual people in business processes – not at least in the role as a customer. Companies also behave as persons in a lot of ways like being born (establish), change name, relocate, marry (mergers and acquisitions), divorce (split) and decease (dissolve).

All over the world a lot of people spend the days entering and updating the data held on business partners in numerous databases. The world wide sum of B2B connections between a customer and a vendor each entering and maintaining the data about the other resembles (though less aggressive) the grains on a chessboard story:

Last time I checked the D&B WorldBase held more the 150 millions companies. Some are dissolved and fortunately? everyone doesn’t do business with everyone – but as said, the sum of B2B connections is huge and the work in entering and maintaining the master data seems overwhelming.

If we look at one single company and how it may be represented differently in databases around only taking basic data as name and address into account, there will be lots of variations. Even in the same table the same real world company often occupies several rows spelled differently.

One of the most effective methods for avoiding duplicates of company master data is plugging into a business directory. By using an external sourced company ID as a key in your master data you are able to hold a golden record of that real world entity. As a bonus you are offered updates and access to a lot of additional data you would never be able to collect yourself.

9 thoughts on “Select Company_ID from External_Source where possible”

I’m not sure about other countries, but in the UK, if a registered company has been dissolved, you are free to re-use the company name. Perhaps both the dissolved, and the new company ordered office furniture from ‘Office Supplies PLC’. As you stated, please ensure a unique key is derived for each company record.

Another DQ point also worth pointing out, which I found in a rejection table within the ETL process at a UK Retail Bank – If you have customers with ‘,’s in their business name, ensure that your ETL processes are not set up to process as comma-delimited.

Business directory integration most often has to start with that the existing customers/vendors are matched against the business directory in order to assign every row possible with the ID in the business directory, e.g. the Duns_Number.

Name is as Phil remarks of course not a very useful joining element and as Stuart says very often the name and address is not spelled exactly the same in the customer/vendor table to be matched and the business directory.

So theoretically you will have to compare every row in the customer/vendor table with every row in the business directory for fuzzy similarity comparison which should be done a number of times that equals Jim’s cartesian product. Therefore you also have to introduce fuzzy candidate selection – and as a person having built match solutions with the D&B WorldBase, I could talk hours about that subject.

Next step could be that you introduce a function where any new customer/vendor is picked from the business directory that may be stored on premise as a copy or accessed remote (in the cloud). But I have also participated in implementations were a daily production of new party records are automatically matched with the business directory using a webservice.

The D&B WorldBase is of course a very obvious choice if you have international data as the database has a consistent layout for all countries and have world wide business family trees.

Eurocontactpool covers a range of European countries.

For national data I think every country have a range of providers each having some pros and cons.

Most business directories are based on public sector company registrations which have very varying completeness and standardisation between countries – that fact is also seen in the D&B WorldBase.

As u rightly depict ‘golden record of that real world entity’ is what today’s organizations are struggling to achieve and thats truly a business challenge. If D & B or similar other information providers hold a universal database of clean records (i know am too ambitious!!!)it would make life simple for us 🙂

Good post ..
.. there is a few more dimensions that make this issue even more complex, for example:
– companies have different locations (e.g. branch offices) with different addresses, phone numbers, contact persons etc.
– companies are “related” to another .. e.g. company A owns company B or (still more complex) company D owns 50% of company E.
All of this is relevant when evaluating “creditworthiness” but might not even be public knowledge. I’m pretty sure that this information is not available from D&B or anywhere else.
So the next question then is .. what’s the business case for “make vs. buy” of the information? I’m still trying to figure that out …
Thorsten

The points you are making is exactly what I mean with that such business directories are dependent on public sector data in different countries.

For example for Denmark uniqueness is pretty good in public sector data (except for governmental bodies), multiple business addresses are covered and even partly ownership is available.

But surely you will always find some internal registrations that are correct but not found in a business directory. I made a post on that subject a while ago called The art of Business Directory Matching.