Entity Resolution and Record Linkage: AnyConnection

AnyConnection is a customizable approach to solving Entity Resolution and Master Data Management problems. Entity Resolution is a process by which several similar but non-exact records are consolidated into one “golden” record, clustering records into a family for aggregation and identification purposes. It is often needed to cleanse and consolidate vendor and customer data bases that have disparate and non-standard data.

Harnessing the power of SAS software, AnyConnection cleanses name, address, and number fields prior to matching so that the most accurate match can occur. Using a mix of proprietary and time-tested algorithms, our solution is guaranteed to be the best fit for your Entity Resolution needs.

AnyConnectionis offered as a service instead of a canned product. Here is why: many large Entity Resolution and Master Data Management packages cost over $1 Million, and generally a consultant is needed to install, configure, and run the software. We firmly believe that Entity Resolution and Master Data Management is typically not a daily production task and that most entity resolution processes have to be highly customized. Entity Resolution, because of its complexities, lends itself to a service much more readily than Commercial Off-The-Shelf (COTS) software.

Uses

Matching disparate data bases and tables – Joining tables that often do not have a common identifier.

Fraud Detection – Linking records together that have no common elements but are related to each other indirectly via another record or piece of data.

AnyConnectionis very useful for fraud detection because it allows seemingly disparate records to be joined and related, even if they are related only via another record.

For example, suppose you have the following 4 records in your data base.

Record #

Name

Address

Bank Acct #

Phone #

A

John Roberts

101 S. Main St., Fairfax, VA

10122346

703-356-1101

B

Jon Roberts

12235 Rezdec Circle, Fairfax, VA

10122345

703-356-1101

C

Mohamad Habib

789 Wheeler Way, Tulsa, OK

10122345

954-227-0050

D

Haraj Touzec

2245-A Beach Rd, Miami, FL

09773394

954-227-0050

With the human eye, you may be able to tell that all four of the above records are related:

Records A and B are related via similar name and same phone number.

Records B and C are related via same bank account number.

Records C and D are related via same phone number.

With AnyConnection, all of the above four records would be systematically joined together in one cluster for further analysis and investigation.

This allows record A to be related to record D even though they do not relate to each other directly on any piece of data. This kind of record linkage is incredibly useful for identifying collusion, fraud, fictitious entities, duplicate records, and related entities.

Powerful Linkage Mechanisms

With AnyConnection, you can link any record with any other record even if they are related by 20+ degrees of separation. As in the example above, there is nothing relating Record A, “John Roberts”, to record D, “Haraj Touzec”, but they are both related via three degrees of separation. The software can identify tight-knit clusters that are very similar, or looser, larger clusters. The software can match on only one record, such as name, or many records (name, address, bank routing number, phone number, SSN, employee id, etc.).

You might be able to utilize software to determine that record A is related to record B via phone number, but rarely does software have the capability to relate record A to record D when there is absolutely no piece of data in common.

In essence, AnyConnection creates clusters based upon one or many fields. Each field is matched utilizing proprietary fuzzy-matching techniques, so that records with similar names, similar addresses, and other pieces of data, can be linked together.

Cluster analysis, using the farthest neighbor technique, links records together that are linked by at least one common record in between. It analyzes pairs of data and determines if the pairs are the same or similar, then the clustering procedure resolves the pairs of data into larger clusters with related records fitting into one and only one cluster (hard clustering).

The pairs are related via a variety of proprietary fuzzy-matching algorithms.

What Can AnyConnectionDo For You?

Fraud Detection

With AnyConnection, you can identify potential fraud across several industries. For example, in Accounts Payable fraud, perpetrators often create fictitious vendor numbers using a piece of their own personal identifiable information, such as SSN, phone number, bank number, or some variation of their name. Using this software, you can pinpoint the related records and pro-actively identify fraud. Another example, you can utilize AnyConnection to identify importers/exporters who share the same bank account number or who have shipped to the same bad actor. This is very useful in the intelligence field for identifying money laundering schemes and other terrorist-financing networks.

De-DuplicationWith AnyConnection software, you can eliminate duplicate records in your data base. Duplicate records can be a major problem, especially when they lead to duplicate mailings, duplicate payments, and other outcomes that can seriously impact your profit. Data integrity is critical to any good business, therefore eliminating duplicates becomes of paramount importance. Using our proprietary name and address-matching algorithms, we can eliminate unnecessary duplicate records that are bogging down your system.

Entity Resolution

Entity resolution, also known as Record Linkage, is the process of merging and purging records so that records are grouped in one “family”. An “Entity” can be a person’s name, a company name, an address, a city name, a description in a financial transaction, or a phrase. Basically, an “Entity” can be anything that has text in it that you would like to aggregate upon. This is useful in any line of business. For example, at Customs and Border Protection (CBP), they might want to know if a package is going to an urban or rural area. But if the city name is spelled 1,000 different ways, how can you even begin to aggregate without a standardized city name? This was a real situation I encountered, where there were over 8,000 spellings of “Buenos Aires”, because of a multitude of mail drop precinct numbers in the city name. I had to find a way to resolve the 8,000 “Buenos Aires” names into one family, so that CBP could identify whether a package was going to an urban or rural area. This was another way Entity Resolution solved a relevant problem: Homeland Security.

With AnyConnection, you can link all Entities together via our matching software, and then cluster the entities together into one master entity name. Take the following example:

Record #

Name

A

Hewlett Packard

B

HP, Inc.

C

Hewlett-Packard Corp

D

HP Corporation

E

Hewlet Pakkard

F

Hewwlet Packerd

All of the above spellings are obviously HP to the human eye. But how do we tell the system they are all HP so that we can link HP or unlink HP from the shipping activity we are seeing? There is no drop-down box on shipping export declaration documents, so the variations on name spelling truly can be 1,000+. In fact, we encountered a situation where the number of variations on the city name “Buenos Aires” were OVER 8,000. This was definitely a case where AnyConnection could cluster the city names together.

Another example where entity resolution is used widely is for Vendor File Cleanup and Maintenance.

Data Reconciliation

With AnyConnection, you have the capability to match data from an unlimited number of disparate sources. For example, suppose you have names and addresses in one file, but only addresses in another file, with no key to link the two data bases together. This is a common problem with data reconciliation efforts. The two data bases can be linked together using AnyConnection’s proprietary address-matching module. And, because the software is powered by SAS, it is robust enough to handle millions of records.