Relationship Mining

Data mining six degrees of separation

By: Hari Mailvaganam

Figure 1. Overview of the Sphere of Influence of Relationship Mining Software

I (Hary) recently read an article in the Wall Street Journal (registration required) about the introduction of Relationship Mining software in companies. USA Today also has a similar article. These software applications help companies ‘mine’ workers’ external personal relationships for business prospects.

The goal of these application is to scan the company’s repositories of contact information – such as address books, electronic calendars, e-mail correspondence, instant message contact lists. Upon scanning the contact information the software build maps of all the relationships found in the repositories.

There is no doubt on the potential usefulness of such a product. Most business transactions are initiated through personal introductions and relationships developed over time. Sales cycles can be shorten if a strong relationship is found in a business prospect.

After reading the WSJ article, I started to think on the technical viabilities of relationship mining software and evaluate how does this involve data mining, if at all.

Most of the challenges of ensuring successful relationship mining in an organization involves cooperation from employees to ensure that contact information are stored in the correct format and synchronized frequently with portable devices.
Most organizations have a myriad of repositories that can be difficult to access. These proprietary databases may not follow open standards such as ODBC, OLE-DB. If this is the case, the relationship mining software must have the proper data extraction modules are in place. Data extraction can be run as a schedule process or enabled manually when a search is conducted.

Is there data mining involved?

My first impressions were that there aren’t any of the better known data mining algorithms running in most of the commercial relationship mining software currently available. The old adage of data mining cannot be forgotten – Data mining is pattern discovery of data. Data patterns are central to the discovery of relationships and the relevancy of the relationships.

With a little tweaking, classification and clustering algorithms will be suitable for relationship mining. To test the idea, I created a version using the Microsoft Clustering algorithm provided by SQL Server 2000. The first part of the process was extracting the contact information from Microsoft Outlook and building a data mining model. I exported the contact information from Outlook to a text file with comma separated values and ran a script to import the data into a SQL 2000 database.

Once the data was imported to SQL Server 2000, it was fairly straight forward to build the data mining model. Running clustering passes through the contact information proved feasible once the data mining model was created. For a commercial product, a data visualization layer will sit above the classification results.

Aside from data extraction and cleansing process, the other labor intensive process in relationship mining is the setting up of context relevancy. Having a contact who is the CEO of a business prospect may be more valuable than a contact that is a Data Administrator for some scenarios. However if the relationship miner is intending to sell storage area networks, the Data Administrator contact may be more valuable.

There are a number of challenges to setting up the results. This can be illustrated with an example of a search for “Steve Ballmer, CEO of Microsoft” in the relationship mining software.

Importance is given to the Steve Ballmer listing that has all the contact information filled. Special importance in signifying closeness of relationship if cell phone number and instant messenger buddy list is entered.

Is the Steve Ballmer who is listed as the PTA contact the same at the CEO of Microsoft? This coincidence is too strong to discard. A personal contact without business affiliation can signify a truly close relationship.

A Steve Ballmer listing without any contact information and only business affiliation is given lower priority.

Alternative suggestions to Steve Ballmer are listed. Rob Dobson, employee of Microsoft, is listed as a contact. Can Rob Dobson lead to Steve Ballmer? This is indirect contact that may prove to be of value.

The Steve Ballmer listing without business affiliation or contact information is given lowest priority. Employee B could have omitted to fill in the contact information and may have a strong connection to the CEO of Microsoft.

The value of relationship mining can be a useful tool not only in sales and marketing but also for law enforcement and science research. However there is a threshold barrier under which relationship mining would be not cost effective.

In an organization, of say 15 to 20 employees, it would be more efficient for the marketing manager to e-mail employees asking them if they had a contact for a business prospect that he is working on.