Big Data Fakers: 5 Warning Signs

Data falsification at research institutions to make results look better is nothing new. Here's what it can teach us about misuse of big data in business.

Big Data Analytics Masters Degrees: 20 Top Programs

(click image for larger view and for slideshow)

Data fabrication and falsification pose a major problem in academic research, especially for projects funded by government agencies. Large fines and moratoria for researchers await those individuals and institutions caught cheating. The extent to which this problem also occurs in the amorphous world of big data is difficult to assess, but worth evaluating given the embarrassments in academia and the likelihood that motivations to cheat are universal.

Universities are increasingly cognizant of the problem and their compliance offices are taking aggressive steps to demonstrate to funding agencies that they are vigilant in handling the problem proactively.

At the University of Central Florida, in response to a request from senior management for a seminar on data fabrication and falsification, I developed a two-hour module addressing scientific misconduct and compensatory measures. Graduate students are required to attend the seminar to be officially admitted to Ph.D. candidacy.

Although I suspect InformationWeek readers are better informed than our Ph.D. students about data fabrication and falsification scandals, I thought I'd share some of the preliminary conclusions from my seminar on academia's dealings with data misconduct.

Here are some of the most egregious cases I came across. We'll start with five all-star perps, researchers who have made an embarrassing name for themselves by falsifying or misrepresenting data. Then we'll move on to five types of big-data people or scenarios that should make you suspicious enough to do some additional digging.

1. Eric "Massage Muscles not Data" Poehlman.

This University of Vermont kinesiologist was the first researcher to earn a federal prison term -- 366 days -- owing to extensive data fabrications. If the data did not support his hypothesis, he changed it to suit his purposes. Credit should be given to his graduate student/technician Walter deNiro, who had the courage and fortitude to question the honesty of his supervisor's analyses. Poehlman cited the need to fund his lab as motivation for tampering with the data to keep the funding flowing.

2. Yoshitaka "Retracto" Fujii.

Fujii, an anethesiologist at Toho University, likely holds the all-time record of retractions of papers with 172 found to be bogus by an expert panel and thus in various stages of retraction. The panel found that 126 of his randomized controlled studies -- double blind, no less -- "were totally fabricated." Some of his co-authors were in fact unaware that they were even co-authors because he forged their signatures.

3. Dipak "Sommelier" Das.

Das, a researcher at the Cardiovascular Research Center at the University of Connecticut, avoided detection for many years because the results of his studies – a glass of red wine per day is good for health -- was so comforting. Who wanted to overturn this result? He eventually was caught tinkering with Western blots, a type of figure for identifying proteins. Das unsuccessfully tried to transfer the blame to his students, one of whom admitted that he changed a figure the way Das wanted him to.

4. Diederik "Media Dude" Stapel.

This Tilburg University researcher studied human phenomena of great topical interest -- bias and stereotypes -- leading to numerous interviews with the mainstream media regarding his findings. Unfortunately, as the sole proprietor of his data, much of it faked from his office, it took years before his falsehoods were discovered.

5. Eric "Not So" Smart.

For at least 10 years, Smart falsified data in grant proposals and publications in his areas, cardiovascular disease and diabetes. A key problem area was again Western blots, and he also reported results on genetically engineered mice – "knockout" mice -- that did not exist. Some of these publications garnered over one hundred citations and he drew funding to the University of Kentucky to the tune of $8 million. Smart resigned from the university and evidently works now as a science teacher in the Lexington area.

These are just five of the bad actors among many possible world-class data fabricators or manipulators we might not know about. The Department of Health and Human Services maintains a list that currently has 43 individuals with active administrative actions against them, a data falsification wall of shame if you will. Publicizing the guilty parties, their crimes and the corresponding penalties is in stark contrast to the old days of handling data fraud cases internally and quietly -- and ineffectively.

Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.

Why should big data be more difficult to secure? In a word, variety. But the business won’t wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.