How Machine Learning Could Detect Medicare Fraud

Machine learning could become a new weapon in the fight against Medicare fraud.

Machine learning could become a useful tool in helping to detect Medicare fraud, according to a new study, potentially reclaiming anywhere from $19 billion to $65 billion lost to fraud each year.

Researchers from Florida Atlantic University’s College of Engineering and Computer Science recently published the world’s first study using Medicare Part B data, machine learning and advanced analytics to automate fraud detection. They tested six different machine learners on balanced and imbalanced data sets, ultimately finding the RF100 random forest algorithm to be most effective at identifying possible instances of fraud. They also found that imbalanced data sets are more preferable than balanced data sets when scanning for fraud.

“There are so many intricacies involved in determining what is fraud and what is not fraud, such as clerical error,” Richard A. Bauder, senior author and a Ph.D. student at the school, said. “Our goal is to enable machine learners to cull through all of this data and flag anything suspicious. Then we can alert investigators and auditors, who will only have to focus on 50 cases instead of 500 cases or more.”

In the study, Bauder and colleagues examined Medicare Part B data from 2012 to 2015, which held 37 million cases, for instances such as patient abuse, neglect and billing for medical services that never occurred. The team narrowed the data set to 3.7 million cases, a number that would still represent a challenge for human investigators who are typically charged with pinpointing Medicare fraud.

The authors used the National Provider Identifier — a unique ID number issued by the government to healthcare providers — to match fraud labels to Medicare Part B data, which comprised provider details, payment and charge information, procedure codes, total procedures performed and medical specialty.

When researchers matched the NPI to the Medicare data, they flagged potentially fraudulent providers in a separate database. How?

“If we can predict a physician’s specialty accurately based on our statistical analyses, then we could potentially find unusual physician behaviors and flag these as possible fraud for further investigation,” Taghi M. Khoshgoftaar, Ph.D., co-author and a professor at the school, said.

So, if a cardiologist were incorrectly labeled a neurologist, that could be a sign of fraud.

Still, the data set itself remained a challenge. The small number of fraudulent providers and the large number of above-board providers made the data set imbalanced, which can fool machine learners. So, using random undersampling, investigators whittled down the set to 12,000 cases, with seven class distributions ranging from severely imbalanced to balanced.

From there, they unleashed their learners and reached their results regarding random forest and class distribution.

Surprisingly, researchers found that keeping the data set 90 percent normal and 10 percent fraudulent was the “sweet spot” for machine-learning algorithms tasked with identifying Medicare fraud. They thought the ratio would need to include more fraudulent providers for the learners to be effective.

A dean at the college of engineering said these machine-learning detection tools could become a “game changer” for Medicare fraud detection.

Inside Digital Health™ delivers the information that healthcare decision makers and physicians need to confidently navigate the digital transformation. We bring you compelling stories about the institutions and individuals who are fomenting positive change — so you can join them in leveraging the tools of healthcare technology and leading the noble quest toward improving patient care and eliminating healthcare waste.