In his latest Big Dig, Butte (along with his colleagues) has combed through mountains of electronically available data to identify molecular idiosyncrasies linking specific diseases to easily observed traits that on first glance wouldn’t be thought to have any such connection. The results, written up in a study published in Science Translational Medicine, may allow relatively non-invasive predictions of impending disorders.

For example, who would think that magnesium levels in the blood might be an early-warning marker for gastric cancer? Or that platelet counts in a blood sample would predict a coming diagnosis of alcohol dependency? Or that a high PSA reading, typically associated with potential prostate cancer, would turn out to be predictive of lung cancer? Or that a high red-blood-cell count might presage the development of actute lymphoblastic leukemia?

Answer: No one. That’s the beauty of Big Data. You find out stuff you were never specifically looking for in the first place. It just pops out at you in the form of a high, if initially inexplicable, statistical correlation.

But by cross-referencing voluminous genetic data implicating particular gene variants in particular diseases with equally voluminous data associating the same gene variants with other, easily measured traits typically considered harmless, Butte and his associates were able to pick out a number of such connections, which they then explored further by accessing anonymized electronic medical records from Stanford Hospital and Clinics, Columbia University, and Mount Sinai School of Medicine. “We indeed found that some of these interesting genetic-based predictions actually held up,” Butte told me.

Because checking blood levels of one or another substance is far simpler and less invasive than doing a biopsy, and because altered levels of the substance may appear well before observable disease symptoms, this approach may lead to early, more inclusive and less expensive diagnostic procedures.

Butte is one of the speakers at Stanford’s upcoming Big Data in Biomedicine conference. Registration for the May 21-24 event is open on the conference website.