Devoted to the topic of data specification (including data organization, data description, data retrieval and data sharing) in the life sciences and in medicine.

Tuesday, February 12, 2008

Medical autocoding with Perl

In yesterday's blog, I showed a short, simple Ruby script that can provide quick and accurate medical autocoding for medical free-text. I also provided a web site where you could inspect 20,000 PubMed abstract titles and the extracted/coded terms produced by the Ruby autocoder.

Today, I'm providing a web site with the equivalent Perl medical autocoder, along with the public domain output file of 20,000 autocoded PubMed abstracts. Surprisingly (to me) the Perl code executed at about the same speed as the Ruby code. Both autocoders would have significant speed gains if they used the doublet method (which I didn't use here because I wanted to demonstrate the shortest possible scripts). The Perl code is contained on the web page.

- Jules Berman
Science is not a collection of facts. Science is what facts teach us; what we can learn about our universe, and ourselves, by deductive thinking. From observations of the night sky, made without the aid of telescopes, we can deduce that the universe is expanding, that the universe is not infinitely old, and why black holes exist. Without resorting to experimentation or mathematical analysis, we can deduce that gravity is a curvature in space-time, that the particles that compose light have no mass, that there is a theoretical limit to the number of different elements in the universe, and that the earth is billions of years old. Likewise, simple observations on animals tell us much about the migration of continents, the evolutionary relationships among classes of animals, why the nuclei of cells contain our genetic material, why certain animals are long-lived, why the gestation period of humans is 9 months, and why some diseases are rare and other diseases are common. In “Armchair Science”, the reader is confronted with 129 scientific mysteries, in cosmology, particle physics, chemistry, biology, and medicine. Beginning with simple observations, step-by-step analyses guide the reader toward solutions that are sometimes startling, and always entertaining. “Armchair Science” is written for general readers who are curious about science, and who want to sharpen their deductive skills.

Perl Programming Language

Ruby Programming Language

Methods in Medical Informatics (Korean)

R for Medicine and Biology

About Me

Jules Berman received two baccalaureate degrees from MIT; in Mathematics, and in Earth and Planetary Sciences. He received the Ph.D. from Temple University, and the M.D. from the U. of Miami. He received post-doctoral training at NIH and residency training at Geo. Washington U Med Ctr. He is board certified in anatomic pathology and in cytopathology. He served as Chief of Anatomic Pathology, Surgical Pathology and Cytopathology at the Veterans Administration Medical Center in Baltimore, Maryland, where he held joint appointments at the University of Maryland Medical Center and the Johns Hopkins Medical Institutions. In 1998, he became a Medical Officer at the U.S. National Cancer Institute and served as the Program Director for Pathology Informatics in the Institute's Cancer Diagnosis Program. In 2006, Jules Berman was President of the Association for Pathology Informatics. In 2011 he received the Lifetime Achievement Award from the Association for Pathology Informatics. Today, Jules Berman is a free-lance writer. He has first-authored more than 100 articles and 13 book titles in science and medicine.