Regenstrief’s nDepth is artificial intelligence-powered text analytics technology. It was developed within the Indiana Health Information Exchange, the largest and oldest HIE in the country. Regenstrief fine-tuned nDepth through extensive and repeated use, searching more than 230 million text records from more than 17 million patients. The goal of the partnership is to speed improvements in patient care by unlocking the unstructured data within electronic health records. Health Catalyst will incorporate nDepth into its data analytics platform in use by health systems that together serve 85 million patients across the country.

In addition, clinicians are contributing their knowledge to build and curate clinical domain expertise and phenotype libraries to augment the platform. Another worthy contributor is Memorial Hospital at Gulfport, which was a co-development partner and was the first to implement the Health Catalyst/ nDepth system.

Based in Indianapolis, the Regenstrief Institute was founded in 1969 with a mission—to facilitate the use of technology to improve patient care. Launched in 2008, Health Catalyst is much younger but holds a similar purpose—to improve healthcare with data analysis and information sharing technologies. That enterprise is based in Salt Lake City.

Healthcare is one of the industries that people imagine can be revolutionized by new technology. Digital electronic medical records, faster, more accurate diagnostic tools, and doctors having the ability to digest piles of data in minutes are some of the newest and best advances in medicine. Despite all of these wonderful improvements, healthcare still lags behind other fields transforming their big data into actionable, usable data. Inside Big Data shares the article, “How NLP Can Help Healthcare ‘Catchup’” discusses how natural language processing can help the healthcare industry make more effective use of their resources.

The reason healthcare lags behind other fields is that most of their data is unstructured:

This large realm of unstructured data includes qualitative information that contributes indispensable context in many different reports in the EHR, such as outside lab results, radiology images, pathology reports, patient feedback and other clinical reports. When combined with claims data this mix of data provides the raw material for healthcare payers and health systems to perform analytics. Outside the clinical setting, patient-reported outcomes can be hugely valuable, especially for life science companies seeking to understand the long-term efficacy and safety of therapeutic products across a wide population.

Natural language processing relies on linguistic algorithms to identify key meanings in unstructured data. When meaning is given to unstructured data, then it can be inserted into machine learning algorithms. Bitext’s computational linguistics platform does the same with its sentimental analysis algorithm. Healthcare information is never black and white like data in other industries. While the unstructured data is different from patient to patient, there are similarities and NLP helps the machine learning tools learn how to quantify what was once-unquantifiable.

Version 4 of the Digital Reasoning platform released on Tuesday (June 21) is based on proprietary analytics tools that apply deep learning neural network techniques across text, audio and images. Synthesys 4 also incorporates behavioral analytics based on anomaly detection techniques.

The upgrade also reflects the company’s push into user and ‘entity’ behavior analytics, a technique used to leverage machine learning in security applications such as tracking suspicious activity on enterprise networks and detecting ransomware attacks. ‘We are especially excited to expand into the area of entity behavior analytics, combining the analysis of structured and unstructured data into a person-centric, prioritized profile that can be used to predict employees at risk for insider threats,’ Bill DiPietro, Digital Reasoning’s vice president of product management noted in a statement.

The platform has added Spanish and Chinese to its supported languages, which come with syntactic parsing. There is also now support for Elasticsearch, included in the pursuit of leveraging unstructured data in real time. The company emphasizes the software’s ability to learn from context, as well as enhanced tools for working with reports.

Digital Reasoning was founded in 2000, and makes its primary home in Nashville, Tennessee, with offices in Washington, DC, and London. The booming company is also hiring, especially in the Nashville area.

I heard an interesting idea the other idea. Most parents think that when their toddler can figure out how to use a tablet that he or she is a genius, but did you ever consider that the real genius is the person who actually designed the tablet’s interface? Soon a software developer will be able to think their newest cognitive system is the next Einstein or Edison says Computer World in the article, “Machines Will Learn Just Like A Child, Says IBM CEO.”

IBM’s CEO Virginia Rometty said that technology is to the point where machines are almost close to reasoning. Current cognitive systems are now capable of understanding unstructured data, such as images, videos, songs, and more.

” ‘When I say reason it’s like you and I, if there is an issue or question, they take in all the information that they know, they stack up a set of hypotheses, they run it against all that data to decide, what do I have the most confidence in, ‘ Rometty said. The machine ‘can prove why I do or don’t believe something, and if I have high confidence in an answer, I can show you the ranking of what my answers are and then I learn.’ ”

The cognitive systems learn more as they are fed more data. There is a greater demand for machines that can process more data and are “smarter” and handle routines that make it useful.

The best news about machines gaining the learning capabilities of a human child is that they will not replace an actual human being, but rather augment our knowledge and current technology.

“Organizations have to study both structured and unstructured data to arrive at meaningful business decisions…. Not only do they have to analyze information provided by consumers and other organizations, information collected from devices must be scrutinized. This must be done not only to ensure that the organization is on top of any network security threats, but to also ensure the proper functioning of embedded devices.

“While sifting through vast amounts of information can look like a lot of work, there are rewards. By reading large, disparate sets of unstructured data, one can identify connections from unrelated data sources and find patterns. What makes this method of analysis extremely effective is that it enables the discovery of trends; traditional methods only work with what is already quantifiable, while looking through unstructured data can cause revelations.”

The nine steps presented in the article begin at the beginning (“make sense of the disparate data sources”) and ends at the logical destination (“obtain insight from the analysis and visualize it”.) See the article for the steps in between and their descriptions. A few highlights include designating the technology stack for data processing and storage, creating a “term frequency matrix” to understand word patterns and flow, and performing an ontology evaluation.

Writer Salil Godika concludes with a reminder that new types of information call for new approaches, including revised skillsets for data scientists. The ability to easily blend and analyze information from disparate sources in a variety of formats remains the ultimate data-analysis goal.

The article on Datamation titled Big Data: 9 Steps to Extract Insight Unstructured Data explores the process of analyzing all of the data organizations collect from phone calls, emails and social media. The article stipulates that this data does contain insights into patterns and connections important to the company. The suggested starting point is deciding what data needs to be analyzed, based on relevance. At this point, the reason for the analysis and what will be done with the information should be clear. After planning on the technology stack the information should be kept in a data lake. The article explains,

“Traditionally, an organization obtained or generated information, sanitized it and stored it away… Anything useful that was discarded in the initial data load was lost as a result… However, with the advent of Big Data, it has come into common practice to do the opposite. With a data lake, information is stored in its native format until it is actually deemed useful and needed for a specific purpose, preserving metadata or anything else that might assist in the analysis.”

The article continues with steps 5-9, which include preparing the data for storage, saving useful information, ontology evaluation, statistical modeling and finally, gaining insights from the analysis. While an interesting breakdown of the process, the number of steps in the article might seem overwhelming for companies in a hurry and not technically robust.

Search the site

Stephen E. Arnold monitors search, content processing, text mining
and related topics from his high-tech nerve center in rural Kentucky.
He tries to winnow the goose feathers from the giblets. He works with colleagues
worldwide to make this Web log useful to those who want to go
"beyond search". Contact him at sa [at] arnoldit.com. His Web site
with additional information about search is arnoldit.com.