Category Archives: Data Analytics

Post navigation

A new IDC report is recognizing Sinequa for our Cognitive Search & Analytics platform around critical technologies, including machine learning and advanced natural language processing. This Vendor Spotlight looks at how Sinequa leverages artificial intelligence and cognitive computing-based analytics to meet the needs of companies that are looking to address complex problems with easy-to-use, powerful solutions featuring simplified interfaces.

According to the report’s author David Schubmehl, Research Director for IDC’s Cognitive/Artificial Intelligent Systems and Content Analytics research, “The capabilities being offered by cognitive knowledge discovery systems, such as Sinequa, provide many opportunities for enterprises to innovate and advance their organization using approaches that were either not possible or not easily implemented several years ago. Within many enterprises, these opportunities are limited only by the imagination and creativity of those seeking to improve their business and information handling processes.”

The report states that Sinequa’s software provides organizations with real-time, relevant results from unstructured and structured internal data, and that the we are developing our Cognitive Search & Analytics platform on an extensive foundation of unstructured information access technologies that include advanced natural language processing capabilities in 21 different languages.

Schubmehl adds: “While Sinequa has offered a flexible information collection, access, and analysis architecture for many years, it has now built capabilities around cognitive technologies, such as machine learning, advanced natural language processing, improved relevance, and better decision support while offering strong user and data interaction capabilities.”

The advancement of natural language processing and increased maturity of machine learning are creating substantial demand for cognitive search and analytics solutions. At the same time, the growth of unstructured data and pressure to improve worker productivity makes it even more critical to find the right information at the right time. This report highlights the fact that Sinequa’s platform meets this demand and by combining our solution with human ingenuity, we can produce the best possible search and analytics results.

As the data-driven age gives way to an information-driven economy where context is critical to surfacing useful insights from data, taking in relevance feedback from users, especially expert users, will play a major role in driving the benefits. This article explains the concept of a relevance feedback model and why you should care.

What is a Relevance Feedback Model?

Assume you ask a person or a system to provide you with information on a certain topic. There may be many facets to this topic, and you may get information from a whole range of different aspects. If you are working with that person or that system on a permanent basis, you may want to tell them that only certain aspects and hence certain kinds of information are relevant to you – in the hope of getting only the more relevant answers from them the next time you ask. You give the person or system “relevance feedback”.

Now let’s concentrate on a system, a cognitive information retrieval system or a “system of insight”. In that context, a relevance feedback model (RFM) is the capability of the system to take your relevance feedback and “internalize” it in order to tune the results of your future queries to what is most relevant.

The system performs and automates this task by adjusting weights attributed to certain terms and their equivalents (i.e. terms with the same or similar meanings) within the data it processes.
Imagine you asked “what do we have on MRO”, and you got information back on maintenance, repair and operations, but you told the system that you are only interested in anything pertaining to “Mars Reconnaissance Orbiter”. The next time you ask, you will get information only pertaining to the latter and possibly on related topics like Mars landing craft, automated robots for planetary exploration, etc.

For one person and one query, that seem rather simple. But now imagine, that you have tens of thousands of colleagues and thousands of topics to cover. That is when the RFM benefits from machine learning algorithms, not only to detect the preferences of each person but also of groups of people with similar interests, similarities in documents, etc. to spread the user relevance feedback to other documents, queries and people on an ongoing basis in an automated way.

Why use a Relevance Feedback Model?

A key benefit of a relevance feedback model is to enable users, in particular expert users, to affect relevance appropriate to their environment without the IT department having to implement rules for relevance according to specific user groups. It allows administrators to decide by configuration which specific users within the organization will contribute as well as the exact factor of relevancy improvement.

The relevance feedback model can also go a long way towards improving the human-machine interaction. As the relevance of certain content increases significantly due to relevance feedback, the user experience starts to feel much more “conversational” – i.e offering one to three suggestions as “answers” to a query – than a traditional search interface offering a list of documents in response to a query.

The RFM provides a way to discover from everyone’s experience the information that best answers the question. Take the real-world case of a customer service representative (CSR) seeking an answer to a customer’s product question using the product name or code. In this case, the CSR will obtain a diverse set of documents including parts catalogs, how-to information, product specifications, packaging information, marketing material, etc. All of this information is relevant but only some of it may help the CSR answer the customer’s question.

Thanks to the RFM, the CSR would immediately see information she has already viewed when she searched similar things in the past because the RFM takes into account the user’s “click actions” and applies a tiny relevance boost accordingly. Perhaps even more powerfully, the RFM will also modify the order of the results by observing (over time) what information other CSRs spend time to discover, even when they dive deeply into the results list for relevant information. Organizations striving to take full advantage of the RFM will configure it so that the experts’ interactions with the system provide bigger boosts for important content and even ban inaccurate information from appearing in results lists.

As you can see from the example above, the RFM provides a collaborative way to modify search result order. It is neither a tagging nor a classification approach, both of which can be done at indexing time (extracting metadata from source, entity extraction with Natural Language Processing) or afterwards (classification through ML algorithm like clustering, similarity computation, and so forth). The RFM arguably represents a smarter approach by directly incorporating human decisions when presenting information that will best address a user’s query.

As information-driven organizations strive for ever higher degrees of accuracy for end users seeking knowledge, the ability to leverage relevance feedback from users, especially expert users, automatically at scale becomes increasingly mission-critical for optimal business performance.

Forrester, one of the leading analyst firms, defines Cognitive Search in a recent report¹ as: The new generation of enterprise search that employs AI technologies such as natural language processing and machine learning to ingest, understand, organize, and query digital content from multiple data sources. Here is a shorter version, easy to memorize: Cognitive Search = Search + NLP + AI/ML
Of course, “search” in this equation is not the old keyword search but high-performance search integrating different kinds of analytics. Natural Language Processing (NLP) is not just statistical treatment of languages but comprises deep linguistic and semantic analysis. And AI is not just “sprinkled” on an old search framework but part of an integrated, scalable, end-to-end architecture.

AI Needs Data, Lots of Data
For AI and ML algorithms to work well, they need to be fed with as much data you can get at. A cognitive search platform must access the vast majority of data sources of an enterprise: internal and external data of all types, data on premises and in the cloud. Hence the system must be highly scalable.

Continuous Enrichment
Cognitive Search uses NLP and machine learning to accumulate knowledge about structured and unstructured data and about user preferences and behavior. That is how users get ever more relevant information in their work context. To accumulate knowledge, a cognitive search platform needs a repository for this knowledge. We call that a “Logical Data Warehouse” (LDW).

The Strength of Combination
To produce the best possible results, the different analytical methods must be combined, not just executed in isolation of each other. For example, machine learning algorithms deliver much better results much faster if they work on textual data for which linguistic and semantic analyses have already extracted concepts and relationships between concepts.

PHARMA CONNECTION
Sinequa has taken part for the 4th consecutive year in Bio IT World Conference & Expo on May 23-25 in Boston. We’ve been delighted to meet with our Biopharma and Life Science customers and partners at the show and share innovative use cases of our solution for the Pharma industry via live demos.

“OPEN” LIVE DEMOS

Bio IT World conference is always for us a great venue to showcase our platform and present how leading biopharma organizations leverage our Cognitive Search & Analytics platform. This year, the attendees were very interested to see how Sinequa combines advanced Search, NLP and Machine Learning capabilities to extract relevant insight from vast structured and unstructured data silos.

In our joint talk, our customer Alexion shared a testimonial on the implementation of Sinequa for their content analysis project. The presentation highlighted the technology and approaches they used with advanced data visualizations that help explain information sources. ICYMI – please feel free to get your copy here.

UNLIMITED THEATER PRESENTATIONS

Once again, we were very pleased to see the strong interest of many biopharma professionals toward Sinequa insight platform. Our team gave more than a hundred presentations and live demos in the Sinequa Theater Area where they explained a large panel of use cases including R&D Enterprise Search, Clinical Trial Data Discovery & Exploration, Key Opinion Leaders & Subject Matter Experts… .) We hope you enjoyed the conference as much as we did and you could understand how our Cognitive Search & Analytics platform enable leading pharmaceutical organizations drive innovation, accelerate research and shorten drug Time-to-Market. We are already getting excited for next year’s edition! See you all in spring 2018!

Forces of global competition, narrow margins, higher product development costs, and tenuous exclusivity holds drive organizations to push innovation, seek cost cutting strategies, and go-to-market as quickly as possible. Demands change frequently while regulatory and compliance standards become even more stringent. Organizations must keep up, and the pressure on research and development (R&D) never stops. R&D is the critical driver within the organization, whether within a large aircraft manufacturer or a leading automobile company looking to develop cutting edge products and services or a pharmaceutical company accelerating time-to-market for new drugs or a CPG company reinventing waning products. R&D thrives on information: customer information, expert information, product information, scientific information, market information, and competitive information.

To be at the forefront of innovation, R&D departments need complete visibility into both new and historical information across the entire enterprise as well as access to research from external public and premium information services. This is no simple task in today’s world where we are inundated with data — more data, more opportunities and more challenges. As a result, many companies depend on Cognitive Search and Analytics (CS&A) solutions to harness insightful, high-quality information and fuel innovation within their product and solution portfolios.

THE PRESSURE ON R&D

As organizations strive to create value, enhance customer experiences, and differentiate themselves from their competition, they have placed demands on their R&D departments to:

Accelerate delivery of innovative products to market

Optimize and manage available resources and knowledge while leveraging intellectual property

Devise methods to reduce product development costs and eliminate re-work

To meet these demands, R&D depends on complex scientific and engineering content that contains implicit conceptual relationships that can and should be semantically linked to simplify access to the knowledge embedded in that content.

HOW COGNITIVE SEARCH AND ANALYTICS HELPS

Cognitive Search and Analytics solutions amplify the expertise of R&D departments by surfacing insights from data across the enterprise, irrespective of location and format. From a single, secure access point, these solutions enable R&D professionals to unlock relevant and timely product research that helps make informed decisions. In addition, these capabilities are not limited to internal information; users can quickly access information from external Web sites and other applications, deriving relevant information and seamlessly integrating with internal enterprise information.

Cognitive Search and Analytics solutions enable enterprises to maximize the value of their intellectual property. Powerful search relevance and navigation capabilities enable researchers to find valuable pieces of past research and even parallel work going on without each group knowing about the other — eliminating duplicate work, reducing time spent in trials and shortening development cycles. These solutions allow employees to tag, bookmark and comment on documents, enabling collaboration and making teams more innovative, efficient and productive. Surfacing this existing knowledge enables workers to leverage the past work of distant or former researchers to benefit future research. Dynamically delivering relevant information, surfacing knowledge and enabling collaboration can decrease R&D costs significantly. Because R&D departments need to comply with a myriad of complex regulations, they need to be aware of relevant regulations without having to sift through the myriads themselves. This visibility enables R&D to stay abreast of regulatory mandates and efficiently manage compliance. Organizations can also leverage these solutions to send alerts to employees when there are new policy and compliance changes so that relevant R&D stakeholders are immediately notified.

Managing and maintaining product specifications is a critical function within R&D. Cognitive Search and Analytics solutions can access virtually any data source and expose changes when information is deleted or becomes outdated. These solutions can alert workers when any new information is created that impacts their specific process in the development cycle. These solutions also track and respect the access permissions accorded by each target application; only those with the correct privileges can access restricted information. Cognitive Search & Analytics solutions give researchers clear insight into product requirements and enable them to collaboratively develop safer, higher quality products that meet regulatory requirements.

RAPID RETRIEVAL OF RELEVANT INFORMATION MAKES THE DIFFERENCE

Extracting relevant information from vast and complex data volumes is a challenge that requires a sophisticated and scalable solution. The Sinequa Cognitive Search and Analytics platform handles all structured and unstructured data sources and uses Natural Language Processing (NLP), statistical analysis and Machine Learning (ML) to create an enriched “Logical Data Warehouse” (LDW). You can think of it as a repository of information about data and about relationships between data, people, concepts, etc. This LDW is optimized for performance in delivering rapid responses to users’ information needs. Users can ask questions in their native language or ask that relevant information be “pushed” to them in a timely fashion when it emerges. More than 150 connectors ready for use “out of the box” make the process of connecting multiple data sources fast and seamless. Company and industry-specific dictionaries and ontologies can be easily integrated, putting domain-specific knowledge “under the hood” of the Sinequa platform, making it an intelligent partner for anyone in search of relevant information.

With Sinequa, researchers, designers and engineers have immediate access to all the information needed to work productively.

The advanced semantic capabilities within Sinequa’s platform provide strong relevance in 21 different languages to assist organizations with even the most geographically and linguistically diverse workforce.

REAL-WORLD EXAMPLE: AMPLIFYING BIOPHARMA EXPERTISE

Consider one of Sinequa’s biopharma customers, a research-intensive organization dealing with a vast number of highly technical documents, produced both in-house and externally. The information in these documents varies according to the field of its origin – e.g. medical, pharmaceutical, biological, chemical, biochemical, genetic, etc. – and may deal with diseases, genes, drugs/active agents, and mechanisms of action. A lot of the information is textual, but there is also structured information, like molecular structures, formulae, curves, diagrams, etc. The volume of this information is on the order of magnitude of about 500 million documents and billions of database records.

Now consider the more than 10,000 R&D experts within the organization trying to leverage this information daily. They need to be able to ask topical questions, find relevant people and documents, and explore the vast information landscape to discover knowledge. The Sinequa platform supports this by plowing through the hundreds of millions of documents and equally large amounts of structured data, analyzing the data, analyzing the natural language user queries, and classifying results by category in real time. With the data tamed and enriched, it is presented to the user via a simple, intuitive interface with faceted navigation aids that allow the user to filter results further based on structural attributes that are either explicit or were intelligently derived by the system. The interfaces, also referred to as search-based applications (SBAs) are configured to expose functionality that is very specific to an R&D expert, aligning the solution with the goals of the user.

The Sinequa solution has proven to be very valuable to the customer in question, putting both internal and external research–related information that scientists need for research, development, and decision making into a single virtual repository with advanced navigation and retrieval capabilities. It has also proved to be very beneficial to teams of research and development contributors by allowing experts around the world to collaborate more easily through a single research application. Features such as navigation by topic across multiple repositories, de-duplication of similar documents, and improved research capabilities have all made knowledge workers more efficient and innovative.

CONCLUSION

Sinequa’s Cognitive Search & Analytics platform leverages relevant customer and market information to give R&D organizations insight and the ability to react quickly to demands. Teams utilize this platform to collaborate and share information. Sinequa effectively eliminates data silos and delivers relevant information from data to users in their business context, such that they can make better decisions, drive innovation, reduce risk, and be more efficient, which in turn enables forward-thinking R&D departments that thrive on continuous product improvements and introductions to amplify the collective expertise of the organization.