Top challenges for real-time data discovery…and how we solve them

Analysts set out to reach “Eureka!” moments through big data discovery. They endeavor to use data to iteratively search for relationships and patterns that lead to new insights, new questions and different paths of inquiry. In short, they embark on a mission of discovery using data.

They will inevitably deal with significant challenges if they attempt big data discovery using traditional data analytics processing. You see, traditional analytics processing is all about orderly data. Know what information you’ll need in advance. Line it all up in rows and columns. Build a schema; predict the sources; plan the reports; control the ad-hoc queries.

Discovery, on the other hand, is “messy.” Always has been. That’s why the big “Eureka!” moments from history are mostly accidents. Purposeful discovery done today with data is no different in this regard. It’s answering one question to come up with another, going down one path of inquiry to end up on a new one, and postulating a new theory after validating — or not — the idea that came before.

That’s why this thing called data discovery is hard. Traditional data analytics constrain it. And that’s why we built the Urika-GD™ data discovery appliance, a solution that melds what data discovery truly is with the hardware and software combination to truly deliver it.

Here’s how the Urika-GD appliance addresses the top three challenges of big data discovery.

Discovery cannot know all the data relationships in advance. That’s the essence of discovery – to find these out as you go, as you add more and diverse sources of data to your analytics engine, and as you perform pattern-matching queries to surface previously unknown linkages between the data.

The Urika-GD appliance solves this with a schema-free, in-memory graph analytics database. You can add structured, semistructured and unstructured data as you ingest it. The system’s powerful graph processing engine will bring your growing set of data relationships to the surface in response to your queries.

Discovery wants to ask questions followed by more questions. It is an iterative process that depends on real-time responses to explore data relationships and patterns. These types of database queries are by definition ad-hoc, crafted on the fly, with no consideration for being “well-behaved” within a schema.

The Urika-GD appliance has a special hardware accelerator tuned to optimize its large, shared memory and massively multithreaded architecture. Results from queries are returned in real time; performance remains predictable even as the data model grows, freeing the inquisitor to seamlessly follow breadcrumbs to “Eureka!”

Discovery doesn’t access data in a predictable way. Its access never shows a pattern. You can’t know what to pre-fetch or cache. But it can’t be left this way. There’s a lot of data access going on to find all those relationships and to expose all those patterns within massive amounts of constantly changing data. And real-time response is still required.

The Urika-GD system meets this demand with a data model held completely in memory, one that can scale up to 512 TB if needed. But it doesn’t stop there. Finding all the linkages among that data requires a lot of fetching and processing. That’s why the Urika appliance can go up to 8,192 graph accelerator processors, each one doing 128 independent threads of work at the same time.

We can’t change the essence of discovery to fit an analytics engine and still call it discovery. We get it. That’s why we built the Urika-GD real-time platform for big data discovery.

Purpose built: Cray does data discovery better than anyone

Cray has packaged the power of real-time data discovery into an enterprise-ready big data analytics appliance. The Urika-GD™ data discovery appliance ingests data and optimizes discovery from multiple, arbitrary sources, allowing your analysts to quickly load data, test hypotheses and reconfigure datasets without the need of a schema or having to craft complex, cumbersome SQL queries.

The Urika-GD appliance’s schema-free architecture, large shared memory and highly scalable I/O enables the fusion of diverse datasets without the analyst having to first lay out the data or predict the relationships – or even know all the questions to ask upfront. It’s truly an engine that delivers on the promise of big data to accelerate finding linkages between, and gaining new insights from, disparate sources of information and data.

Cray has delivered the massive analytic processing power required by scientists and engineers for more than 40 years. We’ve harnessed those decades of analytic processing know-how to provide organizations of all kinds with the Urika data discovery platform. The Urika-GD appliance finds the unknown linkages between data that sets you on your way to new insights…new discoveries…maybe even amazing breakthroughs.

Easy to deploy into your existing analytical infrastructure

We get it. That’s why we built our Urika-GD™ data discovery solution as an appliance. We’ve already built the discovery engine that’s right for big data. We’ve already architected the large, high performance memory, extreme processing power and massive multithreading required to surface data relationships and patterns from massive amounts of diverse data sources.

We’ve already integrated our powerful purpose-built hardware with a software stack that knows how to get every cycle out of it for the real-time performance demanded by big data discovery. Our graph analytics database is tuned specifically to fully utilize the massive multithreading and highly parallel processing capabilities of our discovery engine.

We know you’ve got to integrate data into our discovery platform from many different sources. And that you likely have to consider offloading processing from other analytic solutions. That’s why our software stack is standards based, adhering to the W3C specifications for RDF and SPARQL used for graph analytics processing.

All of this adds up to a complete package we deliver where all the hardware, software and management tools for big data discovery are already integrated, tested and optimized. You get an easy-to-deploy, purpose-built appliance that delivers rapid time to value and a single point of support.

Your analysts can be up and running within hours of your Urika-GD appliance installation.

(Matt Aslett, May 23, 2013) Having recently completed its first year of operation, Cray's YarcData graph database-appliance business has now established its place in the database landscape, providing a platform for the discovery of relationships between data as part of exploratory analysis approaches that promise to encourage businesses to ask new questions, and to reveal new business insights.