Monthly Archives: June 2014

An IT industry analyst article published by SearchDataCenter.

Graph databases play six degrees of separation to find real connections. See how IT teams can use the database approach for businesses.

Is there a benefit to understanding how your users, suppliers or employees relate to and influence one another? It’s hard to imagine that there is a business that couldn’t benefit from more detailed insight and analysis, let alone prediction, of its significant relationships.

If you have ever drawn dots on a whiteboard and then connected them, you can appreciate that thinking in terms of nodes and links naturally echoes many real-world scenarios. Many of today’s hottest data analysis opportunities for optimization or identifying fraud are represented as a linked web.

Analyzing sets of nodes and the relationships between them is known as graph theory. In a graph database, a common query might be to find all the related objects, based on a certain pattern of relationship, between two and six links away.

Specialized graph databases are a small but fast-growing part of the so-called NoSQL (not only structured query language) database movement. Graph databases are designed to help model and explore a web or graph of relationships in a natural and more productive way than through the traditional relational database approach.

In a graph database, for example, a common query might be to find all the related objects, based on a certain pattern of relationship, between two and six links away. If the same problem was force-fit into normalized tables in an SQL relational model, the translated query would become quite complex and require tens or even hundreds of nested full table joins.

In a relational database query, every required join is going to cause a performance hit. For graph problems of any size, a SQL approach will be demonstrably slower, more complex, prone to error and definitely not as scalable.

Graph databases don’t require a predefined schema; nodes and links can have attributes edited or assigned to them at any time. If a new relationship type is discovered, it can be added to the database dynamically, extending what’s modeled in the database.

In production, IT should be aware of differences in how graph databases scale, how they use memory and how they ingest (and index) data loads.

An IT industry analyst article published by SearchStorage.

IT departments can benefit from its storage vendors eavesdropping on their arrays.

Tired of big data stories? Unfortunately, they’re not likely to stop anytime soon, especially from vendors hoping to re-ignite your Capex spending. But there is some great truth to the growing data explosion, and it pays to consider how that will likely cause incredible changes to our nice, safe, well-understood current storage offerings. For some it might feel like watching a train wreck unfolding in slow motion, but for others it just might be an exciting big wave to surf on. Either way, one of the biggest contributions to data growth will come from the so-called Internet of Things.

Basically, the Internet of Things just means that clever new data sensors (and, in many cases, remote controls) will be added to more and more devices that we interact with every day, turning almost everything we touch into data sources. The prime example is probably your smartphone, which is capable of reporting your location, orientation, usage, movement, and even social and behavioral patterns. If you’re engineering-minded, you can order a cheap Raspberry Pi computer and instrument any given object in your house today, from metering active energy devices to tracking passive items for security to monitoring environmental conditions. You can capture your goldfish’s swimming activity or count how many times someone opened the refrigerator door before dinner.

One challenge is that this highly measured world will create data at an astonishing rate, even if not all the data is or ever will be interesting or valuable. We might resist this in our own homes, but the Internet of Things trend breached the data center walls long ago. Most active IT components and devices have some built-in instrumentation and logging already, and we’ll see additional sensors added to the rest of our gear. Naturally, if there are instrumented IT components and devices, someone (probably us) will want to collect and keep the data around for eventual analysis, just in case.

Adding to the potential data overload, there’s an emerging big data science principle that says the more data history one has the better. Since we don’t necessarily know today all the questions we might want to ask of our data in the future, it’s best to retain all the detailed history perpetually. That way we always have the flexibility to answer any new questions we might ever think of, and as a bonus gain visibility over an ever-larger data set as time goes by…

RT @TruthinIT: There's no cost of goods like a traditional NAS device where I've got disks I've got to pay for. And if I'm not using the data on those disks, I still got to pay for those disks. bit.ly/2BBX073@Nasuni@smworldbigdata

In 30 min I'm interviewing @Cohesity (and customer) on @TruthinIT about Mass Data Fragmentation. It's about having too many copies in about four or five different "dimensions", including cloud! Join us webcast (12.11.18) @ 1pmET (and there will be prizes) bit.ly/2PdqrQn