Do you think that you’re working with “Big Data”? or is it “Small Data”? If you’re asking ad hoc questions of your data, you’ll probably need something that supports “query-response” performance or, in other words, “near real-time”. We’re not talking about batch analytics, but more interactive / iterative analytics. Think NoSQL, or “near real-time Hadoop” with technologies like Impala. Here’s my view of Big versus Small with ad hoc analytics in either case.

Ad Hoc Analytics

Small Data

Big Data

Data Volume

Megabytes – Gigabytes

Terabytes (1-100TB)

Data Velocity

Update in near real-time (seconds)

Update in real-time (milliseconds)

Data Variety

1-6 structured data sources

6+ structured AND 6+ unstructured data sources

Data Models

Aggregations with tens of tables

Aggregations with up to 100s – 1000s of tables

Business Functions

One line of business (e.g. sales)

Several lines of business – to – 360 view

Business Intelligence

Queries are simple, regarding basic transactional summaries/reports.Response times are in seconds across a handful of business analysts.

Example: retrieve a customer’s profile and summarize their overall standing based on current market values for all assets.

This is representative of the work performed when a business asks the question “What is my customer worth today?”

The transaction is a read-only transaction. Questions vary based on what business analyst needs to know interactively.

Queries can be as complex as with batch analytics, but generally are still read-only and processed against aggregates. Queries span across business functions.Response times are in seconds across large numbers of business analysts.Example: retrieve a customer profile and summarize activities across all customer-touch points, calculating “Life-Time-Value” based on past & current activities.

This is representative of the work performed when a business asks the question “Who are my most profitable customers?”

Questions vary based on what business analyst needs to know interactively.

Subscribe

About Jim Kaskade

Jim currently leads Janrain, the category creator of Consumer Identity & Access Management (CIAM). We believe that your identity is the most important thing you own, and that your identity should not only be easy to use, but it should be safe to use when accessing your digital world. Janrain [...]more →