In Q&A, there was a question and comment by Lyn Robison from Gartner on advising customers to use the right data model for the right problem/application.
We completely agree. We'd like customers to use the right model for the right problem. With informix not only you can use either standard relational model or flexible schema with JSON for your appdev. Obviously, this can create data silos if your API is tied to specific data model. We believe data model should not restrict data access. Hence the hybrid access.

So, how does this hybrid access work in Informix NoSQL?

Informix provides both SQL and NoSQL semantics. You can define regular relational schema or simply use MongoAPI and Informix creates the database, collections and documents just like MongoDB does. Let’s look at the implementation details. Some of these are already in our detailed deepdive at: slidesha.re/1gEGXW6

Data representation:

SQL: SQL (relational) data is stored in regular tables with rows and columns. Logical schema is distinct from the physical schema.

NoSQL: Flexible schema for the data means, no upfront definition of tables or rows or columns or their types. Each row could have random values. To achieve this, NoSQL databases like MongoDB store data in JSON (actually in its binary form called BSON). JSON is a series of key-value pair. You can nest key-value pars within other key value pairs to form hierarchical structures or arrays. This is generally referred to as document-structure.

For each NoSQL collection, we create a table with a BSON data type. BSON is Binary JSON (http://bsonspec.org/). JSON is Java Script Object Notation (http://www.json.org/). All the MongoDB APIs exchange information with the server using BSON. When the client sends data as BSON, it’s stored AS IS in BSON.

NoSQL pays more attention to application flexibility and agile appdev instead of storage efficiency. So, for now, additional space for key in key-value pair is fine. Eventually, all the databases will be looking at making JSON/BSON storage efficient. Within Informix, you can use compression to get space savings.

The moment you have this kind of access, from MongoDB API, you can exploit the relational database features like transactions, views, joins, grouping, OLAP window functions, stored procedures, etc.

In this case, if a JSON query references a non-existent column, they’ll get the error. The intent is not to simply extend existing relational schema, but to make the existing enterprise data available to new APIs seamlessly.

Currently, you’ll have to use expressions and dotted notations to extract the specific key-value pairs.

SELECT bson_value_int(jc1.data, 'x'),

bson_value_lvarchar(jc1.data, 'y'),

bson_value_int(jc1.data, 'z') ,

bson_value_int(jc2.data, 'c1'),

bson_value_lvarchar(jc2.data, 'c2')

FROM w jc1, v jc2

WHERE bson_value_int(jc1.data, 'x') = bson_value_int(jc2.data, "c1");

You can also create views on top of these and make the access much simpler for application developers.

create view vwjc(jc1x, jc1y, jc1z, jc2c1, jc2c2) as

SELECT bson_value_int(jc1.data, 'x'),

bson_value_lvarchar(jc1.data, 'y'),

bson_value_int(jc1.data, 'z') ,

bson_value_int(jc2.data, 'c1'),

bson_value_lvarchar(jc2.data, 'c2')

FROM w jc1, v jc2

WHERE bson_value_int(jc1.data, 'x') = bson_value_int(jc2.data, 'c1');

Summary:

You can model using relational or NoSQL concepts within the same database and access data from either SQL and MongoDB API without replicating the data or having ETL. Since you just have one copy of the data, you'll be accessing the consistent copy of the data.

Applications generally look for NoSQL database for one of the following reasons:

Flexible schema or schema on read. Data or its type cannot be predetermined. In Informix NoSQL, it's accomplished via JSON and BSON type, added natively into the database. Just like LVARCHAR. Not only you won't know the column names (in this case KEY-VALUE pair, you won't know their type as well). It's not just you're able to store, you need to index and query as well.

Scaling out for increase data capacity, lower latency and performance.

New type of data modelling like GRAPH.

While NoSQL generally means Not Only SQL, if you attend any of the NoSQL conferences, meetups, people more often mean SQL. Recently, Michael Stonebraker made a point in SE Radio (http://www.se-radio.net/2013/12/episode-199-michael-stonebraker/) that most popular NoSQL databases like MongoDB and Cassandra have query languages very similar to SQL -- they're Not Yet SQL. While Hadoop when thru a phase of customer query languages, majority of vendors have SQL like interface to Hadoop.

In case of Informix, we started with SQL. Added flexible schema with the native support of JSON, MongoDB compatible query language (in fact, Informix NoSQL customers have to use the driver for MongoDB) and scale out via range and hash based sharding.

Once we added that, the question was, how would enterprises with TBs of data in relational form and hundreds of applications running on this data exploit the innovations in the API and the technology advantages.

When it comes to data and its infrastructure, evolutions seems to work much better than revolution.

Replicating the data from one form to another just won't work in most cases due to cost and consistency issues. So, we've added support for MongoDB APIs to access SQL data (relational data), features (joins, stored procedures, views) and enabled SQL to access JSON data.

Tomorrow, TUES Dec 17th 11.30 EST, John miller will be talking on these features. Rigister Now.

Title: Query Acceleration for business using Informix Warehouse Accelerator

Description:

The Informix Warehouse Accelerator is a breakthrough technology that changes how businesses will view and deploy analytics. Not only does it promise several order of magnitude differences in performance for business queries, it eliminates the need for query tuning, data management, and cube building activities normally associated with enterprises seeking acceptable performance in analytics. Together with the Informix Dynamic Server, it offers a single environment where traditional OLTP and current OLAP processing can be combined in a cost-effective manner. This IBM Redbooks publication will provide a comprehensive understanding on how to configure and deploy in this hybrid environment. It will include data mart design, data mart loading and incremental refresh, and proper execution of queries. It will also include discussion of a BI tool, for example, Cognos, to be used in conjunction with IWA.

Audiences: DBAs, system supports, and users who are interested in theproducts.

--- Automated SQL Tracing + Identify I/O bound slow queries candidates for IWA, based on workload (sql file) given by customer, or based on SQL tracing during Adhoc or OLAP/BI/DSS processing (does not exist)

10.3. Quick Start guide

-- Create a virtual image, or make a real/physical environment ready for the IWA PoC

--- Download a ready-to-use TPC-E or TPC-DS demo database and step-by-step + ready to run queries

--- Steps to install and set up an on-site or virtual environment, from the Informix Virtual Appliance + Trial edition of IWA - quick start guide

10.4. Deploy a technical PoC and deliver results

-- Benchmark templates, ready to use, to show performance results and deliver report with results

-- FAQ on How to re-write different queries (ex: correlated sub-queries, expressions that are not supported, etc... identify which queries are not supported or would require a rewrite)

Edward Tufte will discuss seeing, reasoning, and producing in high science and high art. His current project “The Thinking Eye” suggests self-aware strategies for improving the interface where the real world meets the human eye-brain system.

Exprience things first hand, rather first eyes -- with your own eyes,
not photographs, not movies, but with your own eyes. That's when you see
data collection bias, sampling bias. Don't just depended on aggregated reports.

Your thinking starts are your retina. That data is transferred to brain at
20 MB/sec. Since there is ton of data coming in to the brain, brain continuously
tries to ignore and conclude! Trying to take in data and conclude often is one of
humanity's big sins.

When you see things, use all of your focus on what you're trying see and experience
He summons -- shutup and look. Talking & listening takes lot of processing power.
Next time you're in a museum, be silent. Look -- again and again.
Practice Intense seeing.

Look at real things -- as much as possible -- and not their represenation.
Tip: When you search for anything google, search in google image search instead of default doc search.
You see the work instead of words.
Tufte showed results of feynman diagrams (and couple of other things I don't remember).
feynman diagrams: http://bit.ly/10b0SPX

I tried some... It's lot more stimulating to see the results.. This will change all the assumptions
about search engine optimization:

While showing results of the image search, the "space" between the images take about 24% of the
available space! Professor showed a re-rendered page -- more images or bigger imanges with same space.
Use space wisely -- show more, show bigger.

...

When you write or organize, focus on verbs not nouns.

Design mimics hierarchy -- see the internet proposal doc by Tim Berners-Lee.
Internet flattened the organization & society. Now you can send and get message from/to anyone!
....

Bigdata has been very useful and successful in aiding the development of natural sciences over
last 400 years -- physics, chemistry, astronomy, cosmology, geology, biology and more. Those
trying to use big data for social science, should have some modesty.
Just having a bigger pile won't help. Focus on excellence --
elegance takes care of herself. Long term horizon seeing is heard, really heard.
The only way to progress is by double-blind studies -- collecting data after the fact
will lead to wrong conclusions -- aka stylized facts!

It took 35 years to prove HRT causes cancer -- to prove conclusively.
For PSA, NNT (Number Needed to Treat) is 48. i.e., you've to test patients to improve
health of one person.

Once you deploy and load the data mart on to Informix Warehouse Accelerator, all your dynamic queries will get accelerated. The phrase "dynamic queries" needs little bit of explanation. Dynamic queries include the following:

Straight forward, so far. The unsupported statements are the static statements within SPL stored procedure. These statements are prepared once and the same plan is reused by Informix for subsequent invocation of the procedure/statement. Only when there change of table schema, permissions or newer statistics, Informix will recompile the statements referencing affected tables.

These static statements are not evaluated for IWA acceleration and are always executed locally on Informix.

Summary, before we look at the examples:

All forms of dynamic statements are evaluated for acceleration and are accelerated when possible.

And, here is the query plan for the dynamic statement within this stored procedure.

The iterators within query statistics section of this plan shows only one iterator (dwa) indicating the query was sent over to IWA and the results were received via the dwa iterator. On 11.70, this iterator is shown as "remote", but it does the same thing. There is one of problem (bug) in this -- we're not printing the SQL-- sent over to IWA.. When you execute the query directly (outside of the procedure), the whole plan including SQL-- gets printed. The fix for this is coming up soon.

At IIUG, on Monday and Tuesday, we'll have two sessions on What's New in IBM Informix 12.10?giving overview of 12.10 features as well as talking about other sessions in various domains. In the remote case, you cannot attend that talk :-), here is the IIUG session grid with my recommendations for Informix 12.10 warehouse features.

Couple of years ago, at IOD, Nestle CIO said, enterprises are moving from "reporting model" to "tooling model".

What does that mean? Executives are no longer satisfied with simply using canned reports and taking decisions on it.They need to have drill down and interact with the data to get the right data to make right decision in real time.While this trend has been going on for some years, it took incredible investment from CIO and the organization to get

Pronto has complete ERP solution with Informix server and have tightly integrated Cognos BI into the same solution.From asset management to analytics, distribution to data intelligence, financials to forecasting -- It's all in there.

In this solution, not only they run the business, they help analysts have a conversation with the real-data in real time.

This helps SMB and enterprise, "find their moment"!

At IOD Conference in Las wegas, Wednesday, Oct 24th. 2.30 to 3.30PM, Chad Gates, Senior Development Manager at Pronto will be talking about this solution and how they've integrated the stack. Should be very interesting.

Alan Turing’s centenary celebrations/conference last couple of days on in San Francisco was a textbook conference. Folks who invented and wrote text books on modern computer science were all there -- Kahan to Knuth, Serf to Wirth, Ken Thompson and many many more. The lectures are webcast and hopefully be available for viewing later on. Friday I was at the conference and listened to all the talks. It was pretty amazing gathering with 32 Turing award winners, talks and panel sessions by many of them. Unfortunately I had to miss Saturday events. Photos from this event are at: http://on.fb.me/LyRqih and http://bit.ly/Tu100p

Turing and other tidbits I learned on Friday.1. It’s estimated that Turing and his team’s effort in decoding German code shortened world war II by at least 6 months, may be up to 2 years.

2. He was a long distance runner. The loneliness of long distance running must have given him ample time to think. He would have liked the ultra running crowd!

3. Raj Reddy is the only Turning award winner of Indian Origin. He has been teaching at CMU longer than I’ve been living. His work on speech recognition, AI are used by systems like siri. He's now working on intelligent system that can analyze complete world information and give instant expertise... Next generation of Watson?

4. Classic turing test for AI: If you can exchange messages with a machine and humans can't distinguish between a computer, you have artificial intelligence. Jim Gray revised the this to: If you can't distinguish the computer's vision, hearing and talking from a human being, you have artificial intelligence.

5. Take away from listening to Fernando, Ken Thompson who worked on MULTICS, Unix was to be daring and use Moore's law as your friend.

-- Fernando said, during MULTICS dev, they enforced a rule. Every developer should pre-announce what they're building!6. Seems like there is lot of interest in creating algorithms and systems to do this. Raj said release of Siri is an important step for humans to have talking automated assistant. Second observation was, recent "intelligent" systems, Deep Blue, Siri, Watson, Google Translator use brute force and not deductive reasoning from the base input. May be that's the way to solve these problems. The observation is rather interesting. This year's Turing award winner, Judea Pearl has advocated statistical approach to AI. I'll revisit this in a later blog.7. Yesterday's how becomes today's what? Going back to personal productivity example, till recently, we wanted TO-DO lists, calendars, instant messages within the phone. Siri has redefined this to a WHAT problem -- you want to keep updated to-do list, keep your appointments and communicate quickly. Finally, the systems folks have to move from how to WHAT. Again, another topic worth thinking and writing more about.

8. Lambda calculus played an important role in computer science development. Turing's universal machine got more acceptance because of its simplicity and was developed from first principles.

9. William Kahn talked about trying to test software to avoid software errors, how you can not guarantee perfect software. But, systems can be designed to anticipate the classes of issues and design for it. I'll revisit this in separate blog..

10. Go back read the original papers -- not rewritten books or papers. Original papers tend to build the ideas from first principles and are usually better written.

On that note, go see the webcasts themselves from ACM site (hopefully, they're available soon). Read papers from Turing... Couple of important ones...

1. On computable numbers, with an application to the Entscheidungs problem.

The few slides below give you a straight forward explanation of the two main use cases.
fyi: You can download the power point version with animation at: http://slidesha.re/KnfTYK
I've added commentary for each slide below.

Slide 1: This
is the IWA architecture and steps to deploy and use IWA till
11.70.FC4.. IWA takes a snapshot of the data from Informix and runs
queries on that data. The transactions and loads happening on Informix
won't immediately change the data on IWA. To do that, you've to refresh
the data either using the studio or command line tool. When you
refresh in this method, you'll reload the entire data. Time to
completely refresh is directly correlated to size of your data set.

Informix 11.70.FC5 alleviates this issue by allowing you to ONLY refresh the changed partitions of the fact table.

Slide 2: The
Sales table is the fact table in this mart. After you've setup the
datamart in IWA, your load jobs will load data into one of the
partitions or add new paritions into the fact table. In this case, you
simply do the following:

1. execute dropPartMart() procedure for each of previously existing partitions that were modified. 2. execute loadPartMart() procedure for each of modified and new partitions.You
do have to know or keep track of which of the partitions in your fact
table were modified since the last refresh and which of the new
partitions are added.

Slide 3,4,5:
Time cyclic data management enables you to limit the amount of data you
keep in your data marts. If your business requires a 3 week window, at
the start of every week, you roll off (detach) the oldest week and
attach the data from the latest week.

Slide 6:
This slide is better seen with animation in power point. In step 2,
execute the dropPartMart() to drop the IWA data for the partition you're
about to detach and then detach the partition. After you attach the
new partition, issue the loadPartMart() to refresh the data to IWA.

When you have large data marts, using the partition based refresh
will enable you to refresh the data quickly and resume your analysis.
Because you can refresh the data quickly, you can choose to refresh the
data often to enable analysis with latest data.

With Informix 11.70.FC5, you can design, deploy and use Informix Warehouse Acceleraor (IWA) from any of the Informix high availability server types -- Primary, HDR Secondary, SDS, RSS. It's simple to configure and use. The slide show below quickly gets you up to speed. Of course, Info center has further information -- http://ibm.co/LiGAQ5

The few slides below give you a straight forward explanation of the two main use cases.
fyi: You can download the power point version with animation at: http://slidesha.re/KnfTYK
I've added commentary for each slide below.

Slide 1: This
is the IWA architecture and steps to deploy and use IWA till
11.70.FC4.. IWA takes a snapshot of the data from Informix and runs
queries on that data. The transactions and loads happening on Informix
won't immediately change the data on IWA. To do that, you've to refresh
the data either using the studio or command line tool. When you
refresh in this method, you'll reload the entire data. Time to
completely refresh is directly correlated to size of your data set.

Informix 11.70.FC5 alleviates this issue by allowing you to ONLY refresh the changed partitions of the fact table.

Slide 2: The
Sales table is the fact table in this mart. After you've setup the
datamart in IWA, your load jobs will load data into one of the
partitions or add new paritions into the fact table. In this case, you
simply do the following:

1. execute dropPartMart() procedure for each of previously existing partitions that were modified. 2. execute loadPartMart() procedure for each of modified and new partitions.You
do have to know or keep track of which of the partitions in your fact
table were modified since the last refresh and which of the new
partitions are added.

Slide 3,4,5:
Time cyclic data management enables you to limit the amount of data you
keep in your data marts. If your business requires a 3 week window, at
the start of every week, you roll off (detach) the oldest week and
attach the data from the latest week.

Slide 6:
This slide is better seen with animation in power point. In step 2,
execute the dropPartMart() to drop the IWA data for the partition you're
about to detach and then detach the partition. After you attach the
new partition, issue the loadPartMart() to refresh the data to IWA.

When you have large data marts, using the partition based refresh
will enable you to refresh the data quickly and resume your analysis.
Because you can refresh the data quickly, you can choose to refresh the
data often to enable analysis with latest data.

Lester Knutsen did extensive testing of of IWA before the release and gave us very valuable input.Now Lester and his team have created a new benchmark+demo.

Look forward to see you at the webcast.

Announcement from Lester Knutsen:

We are hosting a Webcast to demo the Informix Warehouse Accelerator on February 28th at at 2:00pm EST. I have been using it for over a year and I am continually shocked at how fast it is. In one set of benchmarks it ran 9 hours of queries in 14 minutes. Please join Mike Walker, Art Kagel, and me for a webcast that will demonstrate our current benchmarks with this exciting new database technology. We will demonstrate ad-hoc queries on a bookstore database with 250 million customers and over 400 million records in the fact table.

Informix Warehouse Accelerator Demo Webcast

1. Benchmarks - Fast Performance Demo2. How to Setup the Accelerator3. How the Accelerator Works4. Smart Mart Demo - How to Automatically Build a Data Mart