Getting Started:
Querying with NodeJS

Node.js Version Setup

For the Node.js version, please download the source from GitHub by either
cloning the source code
repository or downloading the current zip of the master
branch.
The code for this chapter is in nodejs/Ch02-Schemas-and-Indexes. Be
sure to run npm install in this directory prior to running node
./app.js to run the code.

A Quick Note on Querying and Schemas

Schemas? Yes, we said that correctly: S-C-H-E-M-A-S. It’s not a dirty
word. Even in a key/value store, you will still have a logical database
schema of how all the data relates to other data. This can be as simple
as using the same key across multiple buckets for different types of
data to having fields in your data that are related by name. These
querying methods will introduce you to some ways of laying out your data
in Riak, along with how to query it back.

Denormalization

If you’re coming from a relational database, the easiest way to get your
application’s feet wet with NoSQL is to denormalize your data into
related chunks. For example, with a customer database, you might have
separate tables for customers, addresses, preferences, etc. In Riak,
you can denormalize all that associated data into a single object and
store it into a Customer bucket. You can keep pulling in associated
data until you hit one of the big denormalization walls:

Size Limits (objects greater than 1MB)

Shared/Referential Data (data that the object doesn’t “own”)

Differences in Access Patterns (objects that get read/written once vs.
often)

At one of these points we will have to split the model.

Same Keys, Different Buckets

The simplest way to split up data would be to use the same identity key
across different buckets. A good example of this would be a Customer
object, an Order object, and an OrderSummaries object that keeps
rolled up info about orders such as total, etc. Let’s put some data into
Riak so we can play with it.

While individual Customer and Order objects don’t change much (or
shouldn’t change), the “Order Summary” object will likely change often.
It will do double duty by acting as an index for all a customer’s
orders, and also holding some relevant data such as the order total,
etc. If we showed this information in our application often, it’s only
one extra request to get all the info.

If you’re coming from an SQL world, Secondary Indexes (2i) are a lot
like SQL indexes. They are a way to quickly look up objects based on a
secondary key, without scanning through the whole dataset. This makes it
very easy to find groups of related data by values, or even ranges of
values. To properly show this off, we will now add some more data to our
application, and add some secondary index entries at the same time.

As you may have noticed, ordinary key/value data is opaque to 2i, so we
have to add entries to the indexes at the application level. Now let’s
find all of Jane Appleseed’s processed orders, we’ll look up the orders
by searching the SalespersonId integer index for Jane’s id of 9000.

Jane processed orders 1 and 3. We used an “integer” index to reference
Jane’s ID, next let’s use a “binary” index. Now, let’s say that the VP
of Sales wants to know how many orders came in during October 2013. In
this case, we can exploit 2i’s range queries. Let’s search the
OrderDate binary index for entries between 2013-10-01 and
2013-10-31.