Credits: This tutorial is based on the original
Datomic Seattle tutorial (there is also a new tutorial)
and some text passages have been quoted as-is or been slightly modified to describe
how Molecule works.

After setting up the database and populating it with data we can
start making queries. We make queries by building “molecules” which are
chains of attributes put together with the builder pattern. We can imagine
this as a 3-dimensional data structure of atoms bound together in various
patterns to build molecules…

Molecule builder pattern

The first thing you do with Molecule is to define your domain namespaces and attributes in a
trait that defines namespaces with attributes for your domain:

The name field defines an attribute of type String with cardinality one. Adding the
fulltextSearch option will tell Datomic that we want to be able to make fulltext
searches on the values of this attribute.

After defining the schema like this, we run sbt compile and Molecule will generate some
boilerplate traits that allow us to build molecules of our attributes:

val nameUrls = m(Community.name.url).get

Since the m method is implicit we can generally just write

val nameUrls = Community.name.url.get

If you look at the generated namespace code you’ll see that it is a
little more complex behind the scenes. That’s because we want our IDE
to be able to infer the type of each attribute. If we for instance had
an age attribute of type Int we could infer the return types of
calling the get method on a molecule. That would return
a List of name/age tuples of type List[(String, Int)]:

val nameAges: List[(String, Int)] = Community.name.age.get

A feature of Molecule is to omit the values of an attribute from the result set
by adding an underscore to the attribute name:

val names: List[String] = Community.name.age_.get

This is handy if we want to query for entities that we want to be sure have an age and
where we at the same time don’t need the age returned.

To find communities we can make a communities molecule looking for entities with Community name:

val communities = m(Community.name)

With this molecule at hand we can get the community names:
Or we can ask for the size of our returned data set:

communities.get === // List of community names...

or we could check how many communities we have

communities.get.size === 150

If we want the entity ids of our communities we can add the generic attribute e to our molecule.
We might not be interested in the names but we want to make sure that we find entities having a name,
so we add the name attribute with an underscore (to omit it from the result set):

After defining a molecule like Community.name we can call the get
method on it to retrieve values that matches it. When there’s only one
attribute defined in the molecule we’ll get a list of this attribute’s
value back.

Notice that we get some different communities. We are not guaranteed a specific order
of returned values and the first 3 values can therefore vary as we see here even though
the molecules/queries are similar.

In most of our examples we supply static data like “twitter” but even
though our molecules are created at compile time we can even supply
data as variables like we would do with user input from forms etc.
So we could as well write the following and get the same result.

The sample data model includes three main entity types communities,
neighborhoods and districts that are related to each other with references.
Molecule lets you traverse those references by going from one namespace
to the next. Let’s find communities in the noth-eastern region:

When you apply values to molecules, the resulting query
string is cached by Datomic. If you keep varying the string content,
the cache is not effective. To take advantage of query caching it is
recommended to make parameterized queries that can be cached once and
used with varying input parameters.

Single input value for an attribute

Instead of applying the constant value “twitter” to a molecule
Community.type("twitter") we can use the ? input placeholder
in an “input molecule” telling us that it waits for an input value.

The order of arguments in the logical AND expression will correspond
to the order of the input placeholders in the input molecule so that
“email_list” corresponds to type_(?) and “community” corresponds to
community_(?).

Arguments in expressions are also type-checked against the expected
types of the corresponding attributes. Our IDE would infer that the
orgtype attribute doesn’t expect an Int as the second argument if we
were to pass the expression “email_list and 42”. This helps us avoid
populating our database with unexpected data.

Or we can pass a list of arguments. IMPORTANT: note how the semantics
of a list of arguments change compared to the OR semantics that we
saw with the single-input molecule above that had OR-semantics. When
we have multiple inputs the semantics change to AND-semantics!

Datomic lets you invoke functions in queries. Molecule use this to
apply comparison operations on attribute values. Here we can for
instance find communities whose name come before “C” in
alphabetical order.

Datomic supports fulltext searching. When you define an attribute
of string value, you can indicate whether it should be indexed for
fulltext search. For instance Community name and category have
the fulltextSearch option defined in the Seattle schema. Let’s find
communities with “Wallingford” in the name.

Fulltext search on many-cardinality attributes

The category attribute can have several values so when we do a
fulltext search on its values we’ll get back a set of its values
that match our seed. We can also combine fulltext search with other
constraints. Here we look for website communities with a category
containing the word “food”:

Datomic rules are named groups of Datomic clauses that can be
plugged into Datomic queries. As a Molecule user you don’t need to
know about rules since Molecule automatically translates your logic
to Datomic rules.

We can for instance find social media communities with a
logical OR expresion:

Note how this syntax for the ((a OR b) AND (c OR d)) expression
is different from the syntax we had earlier in the section
“Multiple tuples of input values for multiple attributes” where
we had a ((a AND b) OR (c AND d)) expression.

All of the query results shown in the previous two sections were
based on the initial seed data we loaded into our database. The
data hasn’t changed since then. In this section we’ll load some
more data, and explain how to work with database values from
different moments in time.

Time is built in

One of the key concepts in Datomic is that new facts don’t replace
old facts. Instead, by default, the system keeps track of all the
facts, forever. This makes it possible to look at the database as
it was at a certain point in time, or at the changes since a certain
point in time.

When you submit a transaction to a database, Datomic keeps track
of the entities, attributes and values you add or retract. It also
keeps track of the transaction itself. Transactions are entities
in their own right, and you can write queries to find them.
The system associates one attribute with each transaction entity,
Db.txInstant, which records the time the transaction was processed.

Molecule has a Db namespace with a txInstant attribute that
we can use to query for transactions instants (represented as
java.util.Date instances) that has been created. We’ve only
executed two transactions, but the earlier system executed a
few as part of its bootstrapping process. We know, though, that
our two are the most recent. The code below uses a Db.txInstant
molecule to retrieve transaction times, sort them into reverse
chronological order, and store the most recent two as dataTxDate
and schemaTxDate, when we added our data and our schema, respectively.

Revisiting the past - getAsOf(PastDate)

Once we have the relevant transaction times, we can look at the
database as of that point in time. To do this, we retrieve the
current database value by calling the molecule method getAsOf,
passing in the Date we’re interested in. The getAsOf method
returns a new molecule based on the database value that is
“rewound” back to the requested date.

An example will help make this clear. The code below gets the
value of the database as of our schema transaction. Then it
runs our very first query, which retrieves entities representing
communities, and prints the size of the results. Because we’re
using a database value from before we ran the transaction to
load seed data, the size is 0.

If we do the same thing using the date of our seed data
transaction, the query returns 150 results, because as of
that moment, the seed data is there.

communities.getAsOf(dataTxDate).size === 150

Changes since a date - getSince(compareDate)

The getAsOf method allows us to look at a database value
containing data changes up to a specific point in time.
There is another method getSince that allows us to look at
a database value containing data changes since a specific
point in time.

The code below gets the value of the database since our
schema transaction and counts the number of communities.
Because we’re using a database value containing changes
made since we ran the transaction to load our schema -
including the changes made when we loaded our seed data -
the size is 150.

communities.getSince(schemaTxDate).size === 150

If we do the same thing using the date of our seed data
transaction, the query returns 0 results, because we haven’t
added any communities since that time.

communities.getSince(dataTxDate).size === 0

While we passed specific transaction dates to getAsOf
and getSince, you can pass any date. The system find the
closest relevant transaction and use that as the basis for
filtering.

Keeping track of data over time is a very powerful feature.
However, there may be some data you don’t want to keep old
versions of. You can control whether old versions are kept
on a per-attribute basis by adding noHistory to your
attribute definition when you create your schema. If you
choose not to keep history for a given attribute and you
look at a database as of a time before the most recent change
to a given entity’s value for that attribute, you will not
find any value for it.

Imagining the future - getWith(TestTxData)

Revisiting the past is a very powerful feature. It’s also
possible to imagine the future. The getAsOf and getSince
methods work by removing data from the current database value.
You can also add data to a database value, using the
Molecule method getWith.
The result is a database value that’s been modified without
submitting a transaction and changing the data stored
in the system. The modified database value can be used to
execute queries, allowing you to perform “what if”
calculations before committing to data changes.

When a getWith(TestTxData) database object goes out of scope
it is simply garbage collected. So we don’t need to do any tear
down of some state as is common with normal database mockups.

We can explore this feature using a second seed data file
provided with the sample application,
“samples/seattle/seattle-data1.edn”. The code below reads
it into a list.

Once we have this new data transaction, we can build a
database value that includes it. To do that, we simply
get the current database value (or one as of or since a
point in time) and call getWith, passing in the
transaction data. getWith returns a molecule based on
the new value of the database after the new data is added.
If we execute our community counting query against it,
we get 258 results.

// test db
communities.getWith(newDataTx).get.size === 258

The actual data hasn’t changed yet, so if we query the
current database value, we still get 150 results. We won’t
see a change in the current database value until we submit
the new transaction. After that, querying the current
database value returns 258 results. Finally, if we get
another database value containing data since our first
seed data transaction ran, and query for communities we
get 108 results, the number added by new data transaction.

Note how we can add values for referenced namespaces and multiple values for
cardinality-many attributes like category - all in one go!

Apart from the new Community entity two more entities are also added. Since
neither “myNeighborhood” nor “myDistrict” exist they are created to so that
we can reference them.

In Datomic there is no requirement that we add a “complete”
set of namespace attributes to create an entity. For instance, we could add
a community only with Community.name("My community").save.

“Insert-molecule” + matching values

A more efficient way to add larger sets of data is to define an
“Insert-Molecule” that models the data structure we
want to insert into the database. Note how we call the insert method
to define it as an Input-Molecule:

This approach gives us a clean way of populating a database
where we can supply raw data from any source easily as long
as we can format it as a list of tuples of values each matching
our template molecule.

In an sql table we would “insert a null value” for such column.
But with Molecule/Datomic we just simply don’t assert any orgtype
value for that entity at all! In other words: there is no orgtype
fact to be asserted.

Type safety

In this example we have only inserted text strings. But all input
is type checked against the selectedattributes of the molecule which
makes the insert operation type safe. We even infer the expected type
so that our IDE will bark if it finds for instance an Integer
somewhere in our input data:

To update data with Molecule, we first need the id of the entity
that we want to update.

val belltown = Community.e.name_("belltown").get.head

Then we can “replace” some attributes

Community(belltown).name("belltown 2").url("url 2").update

What really happens is not a mutation of data since no
data is ever deleted or over-written in Datomic. Instead the old/current data is
retracted and the new fact for the attribute is asserted. The new fact will
turn up when queried for. But if we go back in time we can see
the previous value at that point in time - many updates could
have been performed over time, and all previous values are stored.

Updating cardinality-many attributes

When updating cardinality-many attributes we need to tell which
of the values we want to update:

Community(belltown).category("news" -> "Cool news").update

This syntax causes Molecule to first retract the old value “news”
and then assert/add the new value “Cool news”. Note that if the
before-value doesn’t exist the new value will still be inserted,
so you might be sure what the current value is by querying for it first.

We can even update several values of a cardinality-many attribute in one go:

Retract values

When you update a molecule you can apply an empty value apply()
or simply () after an attribute name to retract (“delete”) the
attributes value(s). We can mix updates and retractions:

Community(belltown).name("belltown 3").url().category().update

name gets the new value “belltown 3” and both the url and
category attributes have their values retracted.

There are a couple of important things to know about retracting data.
The first is that Datomic expects to know the value of the attribute being
retracted. When applying the empty value, Molecule therefore internally first
looks up the current value in order to be able to retract it.

The other thing to know is that, because we can access database
values as they existed at specific points in time, we can retrieve
retracted data by looking in the past. In other words, the data
is not gone as in a mutable database. If we want data to really be gone after we
retract it, we have to disable history for the specific attribute,
as described in Database setup.