=pod
=head1 NAME
KiokuDB::Tutorial - Getting started with L
=head1 INSTALLATION
The easiest way to install L along with a number of backends is
L.
L depends on L and a few other modules out of the box, but no
specific storage module.
L is a frontend to several backends, much like L uses DBDs to
connect to actual databases.
For development and testing you can use the L backend,
which is an in memory store, but for production use L or
L are the recommended backends.
See below for instructions on getting L installed.
=head1 CREATING A DIRECTORY HANDLE
A KiokuDB directory is the main object through which all work is done.
The simplest directory that is ready for use can be created like this:
my $dir = KiokuDB->new(
backend => KiokuDB::Backend::Hash->new
);
We will revisit other more interesting backend configuration later in this
document, but for now this will do.
You can also use DSN strings to connect to the various backends:
KiokuDB->connect("hash");
KiokuDB->connect("dbi:SQLite:dbname=foo", create => 1);
KiokuDB->connect("bdb:dir=foo", create => 1);
You can also use a configuration file:
KiokuDB->connect("/path/to/my_db.yml");
Which is just a YAML file:
---
# these are basically the arguments for 'new'
backend:
class: KiokuDB::Backend::DBI
dsn: dbi:SQLite:dbname=/tmp/test.db
create: 1
=head1 USING THE DBI BACKEND
During this tutorial we will be using the DBI backend for two reasons. The
first is L's ubiquity. The second is the possibility of easily looking
behind the scenes, to more clearly demonstrate what L is doing.
That said, the examples will work with all backends exactly the same.
First we create C:
my $dir = KiokuDB->connect(
"dbi:SQLite:dbname=kiokudb_tutorial.db",
create => 1, # this causes the tables to be created
);
Note that if you are connecting with a username and password you need to
specify these as named arguments:
my $dir = KiokuDB->connect(
$dsn,
user => $user,
password => $password,
);
=head1 INSERTING OBJECTS
Let's start by defining a simple class using L:
package Person;
use Moose;
has name => (
isa => "Str",
is => "rw",
);
We can instantiate it:
my $obj = Person->new( name => "Homer Simpson" );
and insert the object to the database as follows:
my $scope = $dir->new_scope;
my $homer_id = $dir->store($obj);
This is very trivial use of L, but it illustrates a few important
things.
First, no schema is necessary. L uses L to introspect your
object without needing to predefine anything like tables.
Second, every object in the database has an ID. If you don't choose an ID for
an object, L will assign a UUID instead.
This ID is like a primary key in a relational database.
You can also specify an ID instead of letting one be generated:
$dir->store( homer => $obj );
Third, all L operations need to be performed within a B. The
scope is not really doing anything important in this simple example, but
becomes necessary when cycles and weak references are in use. We will look into
that in more detail later.
=head1 LOADING OBJECTS
So now that Homer has been inserted into the database, we can fetch him out of
there using the ID we got from C.
my $homer = $dir->lookup($homer_id);
Assuming that C and C are still in scope, C and C
will actually be the same object:
# this is true:
refaddr($homer) == refaddr($obj)
This is because L tracks which objects are "live" in the
B (L).
If the object wasn't already in memory then L would have fetched it
from the backend instead.
=head1 WHAT WAS STORED
Let's peek into the database:
% sqlite3 kiokudb_tutorial.db
SQLite version 3.4.0
Enter ".help" for instructions
sqlite>
The database schema has two tables, C and C:
sqlite> .tables
entries gin_index
C is used for more complex queries, and we'll get back to it at the
end of the tutorial.
For now let's just have a closer look at C:
sqlite> .schema entries
CREATE TABLE entries (
id varchar NOT NULL,
data blob NOT NULL,
class varchar,
root boolean NOT NULL,
tied char(1),
PRIMARY KEY (id)
);
The main columns are C and C. In L every object has an ID
which serves as a primary key and a BLOB of data associated with it.
Since the default serializer for the DBI backend is
L, we examine the data.
First let's set C's output mode to C. This is easier to read with
large columns:
sqlite> .mode line
And select the data from the table:
sqlite> select id, data from entries;
id = 201C5B55-E759-492F-8F20-A529C7C02C8B
data = {"__CLASS__":"Person","data":{"name":"Homer Simpson"},"id":"201C5B55-E759-492F-8F20-A529C7C02C8B","root":true}
As you can see the C attribute is stored under the C key inside the
blob, as is the object's class.
The C column contains all of the data necessary to recreate the object.
All the other columns are only for searches. Later on you'll also see how to
create user defined columns.
When using L the on-disk format is just a hash of C
to C with no additional columns.
=head1 OBJECT RELATIONSHIPS
Let's extend the C class to hold some more interesting data than just a
C:
package Person;
has spouse => (
isa => "Person",
is => "rw",
weak_ref => 1,
);
This new C attribute will hold a reference to another person object.
Let's first create and insert another object:
my $marge_id = $dir->store(
Person->new( name => "Marge Simpson" ),
);
Now that we have both objects in the database, let's link them together:
{
my $scope = $dir->new_scope;
my ( $marge, $homer ) = $dir->lookup( $marge_id, $homer_id );
$marge->spouse($homer);
$homer->spouse($marge);
$dir->store( $marge, $homer );
}
Now we have created a persistent B, that is several objects which
point to each other.
The reason C had the C option was so that this circular
structure will not leak.
When then objects are updated in the database, L sees that their
C attribute contains references, and this relationship will be encoded
using their unique ID in storage.
To load the graph, we can do something like this:
{
my $scope = $dir->new_scope;
my $homer = $dir->lookup($homer_id);
print $homer->spouse->name; # Marge Simpson
}
{
my $scope = $dir->new_scope;
my $marge = $dir->lookup($marge_id);
print $marge->spouse->name; # Homer Simpson
refaddr($marge) == refaddr($marge->spouse->spouse); # true
}
When L is loading the initial object, all the objects the object
depends on will also be loaded. The C attribute contains a
reference to another object (by ID), and this link is resolved at inflation
time.
=head2 The purpose of C
This is where C becomes important. As objects are inflated from the
database, they are pushed onto the live object scope, in order to increase
their reference count.
If this was not done, by the time C was returned from C his
C attribute would have been cleared because there is no other reference
to Marge.
This demonstrates why:
sub get_homer {
my $homer = Person->new( name => "Homer Simpson" );
my $marge = Person->new( name => "Marge Simpson" );
$homer->spouse($marge);
$marge->spouse($homer);
return $homer;
# at this point $homer and $marge go out of scope
# $homer has a refcount of 1 because it's the return value
# $marge has a refcount of 0, and gets destroyed
# the weak reference in $homer->spouse is cleared
}
my $homer = get_homer();
$homer->spouse; # this returns undef
By using this idiom:
{
my $scope = $dir->new_scope;
# do all KiokuDB work in here
}
You are ensuring that the objects live at least as long as is necessary.
In a web application context you usually create one new scope per request. In
fact, L does this automatically.
=head1 REFERENCES IN THE DATABASE
Now that we have an object graph in the database let's have another look at
what's inside.
sqlite> select id, data from entries;
id = 201C5B55-E759-492F-8F20-A529C7C02C8B
data = {"__CLASS__":"Person","data":{"name":"Homer Simpson","spouse":{"$ref":"05A8D61C-6139-4F51-A748-101010CC8B02.data"}},"id":"201C5B55-E759-492F-8F20-A529C7C02C8B","root":true}
id = 05A8D61C-6139-4F51-A748-101010CC8B02
data = {"__CLASS__":"Person","data":{"name":"Marge Simpson","spouse":{"$ref":"201C5B55-E759-492F-8F20-A529C7C02C8B.data"}},"id":"05A8D61C-6139-4F51-A748-101010CC8B02","root":true}
You'll notice the C field has a JSON object with a C field inside
it holding the UUID of the target object.
When data is loaded L queues up references to unloaded objects and
then loads them in order to materialize the memory resident object graph.
If you're curious about why the data is represented this way, this format is
called C, or JavaScript Persistent Object Notation
(L). When using L the
L and L objects are serialized with their
storable hooks instead.
=head1 OBJECT SETS
More complex relationships (not necessarily 1 to 1) are usually easy to model
with L.
Let's extend the C class to add such a relationship:
package Person;
has children => (
does => "KiokuDB::Set",
is => "rw",
);
L objects are L specific wrappers for L.
my @kids = map { Person->new( name => $_ ) } qw(maggie lisa bart);
use KiokuDB::Util qw(set);
my $set = set(@kids);
$homer->children($set);
$dir->store($homer);
The C convenience function creates a new L
object. A transient set is one which started its life in memory space (as
opposed to a set that was loaded from the database).
The C convenience function also exists, creating a transient set with
L used internally to help avoid circular structures (for
instance if setting a C attribute in our example).
The set object behaves pretty much like a normal L:
my @kids = $dir->lookup($homer_id)->children->members;
The main difference is that sets coming from the database are deferred by
default, that is the objects in C are not loaded until they are actually
needed.
This allows large object graphs to exist in the database, while only being
partially loaded, without breaking the encapsulation of user objects. This
behavior is implemented in L and
L.
This set object is optimized to make most operations defer loading. For
instance, if you intersect two deferred sets, only the members of the
intersection set will need to be loaded.
=head1 THE TYPEMAP
Storing an object with L involves passing it to L,
the object that "flattens" objects into L before the entries
are inserted into the backend.
The collapser uses a L object that tells it how objects of
each type should be collapsed.
During retrieval of objects the same typemap is used to reinflate objects back
into working objects.
Trying to store an object that is not in the typemap is an error. The reason
behind this is that it doesn't make sense to store every type of object (for
instance C handles need a socket, objects based on XS modules have an
internal pointer as an integer, whose address won't be valid the next time it's
loaded), and even though the majority of objects are safe to serialize, even a
small bit of unreported fragility is usually enough to create large, hard to
debug problems.
An exception to this rule is L based objects, because they have
sufficient meta information available through L's powerful reflection
support in order to be safely serialized.
Additionally, the standard backends provide a default typemap for common
objects (L, L, etc), which by default is merged with any
custom typemap you pass to L.
So, in order to actually get L to store things like L
based objects, you can do something like this:
KiokuDB->new(
backend => $backend,
allow_classes => [qw(My::Object)],
);
Which is shorthand for:
my $dir = KiokuDB->new(
backend => $backend,
typemap => KiokuDB::TypeMap->new(
entries => {
"My::Object" => KiokuDB::TypeMap::Entry::Naive->new,
},
),
);
L is a type map entry that performs naive
collapsing of the object, by simply walking it recursively.
When the collapser encounters an object it will ask
L for a collapsing routine based on the class of
the object.
This lookup is typically performed by C, not using inheritance,
because a typemap entry that is safe to use with a superclass isn't necessarily
safe to use with a subclass. If you B want inherited entries, specify
C:
KiokuDB::TypeMap->new(
isa_entries => {
"My::Object" => KiokuDB::TypeMap::Entry::Naive->new,
},
);
If no normal (C keyed) entry is found for an object, the isa entries are
searched for a superclass of that object. Subclass entries are tried before
superclass entries. The result of this lookup is cached, so it only happens
once per class.
=head2 Typemap Entries
If you want to do custom serialization hooks, you can specify hooks to collapse
your object:
KiokuDB::TypeMap::Entry::Callback->new(
collapse => sub {
my $object = shift;
...
return @some_args;
},
expand => sub {
my ( $class, @some_args ) = @_;
...
return $object;
},
);
These hooks are called as methods on the object to be collapsed.
For instance the L related typemap ISA entry is:
'Path::Class::Entity' => KiokuDB::TypeMap::Entry::Callback->new(
intrinsic => 1,
collapse => "stringify",
expand => "new",
);
The C flag is discussed in the next section.
Another option for typemap entries is L,
which is appropriate when you know the backend's serialization can handle that
data type natively.
For example, if your object has a L hook which you know is
appropriate (e.g. contains no sub objects that need to be collapsible) and your
backend uses L. L is an
example of a class with such storable hopes:
'DateTime' => KiokuDB::Backend::Entry::Passthrough->new( intrinsic => 1 )
=head2 Intrinsic vs. First Class
In L every object is normally assigned an ID, and if the object is
shared by several objects this relationship will be preserved.
However, for some objects this is not the desired behavior. These are objects
that represent values, like L, L entries, L
objects, etc.
L can be asked to collapse such objects B, that is
instead of creating a new L with its own ID for the object, the
object gets collapsed directly into its parent's structures.
This means that shared references that are collapsed intrinsically will be
loaded back from the database as two distinct copies, so updates to one will
not affect the other.
For instance, when we run the following code:
use Path::Class;
my $path = file(qw(path to foo));
$obj_1->file($path);
$obj_2->file($path);
$dir->store( $obj_1, $obj_2 );
While the following is true when the data is being inserted, it will no longer
be true when C and C are loaded from the database:
refaddr($obj_1->file) == refaddr($obj_2->file)
This is because both C and C each got its own copy of C.
This behavior is usually more appropriate for objects that aren't mutated, but
are instead cloned and replaced, and for which creating a first class entry in
the backend with its own ID is undesired.
=head2 The Default Typemap
Each backend comes with a default typemap, with some built in entries for
common CPAN modules' objects. L contains more
details.
=head1 SIMPLE SEARCHES
Most backends support an inefficient but convenient simple search, which scans
the entries and matches fields.
If you want to make use of this API we suggest using L
since simple searching is implemented using an SQL where clause, which is much
more efficient (you do have to set up the column manually though).
Calling the C method with a hash reference as the only argument invokes
the simple search functionality, returning a L with the
results:
my $stream = $dir->search({ name => "Homer Simpson" });
while ( my $block = $stream->next ) {
foreach my $object ( @$block ) {
# $object->name eq "Homer Simpson"
}
}
This exact API is intentionally still underdefined. In the future it will be
compatible with L 0.09's syntax.
=head2 DBI SEARCH COLUMNS
In order to make use of the simple search API we need to configure columns for
our DBI backend.
Let's create a 'name' column to search by:
my $dir = KiokuDB->connect(
"dbi:SQLite:dbname=foo",
columns => [
# specify extra columns for the 'entries' table
# in the same format you pass to DBIC's add_columns
name => {
data_type => "varchar",
is_nullable => 1, # probably important
},
],
);
You can either alter the schema manually, or use C to back up your
data, delete the database, connect with C<< create => 1 >> and then use
C.
To populate this column we'll need to load Homer and update him:
{
my $s = $dir->new_scope;
$dir->update( $dir->lookup( $homer_id ) );
}
And this is what it looks in the database:
id = 201C5B55-E759-492F-8F20-A529C7C02C8B
name = Homer Simpson
=head1 GETTING STARTED WITH BDB
The most mature backend for L is L. It performs
very well, and supports many features, like L integration to
provide customized indexing of your objects and transactions.
L is newer and not as tested, but also supports
transactions and L based queries. It performs quite well too, but
isn't as fast as L.
=head2 Installing L
L needs the L module, and a recent version
of Berkeley DB itself, which can be found here:
L.
BerkeleyDB (the library) normally installs into C,
while L (the module) looks for it in C, so
adding a symbolic link should make installation easy.
Once you have L installed, L should install
without problem and you can use it with L.
=head2 Using L
To use the BDB backend we must first create the storage. To do this the
C flag must be passed:
my $backend = KiokuDB::Backend::BDB->new(
manager => {
home => Path::Class::Dir->new(qw(path to storage)),
create => 1,
},
);
The BDB backend uses L to do a lot of the L
gruntwork. The L object will be instantiated using the
arguments provided in the C attribute.
Now that the storage is created we can make use of this backend, much like before:
my $dir = KiokuDB->new( backend => $backend );
Subsequent opens will not require the C argument to be true, but it
doesn't hurt.
This C call is equivalent to the above:
my $dir = KiokuDB->connect( "bdb:dir=path/to/storage", create => 1 );
=head1 TRANSACTIONS
Some backends (ones which do the L role) can be used
with transactions.
If you are familiar with L this should be very familiar:
$dir->txn_do(sub {
$dir->store($obj);
});
This will create a L level transaction, and all changes to the
database are committed if the block was executed cleanly.
If any error occurred the transaction will be rolled back, and the changes will
not be visible to subsequent reads.
Note that L does B touch live instances, so if you do something
like
$dir->txn_do(sub {
my $scope = $dir->new_scope;
$obj->name("Dancing Hippy");
$dir->store($obj);
die "an error";
});
the C attribute is B rolled back, it is simply the C
operation that gets reverted.
Transactions will nest properly, and with most backends they generally increase
write performance as well.
=head1 QUERIES
L is a subclass of L that
provides L integration.
L is a framework to index and query objects, inspired by Postgres'
internal GIN api. GIN stands for Generalized Inverted Indexes.
Using L arbitrary search keys can be indexed for your objects, and
these objects can then be looked up using queries.
For instance, one of the pre canned searches L supports out of the
box is class indexing. Let's use L to do custom
indexing of our objects:
my $dir = KiokuDB->new(
backend => KiokuDB::Backend::BDB::GIN->new(
extract => Search::GIN::Extract::Callback->new(
extract => sub {
my ( $obj, $extractor, @args ) = @_;
if ( $obj->isa("Person") ) {
return {
type => "user",
name => $obj->name,
};
}
return;
},
),
),
);
$dir->store( @random_objects );
To look up the objects, we use the a manual key lookup query:
my $query = Search::GIN::Query::Manual->new(
values => {
type => "person",
},
);
my $stream = $dir->search($query);
The result is L object that represents the search results.
It can be iterated as follows:
while ( my $block = $stream->next ) {
foreach my $person ( @$block ) {
print "found a person: ", $person->name;
}
}
Or even more simply, if you don't mind loading the whole resultset into memory:
my @people = $stream->all;
L is very much in its infancy, and is very under documented.
However it does work for simple searches such as this and contains pre canned
solutions like L.
In short, it works today, but watch this space for new developments.