Previous topic

Next topic

This Page

Quick search

Substance D provides application content indexing and querying via a catalog.
A catalog is an object named catalog which lives in a service named
catalogs within your application's resource tree. A catalog has a number
of indexes, each of which keeps a certain kind of information about your
content.

# in a module named blog.__init__frompyramid.configimportConfiguratordefmain(global_config,**settings):config=Configurator()config.include('substanced')config.scan('blog.catalogs')# .. and so on ...

Once you've done this, you can then add the catalog to the database in any bit
of code that has access to the database. For example, in an event handler when
the root object is created for the first time.

Once a new catalog has been added to the database, each time a new
catalogable object is added to the site, its attributes will be indexed by
each catalog in its lineage that "cares about" the object. The object will
always be indexed in the "system" catalog. To make sure it's cataloged in
custom catalogs, you'll need to do some work. To index the object in a custom
application index, you will need to create an index view for your content
using substanced.catalog.indexview, and scan the resulting index
view using pyramid.config.Configurator.scan():

An index view class should be a class that accepts a single argument,
(conventionally named resource), in its constructor, and which has one or
more methods named after potential index names. When it comes time for the
system to index your content, Substance D will create an instance of your
indexview class, and it will then call one or more of its methods; it will call
methods on the indexview object matching the attr passed in to
add_indexview. The default value passed in should be returned if the
method is unable to compute a value for the content object.

Once this is done, whenever an object is added to the system, a value (the
result of the freaky(default) method of the catalog view) will be indexed in the
freaky field index.

You can use the context parameter to indexview to tell the system that
this particular index view should only be executed when the class of the
resource (or any of its interfaces) matches the value of the context:

You can use the indexview_defaults class decorator to save typing in each
indexview declaration. Keyword argument names supplied to
indexview_defaults will be used if the indexview does not supply the
same keyword:

fromsubstanced.utilimportfind_catalogcatalog=find_catalog(resource,'system')name=catalog['name']path=catalog['path']# find me all the objects that exist under /somepath with the name 'somename'q=name.eq('somename')&path.eq('/somepath')resultset=q.execute()forcontentobinresultset:printcontentob

The calls to name.eq() and path.eq() above each return a query object.
Those two queries are ANDed together into a single query via the
& operator between them (there's also the | character to OR the
queries together, but we don't use it above). Parentheses can be used to
group query expressions together for the purpose of priority.

Different indexes have different query methods, but most support the eq
method. Other methods that are often supported by indexes: noteq, ge,
le, gt, any, notany, all, notall, inrange,
notinrange. The AllowedIndex supports
an additional allows() method.

catalog=find_catalog(resource,'system')name=catalog['name']path=catalog['path']# find me all the objects that exist under /somepath with the name 'somename'q=name.eq('somename')&path.eq('/somepath')resultset=q.execute()newresultset=resultset.sort(name)

Note

If you don't call sort on the hypatia.util.ResultSet you get back,
the results will not be sorted in any particular order.

In many cases, you might only have one custom attribute that you need
indexed, while the system catalog has everything else you need. You
thus need an efficient way to combine results from two catalogs,
before executing the query:

system_catalog=find_catalog(resource,'system')my_catalog=find_catalog(resource,'mycatalog')path=system_catalog['path']funky=my_catalog['funky']# find me all funky objects that exist under /somepathq=funky.eq(True)&path.eq('/somepath')resultset=q.execute()newresultset=resultset.sort(system_catalog['name'])

The Substance D system catalog at
substanced.catalog.system.SystemCatalogFactory contains a number of
default indexes, including an allowed index. Its job is to index security
information to allow security-aware results in queries. This index allows us
to filter queries to the system catalog based on whether the principal issuing
the request has a permission on the matching resource.

The objectmap keeps track of ACLs in a cache to make catalog security
functionality work. Note that for the object map's cached version of ACLs to
be correct, you will need to set ACLs in a way that helps keep track of all the
contracts. For this, the helper function substanced.util.set_acl() can
be used. For example, the site root at substanced.root.Root finishes
with:

As a lesson learned from previous cataloging experience,
Substance D natively supports deferred indexing. As an example,
in many systems the text indexing can be done after the change to the
object is committed in the web request's transaction. Doing so has a
number of performance benefits: the user's request processes more
quickly, the work to extract text from a Word file can be performed
later, less chance to have a conflict error, etc.

As such, the
substanced.catalog.system.SystemCatalogFactory, by default,
has indexes that aren't updated immediately when a resource is
changed. For example:

# name is MODE_ATCOMMIT for next-request folder contents consistencyname=Field()text=Text(action_mode=MODE_DEFERRED)content_type=Field()

The Field indexes use the default of MODE_ATCOMMIT. The Text
overrides the default and set action_mode to MODE_DEFERRED.

substanced.interfaces.MODE_DEFERRED means
indexing action should be performed by an
external indexing processor (e.g. drain_catalog_indexing) if one is
active at the successful end of the current transaction. If an indexing
processor is unavailable at the successful end of the current transaction,
this mode will be taken to imply the same thing as MODE_ATCOMMIT.

This calls sd_drain_indexing which is a console script that
Substance D automatically creates in your bin directory. Indexing
messages are logged with standard Python logging to the file that you
name. You can view these messages with the supervisorctl command
tailindexer. For example, here is the output from
sd_drain_indexing when changing a simple Document content type:

Above we set the default mode used by an index when Substance D indexes
a resource automatically. Perhaps in an evolve script, you'd like to
override the default mode for that index and reindex immediately.

The index_resource on an index can be passed an action_mode
flag that overrides the configured mode for that index, and instead,
does exactly what you want for only that call. It does not permanently
change the configured default for indexing mode. This applies also to
reindex_resource and unindex_resource. You can also grab the
catalog itself and reindex with a mode that overrides all default modes
on each index.

If you add substanced.catalogs.autosync=true within your application's
.ini file, all catalog indexes will be resynchronized with their catalog
factory definitions at application startup time. Indices which were added to
the catalog factory since the last startup time will be added to each catalog
which uses the index factory. Likewise, indices which were removed will be
removed from each catalog, and indices which were modified will be modified
according to the catalog factory. Having this setting in your .ini file is
like pressing the Updateindexes button on the Manage tab of each of
your catalogs. The SUBSTANCED_CATALOGS_AUTOSYNC environment variable can
also be used to turn this behavior on. For example exportSUBSTANCED_CATALOGS_AUTOSYNC=true.

If you add substanced.catalogs.autoreindex=true within your application's
.ini file, all catalogs that were changed as the result of an auto-sync
will automatically be reindexed. Having this setting in your .ini file is
like pressing the Reindexcatalog button on the Manage tab of each
catalog which was changed as the result of hitting Updateindexes. The
SUBSTANCED_CATALOGS_AUTOREINDEX environment variable can also be used to
turn this behavior on. For example exportSUBSTANCED_CATALOGS_AUTOREINDEX=true.

There may be times when you'd like to defer all catalog indexing operations,
such as during a bulk load of data from a script. Normally, only indexes
marked with MODE_DEFERRED use deferred indexing, and actions associated
with those indexes are even then only actually deferred if an index processor
is active.

You can force Substance D to defer all catalog indexing using the
substanced.catalogs.force_deferred flag in your application's .ini
file. When this flag is used, all catalog indexing operations will be added to
the indexer's queue, even those indexes marked as MODE_IMMEDIATE or
MODE_ATCOMMIT. Deferral will also happen whether or not the indexer is
running, unlike during normal operations.

When you use this flag, you can stop the indexer process, do your bulk load,
and start the indexer again when it's convenient to have all the content
indexing done in the background.

The SUBSTANCED_CATALOGS_FORCE_DEFERRED environment variable can also be
used to turn this behavior on. For example exportSUBSTANCED_CATALOGS_FORCE_DEFERRED=true.