Blog entries

I aim at moving all our applications to CubicWeb/Pyramid,
so I wonder what will be the best way to deliver them. For now, we have a
setup made of Apache + Varnish + Cubicweb/Twisted. In some
applications we have two CubicWeb instances with a naive load balacing
managed by Varnish.

When moving to cubicweb-pyramid, there are several options.
By default, a cubicweb-pyramid instance started via the cubicweb-ctl
pyramid command, is running a waitress wsgi http server.
I read it is common to deliver wsgi applications with nginx + uwsgi,
but I wanted to play with mongrel2 (that I
already tested with Cubicweb a while ago), and give a try to the
circus + chaussette stack.

I have not been able to use uwsgi-mongrel2 as wsgi backend for
mongrel2, since this uwsgi plugin is not provided by the uwsgi debian
packages. I've used wsgid instead (sadly, the project appears to be dead).

The tricky config option there is lazy-apps which must be set,
otherwise the worker processes are forked after loading the cubicweb
application, which this later does not support. If you omit this, only one
worker will get the requests.

As expected, the legacy (and still default) twisted-based server is
the least efficient method to serve a cubicweb application.

When comparing results with only one CubicWeb worker, the
pyramid+waitress solution that comes with cubicweb-pyramid is the
most efficient, but mongrel2 + wsgid and circus + chaussette
solutions mostly have similar performances when only one worker is
activated. Surprisingly, the uwsgi solution is significantly less
efficient, and especially have some requests that take significantly
longer than other solutions (even the legacy twisted-based server).

The price for activating several workers is small (around 3%) but
significant when only one client is requesting the application. It is
still unclear why.

When there are severel workers requesting the application, it's not a
surpsise that solutions with 4 workers behave significanly better (we
are still far from a linear response however, roughly a 2x better for
4x the horsepower; maybe the hardware is the main reason for this
unexpected non-linear response).

I am quite surprised that uwsgi behaved significantly worse than the 2
other scalable solutions.

Mongrel2 is still very efficient, but sadly the wsgid server I've
used for these tests has not been developed for 2 years, and the uwsgi
plugin for mongrel2 is not yet available on Debian.

On the other side, I am very pleasantly surprised by circus +
chaussette. Circus also comes with some nice features like a nice web
dashboard which allows to add or remove workers dynamically:

The pyramid-based web server proposed by Christophe and used for its unlish website is still under test and evaluation at Logilab. There are missing features (implemented in cubes) required to be able to deploy pyramid-cubicweb for most of the applications used at Logilab, especially cubicweb-signedrequest

In order to make it possible to implement authentication cubes like cubicweb-signedrequest, the pyramid-cubicweb requires some modifications. These has been developped and are about to be published, along with a new version of signedrequest that provide pyramid compatibility.

There are still some dependencies that lack a proper Debian package, but that should be done in the next few weeks.

In order to properly identify pyramid-related code in a cube, it has been proposed that these code should go in modules in the cube named pviews and pconfig (note that most cube won't require any pyramid specific code). The includeme function should however be in the cube's main packgage (in the __init__.py file)

There have been some discussions about the fact that, for now, a pyramid-cubicweb instance requires an anonymous user/access, which can also be a problem for some application.

Christophe pointed the fact that the directory/files layout of cubicweb and cubes do not follow current Python's de facto standards, which makes cubicweb hard to use in a context of virtualenv/pip based installation. There is the CWEP004 discussing some aspects of this problem.

The decision has been taken to move toward a Cubicweb ecosystem that is more pip-friendly. This will be done step by step, starting with the dependencies (packages currently living in the logilab "namespace").

Then we will investigate the feasibility of migrating the layout of Cubicweb itself.

A heavy refactoring is under way that concerns data import in CubicWeb. The main goal is to design a single API to be used by the various cubes that accelerate the insertion of data (dataio, massiveimport, fastimport, etc) as well as the internal CWSource and its data feeds.

For details, see the thread on the mailing-list and the patches arriving in the review pipeline.

This version has been released a few days ago. It has not been deployed on production systems yet.

Its main features are:

virtual relations: a new ComputedRelation class can be used in
schema.py; its rule attribute is an RQL snippet that defines the new
relation.

computed attributes: an attribute can now be defined with a formula
argument (also an RQL snippet); it will be read-only, and updated
automatically.

Both of these features are described in CWEP-002, and the updated
"Data model" chapter of the CubicWeb book.

cubicweb-ctl plugins can use the cubicweb.utils.admincnx function
to get a Connection object from an instance name.

new 'tornado' wsgi backend

session cookies have the HttpOnly flag, so they're no longer exposed to
javascript

rich text fields can be formatted as markdown

the edit controller detects concurrent editions, and raises a ValidationError
if an entity was modified between form generation and submission

cubicweb can use a postgresql "schema" (namespace) for its tables

cubicweb-ctl configure can be used to set values of the admin user
credentials in the sources configuration file

For details read list of tickets for CubicWeb 3.20.0.

We would have loved to integrate the pyramid cube in this release, but the debian packaging effort needed by the pyramid stack is quite big and is acceptable if we target jessie only (at decent price).

We are expecting to be able to use squareui/bootstrap as "rendering engine" for our forge applications (like http://www.cubicweb.org and http://www.logilab.org) as soon as possible. However to achieve to goal, there are still too many "visual bugs", some of which may require a discussion.

Among others:

put the ctxtoolbar component in the <nav> div

each box component should have an icon (what API for this?)

we cannot easily make the left column of the main template responsive-aware (requires to change the html flow), so it's probably best to take inspiration from things like http://wrapbootstrap.com/preview/WB0N89JMK

facet boxes are a mess, there is no simple solution to have a "smart layout"

The Logilab team holds a roadmap meeting every two months to plan its CubicWeb development effort. The previous roadmap meeting was in September 2014.

Here is the report about the November 6th, 2014 meeting.
Christophe de Vienne (Unlish) joined us to express their concerns and discuss the future of CubicWeb. Dimitri Papadopoulos (CEA) could not come.

This version is still under development but should be released very soon now (expected next week). Its main feature being the inclusion of CWEP-002 (computed attributes and relations), along with many small improvement patches.

For details read list of tickets for CubicWeb 3.20.0.

We would have loved to integrate the pyramid cube in this release, but the debian packaging effort needed by the pyramid stack is quite big and is acceptable if we target jessie only (at decent price).

The datafeed API is essentially built around two things: a CWSource
entity and a parser, which is a kind of AppObject.

The CWSource entity defines a list of URL from which to fetch data to
be imported in the current CubicWeb instance, it is linked to a parser
through its __regid__. So something like the following should be enough
to create a usable datafeed source [1].

The parser is usually a subclass of DataFeedParser (from
cubicweb.server.sources.datafeed). It should at least implement the two
methods process and before_entity_copy. To make it easier, there
are specialized parsers such as DataFeedXMLParser that already define
process so that subclasses only have to implement the
process_item method.

Before going into further details about the actual implementation of a
DataFeedParser, it's worth having in mind a few details about the
datafeed parsing and import process. This involves various players
from the CubicWeb server, namely: a DataFeedSource (from
cubicweb.server.sources.datafeed), the Repository and the
DataFeedParser.

Everything starts from the Repository which loops over its
sources and pulls data from each of these (this is done using a
looping task which is setup upon repository startup). In the case of
datafeed sources, Repository sources are instances of the
aforementioned DataFeedSource class [2].

The DataFeedSource selects the appropriate parser from the
registry and loops on each uri defined in the respective
CWSource entity by calling the parser's process method with
that uri as argument (methods pull_data and process_urls
of DataFeedSource).

If the result of the parsing step is successful, the
DataFeedSource will call the parser's handle_deletion method,
with the URI of the previously imported entities.

Then, the import log is formatted and the transaction committed. The
DataFeedSource and DataFeedParser are connected to an
import_log which feeds the CubicWeb instance with a CWDataImport
per data pull. This usually contains the number of created and updated
entities along with any error/warning message logged by the parser. All
this is visible in a table from the CWSource primary view.

So now, you might wonder what actually happens during the parser's process
method call. This method takes an URL from which to fetch data and processes
further each piece of data (using a process_item method for instance). For
each data-item:

the repository is queried to retrieve or create an entity in the
system source: this is done using the extid2entity method;

this extid2entity method essentially needs two pieces of
information:

a so-called extid, which uniquely identifies an item in the
distant source

any other information needed to create or update the corresponding
entity in the system source (this will be later refered to as the
sourceparams)

then, given the (new or existing) entity returned by
extid2entity, the parser can perform further postprocessing (for
instance, updating any relation on this entity).

In step 1 above, the parser method extid2entity in turns calls the
repository method extid2eid given the current source and the extid
value. If an entry in the entities table matches with the specified
extid, the corresponding eid (identifier in the system source) is
returned. Otherwise, a new eid is created. It's worth noting that the
created entity (in case the entity is to be created) is not complete with
respect to the data model at this point. In order the entity to be
completed, the source method before_entity_insertion is called. This is
where the aforementioned sourceparams are used. More specifically, on the
parser side the before_entity_copy method is called: it usually just
updates (using entity.cw_set() for instance) the fetched entity with any
relevant information.

Now we'll go through a concrete example to illustrate all those fairly
abstract concepts and implement a datafeed parser which can be used to
import news feeds. Our parser will create entities of type FeedArticle,
which minimal data model would be:

Here we'll reuse the DataFeedXMLParser, not because
we have XML data to parse, but because its interface fits well with our
purpose, namely: it ships an item-based processing (a process_item method)
and it relies on a parse method to fetch raw data. The underlying
parsing of the news feed resources will be handled by feedparser.

Then the process_item method takes an individual item (i.e. an entry
of the result obtained from feedparser in our case). It essentially
defines an extid, here the uri of the feed entry (good candidate
for unicity) and calls extid2entity with that extid, the entity
type to be created / retrieved and any additional data useful for entity
completion passed as keyword arguments. (The process_feed method
call just transforms the results obtained from feedparser into a dict
suitable for entity creation following the data model described above.)

The before_entity_copy method is called before the entity is
actually created (or updated) in order to give the parser a chance to
complete it with any other attribute that could be set from source data
(namely feedparser data in our case).

And this is all what's essentially needed for a simple parser. Further details
could be found in the news aggregator cube. More sophisticated parsers may
use other concepts not described here, such as source mappings.

Testing a datafeed parser often involves pulling data from the corresponding
datafeed source. Here is a minimal test snippet that illustrates how to
retrieve the datafeed source from a CWSource entity and to pull data
from it.

The resulting stats is a dictionnary containing eids of created and
updated entities during the pull. In addition all entities created should have
the cw_source relation set to the corresponding CWSource entity.

The mapping between CWSource entities' type (e.g. "datafeed")
and DataFeedSource object is quite unusual as it does not rely on
the vreg but uses a specific sources registry (defined in
cubicweb.server.SOURCE_TYPES).

This version is under development. It will try to reduce as much as possible the
stock of patches in the state "reviewed", "awaiting review" and "in
progress". If you have had something in the works that has not been accepted
yet, please ready it for 3.20 and get it merged.

It should still include the work done for CWEP-002 (computed attributes and
relations).

When the work done for Pyramid will have been tested, it will become the default
runner and a lot of things will be dropped: twisted, dead code, ui and core code
that would be better cast into cubes, etc.

Logilab and Christophe will try to make CubicWeb more
pip/virtualenv-friendly. This may involve changing the source layout to include
a sub-directory, but the impact on existing devs is expected to be too much and
could be delayed to CubicWeb 4.0.

Christophe has made good progress on getting CubicWeb to work with Pyramid and
he intends to put it into production real soon now. There is a Pyramid extension
named pyramid_cubicweb and a CubicWeb cube named
cubicweb-pyramid. Both work with CubicWeb
3.19. Christophe demonstrated using the debug toolbar,
authenticating users with Authomatic and starting multiple workers with
uWSGI.

This post considers the issue of building an edition form of a CubicWeb entity
with dependencies on its fields. It's a quite common issue that needs to be
handled client-side, based on user interaction.

The main entity of interest is Citizen which has two relation
definitions towards Country and City. Then, a City is bound
to a Country through the in_country relation definition.

In the automatic edition form of Citizen entities, we would like to
restrict the choices of cities depending on the selected Country, to be
determined from the value of the country field. (In other words, we'd like
the constraint on city relation defined above to be fulfilled during form
rendering, not just validation.) Typically, in the image below, cities not in
Italy should be available in the city select widget:

The issue will be solved by little customization of the automatic entity form,
some uicfg rules and a bit of Javascript. In the following, the country
field will be referred to as the master field whereas the city field as
the dependent field.

The first thing (reading from the bottom of the file) is that we've
added a choices function on city relation of the Citizen
automatic entity form via uicfg. This function city_choice
essentially generates the HTML content of the field value by grouping
available cities by respective country through the addition of some
optgroup tags.

Then, we've overridden the automatic entity form for Citizen entity
type by essentially calling a piece of Javascript code fed with the DOM
ids of the master and dependent fields. Fields are retrieved by their
name (field_by_name method) and respective id using the dom_id
method.

It consists of two functions. The initDependentFormField is called during
form rendering and it essentially bind the second function
updateDependentFormField to the change event of the master select field.
The latter "update" function retrieves the dependent select field, hides all
optgroup nodes (i.e. the whole content of the select widget) and then only
shows dependent options that match with selected master option, identified
by a custom country_<eid> set by the vocabulary function above.

This version was published at the end of April and has now been tested on our internal servers. It includes support for Cross
Origin Resource Sharing (CORS) and a heavy
refactoring that modifies sessions and sources to lay the path for CubicWeb 4.

This version is under development. It will try to reduce as much as possible the
stock of patches in the state "reviewed", "awaiting review" and "in
progress". If you have had something in the works that has not been accepted
yet, please ready it for 3.20 and get it merged.

It should still include the work done for CWEP-002 (computed attributes and
relations.

The new logo is now published in the 3.19 line. David showed us his experimentation that modernize a forge's ui with a bit of CSS. There is still a bit of pressure on the bootstrap side though, as it still rely on heavy monkey-patching in the cubicweb-bootstrap cube.

Also, Dimitry expressed is concerns with the lack of proper data import API. We should soon have some feedback from Aurelien's cubicweb-fastimport experimentation, which may be an answer to Dimitry's need. In the end, we somewhat agreed that there were different needs (eg massive-no-consistency import vs not-so-big-but-still-safe), that cubicweb.dataimport was an attempt to answer them all and then cubicweb-dataio and cubicweb-fastimport were more specific responses. In the end we may reasonably hope that an API will emerge.

On his way to persistent sessions, Aurélien made a huge progress toward silence of warnings in the 3.19 tests. dbapi has been removed, ClientConnection / Connection merged. We decided to take some time to think about the recurring task management as it is related to other tricky topics (application / instance configuration) and it's not directly related to persistent session.

Last but not least, Christophe demonstrated that CubicWeb could basically live with Pyramid. This experimentation will be pursued as it sounds very promising to get the good parts from the two framework.

The Logilab team holds a roadmap meeting every two months to plan its CubicWeb
development effort. Here is the report about the May 15th, 2014 meeting. The
previous report posted to the blog was the march 2014 roadmap.

This version is under development. It will try to reduce as much as possible the
stock of patches in the state "reviewed", "awaiting review" and "in
progress". If you have had something in the works that has not been accepted
yet, please ready it for 3.20 and get it merged.

It should also include the work done for CWEP-002 (computed attributes and
relations) and the merging of Connection and ClientConnection if it happens to
be simple enough to get done quickly (in case the removal of dbapi would really
help, this merging will wait for 3.21).

The anonymous property of Session and Connection is now computed from the
related user login. If it matches the anonymous-user in the config the
connection is anonymous. Beware that the anonymous-user config is web
specific. Therefore, no session may be anonymous in a repository only setup.

A new explicit Connection object replaces Session as the main repository entry
point. A Connection holds all the necessary methods to be used server-side
(execute, commit, rollback, call_service, entity_from_eid,
etc...). One obtains a new Connection object using session.new_cnx().
Connection objects need to have an explicit begin and end. Use them as a context
manager to never miss an end:

with session.new_cnx() as cnx:
cnx.execute('INSERT Elephant E, E name "Babar"')
cnx.commit()
cnx.execute('INSERT Elephant E, E name "Celeste"')
cnx.commit()
# Once you get out of the "with" clause, the connection is closed.

Using the same Connection object in multiple threads will give you access to the
same Transaction. However, Connection objects are not thread safe (hence at your
own risks).

repository.internal_session is deprecated in favor of
repository.internal_cnx. Note that internal connections are now safe by default,
i.e. the integrity hooks are enabled.

A new API has been introduced to replace the dbapi. It is called repoapi.

There are three relevant functions for now:

repoapi.get_repository returns a Repository object either from an
URI when used as repoapi.get_repository(uri) or from a config
when used as repoapi.get_repository(config=config).

repoapi.connect(repo, login, **credentials) returns a ClientConnection
associated with the user identified by the credentials. The
ClientConnection is associated with its own Session that is closed
when the ClientConnection is closed. A ClientConnection is a
Connection-like object to be used client side.

repoapi.anonymous_cnx(repo) returns a ClientConnection associated
with the anonymous user if described in the config.

On the client/web side, the Request is now using a repoapi.ClientConnection
instead of a dbapi.Connection. The ClientConnection has multiple backward
compatible methods to make it look like a dbapi.Cursor and dbapi.Connection.

Sessions used on the Web side are now the same as the ones used Server side.
Some backward compatibility methods have been installed on the server side Session
to ease the transition.

The authentication stack has been altered to use the repoapi instead of
the dbapi. Cubes adding new elements to this stack are likely to break.

All current methods and attributes used to access the repo on CubicWebTC are
deprecated. You may now use a RepoAccess object. A RepoAccess object is
linked to a new Session for a specified user. It is able to create
Connection, ClientConnection and web side requests linked to this
session:

access = self.new_access('babar') # create a new RepoAccess for user babar
with access.repo_cnx() as cnx:
# some work with server side cnx
cnx.execute(...)
cnx.commit()
cnx.execute(...)
cnx.commit()
with access.client_cnx() as cnx:
# some work with client side cnx
cnx.execute(...)
cnx.commit()
with access.web_request(elephant='babar') as req:
# some work with web request
elephant_name = req.form['elephant']
req.execute(...)
req.cnx.commit()

By default testcase.admin_access contains a RepoAccess object for the
default admin session.

RepositorySessionManager.postlogin is now called with two arguments,
request and session. And this now happens before the session is linked to the
request.

SessionManager and AuthenticationManager now take a repo object at
initialization time instead of a vreg.

The async argument of _cw.call_service has been dropped. All calls are
now synchronous. The zmq notification bus looks like a good replacement for
most async use cases.

repo.stats() is now deprecated. The same information is available through
a service (_cw.call_service('repo_stats')).

repo.gc_stats() is now deprecated. The same information is available through
a service (_cw.call_service('repo_gc_stats')).

repo.register_user() is now deprecated. The functionality is now
available through a service (_cw.call_service('register_user')).

request.set_session no longer takes an optional user argument.

CubicwebTC does not have repo and cnx as class attributes anymore. They are
standard instance attributes. set_cnx and _init_repo class methods
become instance methods.

set_cnxset and free_cnxset are deprecated. The database connection
acquisition and release cycle is now more transparent.

The implementation of cascading deletion when deleting composite
entities has changed. There comes a semantic change: merely deleting
a composite relation does not entail any more the deletion of the
component side of the relation.

_cw.user_callback and _cw.user_rql_callback are deprecated. Users
are encouraged to write an actual controller (e.g. using ajaxfunc)
instead of storing a closure in the session data.

A new entity.cw_linkable_rql method provides the rql to fetch all entities
that are already or may be related to the current entity using the given
relation.

The Logilab team holds a roadmap meeting every two months to plan its CubicWeb
development effort. Here is the report about the Mar 7th, 2014 meeting. The
previous report posted to the blog was the january 2014 roadmap.

This version will try to reduce as much as possible the stock of patches in the
state "reviewed", "awaiting review" and "in progress". If you have had something
in the works that has not been accepted yet, please ready it for 3.20 and get it
merged.

It should also include the work done for CWEP-002 (computed attributes and
relations) and CWEP-003 (adding a FROM clause to RQL).

managing dates can be tricky when users reside in different timezones and UTC
is important to keep in mind (unicode/str is a good analogy);

for transitive closures that are often needed when implementing access
control policies with __permissions, Postgresql can go a long way with queries
like "WITH ... (SELECT UNION ALL SELECT RETURNING *) UPDATE USING ...";

the fastest way to load tabular data that does not need too much
pre-processing is to create a temporary table in memory, then COPY-FROM the
data into that table, then index it, then write the transform and load step
in SQL (maybe with PL/Python);

when executing more than 10 updates in a row, it is better to write into a
temporary table in memory, then update the actual tables with UPDATE USING
(let's check if the psycopg driver does that when executemany is called);

reaching 10e8 rows in a table is at the time of this writing the stage when
you should start monitoring your db seriously and start considering
replication, partition and sharding.

full-text search is much better in Postgresql than the general public thinks
it is and recent developments made it orders of magnitude faster than tools
like Lucene or Solr and ElasticSearch;

when dealing with complex queries (searching graphs maybe), an option to
consider is to implement a specific data type, use it into a materialized view
and use GIN or GIST indexes over it;

for large scientific data sets, it could be interesting to link the numpy
library into Postgresql and turn numpy arrays into a new data type;

Oh, and one last thing: the object-oriented tables of Postgresql are not
such a great idea, unless you have a use case that fits them perfectly and
does not hit their limitations (CubicWeb's is_instance_of does not seem to be
one of these).

The Logilab team holds a roadmap meeting every two months to plan its CubicWeb
development effort. Here is the report about the Jan 9th, 2014 meeting. The
previous report posted to the blog was the november 2013 roadmap.

This version includes a heavy refactoring that modifies sessions and
sources to lay the path for CubicWeb 4. It is currently the default development
head in the repository and is expected to be released before the end of january.

This version will try to reduce as much as possible the stock of patches in the
state "reviewed", "awaiting review" and "in progress". If you have had something
in the works that has not been accepted yet, please ready it for 3.20 and get it
merged.

The current trend is to develop more and more new features in dedicated cubes
than to add more code to the core of CubicWeb. If you thought CubicWeb
development was slowing down, you made a mistake, because cubes are ramping up.

resourcepicker
provides a modal window to insert links to images and files into structured
text.

rqlcontroller allows
to use the INSERT, DELETE and SET keywords when sending RQL queries over
HTTP. It returns JSON. Get used to it and you may forget about asking for
specific web services in your apps, for it is a generic web service.

imagesearch is an
image gallery with facets. You may use it as a demo of a visual search tool.

introduce an add permission on attributes, to be interpreted at
entity creation time only and allow the implementation of complex
update rules that don't block entity creation (before that the
update attribute permission was interpreted at entity creation and
update time) (see #2965518)

the primary view display controller (uicfg) now has a
set_fields_order method similar to the one available for forms

new method ResultSet.one(col=0) to retrieve a single entity and enforce the
result has only one row (see #3352314)

new method RequestSessionBase.find to look for entities
(see #3361290)

the embedded jQuery copy has been updated to version 1.10.2, and jQuery UI to
version 1.10.3.

initial support for wsgi for the debug mode, available through the new
wsgi cubicweb-ctl command, which can use either python's builtin
wsgi server or the werkzeug module if present.

a rql-table directive is now available in ReST fields

cubicweb-ctl upgrade can now generate the static data resource directory
directly, without a manual call to gen-static-datadir.

not really an API change, but the entity write permission checks are now
systematically deferred to an operation, instead of a) trying in a
hook and b) if it failed, retrying later in an operation

The default value storage for attributes is no longer String, but
Bytes. This opens the road to storing arbitrary python objects, e.g.
numpy arrays, and fixes a bug where default values whose truth value
was False were not properly migrated.

symmetric relations are no more handled by an rql rewrite but are
now handled with hooks (from the activeintegrity category); this
may have some consequences for applications that do low-level database
manipulations or at times disable (some) hooks.

unique together constraints (multi-columns unicity constraints)
get a name attribute that maps the CubicWeb contraint entities to the
corresponding backend index.

BreadCrumbEntityVComponent's open_breadcrumbs method now includes
the first breadcrumbs separator

entities can be compared for equality and hashed

the on_fire_transition predicate accepts a sequence of possible
transition names

the GROUP_CONCAT rql aggregate function no longer repeats duplicate
values, on the sqlite and postgresql backends

The Logilab team holds a roadmap meeting every two months to plan its CubicWeb
development effort. Here is the report about the Nov 8th, 2013 meeting. The
previous report posted to the blog was the september 2013 roadmap.

This version was supposed to be released in september or october, but is stalled
at the integration stage. All open tickets were moved to 3.19 and existing
patches that are not ready to be merged will be more aggressively delayed to
3.19. The goal is to release 3.18 as soon as possible.

An Apache front end might be useful, as Apache provides standard log files, monitoring or authentication. In our case, we have Apache authenticate users before they are cleared to access our CubicWeb application. Still, we would like user accounts to be managed within a CubicWeb instance, avoiding separate sets of identifiers, one for Apache and the other for CubicWeb.

We have to address two issues:

have Apache authenticate users against accounts in the CubicWeb database,

A possible solution would be to access the identifiers associated to a CubicWeb account at the SQL level, directly from the SQL database underneath a CubicWeb instance. The login password can be found in the cw_login and cw_upassword columns of the cw_cwuser table. The benefit is that we can use existing Apache modules for authentication against SQL databases, typically mod_authn_dbd. On the other hand this is highly dependant on the underlying SQL database.

Instead we have chosen an alternate solution, directly accessing the CubicWeb repository. Since we need Python to access the repository, our sysasdmins have deployed mod_python on our Apache server.

We wrote a Python authentication module that accesses the repository using ZMQ. Thus ZMQ needs be enabled. To enable ZMQ uncomment and complete the following line in all-in-one.conf:

This cube gets CubicWeb to trust the x-remote-user header sent by the Apache front end. CubicWeb bypasses its own authentication mechanism. Users are directly logged into CubicWeb as the user with a login identical to the Apache login.

The Python authentication module is deployed as /usr/lib/python2.7/dist-packages/cubicwebhandler/handler.py where cubicwebhandler is the attribute associated to PythonAuthenHandler in the Apache configuration.

Cubicweb and the Brainomics project were presented last week at the CrEDIBLE workshop (October 2-4, 2013, Sophia-Antipolis) on "Federating distributed and heterogeneous biomedical data and knowledge".
We would like to thank the organizers for this nice opportunity to show the features of CubicWeb and Brainomics in the context of biomedical data.

A short presentation of SHI3LD that defines data access based on
conditions that are based on ASK request. The other part was a state of
the art of Open data license, and the (poor) existence of licenses
expressed in RDF. Future work seems to be an interesting combination of
both SHI3LD and RDF-based licenses for data access.

MIDAS, an open-source software for sharing medical data. This project could be an interesting source of inspiration for the file sharing part of CubicWeb, even if the (really complicated in my opinion) case of large files downloads is not addressed for now.

Federated queries based on FedX - the optimization techniques based on source selection & exclusive groups seems a good approach for avoiding large data transfers and finding some (sub-)optimal ways to join the different data sources. This should be taken into account in the future work on the "FROM" clause in CubicWeb.

Some people seem confused on the RQL to SQL translation. This relies on a simple translation logic that is implemented in the rql2sql file. This is only an implementation trick, not so different from the one used in RDBMS-based triplestores that have to convert SPARQL into SQL.

RQL inference : there is no magic behind the RQL inference process. As opposed to triplestores that store RDF triples that contain their own schema, and thus cannot easily know the full data model in these triples without looking at all the triples, RQL relies on a relational database with an fixed (at a given moment) data model, thus allowing inference and simple checks. In particular, in this example, we want All the Cities of `Île de France` with more than 100 000 inhabitants ?, which is expressed in RQL:

Beside the fact that RQL is less verbose that SPARQL (syntax matters), the simplicity of RQL relies on the fact that it can automatically infer (similarly to SPARQL) that if X is related to Y by the region relation and has a population attribute, it should be a city. If city and district both have the region relation and a population attribute, the RQL inference allows to fetch them both transparently, otherwise one can be specific by using the is relation:

The Logilab team holds a roadmap meeting every two months to plan its CubicWeb
development effort. Here is the report about the Sept 6th, 2013 meeting. The
previous report posted to the blog was the february 2013 roadmap.

This version is now stable and maintained (release 3.17.7 is upcoming). It added
a couple features and focused on putting CW to the diet by extracting some
functionnalities provided by the core into external cubes: sioc, embed,
massmailing, geocoding, etc.

This version is now freezed and will be published as soon as all the patches are
tested and merged. Since we have a lot of work for clients until the end of the
year at Logilab, the community should feel free to help (as usual) if it wants
this version to be released rather sooner than later.

This version will remove the ldapuser source that is replaced by ldapfeed,
implement Cross Origin Resource Sharing, drop some very old compatibility code,
deprecate the old version of the multi-source system and provide various other
features and bugfixes.

Since Orbui changes the organization of the default user interface on screen, it
was decided to share the low-level bootstrap related views that could be shared
and build a SquareUI cube that would conform design choices of the default UI.

Our current plan is to extract as much as possible to cubes. We started CubicWeb
many years ago with the Python motto "batteries included", but have since
realized that having too much in the core contributes to making CubicWeb
difficult to learn.

Since we would very much like the community to grow, we are now aiming for
something more balanced, like Mercurial does. The core is designed such that
most features can be developed as an extension. Once they are stable, popular
extensions can be moved to the main library that is distributed with the core,
and be activated with a switch in the configuration file.

Several cubes are under active development: oauth, signedrequest, dataio, etc.

Once the cube legacyui extracted (in version 3.17), it will be possible to move forward
swiftly with squareui. Due to its other duties, one can not expect the core CW team to develop squareui.
People interested will be in charge and ideally the squareui cube could be released when cubicweb 3.17
will be published.

What's new in CubicWeb 3.16?

Add a new dataimport store (SQLGenObjectStore). This store enables a fast
import of data (entity creation, link creation) in CubicWeb, by directly
flushing information in SQL. This may only be used with PostgreSQL, as it
requires the 'COPY FROM' command.

Dropped 'pyro-ns-host', 'pyro-instance-id', 'pyro-ns-group' from the client side
configuration, in favor of 'repository-uri'. NO MIGRATION IS DONE,
supposing there is no web-only configuration in the wild.

Stop discovering the connection method through repo_method class attribute
of the configuration, varying according to the configuration class. This is
a first step on the way to a simpler configuration handling.

DB-API related changes:

Stop indicating the connection method using ConnectionProperties.

Drop _cnxtype attribute from Connection and cnxtype from
Session. The former is replaced by a is_repo_in_memory property
and the later is totaly useless.

Turn repo_connect into _repo_connect to mark it as a private function.

Deprecate in_memory_cnx which becomes useless, use _repo_connect instead
if necessary.

the "tcp://" uri scheme used for ZMQ
communications (in a way reminiscent of Pyro) is now named
"zmqpickle-tcp://", so as to make room for future zmq-based lightweight
communications (without python objects pickling).

The RQL search bar has now some auto-completion support. It means
relation types or entity types can be suggested while typing. It is
an awesome improvement over the current behaviour !

The action box associated with table views (from tableview.py)
has been transformed into a nice-looking series of small tabs; it
means that the possible actions are immediately visible and need not
be discovered by clicking on an almost invisible icon on the upper
right.

The uicfg module has moved to web/views/ and ui configuration
objects are now selectable. This will reduce the amount of
subclassing and whole methods replacement usually needed to
customize the ui behaviour in many cases.

For two days, on dec 13th/14th 2012, ten hackers gathered at Logilab to improve the user interface of CubicWeb. This hackathon was initiated by
Crealibre. About a year ago, they started the Orbui project, a new user interface for CubicWeb based on the Bootstrap HTML/CSS framework.

Several projects at Logilab and Crealibre proved that Orbui was heading in the right direction, but that it had to fight with the default user interface of Cubicweb. Orbui makes different design/ergonomic choices and needs different HTML/CSS structure and Javascript components.

Sylvain published a roadmap back in may with a section titled "on the road to Bootstrap". After more than half a day of heated debate on the firts day, it was decided to follow the direction he pointed to. We started extracting from CubicWeb the default user interface and turning it into a set of cubes:

cubicweb-legacyui: css, views and templates extracted from CubicWeb 3.16, so as to provide full backward compatibility

cubicweb-bootstrap: empty cube with only bootstrap version 2.2.2 in data/

cubicweb-squareui: bootstrapified version of legacyui (slightly altered to benefit from the bootstrap css without breaking backward compatibility too hard)

At the end of the sprint, one could add_cube('squareui') on an existing application and keep it usable... and get "some kind of responsiveness" for free, thus proving that we were on the right track.

A lot of work is still ahead of us, but we have moved a few step forward towards the goal of making it easier to implement different UIs on top of CubicWeb 3.17.

For the curious, here is what the skeleton of legacyui.views.maintemplate (aka cw.web.views.maintemplate) looks like:

This blog post describes the main points of the alignment process between the
French National Library's Géo repository of data, and the data extracted from
Geonames.

Alignment is the process of finding similar entities in different
repositories. The Géo repository of data contains a lot of locations and the
goal is to find those locations in the Geonames repository, and to be able to
say that location in *Géo* is the same than this one in *Geonames*. For that
purpose, Logilab developed a library, called Nazca,
to build those links.

To process the alignment between Géo and Geonames, we divided the Géo
repository into two groups:

A group gathering the Géo data having information about longitude and
latitude.

An other, gathering the data having no information about longitude and
latitude.

Thanks to the Kdtree, we can quickly find the geographical nearest
neighbours. During this fourth step, we loop over the nearest neighbours and
assign to each a grade according to the similarity of its name and
the name of the location we're looking for, using the Levenshtein distance.
The alignment will be made with the best graded one.

Let's have a look to the data having no information on the longitude and the
latitude. The steps are more or less the same than before, except that we cannot
find neighbours using a Kdtree. So, we use an other method to find location
having a quite high level of similarity in their names. This method is called
the Minhashing which has been shown to be quite relevant for this purpose.

To minimise the amount of mistakes, we try to gather locations according to
their country, knowing the country in often written in the location's
preferred_label. This pre-treatment helps us to filter out the cities having
the same name but located in different countries. For instance, there is
Paris in France, there is
Paris in the United States, and there
is Paris in Canada. So the
alignment is made country by country.

One problem we met is the language used to describe the location. Indeed, the
similarity grade is given according the distance between the names, and one can
notice that Londres and London, for instance, do not having the same
spelling.despite they represent the same location.

In order to improve a little bit the results, we had a closer look to the 10.7%
non-aligned of the first group. The problem of the language mentioned before was
pretty clear. So we decided to use the following definition : two locations are
identical, if they are geographically very close. Using this definition, we get rid of the name,
and focus on the longitude and the latitude only.

To estimate the exactness of the results, we pick 50 randomly chosen location
and process to a manual checking. And the results are pretty good ! 98% are
correct (49/50). That's how, based on a purely geographical approach, we can
increase the results covering rate (from 89.3% to 99.6%).

A few people from Logilab attended the dotjs conference in Paris last week. The conference wasn't exactly what we expected, we were hoping for more technical talks. Nevertheless, some of the things we saw were quite interesting. Some of them could be relevant to CubicWeb.

Building your URLs in cubicweb

In cubicweb, you often have to build url's that redirect the current view to
a specific entity view or allow the execution of a given action. Moreover, you
often want also to fallback to the previous view once the specific action or
edition is done, or redirect also to another entity's specific view.

To do so, cubicweb provides you with a set of powerful tools, however as
there is often more than one way to do it, this blog entry is here to help
you in choosing the preferred way.

build_url is accessible in any context, so for instance in the rendering of a
given entity view you can call self._cw.build_url to build you URLs easily,
which is the most common case. In class methods (for instance, when declaring the
rendering methods of an EntityTableView), you can access it through the context
of instantiated appobject which are usually given as argument,
e.g. entity._cw.build_url. For test purposes you can also call
session.build_url in cubicweb shells.

build_url basically take a first optional, the path, relative to the base
url of the site, and arbitrary named arguments that will be encoded as url
parameters. Unless you wish to direct to a custom controller, or to match
an URL rewrite url, you don't have to specify the path.

Extra parameters given to build_url will vary according to your needs, however
most common arguments understood by default cubicweb views are the followings:

vid: the built view __regid__;

rql: the RQL query used to retreive data on which the view should be
applied;

eid: the identifier of an entity, which you should use instead of rql
when the view apply to a single entity (most often);

__message: an information message to display inside the view;

__linkto: in case of an entity creation url, will allow to set some
specific relations between both entities;

Generally, an entity has two important methods that retrieve its absolute or
relative urls:

entity.rest_path() will return something like <type>/<eid> where
<type> corresponds to the entity type and <eid> the entity eid;

entity.absolute_url() will return the full url of the entity
http://<baseurl>/<type>/<eid>. In case you want to access a specific view
of the entity, just pass the vid='myviewid' argument. You can give
arbitrary arguments to this method that will be encoded as url parameters.

Passing the rql to the build_url method requires to have a proper RQL
expression. To do so, there is a convenience method, printable_rql(), that is
accessible in rset resulting from RQL queries. This allows to apply a view to the
same result set as the one currently process, simply using rql =
self.cw_rset.printable_rql().

There are several ways to get URL of the current view, the canonical one being to
use self._cw.relative_path(includeparams=True) which will return the path of
the current view relative to the base url of the site (otherwise use
self._cw.url(), including parameters or not according to value given as
includeparams).

You can also retrieve values given to individual parameters using self._cw.form, eg:

self._cw.form.get('vid', '') will return only the view id;

self._cw.form.get('rql', '') will return only the RQL;

self._cw.form.get('__redirectvid', '') will return the redirection
view if defined;

self._cw.form.get('__redirectpath', '') will return the redirection
path if defined.

This case often appears when you want to create a link to a startup view or a
controller. It the first case, you simply build you URL like this:

self._cw.build_url('view', vid='my_view_id')

The latter case appears when you want to call a controller directly without
having to define a form in your view. This can happen for instance when you
want to create a URL that will set a relation between 2 objects and do not need
any confirmation for that. The URL construction is done like this:

self._cw.build_url('my_controller_id', arg1=value1, arg2=value2, ...)

Any extra arguments passed to the build_url method will be available in the
controller as key, values pairs of the self._cw.forms dictionary. This is
especially useful when you want to define some kind of hidden attributes
but there is not form to put them into.

And, last but not least, a convenient way to get the root URL of the instance:

There are other ways to create a link to registered actions than using
build_url, mostly by accessing them via the registry vreg.

For instance, the action registry holds effectively all possible actions in a
given context: a specific action can be selected using the select_or_none()
method, or even using the possible_action() method which will return a list of
categorized actions. The url of the action is then available as
action.url(). For contextual components (e.g. boxes), you can even directly
get a link to the selected action(s) using the self.action_link(this_action)
method.

If the action corresponds to the creation of a new entity, there is an even
faster and elegant way to do it, using the schema of your cube:

Let's suppose you're working on a social network project where you have
to develop friend-of-a-frient (foaf) relationships between persons.
For that purpose, we use the cubicweb-person cube and create in our
scheme relations between persons like X in_contact_with Y:

We will also assume that a given Person corresponds to a unique CWUser through
the relation is_user.

Although it is not evident, we would like that any connected person can chose
to disconnect himself from another person at any time. For that, we will
create a table view that will display the list of connected users, with a
custom column giving the ability to "disconnect" with the person.

Before disconnecting with this particular person, we would like also to have
a confirmation form.

To display the list of connected persons to the current person, but also to add
custom columns that do not refer specifically to attributes of a given entity,
the best choice is to use EntityTableView (see here for more information):

By default, the column attribute contains a list of displayable attributes
of the entity. If one element of the list does not correspond to an attribute,
which is the case for 'remove' here, it has to have rendering function
defined in the dictionnary column_renderers.

However, when the column header refers to a related entity attribute, we can
easily use the rendering function RelatedEntityColRenderer, as it is the
case for the email and phone display.

As for concerns the 'remove' column, we render a clickable image in the
cell_remove method. Here we have chosen an icon from famfamsilk that
is putted in our data/ directory, but feel free to chose a predefined
icon in the cubicweb shared data directory.

The redirection URL associated to each image has to be a link to a specific
action allowing the user to remove the selected person from its contacts.
It is built using the self._cw.build_url() convenience function. The
redirection view, 'suppress_contact_view', will be defined later on. The
eid argument passed refers to the id of the contact person the user wants
to remove.

The above view has to be called with a given rset which corresponds to the
list of known contacts for the connected user. In our case, we have defined
a StartupView for the contact management, in which in the call function we
have added the following piece of code:

Inside the cell_call() method of this view, we will have to render a form
which aims at displaying both buttons (confirm deletion or cancel deletion).
This form will be described later on.

The Person contact to remove is retrieved easily thanks to cw_rset. The
Person corresponding to the connected user is here also retrieved thanks to the
is_user relation. To make both of them available in the form, we add them at
the instanciation of the form using the convenience function
add_hidden(key,val).

The deletion form as mentioned previously is only here to hold both buttons for
the deletion confirmation or the cancelling. Both buttons are declared thanks
to the form_buttons attribute of the form, which is instanciated from
forms.FieldsForm:

Specifying a given domid will ensure that your form will have a specific DOM identifier,the controller defined in the
action method will be called without any ambiguity. The form_renderer_id is
precised here so as to avoid additional display of informations which don't make sense here.

The custom controller is instanciated from the Controller class in
cubicweb.web.controller. The declaration of the controller should have the
same domid than the calling form, as mentioned previously. The related
actions are described in the publish() method of the controller:

Retrieving of the user action is performed by testing if the
'__action_<action>', where <action> refers to the cwaction in the
button declaration, is present in the form keys. In the case of a cancelling,
we simply redirect to the contact management view with a message specifying
that the deletion has been cancelled. In the case of a deletion confirmation,
both Person id's for the connected user and for the contact to remove are
retrieved from the form hidden arguments.

The deletion is performed using an RQL request on the relation
in_contact_with. We also redirect the view to the contact management view,
this time with another message confirming the deletion of the contact link.

We have been playing along with political data for a while, using CubicWeb
to store and query various sets of open data
(e.g. NosDeputes, data.gouv.fr),
and testing different visualization tools.
In particular, we have extended our prototype of News Analysis (see the
presentation we made last year at Euroscipy),
in order to use these political datasets as reference for the named
entities extraction part.
Last week's conference "The Law Factory" at Sciences Po was a really nice opportunity to meet people with similar interests in
opendata for political sciences, and to find out which questions we should be asking our data !
Check out the talk of our presentation and a few screencasts (no sound) :

Among the different things that we have seen, we want to emphasize on:

Law is Code (http://gitorious.org/law-is-code/) - This project by the team of Regards Citoyens, aims at analysing the laws and amendments, by extracting information from the French National Assembly website, and by pushing the contributions of the members of parlement to a given law in a git repository. If we can find the time, we'll turn that into a mercurial repository and integrate it into our above application using cubicweb-vcsfile.

Both national websites (Assemblée Nationale, Sénat), do not allow (yet...) to get data any other way than parsing the sites. However, it seems that the people involved are aware of the issues of opendata, and this may changed in the next months. In particular, the Senat use two databases (Basile and Ameli), and opening them to the public could be really interesting

Check out, ITCparliement which gives tools to analyse and share data from many different parliments.

Saturday, at La Cantine Numérique, the discussions focused on the
possibilities to share tools, and the possible collaborations. I think that this is the crucial point: How people can share tools and use them in a efficient way, without being an IT expert ?

In this way, we have are thinking about some evolutions of CubicWeb that can fullfill (part) of these requirements:

easier installation, especially on Windows, and easier Postgresql configuration. This could perhaps be made by allowing some graphical interface for creating/managing the instances and the databases.

a graphical tool for schema construction. Even if the construction of a data model in CubicWeb is quite simple, and rely on the straightforward Python syntax, it could be interesting to expose a graphical tool for adding/removing/modifying entities from the schema, as well as some attributes or relations.

easier ways to import data. This point is not trivial, and we don't want to develop a specific language for defining import rules, that could be used for 80% of the cases, but will be painful to extend to the 20% exotic cases. We would rather develop some helpers to ease the building of some import scripts in Python, and to upload some CubicWeb instances already filled with open databases.

This demo site allows you to deeply explore the data, with different visualisations, and complex queries. Again, comments are welcome, especially if you want to retrieve some information but you don't know how to! This demo site will probably evolve in the next weeks, and we will use it to test different cubes that we have been building.

PS: We are sorry we cannot open the propotype of news aggregator for now, as there are still
licensing issues concerning the reusability of the different news sources that we get articles from.

Add ZMQ server, based on the cutting edge ZMQ socket
library. This allows to access distant instances, in a similar way as Pyro.

Publish/subscribe mechanism using ZMQ for communication among cubicweb
instances. The new zmq-address-sub and zmq-address-pub configuration variables
define where this communication occurs. As of this release this mechanism is
used for entity cache invalidation.

Improved WSGI support. While there are still some caveats, most of the code
which was twisted only is now generic and allows related functionalities to work
with a WSGI front-end.

Full undo/transaction support: undo of modifications has finally been
implemented, and the configuration simplified (basically you activate it or not
on an instance basis).

Controlling HTTP status code returns is now much easier:

WebRequest now has a status_out attribute to control the response status ;

The base registry implementation has been moved to a new
logilab.common.registry module (see #1916014). This includes code from :

cubicweb.vreg (everything that was in there)

cw.appobject (base selectors and all).

In the process, some renaming was done:

the top level registry is now RegistryStore (was VRegistry), but that
should not impact CubicWeb client code;

former selectors functions are now known as "predicate", though you still use
predicates to build an object'selector;

for consistency, the objectify_selector decorator has hence been renamed to
objectify_predicate;

on the CubicWeb side, the selectors module has been renamed to
predicates.

Debugging refactoring dropped the need for the lltrace decorator. There
should be full backward compat with proper deprecation warnings. Notice the
yes predicate and objectify_predicate decorator, as well as the
traced_selection function should now be imported from the
logilab.common.registry module.

All login forms are now submitted to <app_root>/login. Redirection to requested
page is now handled by the login controller (it was previously handled by the
session manager).

Publisher.publish has been renamed to Publisher.handle_request. This
method now contains a generic version of the logic previously handled by
Twisted. Controller.publish is not affected.

New 'zmqrql' source type, similar to 'pyrorql' but using ømq instead of Pyro.

A new registry called 'services' has appeared, where you can register
server-side cubicweb.server.Service child classes. Their call method can be
invoked from a web-side AppObject instance using the new self._cw.call_service
method or a server-side one using self.session.call_service. This is a new
way to call server-side methods, much cleaner than monkey patching the
Repository class, which becomes a deprecated way to perform similar tasks.

a new ajaxfunction registry now hosts all remote functions (i.e. functions
callable through the asyncRemoteExec JS api). A convenience ajaxfunc
decorator will let you expose your python functions easily without all the
appobject standard boilerplate. Backwards compatibility is preserved.

the 'json' controller is now deprecated in favor of the 'ajax' one.

WebRequest.build_url can now take a __secure__ argument. When True, cubicweb
tries to generate an https url.

This is a fairly technical post talking about the structural changes I would like to see in CubicWeb's near future. Let's call that CubicWeb 4.0! It also drafts ideas on how to go from here to there. Draft, really. But that will eventually turn into a nice roadmap hopefully.

Some parts of cubicweb are sometimes too hairy for different reasons (some good,
most bad). This participates in the difficulty to get started quickly. The goal of CubicWeb 4.0 should be to make things simpler :

Fix some bad old design.

Stop reinventing the wheel and use widely used libraries in the Python Web
World. This extends to benefitting from state of the art libraries to build nice
and flexible UI such as Bootstrap, on top of the JQuery foundations (which could
become as prominent as the Python standard library in CubicWeb, the development team should get
ready for it).

If there is a best way to do something, just do it and refrain from providing configurability and options.

Then we should probably move the default UI into some cubes (i.e. the content of
cw.web.views and cw.web.data). Besides making the move to Bootstrap easier, this
should also have the benefit of making clearer that this is the default way to
build an (automatic) UI in CubicWeb, but one may use other, more usual,
strategies (such as using a template language).

This is a first draft that will need some adjustements. Some of the listed
modules should be split (e.g. actions, boxes,) and their content moved to
different core cubes. Also some modules in cubicweb.web packages may be moved
to the relevant cube.

Each cube should provide an interface so that one could replace it with another
one. For instance, move from the default coreviews and corelayout cube to
bootstrap based ones. This should allow a nice migration path from the current UI
to a Bootstrap based UI. Bootstrap should probably be introduced bottom-up: start
using it for tables, lists, etc. then go up until the layout defined in the main
template. The Orbui experience should greatly help us by pointing at hot spots
that will have to be tackled, as well as by providing a nice code base from which
we should start.

Regarding current implementation, we should take care that Contextual components
are a powerful way to build "pluggable" UI, but we should probably add an
intermediate layer that would make more obvious / explicit:

what the available components are

what the available slots are

which component should go in which slot when possible

Also at some point, we should take care to separate view's logic from HTML
generation: our experience with client works shows that a common need is to use
the logic but produce a different HTML. Though we should wait for more use of
Bootstrap and related HTML simplification to see if the CSS power doesn't
somewhat fulfill that need.

The Werkzeug framework sounds like a good candidate to use as a library that
would replace/simplify the request, httpcache, session, authentication (maybe
more) modules as well as the wsgi package. It sounds like the right candidate for
the following reasons:

I've to say I'm somewhat impatient to find some time to give a try to
werkzeug.routing. IMO, used well, that may introduce a structural change that
would make things much easier to understand and configure properly.

The current looping task / repo thread mecanism is used for various sort of
things and has several problems:

tasks don't behave similarly in a multi-instances configuration (some should
be executed in a single instance, some in a subset); the tasks system has been
originally written in a single instance context; as of today this is (sometimes)
handled using configuration options (that will have to be properly set in each
instance configuration file);

tasks is a repository only api but we also need web-side tasks;

there is probably some abuse of the system that may lead to unnecessary
resources usage.

Analyzing a sample http://www.logilab.org/ instance, below are the running looping
task by categories. Tasks that have to run on each web instance:

clean_sessions, automatically closes unused repository sessions. Notice
cw.etwist.server also records a twisted task to clean web sessions. Some
changes are imminent on this, they will be addressed in the upcoming refactoring session (that will
become more and more necessary to move on several points listed here).

regular_preview_dir_cleanup (preview cube), cleanup files in the
preview filesystem directory. Could be executed by a (some of the) web
instance(s) provided that the preview directory is shared.

cleanup_plans (narval cube), delete Plan entities instance older than an
amount of time specified in the configuration. If 'plan-cleanup-delay' is set
to an empty value, the task isn't started.

refresh_local_repo_caches (vcsfile cube), pull or clone vcs repositories
cache if the Repository entity ask to import_revision_content (hence web
instance should have up to date cache to display files content) or if
'repository-import' configuration option is set to 'yes'; import vcs repository
content as entities if 'repository-import' configuration option and it is
coming from the system source.

Some deeper thinking is needed here so we can improve things. That includes
thinking about:

Remember the more cw independent the tasks are, the better it is. Though we still want an
'all-integrated' approach, e.g. not relying on external configuration of Unix
specific tools such as CRON. Also we should see if a hard-dependency on Celery or
a similar tool could be avoided, and if not if it should be considered as a
problem (for devops).

First, we should drop the different behaviour according to presence of a '.hg' in
cubicweb's directory. It currently changes the location where cubicweb external
resources (js, css, images, gettext catalogs) are searched for. Speaking of
implementation:

shared_dir returns the cubicweb.web package path instead of the path to the
shared cube,

i18n_lib_dir returns the cubicweb/i18n directory path instead of the path to the
shared/i18n cube,

Moving web related objects as proposed in the Bootstrap section would resolve the
problem for the content web/data and most of i18n (though some messages
will remain and additional efforts will be needed here). By going further this
way, we may also clean up some schema code by moving cubicweb/schemas and
cubicweb/misc/migration to a cube (though only a small benefit is to be expected
here).

We should also have fewer environment variables... Let's see what we have today:

CW_INSTANCES_DIR, where to look for instances configuration

CW_INSTANCES_DATA_DIR, where to look for instances persistent data files

CW_RUNTIME_DIR, where to look for instances run-time data files

CW_MODE, set to 'system' or 'user' will predefine above environment variables differently

CW_CUBES_PATH, additional directories where to look for cubes

CW_CUBES_DIR, location of the system 'cubes' directory

CW_INSTALL_PREFIX, installation prefix, from which we can compute path to 'etc', 'var', 'share', etc.

I would propose the following changes:

CW_INSTANCES_DIR is turned into CW_INSTANCES_PATH, and defaults to
~/etc/cubicweb.d if it exists and /etc/cubicweb.d (on Unix platforms) otherwise;

Remember the following goals: migration of legacy code should go smoothly. In a perfect world every application should be able to run with CubicWeb 4.0 until the backwards compatibility code is removed (and CubicWeb 4.0 will probably be released as 4.0 at that time).

Please provide feedbacks:

do you think choices proposed above are good/bad choices? Why?

do you know some additional libraries that should be investigated?

do you have other changes in mind that could/should be done in cw 4.0?