Thoughts on CubicWeb 4.0 (CubicWeb's Forge) RSS Feedhttp://www.cubicweb.org/blogentry/2356431
http://www.cubicweb.org/blogentry/2356431Thoughts on CubicWeb 4.0http://www.cubicweb.org/blogentry/2356431
<p>This is a fairly technical post talking about the structural changes I would like to see in CubicWeb's near future. Let's call that CubicWeb 4.0! It also drafts ideas on how to go from here to there. Draft, really. But that will eventually turn into a nice roadmap hopefully.</p>
<div class="section" id="the-great-simplification">
<h3><a>The great simplification</a></h3>
<p>Some parts of cubicweb are sometimes too hairy for different reasons (some good,
most bad). This participates in the difficulty to get started quickly. The goal of CubicWeb 4.0 should be to make things simpler :</p>
<ul class="simple">
<li>Fix some bad old design.</li>
<li>Stop reinventing the wheel and use widely used libraries in the Python Web
World. This extends to benefitting from state of the art libraries to build nice
and flexible UI such as Bootstrap, on top of the JQuery foundations (which could
become as prominent as the Python standard library in CubicWeb, the development team should get
ready for it).</li>
<li>If there is a best way to do something, just do it and refrain from providing configurability and options.</li>
</ul>
</div>
<div class="section" id="on-the-road-to-bootstrap">
<h3><a>On the road to Bootstrap</a></h3>
<p>First, a few simple things could be done to simplify the UI code:</p>
<ul class="simple">
<li>drop xhtml support: always return text/html content type, stop bothering
with this stillborn stuff and use html5</li>
<li>move away everything that should not be in the framework: calendar?, embedding,
igeocodable, isioc, massmailing, owl?, rdf?, timeline, timetable?, treeview?,
vcard, wdoc?, xbel, xmlrss?</li>
</ul>
<p>Then we should probably move the default UI into some cubes (i.e. the content of
cw.web.views and cw.web.data). Besides making the move to Bootstrap easier, this
should also have the benefit of making clearer that this is the default way to
build an (automatic) UI in CubicWeb, but one may use other, more usual,
strategies (such as using a template language).</p>
<p>At a first glance, we should start with the following core cubes:</p>
<ul class="simple">
<li><cite>corelayout</cite>, the default interface layout and generic components. Modules to
backport there: application (not an appobject yet), basetemplates, error,
boxes, basecomponents, facets, ibreadcrumbs, navigation, undohistory.</li>
<li><cite>coreviews</cite>, the default generic views and forms. Modules to backport there:
actions, ajaxedit, baseviews, autoform, dotgraphview, editcontroller,
editforms, editviews, forms, formrenderers, primary, json, pyviews, tableview,
reledit, tabs.</li>
<li><cite>corebackoffice</cite>, the concrete views for the default back-office that let you
handle users, sources, debugging, etc. through the web. Modules to backport
here: cwuser, debug, bookmark, cwproperties, cwsources, emailaddress,
management, schema, startup, workflow.</li>
<li><cite>coreservices</cite>, the various services, not directly related to display of
something. Modules to backport here: ajaxcontroller, apacherewrite,
authentication, basecontrollers, csvexport, idownloadable, magicsearch,
sessions, sparql, sessions, staticcontrollers, urlpublishing, urlrewrite.</li>
</ul>
<p>This is a first draft that will need some adjustements. Some of the listed
modules should be split (e.g. actions, boxes,) and their content moved to
different core cubes. Also some modules in <cite>cubicweb.web</cite> packages may be moved
to the relevant cube.</p>
<p>Each cube should provide an interface so that one could replace it with another
one. For instance, move from the default <cite>coreviews</cite> and <cite>corelayout</cite> cube to
bootstrap based ones. This should allow a nice migration path from the current UI
to a Bootstrap based UI. Bootstrap should probably be introduced bottom-up: start
using it for tables, lists, etc. then go up until the layout defined in the main
template. The <a class="reference" href="http://orbui.com/">Orbui</a> experience should greatly help us by pointing at hot spots
that will have to be tackled, as well as by providing a nice code base from which
we should start.</p>
<p>Regarding current implementation, we should take care that Contextual components
are a powerful way to build "pluggable" UI, but we should probably add an
intermediate layer that would make more obvious / explicit:</p>
<ul class="simple">
<li>what the available components are</li>
<li>what the available slots are</li>
<li>which component should go in which slot when possible</li>
</ul>
<p>Also at some point, we should take care to separate view's logic from HTML
generation: our experience with client works shows that a common need is to use
the logic but produce a different HTML. Though we should wait for more use of
Bootstrap and related HTML simplification to see if the CSS power doesn't
somewhat fulfill that need.</p>
</div>
<div class="section" id="on-the-road-to-wsgi-and-related">
<h3><a>On the road to WSGI and related</a></h3>
<p>For the record regarding WSGI:</p>
<ul class="simple">
<li><a class="reference" href="http://mongrel2.org/">http://mongrel2.org/</a></li>
<li><a class="reference" href="http://projects.unbit.it/uwsgi/">http://projects.unbit.it/uwsgi/</a></li>
<li><a class="reference" href="http://wiki.nginx.org/NgxWSGIModule">http://wiki.nginx.org/NgxWSGIModule</a></li>
</ul>
<p>At some point, the whole <cite>cw.etwist</cite> package should be dropped in favor of <cite>cw.wsgi</cite>.</p>
<div class="section" id="werkzeug">
<h4><a>Werkzeug</a></h4>
<p><a class="reference" href="http://werkzeug.pocoo.org/">http://werkzeug.pocoo.org/</a></p>
<p>The Werkzeug framework sounds like a good candidate to use as a library that
would replace/simplify the request, httpcache, session, authentication (maybe
more) modules as well as the wsgi package. It sounds like the right candidate for
the following reasons:</p>
<ul class="simple">
<li>it's a non-intrusive WSGI library, not a web framework,</li>
<li>it's used by fairly popular frameworks (<a class="reference" href="http://www.openerp.com/community">openerp</a>, <a class="reference" href="http://flask.pocoo.org/">flask</a>),</li>
<li>I'm +1 on A. Ronacher idea of a common request implementation for python web
frameworks, let's experiment and promote this idea.</li>
</ul>
</div>
<div class="section" id="route-url-handling">
<h4><a>Route (URL handling)</a></h4>
<p>Investigate URL routing modules as a replacement for urlpublishing, urlrewrite and
apacherewrite.</p>
<p>Candidates are :</p>
<ul class="simple">
<li><cite>werkzeug.routing</cite>, which has noticable pros: celebrated by A. Martelli,
provided by an already-in-wishlist library, URL routing <em>AND</em> generation.</li>
<li><cite>routes</cite> (<a class="reference" href="http://routes.readthedocs.org/en/latest/">http://routes.readthedocs.org/en/latest/</a>), pros: used by pylons,
features conditional matching based on domain, cookies, HTTP method... and
sub-domain support.</li>
<li><cite>selector</cite> (<a class="reference" href="http://lukearno.com/projects/selector/">http://lukearno.com/projects/selector/</a>)</li>
</ul>
<p>I've to say I'm somewhat impatient to find some time to give a try to
<cite>werkzeug.routing</cite>. IMO, used well, that may introduce a structural change that
would make things much easier to understand and configure properly.</p>
</div>
</div>
<div class="section" id="on-the-road-to-proper-tasks-management">
<h3><a>On the road to proper tasks management</a></h3>
<p>The current looping task / repo thread mecanism is used for various sort of
things and has several problems:</p>
<ul class="simple">
<li>tasks don't behave similarly in a multi-instances configuration (some should
be executed in a single instance, some in a subset); the tasks system has been
originally written in a single instance context; as of today this is (sometimes)
handled using configuration options (that will have to be properly set in each
instance configuration file);</li>
<li>tasks is a repository only api but we also need web-side tasks;</li>
<li>there is probably some abuse of the system that may lead to unnecessary
resources usage.</li>
</ul>
<p>Analyzing a sample <a class="reference" href="http://www.logilab.org/">http://www.logilab.org/</a> instance, below are the running looping
task by categories. Tasks that have to run on each web instance:</p>
<ul class="simple">
<li><cite>clean_sessions</cite>, automatically closes unused repository sessions. Notice
<cite>cw.etwist.server</cite> also records a twisted task to clean web sessions. Some
changes are imminent on this, they will be addressed in the upcoming refactoring session (that will
become more and more necessary to move on several points listed here).</li>
<li><cite>regular_preview_dir_cleanup</cite> (<cite>preview</cite> cube), cleanup files in the
preview filesystem directory. Could be executed by a (some of the) web
instance(s) provided that the preview directory is shared.</li>
</ul>
<p>Tasks that should run on a single instance:</p>
<ul class="simple">
<li><cite>update_feeds</cite>, update copy based sources (e.g. datafeed, ldapfeed). Controlled
by 'synchronize' source configuration (persistent source attribute that may be
overridden by instance using <cite>CWSourceHostConfig</cite> entities)</li>
<li><cite>expire_dataimports</cite>, delete <cite>CWDataImport</cite> entities older than an amount of
time specified in the 'logs-lifetime' configuration option. <strong>Not controlled
yet</strong>.</li>
<li><cite>cleanup_auth_cookies</cite> (<em>rememberme</em> cube), delete <cite>CWAuthCookie</cite> entities
whose life-time is exhausted. <strong>Not controlled yet</strong>.</li>
<li><cite>cleaning_revocation_key</cite> (<em>forgotpwd</em> cube), delete <cite>Fpasswd</cite> entities with
past <cite>revocation_date</cite>. <strong>Not controlled yet</strong>.</li>
<li><cite>cleanup_plans</cite> (<em>narval</em> cube), delete <cite>Plan</cite> entities instance older than an
amount of time specified in the configuration. If 'plan-cleanup-delay' is set
to an empty value, the task isn't started.</li>
<li><cite>refresh_local_repo_caches</cite> (<em>vcsfile</em> cube), pull or clone vcs repositories
cache if the <cite>Repository</cite> entity ask to import_revision_content (hence web
instance should have up to date cache to display files content) or if
'repository-import' configuration option is set to 'yes'; import vcs repository
content as entities if 'repository-import' configuration option and it is
coming from the system source.</li>
</ul>
<p>Some deeper thinking is needed here so we can improve things. That includes
thinking about:</p>
<ul class="simple">
<li>the inter-instances messages bus based on zmq and introduced in 3.15,</li>
<li>the Celery project (<a class="reference" href="http://celeryproject.org/">http://celeryproject.org/</a>), an asynchronous task queue,
widely used and written in Python,</li>
</ul>
<p>Remember the more cw independent the tasks are, the better it is. Though we still want an
'all-integrated' approach, e.g. not relying on external configuration of Unix
specific tools such as CRON. Also we should see if a hard-dependency on Celery or
a similar tool could be avoided, and if not if it should be considered as a
problem (for devops).</p>
</div>
<div class="section" id="on-the-road-to-an-easier-configuration">
<h3><a>On the road to an easier configuration</a></h3>
<p>First, we should drop the different behaviour according to presence of a '.hg' in
cubicweb's directory. It currently changes the location where cubicweb external
resources (js, css, images, gettext catalogs) are searched for. Speaking of
implementation:</p>
<ul class="simple">
<li><cite>shared_dir</cite> returns the <cite>cubicweb.web</cite> package path instead of the path to the
<cite>shared</cite> cube,</li>
<li><cite>i18n_lib_dir</cite> returns the <cite>cubicweb/i18n</cite> directory path instead of the path to the
<cite>shared/i18n</cite> cube,</li>
<li><cite>migration_scripts_dir</cite> returns the <cite>cubicweb/misc/migration</cite> directory path
instead of <cite>share/cubicweb/migration</cite>.</li>
</ul>
<p>Moving web related objects as proposed in the Bootstrap section would resolve the
problem for the content <cite>web/data</cite> and most of <cite>i18n</cite> (though some messages
will remain and additional efforts will be needed here). By going further this
way, we may also clean up some schema code by moving <cite>cubicweb/schemas</cite> and
<cite>cubicweb/misc/migration</cite> to a cube (though only a small benefit is to be expected
here).</p>
<p>We should also have fewer environment variables... Let's see what we have today:</p>
<ul class="simple">
<li>CW_INSTANCES_DIR, where to look for instances configuration</li>
<li>CW_INSTANCES_DATA_DIR, where to look for instances persistent data files</li>
<li>CW_RUNTIME_DIR, where to look for instances run-time data files</li>
<li>CW_MODE, set to 'system' or 'user' will predefine above environment variables differently</li>
<li>CW_CUBES_PATH, additional directories where to look for cubes</li>
<li>CW_CUBES_DIR, location of the system 'cubes' directory</li>
<li>CW_INSTALL_PREFIX, installation prefix, from which we can compute path to 'etc', 'var', 'share', etc.</li>
</ul>
<p>I would propose the following changes:</p>
<ul class="simple">
<li>CW_INSTANCES_DIR is turned into CW_INSTANCES_PATH, and defaults to
~/etc/cubicweb.d if it exists and /etc/cubicweb.d (on Unix platforms) otherwise;</li>
<li>CW_INSTANCES_DATA_DIR and CW_RUNTIME_DIR are replaced by configuration file
options, with smart values generated at instance creation time;</li>
<li>the above change should make CW_MODE useless;</li>
<li>CW_CUBES_DIR is to be dropped, CW_CUBES_PATH should be enough;</li>
<li>regarding CW_INSTALL_PREFIX, I'm lacking experience with non-hg-or-debian
installations and don't know if this can be avoided or not.</li>
</ul>
<p>Last but not least (for the moment), the 'web' / 'repo' / 'all-in-one'
configurations, and the fact that the associated configuration file changes
stinks. Ideas to stop doing this:</p>
<ul class="simple">
<li>one configuration file per instance, with all options provided by installed
parts of the framework used by the application.</li>
<li>activate 'services' (or not): web server, repository, zmq server, pyro
server. Default services to be started are stored in the configuration file.</li>
</ul>
<p>There is probably more that can be done here (less configuration options?), but
that would already be a great step forward.</p>
</div>
<div class="section" id="on-the-road-to">
<h3><a>On the road to...</a></h3>
<p>The following projects should be investigated to see if we could benefit from them:</p>
<ul class="simple">
<li>Paste (<a class="reference" href="http://pythonpaste.org/">http://pythonpaste.org/</a>, Configuration and all)</li>
<li>Beaker (<a class="reference" href="http://beaker.readthedocs.org/en/latest/index.html">http://beaker.readthedocs.org/en/latest/index.html</a>, More on Session / cache handling than what will be found in Werkzeug?)</li>
<li>Pyramid's debug toolbar
(<a class="reference" href="http://docs.pylonsproject.org/projects/pyramid_debugtoolbar/en/latest/">http://docs.pylonsproject.org/projects/pyramid_debugtoolbar/en/latest/</a>). See
also <a class="reference" href="http://firelogger.binaryage.com/#python">http://firelogger.binaryage.com/#python</a>. Notice Werkzeug comes with an
integrated js console as well.</li>
<li>zc.buildout (Deployment)</li>
</ul>
</div>
<div class="section" id="discussion">
<h3><a>Discussion</a></h3>
<p>Remember the following goals: migration of legacy code should go smoothly. In a perfect world every application should be able to run with CubicWeb 4.0 until the backwards compatibility code is removed (and CubicWeb 4.0 will probably be released as 4.0 at that time).</p>
<p>Please provide feedbacks:</p>
<ul class="simple">
<li>do you think choices proposed above are good/bad choices? Why?</li>
<li>do you know some additional libraries that should be investigated?</li>
<li>do you have other changes in mind that could/should be done in cw 4.0?</li>
</ul>
</div>2012-05-21T15:04-01:00Sylvain Thenault