Install the bookmarklet from the cms page.
You only have to do this once.
[Note: the page will prompt for an ASF committer login.
If you don't have an ASF login, use the name anonymous and leave the password blank.]

Navigate to the page you wish to edit (on the live site, not in the cms).

Click the bookmarklet.
There will be a short pause while the CMS system is initialised for you.

Click on Edit (to skip this step hack the bookmarklet to add an 'action=edit'
param to the bookmarklet's query string)

The page editor should then be displayed.

Click Submit to save your edit to the workarea

Click Commit to save the updated file to SVN and trigger a staged build. (to
skip this step click on the "Quick Commit" checkbox in the Edit form).

The results should appear shortly on the
staging site.
(You may have to force the page to refresh in order to see the updated content)

Once you are happy with the updated page, click on Publish Site to deploy.

This section describes the current conditions of the ASF website publishing
system and its deficiencies. It also discusses options the Infrastructure
Team considered in addressing these problems with an eye towards our future
needs.

The existing publishing system at Apache has evolved from the case where
the organization's hardware consisted of a single machine. Websites have
always been limited to using a combination of static content and cgi
scripts in order to not overtax a machine simultaneously responsible for
delivering (circa 2000-2003) over 1M hits and serving committers as our
CVS master host.

The organization has since grown to encompass about three full cabinets
worth of hardware and a pair of machines dedicated to serving mainly
www.apache.org and project websites. The machines, eos and aurora, are
some of our most expensive equipment and are located in two different
datacenters to provide redundancy and failover capabilities. The
current traffic load is roughly 20M hits a day for those machines.

However the publishing system involves running hourly find jobs on
people.apache.org and pushing that content out to eos and aurora with
rsync. With roughly 300GB worth of content to scan it is no longer
possible to do this with a single find job, so we now run them in
parallel: one find job per website. This puts an incredible load on
people.apache.org's ZFS array as there are roughly 100 sites to scan.
As good as ZFS is, the filesystem will not be able to keep up with this
load as the organization continues to promote new top-level projects.

Several years ago during the wiki craze at Apache, the Infrastructure
Team was tasked with setting up a Confluence installation for our projects
to use. Apache member Pier Fumigalli developed and offered the autoexport
plugin as a way to provide Confluence-backed project websites, which was
quickly adopted by several projects. The process involves rsyncing the
autoexported pages from the machine hosting Confluence over to
people.apache.org, where the standard publication system described above
would push those pages out to eos and aurora to be served live.

Over time we began to experience chronic problems with this particular
setup. First off, different projects often wanted to use different and
occasionally conflicting plugins for their sites. Secondly, plugins would
often break during Confluence upgrades. The biggest offender was in fact
the autoexport plugin and its reliance on Confluence internals. Virtually
every upgrade was guaranteed to break it, and after a while Pier and other
java developers at Apache lost interest in supporting it. We tried
around for people to support it, and were even willing to compensate folks
for their time, but there were no takers. Confluence backed websites were
fully dependent on the autoexport plugin to have any chance of working,
and the organization was caught between a rock and a hard place in deciding
when it was possible to upgrade Confluence.

The other main problem with this configuration is that it makes url
deletions a nightmare. The autoexport plugin doesn't support url
deletions, and that is carried through to the live sites via rsync.

Currently Apache's Confluence installation is hosted on thor, which is
a Sun T5220 Sparc. It's by far our beefiest machine with 8 cores and 8
threads per core, and yet our Confluence service is dog slow. Our
installation is simply out-scaling the software, and to keep it performing
acceptably will require even more significant equipment investments going
forward.

Anakia was a great tool 10
years ago. It is a competing technology to XSLT for dealing with raw
XML content. Many projects still rely on anakia to generate their webpages
but most of the web has moved on. It's time the ASF caught up with the
times.

While Apache is still primarily a place for software developers to
collaborate, some of the people who provide support for our press
and legal efforts need to be able to contribute to www.apache.org.
Expecting them to deal with tools like Anakia to roll their own builds
of XML-based content is a non-starter.

Obviously with hourly crons pushing content out to our webservers there
will be delays as long as 2 hours between the time someone commits a change
and logs onto people.apache.org to svn up the website, and the time it
actually gets synced to the live site. That has been the status quo at
Apache for several years and it simply isn't good enough any longer.

While there is a zoo of available Open Source CMS's to choose from, only
a handful of them actually support exports of static content. Even fewer
of them offer support for staging. Apache's project websites aren't
like Twitter, they don't have rapidly changing content that needs to be
updated and delivered in real-time. The sites are meant to provide
stable resources for the public to gain necessary information about the
software we develop.

While not an open source offering, Roy T. Fielding pursued a CQ5
installation for the organization's use. Roy demoed the featureset
at ApacheCon US 2009 and the members of the Infrastructure Team who saw
it were thoroughly impressed. It seemingly met all of our core
requirements.

However conditions changed in 2010 for Roy, and he simply lost any
free time he could have put to this effort. We had to eliminate this
as an option going forward, but thank Roy and Day for their time and
consideration.

Lenya had most of the features we were looking
for, but ultimately was rejected as being insufficiently flexible for use
as a foundation-wide CMS. Allowing projects the flexibility of deploying
per-project site build technologies which were only limited by the
software installation on the build host was the Infrastructure Team's
preferred strategy.

In September 2010 Philip Gollucci, VP Infrastructure, gave the green light
to a custom-built CMS for the ASF, to be developed primarily by one of the
contracted System Administrators. After collecting feedback on the goals
and requirements of several interested parties, the development work
was undertaken with a goal of completing the work in 60 days or less-
just in time for ApacheCon 2010 NA. Fortunately the goals were kept simple
enough that the actual development time only spanned about 30 days.

The software follows the Unix development mantra of separate executables
for independent activities. The key separation was to ensure content
presentation was kept independent from content editing, using the
addressability of the web to sew things together. The main advantage of
this approach is that it imposes relatively few constraints on the content
generation software- different projects may adopt different tools to
build their websites, without any of the conflicts inherent in
single-process plugin architectures like Confluence.

While Dotiac::DTL, a perl port of django's templating library, was chosen
for use with www.apache.org, it is not a requirement that projects adopt
it. Any templating system that runs on FreeBSD may be used, provided
the necessary (perl) glue code is written that makes the system compatible
with the CMS's build system.

The CMS relies on buildbot to provide
automated builds and checkins of a project's staging site. Such builds are
triggered instantly on commits to the project's site source material and
are an essential component of the system.

The build system executes builds in parallel, so it is quite fast, even for
a full site build.

Markdown was chosen as the
format for the www.apache.org source content. Editing the source in the
CMS's webgui relies on the wmd-editor to
provide a WYSIWYM look and feel to the CMS.

Although it is strongly recommended that projects migrating to the CMS
adopt markdown, it is not a hard requirement. In fact the
codemirror is also provided as an option for those
who prefer to store their source content in raw html.

The CMS's overall design was influenced heavily by
django's architecture. From the build
system to the preferred template system to the webgui, the influences are
clear and obvious to anyone familiar with django.

Instead of developing versioning support and a notification scheme into a
database driven CMS, Apache's subversion
infrastructure was chosen as the central data
store for everything. The fact that the web interface to the CMS
interacts with the subversion repository in a LAN environment,
combined with the lightning-fast SSDs that serve as l2arc cache for the
underlying FreeBSD ZFS filesystem, eliminates virtually all subversion
network/disk latency. Subversion continues to scale past 1M commits to
deliver high performance to Apache developers, as well as to our internal
programs that rely on it.

The mod_perl based webgui is under 3500 LOC
and takes full advantage of the httpd
module API. Being an in-process application it is respectably fast
and will scale well even on the limited hardware (a FreeBSD jail) that
it runs on.

The application embraces the REST architectural style while making
appropriate use of cookies solely to enhance the user experience.
It is also LDAP enabled, not another auth silo to deal with, so your
svn committer credentials will instantly grant access to the site.

It was also designed for humans already familiar with the featureset
of the svn command-line tool, taking cues from the Emacs svn.el module.
However it is accessible even to those without any familiarity with
svn- a simple javascript
bookmarklet allows users to go
from a live webpage to a WYSIWYM editor session in 2 clicks. Submitting,
committing, and publishing those changes is just as simple and
straightforward. You may access the CMS anonymously
if you are not currently an Apache committer.

Because the webgui revolves around providing users with a temporary
server-side working copy, the urls it generates are not meant to be
bookmarked, and are forbidden from being shared with others. The fulcrum
for sharing changes is the staging site, and the "commits are easy and
cheap" concept is built into the webgui.

However the url for publishing a website may be considered appropriate
for writing a basic web service client app. Since the site is based in
subversion developers may check-out the site and commit directly from
their workstations instead of through the webgui, so it may be convenient
for project members to have a simple site publication script. This choice
is entirely up to each project, and a reference implementation is available
at http://s.apache.org/cms-cli. Virtually every resource on the site may
be directed to be served as application/json simply by adding as_json=1
to the query string, or by setting application/json as being preferable to
text/html in the "Accept" request header.

In order to scale effectively to handling multi-gigabyte size websites, the
webgui relies on zfs clones to create per-user working copies. The alternative
algorithm would be to physically copy (with say rsync or cp -R) working copy
trees, but such algorithms are O(N) whereas a zfs clone (essentially a copy-on-write
version of the original) is O(1).

Svnpubsub
was developed by Paul Querna to provide an infrastructure for
distributing change notifications to our frontline webservers (eos and
aurora). This system is used by the CMS to convert site publication
requests into live publications, and will someday eventually supplant
the existing find + rsync architecture for site publication. It
is a key component of Apache's infrastructure and will continue to be
promoted going forward, even for those projects who elect not to use
the CMS.

Despite the above remarks, there is still room for supporting the
generation of "dynamic" content, in the same fashion that Planet
Apache works. Namely buildbot
may be setup to run periodic builds of select urls that have dynamic
content, and to subsequently publish the results of those builds. While
it is possible to run these jobs more frequently than once an hour,
it is not recommended due to the ensuing email notification traffic
generated thereby.

Since the CMS relies on separate sections of svn for original content
and staging versus publication, it is possible to configure more relaxed
ACLs for content authors versus those capable of publication. The
Infrastructure Team recommends that the content on www.apache.org be
editable by the full committership, while publication remains restricted
to members, committers with apsite karma, and members of the Infrastructure
Team.

The source content MUST have a unique file extension for each
generated file. I.e. you cannot generate foo.pdf and foo.html from
the same source file living in the same directory. You must disambiguate
the paths to these resources using copies or svn externals (symlinks are
not supported, sorry).

There is a further restriction in that the webgui and build system treat
foo.page/ directories as attachment directories. This convention
prevents any files contained therein to be built, but may be treated as
content components (eg html snippets and images) for an individual webpage.

Moreover the source files MUST be utf8- no exceptions.

Content source files with .mdtext or .md extensions are typically expected to
contain optional RFC-compliant (mail or http) headers at the top of the file, or YAML
headers as is customary in comparable, modern static site generation tools.

The build system is under 2000 LOC and relies on lib/path.pm to provide a specially formatted
@patterns array to give the build system hints on which view to run for
a given source file. The patterns are checked in order, and if none of
the patterns match, the source file will simply be copied over to the build
tree. Each element of the @patterns array is an arrayref which consists of
3 items: the pattern to test, the name of the view function to call, and a
hashref of named parameters to pass (by value) to the view function. The
patterns are tested against files based on their location rooted within the
content/ subdirectory.

lib/path.pm may also provide a hash %dependencies mapping paths to array
refs. The keys lists names of files which will also be rebuilt whenever a
file matching a value has changed.
(This is typically used for sitemaps.)
The filenames in the values and also
listed in the keys are rooted in the content/ subdirectory. The
dependency calculation is transitive.

The build system also requires the view functions in lib/view.pm to
return 2 values, the first being the generated content, and the second
being the new file extension.

The build system will always take the local path to trunk/ as
the current working directory for the build (branches are currently
unsupported).

Changes to either the templates/ or lib/ subdirs will trigger a full site
build.

A detailed walkthrough is available for folks working on site design.
Note that the typical ASF::View
based views now support template preprocessing of source content by passing a preprocess => 1
argument to the configured view in path.pm.

With the introduction of svn 1.7+ working copies, it becomes possible to plug in
a wide variety of functionally similar build systems to the standard perl system
described above- think maven, ant, forrest, etc. If this interests you please
discuss the matter further on the infrastructure@ mailing list. It is not unfair
to describe this CMS as simply a CI tool with a basic web browser interface.

After going live with www.apache.org, the next project we would like to
tackle is the incubator website. It too is based on anakia, but thanks
to Sam Ruby there is an xslt
file available to help automate the conversion from xdoc to markdown sources.
We would like to complete this migration by March 1, 2011.

After migrating the incubator site we will branch out to approach any
Apache project still using anakia to convert to the CMS. This will of
course be a project decision, but we hope the advantages of migration
will be clear and well appreciated by pmc members. We hope to complete
this process during the summer of 2011. Update: see ant adoption
for new options for projects still stuck on Anakia.

The next long-term project to tackle is the eventual phaseout of Confluence
backed websites. This will be an extensive project which will require
development of content conversion tools, but the clock is ticking on how
long we can continue to run Confluence without any support for the
autoexport plugin. Update: see confluence adoption
for new options for projects still stuck on Confluence.

The final long-term objective is to completely eliminate people.apache.org
as the publication hub for Apache websites. Security considerations alone
make this a worthwhile goal, and to make this happen we would like to
mandate the adoption of at least svnpubsub for all projects by the end of
2012.