On Wed, Jan 15, 2014 at 12:22 AM, Paul Davis <paul.joseph.davis@gmail.com>wrote:
> I've recently been having discussions about how to handle the
> repository configuration for various bits of CouchDB post-merge. The
> work that Benoit has been doing on the rcouch merge branch have also
> touched on this topic as well.
>
> The background for those unfamiliar is that the standard operating
> procedure for Erlang is to have a single Erlang application per
> repository and then rely on rebar to fetch each dependency.
> Traditionally in CouchDB land we've always just included the source to
> all applications in a single monolithic repository and periodically
> reimport changes from upstream dependencies.
>
> Recently rcouch changed from the monolithic repository to use external
> repositories for some dependencies. Originally the BigCouch used an
> even more federated scheme that had each Erlang application in an
> external repository (and the core couch Erlang application was in the
> root repository). When Bob Newson and I did the initial hacking on the
> BigCouch merge we pulled those external dependencies into the root
> repository reverting back to the large monolithic approach.
>
> After trying to deal with the merge and contemplating how various
> Erlang release things might work it's become fairly apparent that the
> monolithic approach is a bit constrictive. For instance, part of
> rebar's versioning abilities lets you tag repositories to generate
> versions rather than manually updating versions in source files.
> Another thing I've found on other projects is that having each
> application in a separate repository requires developers to think a
> bit more detailed about the public internal interfaces used through
> out the system. We've done some work to this extent already with
> separating source directories but forcing commits to multiple
> repositories shoots up a big red flag that maybe there's a high level
> of coupling between two bits of code.
>
> Other benefits of having the multiple repository setup is that its
> possible that this lends itself to being integrated with the proposed
> plugin system. It'd be fairly trivial to have a script that went and
> fetched plugins that aren't developed at Apache (as a ./configure time
> switch type of thing). Having a system like this would also allow us
> to have groups focused on particular bits of development not have to
> concern themselves with the unrelated parts of the system.
>
> Given all that, I'd like to propose that we move to having a
> repository for each application/dependency that we use to build
> CouchDB. Each repository would be hosted on ASF infra and mirrored to
> GitHub as expected. This means that we could have the root repository
> be a simple repo that contains packaging/release/build stuff that
> would enable lots of the ideas offered on configurable types of
> release generation. I've included an initial list of repositories at
> the end of this email. Its basically just the apps that have been
> split out in either rcouch or bigcouch plus a few other bits from
> CouchDB master.
>
> I would also point out that even though our main repo would need to
> fetch other dependencies from the internet to build the final output,
> we fully intend that our release tarballs would *not* have this
> requirement. Ie, when we go to cut a release part of the process the
> RM would run would be to pull all of those dependencies before
> creating a tarball that would be wholly self contained. Given an
> apache-couchdb-x.y.z.tar.gz release file, there won't be a requirement
> to have access to the ASF git repos.
>
> I'm not entirely sure how controversial this is for anyone. For the
> most part the reactions I remember hearing were more concerned on
> whether the infrastructure team would allow us to use this sort of
> configuration. I looked yesterday and asked and apparently its
> something we can request but as always we'll want to verify again if
> we have consensus to move in this direction.
>
> Anyone have comments or flames? Right now I'm just interested in
> feeling out what sort of (lack of?) consensus there is on such a
> change. If there's general consensus I'd think we'd do a vote in a
> couple weeks and if that passes then start on down this road for the
> two merge projects and then it would become part of master once those
> land (as opposed to doing this to master and then attempting to merge
> rcouch/bigcouch onto that somehow).
>
>
> This is a quick pass at listing what extra repositories I'd have
> created. Some of these applications only exist in the bigcouch and/or
> rcouch branches so that's where the unfamiliar application names are
> from. I'd also point out that the documentation and fauxton things are
> just on a whim in that we could decouple that development from the
> erlang development. I can see arguments for an against those. I'm much
> less concerned on that aspect than the Erlang parts that are directly
> affected by rebar/Erlang conventions.
>
> chttpd
> config
> couch
> couch_collate
> couch_dbupdates
> couch_httpd
> couch_index
> couch_mrview
> couch_plugins
> couch_replicator
> documentation
> ddoc_cache
> ets_lru
> fabric
> fauxton
> ibrowse
> jiffy
> mem3
> mochiweb
> oauth
> rebar
> rexi
> snappy
> twig
>
I also contemplated this and and I am generally +1 on this. And definitely
+1 to mirror them on the apache git if possible. I have a couple of
comments though.
Initially I also had everything separated in its own source repository. 1
year ago I merged back as one core repo the couchdb erlang applications and
put all the dependencies in the refuge repository or in the refuge CDN for
the spidermonkey and ICU sources.
I merged back as one core repo the couchdb erlang applications because they
were a little too much dependant. Especially couch_httpd, couch_index and
couch_mrview. These applications are not yet enough by themselves.
Imo if we split everything in their own apps, then we should make sure
that couch_httpd can be used without couch_index and couch_mrview (which
means that "all_docs" is available in couch_httpd). Also we should be able
to just launch couch without any of the above. And probably without the
need of an ini. The couch_query_server module thing is an interesting case.
bigcouch is also introducing `ddoc_cache` which I am not sure why it is
provided as a standalone app. Does it means it can be replaced by another
application eventually? Why not having it simply in the couch application?
Does it needs to be updated separately?
Also all our base applications should also be named spaced correctly so
they will be strictly identified as erlang modules: "config" is too
generic, "ddoc_cache" too. Others are probably OK.
There are probably other things that we could provide as apps:
- couch_daemon,
- couch_js
- couch_external
- couch_stats
- couch_compaction_daemon
- couch_httpd_proxy
Anyway again i'm +1 for this move, I really think it's a good idea.
- benoit