Last weekend I was in Birmingham for the StateOfTheMap, to learn how we could be more involved in OSM in a number of projects we have down the line.

Although I’m a casual mapper and I did know some things about OSM and its core technologies, this was my first in-depth immersion into that world. Note also that during the conference I followed a specific path into the multiple choices we had so, do not expect me to write a complete summary of the conference neither a hands-on guide on “How Mapnick stylesheets were ported to CartoCSS” (enjoyed a lot that talk by the way!). I’ll focus on the community side of the conference.

Other that that, OSM is strugging with growth. For me, there is a subtle line which connects Alyssa Wright’s “Changing the Ratio of OSM communities“, Richard FairHurt’s “You are not the crowd“, the tools built by the Mapbox guys for the next generation of contributors, the world-class documentation the HOT team is creating and the multiple talks on gamification during the conference: they’re all talking about how should OSM growth. Being it the social side of it (how could we engage new contributors?) or the technical one (what tools do we need for people to find easy work with OSM?). That is a challenge, but a challenge that most of the communities I know would like to have.

As an outsider, I got the impression that OSM is like a teenager that still has to define itself in some aspects. And my belief is that it it manages to do it in a smoothly fashion, it will have an even brighter future ahead.

]]>http://nosolosoftware.com/an-outsider-overview-of-sotm13/feed/0(Geo) Database evolution while developinghttp://nosolosoftware.com/geo-database-evolution-while-developing/
http://nosolosoftware.com/geo-database-evolution-while-developing/#commentsSun, 11 Nov 2012 18:10:23 +0000http://nosolosoftware.com/?p=2660Sigue leyendo (Geo) Database evolution while developing→]]>During last year, I followed with interest the different approaches on how to evolve the design of a database being discussed within the postgresql community. Following is my take on that one: how this year I developed a project with an intense evolving DB design using an agile approach.

The context

My requisites for this project were twofold:

An evolving DB design: at the beginning of the project I didn’t know how the DB design was to going to be. I had set to use some advanced techniques for data modeling which never had used in production (dynamic segmentation and linear referencing with PostgreSQL/PostGIS) and needed an approach which supported my evolving understanding of the domain.

Intense collaboration with analists: the project needed some intense work on data-processing to polish and create the data for the application. I knew this was to be an iterative process where both developers and analists would collaborate together to define and clarify the model we needed.

My approach

So, in the process of improving and automating my delivery pipeline, I set some rules for the project:

DB management through SQL and control versioning: the database was created from DDL scripts and data was stored as CSV (if alphanumeric) or SQL (generated from Shapefiles to store geographical information).

Application and database evolve together: so their code should too, which in practice means I put the app and DB directories/projects under the same git repo.

Test driven development: I needed to break the problem in small chunks I could deal with, while my understanding of the domain improved. Besides, when refactoring the DB (schemas, triggers, functions, etc) -which happened frequently- I needed to know all the pieces were working OK. I decided to use pgTap for that.

And how it turned out?

The pipeline worked smoothly: both the analists and developers were working in their confort zone with the proper tools; desktop GIS applications the formers, command-line and SQL the laters.

git provides an excelent mechanism for versioning text, so I had powerful tools at hand for versioning SQL structure and data (diff, cherry-pick, interative rebases, etc). Besides, see where the data was varying (name and type of fields, its values, etc) allowed us to early discovered some bugs and problems.

Database and application evolving to the same pace. By tagging the versions we can build in seconds the binaries needed for any version of the application with the proper DB.

Tests at DB level are a life-saver. pgTap allowed me to refactor the database whith no risk and a lot of confidence on what I was doing. I had all kind of tests: check if a trigger is launch if an UPDATE happens, a function is working, data integrity and model validation after the initial restore, etc.

Same process for deplying to developing, staging and production environments, which resulted in fewer errors and no panic-moments.

Having the data in the repo and regenerating BD from scracth was very comfy and quick (less than a minute in my laptop the whole DB: 100Mb of raw SQL) and similar numbers when deploying to stage through the wire. In a daily bases I only had to regenerate specific schemas of the DB, so waitings was an order of seconds.

At the tools level, I was reluctant to introduce newtools and steps I didn’t know very well in such a tight schedule, so I decided to stick to the basic and spartan (git repo, shell scripts, pgTap and SQL), then iterate and grow a solution for our specific case. Although I missed some refactoring tools, it turned out to be a good approach and now I´m in good position to know the tradeoffs of the process, which in next projects will help me to choose a specialized tool, if necessary.

]]>http://nosolosoftware.com/open-data-en-espana/feed/0Analysis of free software communities: codahttp://nosolosoftware.com/analysis-free-software-communities-coda/
http://nosolosoftware.com/analysis-free-software-communities-coda/#respondMon, 26 Mar 2012 15:20:55 +0000http://nosolosoftware.com/?p=2473Sigue leyendo Analysis of free software communities: coda→]]>As you can see in my last posts (I, II, III, IV and V), I finally managed to translate the paper we released last year in V jornadas de SIG Libre (please, beg my english!). It took me a year and my wisdom teeth removed to find the time.

Our intention (Fran and me) when this paper first poped out from our heads was to foster debate on the best practices around a free software project. While at CartoLab, we presented the idea to Alberto; he encouraged us to work on it and gave the time and resources needed; also in the later stages he contributed to polish the trends and conclusions. I’m deeply grateful for all his patience and empathy.

I’m very proud of the work we have done: the first study of this kind in the GIS arena, and somehow a picture of 10 years of FOSS4G software development (for the desktop side). I hope the study is worth the effort and it continues to create debates on how to better work together.

Images: on the left, contributions of top 3 developers along the project history; on the right, evolution of developers participating during 2010.

Data: trunk from project repositories during the period 1999-2010.

Is it something we could extrapolate from the data there?

This indicator gives us some sense on how the leadership changed and how the knowledge transfer was done in every project. The paper elaborates a bit more the points of turnover and integration of new blood in the project (highly correlated with this indicator) with statistics of top 10 developers.

All that will give us some insights on every project:

GRASS

The charts and data depict how a new generation took over the leadership from 2005 onwards. The process seems to be happened in a very organic way -in the sense that people grew its skills at a steady pace for a long time- and also deep to the roots: from the top10 only 4 out of 10 people continue collaborating with the project.

The data also shows how the top3 represent half of the work in the project, which suggest that several developers are highly involved with no one having too much influence (actually, the top contributor during 2010 means 40% of work).

gvSIG

The charts and data depict a highly distributed team with a high rate of turnover. Top3 is responsible for less than half of the contributions, being top10 around 60%. The change of leadership happened very quickly around 2007 and only 2 out of 10 contributors from top 10 kept working in 2010.

Besides, the top10 shows a homogeneous involvement in terms of number of contributions, which may reflect that all of them had a similar role and impact in the development of gvSIG.

QGIS

The charts and data depict a project dependent of its top3 with a contributions-friendly culture. Top3 activity means a hight rate of contributions over total but seems they have integrated well new blood as 9 out of 10 most active developers working in QGIS have started in different years and continue involved.

Top10 people have different ratios of involvement, ranging from 6% to 50%, which may reflect the heterogeneity of its core developer base (from volunteers to full-time developers).

Images: on the left, number of changes to the codebase (commits) agregated by hour of day. On the right, number of commits grouped by day.

Data: trunk from project repositories during the period 1999-2010.

Is it something we could extrapolate from the data there?

This indicator is intended to give us some information on the patterns of behavior of contributors. Specifically, we can track how is a typical week for the core developers in every project: the timeline shows when the integration happened, don’t reflect the time in which the work was done; so it’s telling us the history of people with commit permissions, what we know as the leaders.

Let’s try to extract some information from there:

GRASS

Internationalization: the hourly chart represents a gauss bell centered on 15h GMT, which in most European countries would be after lunch, being morning in the Americas. That could reflect that both continents represent the vast majority of core commiters. Nevertheless, the work is relatively well distributed along different hourly zones.

Volunteers: the daily chart shows a light drop of work during the weekend, likely due to hired developers or people who likely make contributions mostly within their working hours. Nevertheless, there is still a high rate of contributions being integrated during weekend, which may be a sign of a well stablished volunteer base of core-developers.

gvSIG

Internationalization: almost all the integration happens in a journey from Monday to Friday, with a hourly range from 09:00 to 20:00 GMT. That is strongly correlated to the hours of opening of a typical shop in Spain and reflects the nature on how the application was built in that period: led by a public body which contracted development to Spanish firms.

Volunteers: seems that volunteer work in core was reaching to none, which reflects the original nature of the project in that period.

QGIS

Internationalization: the hourly chart is nearly to a plain rate of contributions, which is a strong sign of a highly distributed leadership along the world. It’s even difficult to suggest which zones would be the prominent in terms of developers.

Volunteers: the daily chart reflects a steady work along the week, with no signs of falling during the weekend, which may be related to a strong base of volunteers core commiters.

Images: on the left, the number of changes to the codebase (commits) agregated by year. On the right, the number of developers with at least 1 commit that year.

Data: trunk from project repositories during the period 1999-2010.

Is it something we could extrapolate from the data there?

Certainly, not the number of features developed or bug fixes. It is even barely possible to compare activity between projects, as there are a high variability in terms of changesets: some people could send several little changesets and others just 1 big change, some project could have a special policy which affect the results (i.e.: make a commit formatting the code accoring to the style rules and other with the changes), etc. Some people could even argue that the language they are written in affects the number of changes (GRASS is written in C, gvSIG in Java and QGIS in C++) due to the libraries available or the semantics of every language. So, is it possible to find out something? Well, in my opinion, we can trace at least the following:

the internal evolution of a project.

how a project is doing in terms of adding new blood.

So, let’s make again the exercise of finding out what’s happening here:

GRASS

It calls the atention the curve of activity in the project: growth by periods (2001-2004 and 2005-2007) with local maximums in 2004 and 2007. Our hypothesis was that it was due to the way the project works: the developers here make changes both in the trunk and in the branch of the product to release (be it 6.4 or 6.5) at the same time, with a lot of changesets moved between both the trunk and the branches (so doing heavy backporting). In a recently conversation with Markus Neteler, he has explained me better how they work and I guess the rhythm we see in the graphics is due to that.

In terms of number of developers, GRASS has showed a continuous growth until 2008; since then, the number of regular developers stabilizes.

gvSIG

gvSIG shows an incredible high period of activity during 2006-2008 (4500 changesets by year and most that 30 people involved!). To understand the Gauss bell of activity, is needed to know the background of the project: gvSIG development has been led by contract, which means that all activities (planning, development, testing, etc) were led by the client needs who pay for it. Only recently, these processes have been opened to a broader community (firms and volunteers collaborating in the project within the gvSIG association). So, it makes sense that the beginnings had seen less activity (high phases of planing) and afterwards they got to agregate so many people in such a short period of time.

But, in 2010 it suffered a sudden stop in development (only 233 changes to the codebase were made, while a pace of 4500 changes were made during previous years). This decreasing in activity is highly correlated to the number of developers involved. It’s hard to say why it happens: could it be due to the efforts were directed to gvSIG 2.0 development? could it be due to the reorganization in the project and the creation of gvSIG asociation? Well, few can we said at this respect with the data available, further research is required to determine that.

QGIS

Steady grow both in terms of contributions and contributors. 2004 and 2008 years determine two peaks of activity and people participating in the development. Our preliminar hypothesys was that it was due to the release of the first stable version and the release of 1.0, as well as become an oficial project of OSGEO. Gary Sherman has confirmed that in a recent post (history of QGIS commiters) and an interview (part1 and part2). Besides, he pointed out that in 2007 the project added python support for plugin development, which possibly was one of the reasons of the growth in 2008 and afterwards.

An interesting finding is that, every 4 years the project has doubled the amount of developers involved with a slower but steady growth in activity.

Well, hope these graphics have helped us to understand better how is the project activity and the manpower every project is able to aggregate around it. Next posts in the serie, will focus on the developers involved and the culture surrounding them. Looking forward to your feedback!

Find below the statistics for mailinglist activity in GRASS, gvSIG and QGIS during the period 2008-2010. The first one shows data from the general user mailinglists for each project. Take into account that data for gvSIG agregated both international and spanish mailinglist due the reasons stated here.

The next one shows the same data (number of people writing and number of messages by month) for the developers mailinglists.

Is it something we could extrapolate from the data there?

Well, certainly not the user base. The data shyly introduce us the trends, not the real user base. The model we adopted to study the projects reflects just a part of the community -which is arguably the engine of project- but don’t take the data as the number of users for each project. For sure, each one of our favorite projects has more users than those participating in (these) mailinglists!

Anyway, here some food for thought:

GRASS: it smoothly decreases in terms of number of messages as well as people writing, which happen within users and developers. The tendency is not clear though.

gvSIG: the data shows a steadly increasing number of users participating in the mailinglists. On the other hand, although it is the project with more people suscribed to developer mailinglist, it shows the less activity of the three projects (in terms of # of messages in developer lists): few technical conversations seemed to happen through the mailinglists during that period.

QGIS: according to the data, a clear growth exists in the community. In the period in study (3 years) the number of users and developers participating in mailinglists has been doubled!

Few more can be said, hope the graphics are explicative enough! Looking forward to your feedback.