Process

This is first public release of such data in Slovenia and is result of 6 months of intensive work in data reconciliation, methodology and finally – visualisation.

I helped mostly in regards with data reconciliation and can speak about the tools we used. Basic tool was Google Spreadsheet and was used as a database that everyone could contribute to and it helped us sync the data together. It also allowed for basic pivot table based visualisations. It worked mostly ok and ability to write scripts for it also helped a lot. Finally the data was moved into Semantic media wiki and visualised using d3.js.

Lessons learned

Google Spreadsheets don’t scale. After you reach about 1000 rows with 30 columns, it becomes almost unusable slow.

This dataset is complex enough that it would benefit from automatic checks – automated reimporting into real database and basic reports – unique institution, basic pivot tables. This would help with encoding, whitespace issues that Spreadsheet doesn’t handle.

Google Spreadsheet got really good tools for pivot tables, but they’re a pain to manage if data ranges change. It can probably be further automated but I haven’t yet figured out how.