This workshop organised by Transparent Chennai at The Institute of Financial Management and Research, Chennai was the outcome of the experiences of the earlier open data camp events organised by Transperant Chennai in Bangalore and Hyderabad, where there was a wide discussion among attendees who were excited by the potential of
data and the open data movement, but who did not have the necessary skills or technical background to work effectively with it.
It was felt that there was a much larger community of activists, researchers, and on-profits who could benefit from learning to use the kinds of tools presented at the camps. Thus, this event was planned differently from a data camp and focused on training activists, researchers and students to work with data where participants would learn about open data, data visualisation, spatial data and practical issues that come up when working with data in various forms.

The workshop thus aimed at helping the participants to:

Understand various formats of data, diverse possibilities of data visualisation and effective tools for doing so, with a special focus on web-based tools

Understand how to think through projects involving collection, processing and visualisation of data

While data is increasingly important in academia as well as in industry, the two worlds do not intersect each other all that often. DSDT is a monthly forum for sharing ideas about data across disciplines and industries. Each DSDT meeting will consist of two talks on a common theme, pairing a data scientist with a data technologist along with time for discussion. From the second session onward, we will have a tutorial and hacking session after the talks where we will learn how to work on understanding and analysing data sets relevant to that meeting’s theme. The schedule for the first meeting on the 18th at NIAS is given below.

The Rajashtan rainfall data was scraped as part of Scrapathon held in Bangalore 21st July 2011. Intially I used scraperwiki, but the huge amount of data made it to timeout all the time 🙂 so I wrote a simple python script to do it.

Data is in the SQLITE file data.sqlite, in a table called rainfall. It has 6,61,459 rows.
Columns: DISTRICT, STATION, YEAR, MONTH, DAY, RAIN_FALL, PAGE_ID

PAGE_ID refers to the ID in the table webpages which lists the webpages from where these data where scraped. It will help you incase you want to cross check. The rest of the columns are self explanatory. I have signed the SQLITE database using my GPG keys and the signatures are inside the file data_gpg_signature.sig