Category Archives: Data

Post navigation

I co-teach Online Journalism for level three students with Bernie Russell and this week, Tony Hirst from the OU came to Lincoln to give his, now annual, data-driven journalism class. Bernie and I prep the students a few weeks beforehand and then Tony rolls in and packs as much into the class as he can, leaving me and Bernie to pick up the pieces
We’re grateful for it.

If you’re a student struggling with the Wikipedia/Pipes/Google Maps exercise, here’s a working example that you can clone and work backwards through to understand how it works. It’s basically slide 6 of the presentation above.

UPDATE: What follows is broken because of changes to the Wikipedia source page structure, changes to Yahoo Pipes and changes to Google Docs. Trying to keep it working is a pain, so it will have to stay broken for now.

Note how I’ve fetched the CSV into Yahoo Pipes, defined the data I’m interested in, renamed two key attributes, renamed the title attribute to be ‘population’ and then used the location builder in a loop block to determine the geo-locations. Once that’s done, it runs in the Pipe like this:

Is this displaying correctly? I’ve found that embeds directly from Yahoo Pipes can be a bit flaky.

However, if you right click on the KML link and paste the KML link into the search box of Google maps, then you should see something like this:

I’m sure I’m not the only person who’s playing around on a Friday afternoon with a new script for GMail that provides statistics about your email habits. It doesn’t include spam, calendar invites or my chat history. I’m quite pleased that I send two-thirds less email than I receive, although it’s a shame that 80% of the emails I receive are not directly for me. Not that I want them to be, but it suggests I get CC’d into a lot of mail. I’m also pleased to see that about 80% of email I receive gets answered within one day. I hate it hanging around and usually have less than a handful of emails sitting in my Inbox at any one time. All my email, both work and personal, comes to this single account.

As I mentioned a few weeks ago, while attending Dev8D, I surveyed developers working in or for universities. Here are the results. Click on the images below to view them full size. The data can be downloaded (minus email addresses and institutional affiliation).

What does the survey tell us? Well, it’s only 35 people out of about 250 that attended the conference. I also posted the link on Twitter, so it was open to abuse (it certainly wasn’t under controlled conditions!), but looking through the data, I don’t think it was spammed.

The last question shows that about two-thirds of respondents are keen to remain working in the sector and just under half of respondents are not looking for promotion. I expected that to be higher, given that a similar number have only worked in HE for 0-5 years, but maybe they’re entering at a level where promotion is less important to them. About a quarter of people said it was their first proper job. Other people are entering the sector from both public and private organisations in equal measure. A large majority of respondents are or have been in line management positions. Just under a third of developers can see themselves moving into management positions, away from day-to-day development, while a similar number aren’t sure.

In terms of how long they have been writing code, there was an even spread across the range of years and a corresponding response to whether people consider themselves novices, experienced or expert. Two thirds of respondents studied programming at university, but a larger number consider themselves self-taught. The two responses are not exclusive of course. The majority of people prefer web development and the choice of programming languages reflects that, too. There’s lots of use of source control applications, about half of people are using formal development frameworks and fewer people are using Continuous Integration.

Two thirds of people said that they work autonomously, are proud of the work they do, and get on with their colleagues, which is nice to hear However, only a third of people think they are paid pretty well and just under a half said that they enjoy their responsibilities

About two thirds of respondents feel that their work forces them to learn new things all the time. While others only learn new things occasionally or on side projects. The majority of people learn from figuring it out on their own, but many people also learn from web articles, forums, books and colleagues. Training opportunities also seem to be available and, not surprisingly given we were at Dev8D, about half of respondents are encouraged to go to conferences and workshops. Of course, time and money keep people from attending such events, but more worryingly, there’s evidence that at some institutions, it’s ‘not the done thing’.

From my own work, I was interested to see that there’s little culture of involving students in the work of developing services for HEIs, with two-thirds of people saying they never or rarely employ students.

There’s more detail in the numbers, so do have a look for yourself. For me, this was a useful first attempt to get a sense of the motivation, opportunities, interests and challenges for hackers working in universities. I intend to follow it up with a more formal and controlled survey, as well as observation of teams across the country. If you’d like to invite me to observe and interview you and your team, please do let me know

Comments about the survey and the results are welcome below, too. Thanks.

Just a heads up to say that we’ll be advertising for a Web Developer to work on Orbital, our JISC-funded ‘Managing Research Data’ project. The post, starting in March/April, will be a 12 month, full-time, grade 5 (c.£21K) position.

The Web Developer (‘you’) will be working in the Centre for Educational Research and Development, alongside Nick Jackson, Lead Developer on Orbital, and also benefit from being in a team that includes staff in central ICT services and the Library. Orbital builds on and extends previous work we’ve been doing over the last couple of years, so if you’re interested, you should read through our projects pages.

If we were to summarise our technologies and interests I guess they would be #agile, #opensource, #opendata #LAMP, #php, #codeigniter, #mongoDB, #OAuth, #APIs, #HTML5, #CSS3, #github and moving towards #RDF and #LinkedData.

Just seeing these hashtags listed together should cause your heart to beat with excitement

When we advertise in January, you’ll see that the job spec is actually a pretty standard affair. What I want to emphasise here is how interesting and fun the job will be.

The key section in the Job Description is what you’d be working on with Nick:

Development and implementation of a set of web services, which re-use and develop our previous, JISC-funded work as well as other initiatives (e.g. SWORD and DataCite DOIs).

Documented source code will be made available under an open source license by the end of the project.

Development and implementation of mechanisms for managing and transferring data, including the use of MongoDB, OAuth, read/write RESTful APIs, SWORD2 interoperability, and integration with the administrative functions of EPrints.

That actually summarises a lot of work.

I’m managing the project and try to run things with as little hierarchy as possible within a university environment. You’ll always know the project priorities and will be trusted to self-organise and deliver on time, working to two-week iterations and, roughly, monthly releases. I regularly reflect on how we work and our overall working environment. For Orbital, I favour the Crystal Clear agile methodology, as does Nick. You’ll be encouraged to reflect on this with us, too.

We work hard, and not always 9-5pm, but we work at a pace that is sustainable over a long period of time. We take our work seriously but, in the spirit of hacking, are always looking for ways to have fun, too. We recognise that we’re fortunate to be working in a diverse and intellectually stimulating academic environment, but are user/product focused at the end of the day. You’ll be working directly with our users, who are Researchers in the School of Engineering and Siemens, and staff in the Library and ICT. You’ll need to be showing them refreshed, working software every couple of weeks and iteratively improving Orbital, based on their feedback and requirements. There may also be times when you’ll be asked to talk publicly about your work and you’ll be encouraged to blog about it every so often, too. I expect the project to produce one or two conference/journal papers, and you’ll be named as a contributor and can take as active role in that as you like.

I hope this sounds like an interesting job. At £21K, I recognise that it will probably attract younger developers looking to gain experience, though of course, we welcome applications from anyone whatever your age. By the time the post starts, we’ll have set up a decent dev/staging/production environment, hosted in the cloud, and relying on Github and Jenkins to keep things versioned, integrated and tested. Nick will have been developing Orbital for a couple of months or more and laid the groundwork for someone to start coding quickly in a supportive environment.

If you’re thinking of applying and don’t live in Lincoln, you’ll be pleased to know that it’s a decent small city, and a relatively cheap place to live. The campus is modern and sits by a Marina in the middle of the city. You can walk to work. I love the place. Oh, and you can choose your own hardware for development, within reason. Most of us use Macs, but whatever suits you. I’ll ask the successful candidate what they prefer when we offer them the job.

If, after reading around the project website, you’ve got any questions about the post, please do get in touch. Thanks.

Recently, I posted on the LNCD blog about our work on data.lincoln.ac.uk. You might find it interesting.

One of the by-products outcomes of our recent ‘proper’ projects is data.lincoln.ac.uk. This is simply a site that documents the data we are warehousing in our MongoDB datastore (called ‘Nucleus’), and the programatic methods by which we (and the public) can access that data. Most of the data is licensed for public use, but where appropriate (e.g. personal data), a secure access token must be requested. Currently, outside of our own projects, the only people needing/wanting secure access tokens are some third year computer science students who are using data.lincoln.ac.uk as the basis for their dissertation projects and require access to their own personal event data.

Our approach to publishing open data at the University of Lincoln has been to do so in a way that was immediately useful to the work we were doing…

Tony Hirst recently blogged about the Open Data scene in UK HE, mentioning Lincoln as one of the few universities that are currently contributing HEI-related #opendata to the web. Sooner or later, I’ll write a more reflective post, but here I just wanted to document the current situation (that I’m aware of) at Lincoln. There are two groups that take an interest in furthering open data at Lincoln: LiSC, led by Prof. Shaun Lawson, and LNCD, the new cross-university group I co-ordinate which consolidates a lot of the previous and current work listed below. (For a broader overview of recent work, see this post).

Derek Foster in LiSC recently released energy data from our main campus buildings, updated every 2hrs on Pachube. I was just speaking to Nick and Alex and I think they plan to pull this data into our nucleus datastore, combine it with the campus location-based work we’ve done and generate dynamic heat maps (assuming Derek isn’t already working on something similar??)

JISCPress, a 2009/10 project we worked on that didn’t release any data but developed a prototype WordPress platform that atomises documents for publication and comment on the web and spits out lots of data in open formats. It also uses OpenCalais, Triplify and can push RDF Linked Data to the Talis Platform. JISC now use it to publish documents for comment.

Total Recal, a JISC-funded project we completed recently and will roll out across the university this September. As well as providing a fairly comprehensive and flexible calendaring service at the university, it allowed us to work on our space-time data and develop a number of APIs on top of…

Nucleus, the epicentre of our open data efforts. This is a data store, using MongoDB, which aggregates data from a number of disparate university databases and makes that data available over secure APIs. Through a lot of hard work over the last year, Alex and Nick have compiled the single largest data store that we have at the university. Currently, it offers APIs to university events, calendars, locations and people. We’ll also be adding APIs to over 250,000 CC0 licensed bibliographic records held in Nucleus, too (see Jerome below). It also uses the OAuth-based authentication that Alex has developed.

Linking You, is a JISC-funded project we delivered last week to JISC, which looked at our use of URIs, undertook a comparative study of 40 HEI websites (more to come), proposed a high-level data model for use by the HEI sector and made some recommendations for further work. What we’ve learned on this project will have a lasting effect on the way we present our data and on our wider advocacy of open data to the university sector. I really hope that our recommendations will lead us to more discussion and collaboration with people interested in opening university data.

lncn.eu, a URL shortener that Alex and Nick developed in their spare time for a while and has since been formally adopted by the university. Naturally, lncn.eu has an API and can be used (e.g. Jerome) as a proxy for other services, collecting real-time analytics.

Jerome, is a current JISC-funded project that will release over 250,000 bibliographic records under a CC0 license. The data is stored in Nucleus and documented APIs will be available by the end of July. This is a very cool project managed by Paul Stainthorp in the Library (who’s also a member of LNCD).

We’re currently using data.online.lincoln.ac.uk to document the data that is accessible over our APIs. At some point, I can see us moving to data.lincoln.ac.uk – we just need to find time to discuss this with the right people. So far, we haven’t really gone down the RDF/Linked Data route, preferring to offer data that is linked (e.g. locations and events data are linked) and publicly accessible over APIs that are authenticated where necessary and open whenever possible. We are keen to engage in the RDF/Linked Data discussion – it’s just a matter of finding time. Please invite us to your discussions, if you think we might have something to contribute!

I’m going to the JISC Information Programme Meeting on Thursday and have been asked to join a panel where I’ll talk about our work at Lincoln under the heading ‘from unprojects to services’. Here are my notes.

Over the last couple of years, staff in CERD, The Library and ICT have worked closely together on a number of ‘rapid innovation’ projects, which have sometimes later attracted JISC funding. Much of our work has been undertaken at the initiative of individual staff, who have benefited from a supportive ICT environment that allows us the freedom to develop and test our ideas without running into bureaucratic walls. ICT – in particular the head of the department, Mike Day, and head of the Online Services Team, Tim Simmonds – recognised the benefits of employing undergraduate students and recent graduates, and established a post which Nick Jackson and Alex Bilbie share. Alongside this, I have been applying for JISC funding and successful bids have allowed us to employ Nick and Alex full-time rather than part-time. In recent months, this has worked very well and currently much of their time is spent working on JISC-funded projects which bring value to the University. Below, are a list of the services that this culture of innovation has allowed us to work on over the last year or so. Click on the links to go to the services.

The Common Web Design: Distributed HTML5/CSS3 template for internal services

Posters: A repository for visual communications

lncn.eu: The official URL shortner for the university. Provides real-time stats, API and acts as branded/trusted proxy for other services.

Single Sign On: OAuth/SAML/Shibboleth/NTML/Eduroam integration

Zen Desk: University Help Desk

My Calendar: An aggregation of space-time data into a flexible web service. JISC-funded.

Nucleus: Datastore for People, Events, Bibliographic and Location data (and more to follow). Provides (open) APIs to all other services. MongoDB.

James Docherty, a third year student, used the nucleus datastore as a source of data for his final year project: Situated Displays for buildings, showing room booking information, posters and announcements.

Online Server Monitoring: A simple dashboard for anyone to check whether a service is working

QR Codes: Will be used for asset tags and already being used in rooms to create Help Desk tickets.

Most of these services push and pull data to Nucleus, the central, open datastore built on MongoDB. e.g. Zen Desk=People + Locations, My Calendar=Events, Jerome=Bibliographic

We’re currently looking at how Nucleus can also be a source for Linked Data. It has open(ish) APIs.

CWD sites transparently sign the person in to the site, if they are signed in elsewhere.

We like Open Source. SSO is mostly open source software. Alex has released his OAuth 2.0 code. CWD likely to be open source; MongoDB, bits and pieces from Jerome and My Calendar.

As we build these services, they are being integrated, too. e.g. lncn.eu will be a URL resolver for Jerome offering realtime monitoring; posters will show up in My Calendar events; CWD is the design framework for My Calendar.

Most of these services are for official launch in September. They will be included in the new ICT Handbook, included in brochures and other announcements.

Now that we know we can develop this way and that it works and we enjoy it, we’re hoping to expand from two to four student/graduate developers and have our own budget for hardware/software/conferences and to give to staff and students that want to join us.

Our approach links into the University’s Teaching and Learning Strategy: Student as Producer. We want to work with students and staff across disciplines to create useful, innovative and enjoyable online services that make the University of Lincoln a great place to work and study at. It’s not about a team that works on ‘educational technology’, but rather a network of people who develop and support technologies that make Lincoln a productive environment for research, teaching and learning. It’s inclusive, with students (and therefore learning) at its core.

We’ve recently started providing staff training on using Google apps and one of the questions that always comes up is around privacy and security. Following one of our sessions, one member of staff is using Google docs to manage a large number of sensitive documents, with several other colleagues. The sharing of folders and documents with different people is proving very useful. Recently, that member of staff asked me about whether it was possible to encrypt files stored on Google docs so I had a look around to see what the situation is. I knew that transport encryption is available (i.e. https) and that there was no feature in Google docs to encrypt a file, but wanted to provide a thorough response to my colleague.

As I said, Google doesn’t provide the facility to encrypt data held in Google docs. You can however, encrypt a file and upload it to Google docs for online storage only. To read the file, it has to be downloaded and decrypted. I tested this with a .pgp file.

I searched around on the web for a few more clues and there’s the suggestion (last comment) that the data is ‘sharded’ across multiple servers and when you click on the name of a file, the data is brought together into the file for you to work on. I haven’t found any official confirmation of this technique being used.

There’s a Google docs employee on Get Satisfaction that has responded a few times to people’s questions around this area. These replies offer some clarity:

In summary, there is no encryption of data on Google’s servers, but Google are using the same systems to manage their private corporate data and they comply with international (including the UK) data privacy policies. Introducing encryption is technically feasible but would introduce many negative consequences to the features they provide (slower, no collaboration, etc.)

If you’ve got any other, officially confirmed, information on the security of Google docs, please do leave a comment. Thanks.