Tagged: ukoer

In a meeting a couple of days ago discussing some of the issues around what sort of resources we might want to provide students to support GIS (geographical information system) related activities, I started chasing the following idea…

The OU has, for a long time, developed software application in-house that is provided to students to support one or more courses. More often than not, the code is devloped and maintained in-house, and not released / published as open source software.

There are a couple of reasons for this. Firstly, the applications typically offer a clean, custom UI that minimises clutter and is designed in order to support usability for learners learning about a particular topic. Secondly, we require software provided by students to be accessible.

For example, the RobotLab software, originally developed, an still maintained, by my colleague Jon Rosewell was created to support a first year undergrad short course, T184 Robotics and the Meaning of Life, elements of which are still used in one of our level 1 courses today. The simulator was also used for many years to support first year undergrad residential schools, as well as a short “build a robot fairground” activity in the masters level team engineering course.

As well as the clean design, and features that support learning (such as a code stepper button in RobotLab that lets students step through code a line at a time), the interfaces also pay great attention to accessibility requirements. Whilst these features are essential for students with particular accessibility needs, they also benefit all out students by adding to the improved usability of the software as a whole.

So those are two, very good reasons, for developing software in-house. But as a downside, it means that we limit the exposure of students to “real” software.

That’s not to say all our courses use in-house software: many courses also provide industry standard software as part of the course offering. But this can present problems too: third party software may come with complex user interfaces, or interfaces that suffer from accessibility issues. And software versions used in the course may drift from latest releases if the software version is fixed for the life of the course. (In fact, the software version may be adopted a year before the start of the course and then expected to last for 5 years of course presentation). Or if software is updated, this may cause significant updates to be made to the course material wrapping the software.

Another issue with professional software is that much of it is mature, and has added features over its life. This is fine for early adopters: the initial versions of the software are probably feature light, and add features slowly over time, allowing the user to grow with them. Indeed, many latterly added features may have been introduced to address issues surrounding a lack of functionality, power or “expressiveness” in use identfied by, and frustrating to, the early users, particularly as they became more expert in using the application.

For a novice coming to the fully featured application, however, the wide range of features of varying levels of sophistication, from elementary, to super-power user, can be bewildering.

So what can be done about this, particularly if we want to avail ourselves of some of the powerful (and perhaps, hard to develop) features of a third party application?

To steal from a motorsport engineering design principle, maybe we can add lightness?

For example, QGIS is a powerful, cross-platform GIS application. (We have a requirement for platfrom neutrality; some of us also think we should be browser first, but let’s for now accept the use of an application that needs to be run on a computer with a “desktop” applciation system (Windows, OS/X, Linux) rather than one running a mobile operating system (iOS, Android) or eveloped for use by a netbook (Chrome OS).)

The interface is quite busy, and arguably hard to quickly teach around from a standing start:

However, as well as being cross-platform, QGIS also happens to be open source.

we can then search the source code to find the files that contribute to building the UI:

In turn, this means we can take that code, strip out all the menu options and buttons we don’t need for a particular course, and rebuild QGIS with the simplified UI. Simples. (Or maybe not that simples when you actually start getting into the detail, depending on how the software is designed!)

And if the user interface isn’t as accessible as we’d like it, we can try to improve that, and contribute the imporvements back the to parent project. The advantage there is that if students go on to use the full QGIS application outside of the course, they can continue to benefit from the accessiblity improvements. As can every other user, whether they have accessibility needs or not.

So here’s what I’m wondering: if we’re faced with the decision between wanting to use an open source, third party “real” application with usability and access issues, why build the custom learning app, especially if we’re going to keep the code closed and have to maintain it ourselves? Why not join the developer community and produce a simplified, accessible skin for the “real” application, and feed accessibility improvements at least back to the core?

On reflection, I realised we do, of course, do the first part of this already (forking and customising), but we’re perhaps not so good at the latter (contributing accessibility or alt-UI patterns back to the community).

For operational systems, OU developers have worked extensively on Moodle, for example (and I think, committed to the parent project)… And in courses, the recent level 1 computing course uses an OU fork of Scratch called OUBuild, a cross-platform Adobe Air application (as is the original), to teach basic programming, but I’m not sure if any of the code changes have been openly published anywhere, or design notes on why the original was not appropriate as a direct/redistributed download?

Looking at the Scratch open source repos, Scratch looks to be licensed under BSD 3-clause “New” or “Revised” License (“a permissive license similar to the BSD 2-Clause License, but with a 3rd clause that prohibits others from using the name of the project or its contributors to promote derived products without written consent”). Although it doesn’t have to be, I’m not sure the OUBuild source code has been released anywhere or whether commits were made back to the original project? (If you know differently, please let me know:-)) At the very least, it’d be really handy if there was a public document somewhere that identifies the changes that were made to the original and why, which could be useful from a “design learning” perspective. (Maybe there is a paper being worked up somewhere about the software development for the course?) By sharing this information, we could perhaps influence future software design, for example by encouraging developers to produce UIs that are defined from configuration files that can be easily customised and selected from, in that that users can often select language packs).

I can think of a handful of flippant, really negative reasons why we might not want to release code, but they’re rather churlish… So they’re hopefully not the reasons…

But there are good reasons too (for some definition of “good”..): getting code into a state that is of “public release quality”; the overheads of having to support an open code repository (though there are benefits: other people adding suggestions, finding bugs, maybe even suggesting fixes). And legal copyright and licensing issues. Plus the ever present: if we give X away, we’re giving part of the value of doing our courses away.

At the end of the day, seeing open education in part as open and shared practice, I wonder what the real challenges are to working on custom educational software in a more open and collaborative way?

Note that with smaller samples, for the correlation at zero, the best fit line may wobble and may not have zero gradient, though in the following case, with 200 points, it looks okay…

The method breaks if I set the correlation (r parameter) values to less than zero – Error in mvrnorm(n = samples, mu = mu, Sigma = Sigma) : ‘Sigma’ is not positive definite – but we can just negate the y-values (unifvars[,2]=-unifvars[,2]) and it seems to work…

If in the corrdata2 function we stick with the pnorm(rawvars) distribution rather than the uniform (qunif(pnorm(rawvars))) one, we get something that looks like this:

I just had a “doh!” moment in the context of OERs – Open Educational Resources, typically so called because they are Resources produced by an Educator under an Open content license (which to all intents and purposes is a copyright waiver). One of the things that appeals to me about OERs is that there is no reason for them not to be publicly discoverable which makes them the ideal focus for PEER – Public Engagement with Educational Resources. Which is what the OU traditionally offered through 6am TV broadcasts of not-quite-lectures…

Or how about this one?

And which the OU is now doing through iTunesU and several Youtube Channels, such as OU Learn:

PS It also seems to me that users tend not to get too hung up about how things are licensed, particularly educational ones, because education is about public benefit and putting constraints on education is just plain stoopid. Discovery is nine tenths of law, as it were. The important thing about having something licensed as an OER is that no-one can stop you from sharing it… (which even if you’re the creator of a resource, you may not b able to do; academics, for example, often hand over the copyright of their teaching materials to their employer, and their employer’s copyright over their research output (similarly transferred as a condition of employment) to commercial publishers who then sell the content back to their employers.

Mulling over the OU’s OULearn pages on Youtube a week or two ago, colleague Bernie Clark pointed out to me how the links from the OU clip descriptions could be rather hit or miss:

Via @lauradee, I see that the OU has a new offering on YouTube.com/edu is far more supportive of links to related content, links that can represent the start of a learning journey through OU educational – and commentary – content on the OU website.

Here’s a way in to the first bit of OU content that seems to have appeared:

This links through to a playlist page with a couple of different sorts of opportunity for linking to resources collated at the “Course materials” or “Lecture materials” level:

(The language gives something away, I think, about the expectation of what sort of content is likely to be uploaded here…)

So here, for example, are links at the level of the course/playlist:

And here are links associated with each lecture, erm, clip:

In this first example, several types of content are being linked to, although from the link itself it’s not immediately obvious what sort of resource a link points to? For example, some of the links lead through to course units on OpenLearn/Learning Zone:

Others link through to “articles” posted on the OpenLearn “news” site (I’m not ever really sure how to refer to that site, or the content posts that appear on it?)

The placing of content links into the Assignments and Others tabs always seems a little arbitrary to me from this single example, but I suspect that when a few more lists have been posted some sort of feeling about what sorts of resources should go where (i.e. what folk might expect by “Assignment” or “Other” resource links). If there’s enough traffic generated through these links, a bit of A/B testing might even be in order relating to the positioning of links within tabs and the behaviour of students once they click through (assuming you can track which link they clicked through, of course…)?

The transcript link is unambiguous though! And, in this case at least), resolves to a PDF hosted somewhere on the OU podcasts/media filestore:

(I’m not sure if caption files are also available?)

Anyway – it’ll be interesting to hear back about whether this enriched linking experience drives more traffic to the OpenLearn resources, as well as whether the positioning of links in the different tab areas has any effect on engagement with materials following a click…

And as far as the linkage itself goes, I’m wondering: how are the links to OpenLearn course units and articles generated/identified, and are those links captured in one of the data.open.ac.uk stores? Or is the process that manages what resource links get associated with lists and list items on Youtube/edu one that doesn’t leave (or readily support the automated creation of) public data traces?

PS How much (if any( of the linked resource goodness is grabbable via the Youtube API, I wonder? If anyone finds out before me, please post details in the comments below:-)

It turns out that part of the grief I encountered here in trying to access OpenLearn XML content was easily resolved (check the comments: mechanise did the trick…), though I’ve still to try to sort out a workaround for accessing OpenLearn images (a problem described here)), but at least now I have another stepping stone: a database of some deconstructed OpenLearn content.

Using Scraperwiki to pull down and parse the OpenLearn XML files, I’ve created some database tables that contain the following elements scraped from across the OpenLearn units by this OpenLearn XML Processor:

glossary items;

learning objectives;

figure captions and descriptions.

You can download CSV data files corresponding to the tables, or the whole SQLite database. (Note that there is also an “errors” table that identifies units that threw an error when I tried to grab, or parse, the OpenLearn XML.)

Unfortunately, I haven’t had a chance yet to pop up a view over the data (I tried, briefly, but today was another of those days where something that’s probably very simple and obvious prevented me from getting the code I wanted to write working; if anyone has an example Scraperwiki view that chucks data into a sortable HTML table or a Simile Exhibit searchable table, please post a link below; or even better, add a view to the scraper:-)

a search for figure descriptions containing the word “communication” – select * from `figures` where desc like ‘%communication%’: try it

a search over learning outcomes that include the phrase how to followed at some point by the word data – select * from `learningoutcomes` where lo like ‘%how to%data%’: try it

a search of glossary items for glossary terms that contain the word “period” or a definition that contains the word “ancient” – select * from `glossary` where definition like ‘%ancient%’ or term like ‘%period%’: try it

What will make open data better? What will make it usable and useful? What will push people to care about the open data they produce?
SOMEONE USING IT!
Simply that. If we start using the data, we can email, write, text and punch people until their data is in a standard, useful and usable format. How do I know if my data is correct until someone tries to put pins on a map for ever meal I’ve eaten? I simply don’t. And this is the rock/hard place that open data lies in at the moment:

It’s all so moon-hoveringly bad because no-one uses it.
No-one uses it because what is out there is moon-hoveringly bad

Or broken…

Earlier today, I posted some, erm, observations about OpenLearn XML, and in doing so appear to have logged, in a roundabout and indirect way, a couple of bugs. (I did think about raising the issues internally within the OU, but as the above quote suggests, the iteration has to start somewhere, and I figured it may be instructive to start it in the open…)

So here’s another, erm, issue I found relating to accessing OpenLearn xml content. It’s actually something I have a vague memory of colliding with before, but I don’t seem to have blogged it, and since moving to an institutional mail server that limits mailbox size, I can’t check back with my old email messages to recap on the conversation around the matter from last time…

The issue started with this error message that was raised when I tried to parse an OU XML document via Scraperwiki:

nbsp is an HTML entity that shouldn’t appear untreated in an arbitrary XML doc. So I assumed this was a fault of the OU XML doc, and huffed and puffed and sighed for a bit and tried with another XML doc; and got the same result. A trawl around the web looking for whether there were workarounds for the lxml Python library I was using to parse the “XML” turned up nothing… Then I thought I should check…

Ah… vague memories… there’s some sort of handshake goes on when you first try to access OpenLearn content (maybe something to do with tracking?), before the actual resource that was called is returned to the calling party. Browsers handle this handshake automatically, but the etree.parse(URL) function I was calling to load in and parse the XML document doesn’t. It just sees the HTML response and chokes, raising the error that first alerted me to the problem.

So now it’s two hours later than it was when I started a script, full of joy and light and happy intentions, that would generate an aggregated glossary of glossary items from across OpenLearn and allow users to look up terms, link to associated units, and so on; (the OU-XML document schema that OpenLearn uses has markup for explicitly describing glossary items). Then I got the error message, ran round in circles for a bit, got ranty and angry and developed a really foul mood, probably tweeted some things that I may regret, one day, figured out what the issue was, but not how to solve it, thus driving my mood fouler and darker… (If anyone has a workaround that lets me get an XML file back directly from OpenLearn (or hides the workaround handshake in a Python script I can simply cut and paste), please enlighten me in the comments.)

I also found at least one OpenLearn unit that has glossary items, but just dumps then in paragraph tags and doesn’t use the glossary markup. Sigh…;-)

For me, one of the defining attributes of openness relates to accessibility of the machine kind: if I can’t write a script to handle the repetitive stuff for me, or can’t automate the embedding of image and/or video resources, then whatever the content is, it’s not open enough in a practical sense for me to do what I want with it.

So here’s an, erm, how can I put this politely, little niggle I have with OpenLearn XML. (For those of you not keeping up, one of the many OpenLearn sites is the OU’s open course materials site; the materials published on the site as course unit contentful HTML pages are also available as structured XML documents. (When I say “structured”, I mean that certain elements of the materials are marked up in a semantically meaningful way; lots of elements aren’t, but we have to start somewhere ;-))

The context is this: following on from my presentation on Making More of Structured Course Materials at the eSTeEM conference last week, I left a chat with Jonathan Fine with the intention of seeing what sorts of secondary product I could easily generate from the OpenLearn content. I’m in the middle of building a scraper and structured content extractor at the moment, grabbing things like learning outcomes, glossary items, references and images, but I almost immediately hit a couple of problems, first with actually locating the OU XML docs, and secondly locating the images…

Getting hold of a machine readable list of OpenLearn units is easy enough via the OpenLearn OPML feed (much easier to work with than the “all units” HTML index page). Units are organised by topic and are listed using the following format:

<outline type="rss" text="Unit content for Water use and the water cycle" htmlUrl="http://openlearn.open.ac.uk/course/view.php?name=S278_12" xmlUrl="http://openlearn.open.ac.uk/rss/file.php/stdfeed/4307/S278_12_rss.xml"/>

URLs of the form http://openlearn.open.ac.uk/course/view.php?name=S278_12 link to a ‘homepage” for each unit, which then links to the first page of actual content, content which is also available in XML form. The content page URLs have the form http://openlearn.open.ac.uk/mod/oucontent/view.php?id=398820&direct=1, where the ID is one-one uniquely mapped to the course name identifier. The XML version of the page can then be accessed by changing direct=1 in the URL to content=1. Only, we don’t know the mapping from course unit name to page id. The easiest way I’ve found of doing that is to load in the RSS feed for each unit and grab the first link URL, which points the first HTML content page view of the unit.

I’ve popped a scraper up on Scraperwiki to build the lookup for XML URLs for OpenLearn units – OpenLearn XML Processor:

The next step in the plan (because I usually do have a plan; it’s hard to play effectively without some sort of direction in mind…) as far as images goes was to grab the figure elements out of the XML documents and generate an image gallery that allows you to search through OpenLearn images by title/caption and/or description, and preview them. Getting the caption and description from the XML is easy enough, but getting the image URLs is not…

Here’s an example of a figure element from an OpenLearn XML document:

<Figure id="fig001">
<Image src="\\DCTM_FSS\content\Teaching and curriculum\Modules\Shared Resources\OpenLearn\S278_5\1.0\s278_5_f001hi.jpg" height="" webthumbnail="false" x_imagesrc="s278_5_f001hi.jpg" x_imagewidth="478" x_imageheight="522"/>
<Caption>Figure 1 The geothermal gradient beneath a continent, showing how temperature increases more rapidly with depth in the lithosphere than it does in the deep mantle.</Caption>
<Alternative>Figure 1</Alternative>
<Description>Figure 1</Description>
</Figure>

So how can we generate that image URL from the resource link in the XML document? The filename is the same, but how can we generate what are presumably contextually relevant path elements: http://openlearn.open.ac.uk/file.php/4178/!via/oucontent/course/476/

If we look at the OpenLearn OPML file that lists all current OpenLearn units, we can find the first identifier in the path to the RSS file:

But I can’t seem to find a crib for the second identifier – 476 – anywhere? Which means I can’t mechanise the creation of links to actually OpenLearn image assets from the XML source. Also note that there are no credits, acknowledgements or license conditions associated with the image contained within the figure description. Which also makes it hard to reuse the image in a legal, rights recognising sense.

[Doh – I can surely just look at URL for an image in an OpenLearn unit RSS feed and pick the path up from there, can’t I? Only I can’t, because the image links in the RSS feeds are: a) relative links, without path information, and b) broken as a result…]

Reusing images on the basis of the OpenLearn XML “sourcecode” document is therefore: NOT OBVIOUSLY POSSIBLE.

What this suggests to me is that if you release “source code” documents, they may actually need some processing in terms of asset resolution that generates publicly resolvable locators to assets if they are encoded within the source code document as “private” assets/non-resolvable identifiers.

Where necessary, acknowledgements/credits are provided in the backmatter using elements of the form:

Whilst OU-XML does support the ability to make a meaningful link to a resource within the XML document, using an element of the form:

<CrossRef idref="fig007">Figure 7</CrossRef>

(which presumably uses the Alternative label as the cross-referenced identifier, although not the figure element id (eg fig007) which is presumably unique within any particular XML document?), this identifier is not used to link the informally stated figure credit back to the uniquely identified figure element?

If the same image asset is used in several course units, there is presumably no way of telling from the element data (or even, necessarily, the credit data?) whether the images are in fact one and the same. That is, we can’t audit the OpenLearn materials in a text mechanised way to see whether or not particular images are reused across two or more OpenLearn units.

Just in passing, it’s maybe also worth noting that in the above case at least, a description for the image is missing. In actual OU course materials, the description element is used to capture a textual description of the image that explicates the image in the context of the surrounding text. This represents a partial fulfilment of accessibility requirements surrounding images and represents, even if not best, at least effective practice.

Where else might content need liberating within OpenLearn content? At the end of the course unit XML documents, in the “backmatter” element, there is often a list of references. References have the form:

Hmmm… no structure there… so how easy would it be to reliably generate a link to an authoritative record for that item? (Note that other records occasionally use presentational markup such as italics (or emphasis) tags to presentationally style certain parts of some references (confusing presentation with semantics…).)

Finally, just a quick note on why I’m blogging this publicly rather than raising it, erm, quietly within the OU. My reasoning is similar to the reasoning we use when we tell students to not be afraid of asking questions, because it’s likely that others will also have the same question… I’m asking a question about the structure of an open educational resource, because I don’t quite understand it; by asking the question in public, it may be the case that others can use the same questioning strategy to review the way they present their materials, so when I find those, I don’t have to ask similar sorts of question again;-)