Posts Tagged ‘coursedata’

A ten minute post, not thought through, just collecting together some thoughts arising from a Twitter exchange with @jmahoney127, @jacksonj04, @cgutteridge and @mhawksey that followed from picking up on a comment from Alan Paull re: the latest demo OU XCRI feed that I don’t want to forget in the midst of doing other unrelated stuff today… (Additional fragmentary thoughts in that comment thread.)

Looking at the latest OU XCRI-CAP 1.2 feed, it seems to be a little bit richer than last time I looked:

Here’s a snapshot of the University of Lincoln feed:

This represents a partial view of data available via the ONCourse API, I think?

Which leads to a couple of questions:

1) is all the data required to publish an XCRI-CAP1.2 feed available via the ONCourse API? That is, could I generate the XCRI feed from calls to the API?

2) what is available via the API that isn’t in the XCRI feed?

3) could we create an API over the XCRI feed data that resembles (in part) the ONCourse API? That is, for an institution that has an XCRI-CAP 1.2 feed but no additional #coursedata development resources, could we create a simple JSON API layer that offers at least some of the ONCourse API functionality? And would the functionality/data that such an API could make available actually be rich enough or complete enough to actually do anything useful with?

Note: one of the advantages of the XCRI-CAP1.2 feed is that it can be used as a bulk transport format, a convenient source for a crude database of all the current (or future?) course offerings provided by an institution. It can also be though of as an XML database of an instition’s course offerings.

One of the challenges that needs to be addressed when developing data driven applications is how to get data from a data base to the point of use. Another relates to time-relevance. For example, in one document you might want to look at figures or data relating to a specific, fixed date or period (March 2012, for example, or the academic years 2012-13). At other times, you might want current data: values of some indicator over the last 6 weeks for example. Where live data feeds are available, you might want them to be displayed on a dashboard. In other circumstances, you might want a more traditional printed report or Powerpoint presentation, but one that contains absolutely up to the minute information. Ideally, the report or presentation would be “self-updating” so that each time you printed it off, it contained the latest data values.

One common user environment for data related activities is a spreadsheet. When developing an API that produces data that might be usefully manipulated within a spreadsheet context, it can be useful to provide a set of connectors that allow data to be pulled directly from the API and inserted into the spreadsheet.

Here’s a quick example of how we might start to pull data into a Google Spreadsheets context from the University of Lincoln course data API. The example calls broadly reproduce those described in Getting ONCourse With Course Data.

Although the new script management tools in the Google Apps environment confuse the process of defining spreadsheet specific scripts, at its heart defining custom spreadsheet functions for Google Spreadsheets is a trivial exercise: create a custom function (for example, myFunction(val)), pass in a value (such as a cell value) and return a value or array. You can can call the function using a formula, =myFunction(A3), for example. If a single value is returned, this will be returned into the calling cell. If a one-dimensional list is returned, it will add alues to the current row. If a two dimensional array is returned, it will populate rows and columns.

Here are a few helper functions for calling the ONCourse data into a Google Spreadsheet:

One thing I realised about the API from playing with this recipe was that it is defined very much as a little-l, little-d “linked data” API that works well for browsing the data. The well defined URIset and use of internal unique identifiers make it easy to traverse the data space once you are in it. However, it’s not immediately obvious to me how I could either search my way into the data, or access it via natural identifiers such as programme codes or module codes.

For example, to print the programmes associated with a module, I might use the above formula =printProgsFromModuleID(97), but the identifier I need to pass in (97) is an internal, machine generated ID. This is all well and good if you are working within the machine-ID space, but this is not the human user space. For that, it would be slightly more natural to make use of a module code, and call something like printProgsFromModuleCode('FRS2002M'). There are issues with this approach of course, with module codes being carried over from presentation to presentation, and breaking a simple bijective (one-one onto) relationship between internal module IDs and module codes. As a result, we might need to further qualify these calls with a presentation year (or by default assume the presentation from the current academic year), or whatever other arguments are required to recapture the bijectivity.

PS in passing, here’s another view over modules by programme, broken down into core and option modules.

There are a few problems with this view – for example, the levels need ordering properly; where there are no core or no optional modules the display is not as good as it could bel the label sizing is a bit small; there is no information relating to pre-requisites for optional modules – but it’s a start, and it’s reasonably clean to look at.

It should also be easy enough to tweak the data generator script to allow us to use the same display script to show assessment types for each module in a programme, as demonstrated using enclosure charts in VIsually Revealing Gaps in ONCourse Data , and maybe even learning outcomes too.

If I get a chance, I’ll look through the sorts of thing requested in the ONCourse Focus Group – 14th March 2012 and try to pull out some views that match some of the requested ones. What would also be interesting would be to have a list of use cases from people who work with the data, too…

One of the things that I never really, fully, truly appreciated until I saw Martin Hawksey mention it (I think in an off the cuff comment) at a JISC conference session a year or two ago (I think?!) was how we can use visualisations to quickly spot gaps, or errors in a data set.

Following on from my previous University of Lincoln course data explorations (see Getting ONCourse With Course Data and Exploring ONCourse Data a Little More… for more details), I have another little play, this time casting the output of course data around a programme into a hierarchical tree format that can be used to feed visualisations in d3.js or Gephi (code; call the data using URLs rooted on https://views.scraperwiki.com/run/uni_of_lincoln_oncourse_tree_json/ Parameters include: progID (programme ID using the n2 ID for a programme); full=true|contact|assessment (true generates a size attribute for a module relative to the points associated with the module; assessment breaks out assessment components for each module in the programme; contact breaks out contact hours for each module; format=gexf|json gives the output in GEXF format or d3.js tree JSON format. Note: it should be easy enough to tweak the code to accept a moduleID and just display a view for the assessment of contact time breakdown for the module, or a programme ID and level and just display the contact or assessment details for the modules in a particular level, or further core/option attribute to only display core or optional modules etc.).

I also did a quick demo of how to view the data using a d3.js enclosure layout. For example, here’s a peek at how core and option modules are distributed in a programme, by level, with the contact times broken out:

The outermost bubble is a programme, the next largest bubbles represent level (Level 1, Level 2, etc corresponding to first year, or second year of an undergraduate degree programme etc). The structure then groups modules within a level as either core or option (not shown here? All modules are core for this programme, I think…?) and then for each module breaks out the contact time in hours (which proportionally set the size of contact circles) and the contact type.

The labelling is lacking in a flat image view, (if you hover over elements you get tooltips popping up, that identify module codes, whether a set of modules are core or option, the level of a set of modules etc., but in places the layout is hit and miss; for example, in the above example, one module must have 100% lectures, so we don’t see a module bubble around it. I maybe need to see each module bubble with a dummy/null bubble of size 0, if that’s possible, to force a contact bubble to be visibly enclosed by a module bubble?)

Here’s another example, this time breaking out assessment types:

Hovering over these two images (click through them to see the live version), I noticed that form this programme, there don’t appear to be any optional modules, which may or may not be “interesting”?

Looking at the first, contact time displaying image, we also notice that several modules do not have any contacts broken out. Going back to the data, the corresponding element for these modules is actually an empty list, suggesting the data is not available. What these views give us, then, is a quick way exploring not only how a programme is structured, but which modules are lacking data. This is something that the treemaps in Exploring ONCourse Data a Little More… did not show – they displayed contact or assessment breakdowns across a course only for modules where that data was available, which could be misleading. (Note that I should try to force the display of a small empty circle showing the lack of core or option modules if a programme has no modules in that category?)

Something else that’s lacking with the visualisations is labeling regarding which programme is being displayed, etc. But it should also be noted that my intention here is not to generate end user reports, it’s to explore what sorts of views over the course data we might get using off the shelf visualisation components, (testing the API along the way), how this might help us conceptualise the sorts of structures, stories and insights that might be locked up in the data and how visual approaches might help us open up new questions to ask over, or more informative reports to drawn down from, the API served data itself.

Anyway, enough for now. The last thing I wanted to try was pulling some of the API called data into an R environment and demonstrate the generation of a handful of reports in that context, but I’ve run out of time just now…

In particular, I started looking at the programme level to see how we might be able to represent the modules contained within it. Modules are associated with individual programmes, a core attribute specifying whether a module is “core” (presumably that means required).

I thought it might be interesting to try to pull out a “core module” profile for each programme. Looking at the data available for each module, there were a couple of things that immediately jumped out at me as being “profileable” (sp?!) – assessment data (assessment method, weighting, whether it was group work or not):

and contact times (that is, hours of contact and contact type):

I had some code kicking around for doing treemap views so I did a couple of quick demos [code]. First up, a view over the assessment strategy for core modules in a particular programme:

(I really need to make explicit the programme…)

Blocks are coloured according to level and sized according to product of points value for the course and assessment type weighting.

We can also zoom in:

I took the decision to use colour to represent level, but it would also be possible to pull the level out into the hierarchy used to configure the tree (for example having Level 1, Level 2, Level 3 groupings at the top?) I initially used the weighting to set the block size, but then tweaked it to show the product of the weighting and the number of credit points for the module, so for example a 60 point module with exam weighting 50% would have size 60*0.5 = 30, whereas a 15 point module with exam weighting 80% would have size 15*0.8 = 12.

Note that there are other ways we could present these hierarchies. For example, another view might structure the tree as: programme – level – module – assessment type. Different arrangements can tell different stories, be differently meaningful to different audiences, and be useable in different ways… Part of the art comes from picking the view that is most appropriate for addressing a particular problem, question or intention.

Here’s an example of view over the contact hours associated with core modules in the same programme:

(Note that I didn’t make use of the group work attribute which should probably also be added in to the mix?).

Looking at different programmes, we can spot different sorts of profile. Note that there is a lot wrong with these visualisations, but I think they do act as a starting point for helping us think about what sorts of view we might be able to start pulling out of the data now it’s available. For example, how are programmes balanced in terms of assessment or contact over their core modules? One thing developing the above charts got me thinking about was how to step up a level to allow comparison of core module assessment and contact profiles across programmes leading to a particular qualification, or across schools? (I suspect a treemap is not the answer!)

It’s also worth noting that different sorts of view might be appropriate for different sorts of “customer”: potential student choosing a programme, student on a programme, programme manager, programme quality committee, and so on.

And it’s also worth noting that different visualisation types might give a more informative view over the same data structure. On my to do list is to have a play with viewing the data used above in some sort of circular enclosure diagram (or whatever that chart type is called!) for example. (See the next post in this series, which amongst other things made me realise how partial/fragmentary the data displayed in the above treemaps actually is…)

Having had a play, a couple more observations came to mind about the API. Firstly, it could be useful to annotate modules with a numerical (integer) attribute relating to a standardised educational level, such as the FHEQ levels. (At the moment, modules are given level descriptors along the lines of “Level 3″, relating to a third year course, akin to FHEQ level 6). Secondly, relating to assessment, it might be useful (down the line) to know how the grade achieved in a module at a particular level contributes to the final grade achieved at the end of the programme.

In a post a couple of weeks ago – COuRsE Data – I highlighted several example questions that might usefully be asked of course data, thinks like “which modules are associated with any particular qualification?” or “which modules deliver which qualification level learning outcomes?”.

As the University of Lincoln ONCourse project comes to an end [disclosure: I am contracted on the project to do some evaluation], I thought it might be interesting to explore the API that’s been produced to see just what sorts of question we might be able to ask, out of the can.

Note that I don’t have privileged access to the database (though I could possibly request read access, or maybe even a copy of the database, or at least its schema), but that’s not how I tend to work. I play with things that are in principle publicly available, ideally things via openly published URLs and without authentication (no VPN); using URL parameter keys is about as locked down as I can usually cope with;-)

Looking at the record for an actual module gives us a wealth of data, including: the module code, level and number of credit points; a synopsis and marketing synopsis; an outline syllabus (unstructured text, which could cause layout problems?); the learning and teaching strategy and assessment strategy; a set of module links to programmes the module is associated with, along with core programme data; a breakdown of contact time and assessments; and prerequisites, co-requisites and excluded combinations.

There does appear to be some redundancy in the data provided for a module, though this is not necessarily a bad thing in pragmatic terms (it can make like convenient). For example, the top level of a modules/id record looks like this:

and lower down the record, in the /assessments element we get duplication of the data:

As something of a tinkerer, this sort of thing works for me – I can grab the /assessments object out of a modules/id result and pass it around with the corresponding module data neatly bundled up too. But a puritan might take issue with the repetition…

Out of the box then, I can already start to write queries on a module if I have the module ID. I’ll give some code snippets of example Python routines as I play my way through the API…

One thing I’m not sure about is the way in to a moduleID in the first instance? For example, I’d like to be able to straightforwardly be able to get the IDs for any upcoming presentations of FRS2002M in the current academic year?

So what else might we be able to do around a module? How about check out its learning outcomes? My first thought was the learning outcomes might be available from the /modules data (i.e. actual module presentations), but they don’t seem to be there. Next thought was to look for them in the abstracted module definition (from the module_codes), but again: no. Hmmm… Assessments are associated with a module, so maybe that’s where the learning outcomes come in (as subservient to assessment rather than a module?)

The assessment record in the module data looks like this:

And if we click to to an assessment record we get something like this, which does include Learning Outcomes:

Philosophically, I’m not sure about this? I know that assessment is supposed to be tied back to LOs, and quality assurance around a course and its assessment is typically a driver for the use of LOs. But if we can only find the learning outcomes associated with a module via its assessment..? Hmmm… Data modeling is often fraught with philosophical problems, and I think is is one such case?

Anyway, how might we report on the learning outcomes associated with a particular module presentation? Here’s one possible way:

We can also query the API to get a list of learning outcomes directly, and this turns up a list of assessments as well as the single (?) module associated with the learning outcome. Does this mean that a particular learning outcome can’t be associated with two modules? I’m not sure that’s right? Presumably, database access would allow us to query learning outcomes by moduleID?

In the next post, I’ll a peak at some of the subject keywords, and maybe have a play at walking through the data in order to generate some graphs (eg maybe along the lines of what Jamie has tried out previously: In Search of Similar Courses). There;s also programme level outcomes, so it might be interesting trying do do some sort of comparison between those and the learning outcomes associated with assessment in modules on a programme. And all this, of course, without (too much) privileged access…

PS Sod’s Law from Lincoln’s side means that even though I only touched a fragment of the data, I turned up some errors. So for example, in the documentation on bitbucket the award.md “limit” var was class as a “bool” rather than an “int” (which suggests a careful eye maybe needs casting over all the documentation. Or is that supposed to be my job?! Erm…;-) In programme/id=961 there are a couple of typos: programme_title: “Criminology and Forensic Invesitgation”, course_title: “Criminology and Forensic Invesitgation Bachelor of Science with Honours (BSc (Hons)) 2011-12″. Doing a bit of text mining on the data and pulling out unique words can help spot some typos, though in the above case Invesitgation would appear at least twice. Hmmm…

Looking at some of the API calls, it would be generally useful to have something along the lines of offset=N to skip the first N results, as well as returning “total-Results=” in the response. Some APIs provide helper data along the lines of “next:” where eg next=offset+num_results that can be plugged in as the offset in the next call if you want to roll your own paging. (This may be in the API, but I didn’t spot it?). When scripting this, care just needs be taken to check that a fence post error doesn’t sneak in.

Okay – enough for now. Time to go and play in the snow before it all melts…

One of the things that has never really been clear to me is what it is that universities think they sell and what students think they are “buying”. (OU modules have always(?) had a price tag associated with them, although large amounts of financial support has also traditionally been available). One partial view might focus on one of the more tangible exchanges that are evident when taking a university degree, specifically the modules taken as part of a qualification programme, and the way they are bundled, organised and presented to students. Curriculum innovation works at both the level of keeping these modules up to date, as well as introducing new modules (and potentially new degree programmes, either as new aggregations of, and pathways through, collection of modules).

If we think of universities as organisations in the business of selling, at least in part, structured collections of course modules*, then we might speculate around the processes that are used to come up with new collections that are desirable to fee-paying students (and consequently, employers).

(* I know, I know – we might also think of the cost centre services that go along with course delivery as part of the package, the assessment, the facilities, the pastoral care, the structured academic content; or the “payoff” in terms of improved employability, or higher lifetime earnings. But when I buy a bar of chocolate, I don’t see it as covering the factory automation, raw ingredients, logistics or supply chain costs, nor am I buying in to delight or gluttony. I’m buying a bar of chocolate. I’m also not saying that the courses are necessarily the thing students are buying, it’s just one particular lens we can use to see whether it makes storytelling sense to view the system in that way…)

In part, programmes of study leading to named qualifications in particular subject or topic areas are influenced by the QAA benchmark statements:

Subject benchmark statements set out expectations about standards of degrees in a range of subject areas.
…
Subject benchmark statements do not represent a national curriculum in a subject area. Rather, they allow for flexibility and innovation in programme design within an overall conceptual framework established by an academic subject community. They are intended to assist those involved in programme design, delivery and review and may also be of interest to prospective students and employers, seeking information about the nature and standards of awards in a subject area.

The proposal will need to demonstrate that a new or revised statement would provide the
benefits of a wider understanding about the scope and nature of the subject and the academic
standards underpinning it. This could be desirable for one or more of the following reasons.
• The subject is growing and more degree programmes are being provided in it
• A degree in the subject may be required for entry into a profession, but there are no explicit
academic standards associated with the subject for this purpose. There may also be a lack of
understanding within the relevant profession of what level of attainment can be expected
of a graduate in the subject, or of its appropriateness for entry into the profession
• The prospective benefits of agreed and explicit standards in the relevant subject have been
highlighted by, for example, external examiners and validating boards, higher education
providers, subject groups, or stakeholder organisations.

Unpicking the course module view a little further, modules are typically associated with notional academic credit points, which are awarded “when you have shown, through assessment, that you have successfully completed a module or a programme by meeting the specific set of learning outcomes for that module or programme” (Academic credit in higher education in England – an introduction; see also QAA – Academic Credit). Note that credit points do not reflect how well you passed the assessment, just that you achieved at least the minimum standard required. Credit points themselves relate to two considerations: “[t]he credit value indicates both the amount of learning expected (the number of credits) and its depth, complexity and intellectual demand (the credit level).” The “amount of learning” is captured by the “notional hours of
learning” spent on the subject within the module. The level is based on level descriptors that “are used to help work out the level of learning in individual modules.”

Credit level descriptors are guides that help identify the relative demand, complexity and depth of learning, and learner autonomy expected at each level, and also indicate the differences between the levels.
They are general descriptions of the learning involved at a particular level; they are not specific requirements of what must be covered in a particular module, unit or programme.

So to recap – modules are designed in order to deliver a set of learning outcomes (that include subject or topic specific learning outcomes as well as more general skills) that can be acquired in a notional amount of time and that are assessed at a particular academic level in exchange for a academic credit.

Qualifications are then awarded based on credit awarded in programmes of study, such as undergraduate or postgraduate degrees. Qualifications typically require the demonstration of some sort of progression through credit levels within a subject area, specify a range of qualification level learning outcomes that need to be delivered within the context of the programme as a whole, and may also require students to demonstrate aptitude across a range of assessment styles (or alternatively, offer a range of assessment styles so as not to disadvantage students who struggle with a particular style of assessment).

Whilst “traditional” universities typically offered named degree programmes in specific areas, the Open University originally offered an Open Degree (which is still available), in which students were free to choose whatever modules (then referred to as OU courses) they wanted, subject to certain requirements on the number of courses taken at each credit level (akin to each year of a traditional university degree; for more on credit points and credit equivalence. Whilst course choice was free, many students followed the same common pathways through courses to come out with degrees that were, essentially subject degrees. In recent years, the OU has moved increasingly towards the award of named degrees, where students are required to take particular modules. Indeed, it is increasingly difficult to find the individual modules that students originally “bought” on the OU website – the emphasis now is on selling qualification level credit bundles, rather than module level credit points.

But how do universities decide what modules to offer? And how does curriculum innovation work? A bottom-up approach might be to refresh modules within a qualification, and then create new qualifications by rebundling sets of modules that together define some sort of coherent whole (this is how ‘as-if’ subject degrees were self-assembled by OU students in the Open Degree). A top-down approach might be to come up with an idea for a degree programme, and then commission modules to deliver that programme of study. Alternatively, we might look to mass-dynamics in a free choice system, such as an open degree, and come up with a middle-out(?) approach that suggests programmes of study that formalise the module collections freely chosen by students interested in studying a particular set of topics that make sense to them.

(It is interesting to note that possibly uniquely within UK Higher Education, The Open University had the scale of numbers in undergraduate students to start to say interesting things about the way students selected courses under the Open degree model. Furthermore, as popularity in “Big Data” solutions and recommendations driven by crowd-behaviour becomes commonplace, so the OU is reducing the amount of personalisation possible by pushing hard-coded, predefined pathways. At the same time, institutions such as Southampton University seem to be looking to open up personalisation pathways (for example, Southampton Curriculum innovation, discussed here: Graduates for the 21st Century – Curriculum innovation [audio]) and standalone HE level courses are increasingly available, sans credit, (as marketing warez; but for what exactly?) via the various open online course platforms.

So now we’re at the point where I actually wanted to start this post… How do we go about the process of curriculum innovation (for example, OECD Education Working Papers No. 82 – Bringing About Curriculum Innovations) given that we already have a load of inventory? If we sell credit points in particular subject areas or topics, how do we decide what topics to cover and how do we bundle those points up into qualifications?

One place to start might be mapping out where we are at the current time, which is where course data comes in. For example, what does the interest map based on learning outcomes delivered by your university actually look like? Or if you work in HE, do you know (or can you readily find out):

which modules are associated with any particular qualification?

which qualifications are associated with any particular module, either as a required or optional component?

which modules have path dependencies (eg where one module is the pre-requisite of another, or modules are excluded combinations)?

which module are required in which pathways, and which are optional?

in free choice modules (that maybe span programmes), which modules tend to be taken together?

which modules deliver which qualification level learning outcomes?

which modules deliver which sort of assessment types?

are there any modules that already offer particular learning outcomes at a particular level?

To provide a little more context, imagine these scenarios:

Module X is tired and needs to be replaced – what qualifications or other modules might be affected as a result? For example, does the module uniquely cover a a particular qualification level learning outcome, or assessment type?

A new qualification is proposed with a particular set of learning outcomes – what modules are available that already deliver some (or all) of these learning outcomes?

The quality folk want to know how your programme demonstrates progression across credit levels with respect to a particular set of subject related learning outcomes. Can you easily map this out?

The quality folk also want to know whether a particular course is gameable in terms of assessment types covered by the course modules. Could a student select a set of modules that means they never have to do teamwork, project work/report, a presentation, an exam, etc?

You need to generate a set of course transcripts (sets of learning outcomes, by credit level) for a proposed new assemblage of outstanding modules, some of which are core/compulsory modules, some of which are optional. Can you do it?

Do you have the scaffolding data available to build course recommenders based on population flows and module selections of previous cohorts of student?

Which modules deliver the content that potential students think they want to study, eg when searching your online course prospectus (you do use search logs for situational awareness around what potential students are searching for, don’t you?!)

So – how well do you fare?

[Note: this post is inspired by personal reflections around the University of Lincoln ON Course Course data project, on which I have, via the OU, a small consultancy retainer, and of which: more later.]

What this means is that if web pages are appropriately marked up, they can be sorted, filterd or ranked accordingly when returned as a search result in a Google CSE. So for example, if course pages were marked up with academic level, start date, NSS satisfaction score, or price, they could be sorted along those lines.

So how do pages need to be marked up in order to benefit from this feature? There are several ways:

As PageMap data added to a sitemap, or webpage. PageMap data also allows for the definition of actions, such as “Download”, that can be emphasised as such within a custom search result. (Facebook is similarly going down the path of trying to encourage developers to use verb driven, action related semantics (Facebook Actions))

I wonder about the extent to which JISC’s current course data programme of activities could be used to encourage institutions to explore the publication of some of their course data in this way? For example, might it be possible to transform XCRI feeds such as the Open University XCRI feed, into PageMap annotated sitemaps?

Something like a tweaked Course Detective CSE could then act as a quick demonstrator of what benefits can be immediately realised? So for example, from the Google CSE documentation on Filtering and sorting search results (I have to admit I haven’t played with any of this yet…), it seems that as well as filtering results by attribute, it’s also possible to use them to filter and rank (or at least, bias) results:

bias by attribute: for example, boost results according to student satsifaction levels;

restrict to range: for example, only show courses within a specified range of SCQF levels.

Not to self: have a rummage around the XCRI data definitions/vocabularies resources… I also wonder if there is a mapping of XCRI elements onto simple attribute names that could be used to populate eg meta tag or PageMap name attributes?