Category: Infoskills

I don’t know to what extent our current courses equip folk for the future (or even the present), but here’s a typical example of something I needed to figure out a solution to a problem (actually, more of a puzzle) today using the resources I had to hand…

Here’s the puzzle:

I couldn’t remember offhand what this referred to other than “body shape” and “amazon” (I’ve had a cold recently and it seems to have affected my mental indexing a bit…!), and a quick skim of my likely pinboard tags didn’t turn anything up…

Hmmm… I recall looking at a related page, so I should be able to search my browser history. An easy way to do this is using Simon Willison’s datasette app to search the SQLite database that contains my Chrome browser history:

Anyway, back to the question: what did I need to know in order to be able to do that? And where do folk learn that sort of thing, whether or not they are formally taught it. Indeed, if there is a related thing to be formally taught, what it is, at what level, and in what context?

PS I had actually also kept a note of the company and the video in a scribble pad draft post on the Digital Worlds blog where I collect irreality related stiff:

Via @simonw’s rebooted blog, I spotted this – Landsat on AWS: “Landsat 8 data is available for anyone to use via Amazon S3. All Landsat 8 scenes are available from the start of imagery capture. All new Landsat 8 scenes are made available each day, often within hours of production.”

What do things like this mean for research, and teaching?

For research, I’m guessing we’ve gone from a state 20 years ago – no data [widely] available – to 10 years ago – available under license, with a delay and perhaps as periodics snapshots – to now – daily availability. How does this imapct on research, and what sorts of research are possible? And how well suited are legacy workflows and tools to supporting work that can make use of daily updated datasets?

For teaching, the potential is there to do activities around a particular dataset that is current, but this introduces all sorts of issues when trying to write and support the activity (eg we don’t know what specific features the data will turn up in the future). We struggle with this anyway trying to write activities that give students an element of free choice or open-ended exploration where we don’t specifically constrain what they do. Which is perhaps why we tend to be so controlling – there is little opportunity for us to respond to something a student discovers for themselves.

The realtime-ish ness of data means we could engage students with contemporary issues, and perhaps enthuse them about the potential of working with datasets that we can only hint at or provide a grounding for in the course materials. There are also opportunities for introducing students to datasets and workflows that they might be able to use in their workplace, and as such act as a vector for getting new ways of working out of the Academy and out of the tech hinterland that the Academy may be aware of, and into more SMEs (helping SMEs avail themselves of emerging capabilities via OUr students).

At a more practical level, I wonder, if OU academics (research or teaching related) wanted to explore the LandSat 8 data on AWS, would they know how to get started?

What sort of infrastructure, training or support do we need to make this sort of stuff accessible to folk who are interested in exploring it for the first time (other than Jupyter notebooks, RStudio, and Docker of course!;-) ?

Mentioning to a colleague yesterday that the UK Parliamentary library published research briefings and reports on topics of emerging interest, as well as to support legislation, that often provided a handy, informed, and politically neutral overview of a subject area that could make for a useful learning resource, the question was asked whether or not they might have anything on the “internet of things”. The answer is not much, but it got me thinking a bit more about the range of documents and document types produced across Parliament and Government that can be used to educate and inform, as well as contribute to debate.

In other words, to what extent might such documents be used in an educational sense, whether in the sense of providing knowledge and information about a topic, providing a structured review of a topic area and the issues associated with it, raising questions about an issue, or reporting on an analysis of it. (There are also opportunities for learning from some of the better Parliamentary debates, for example in terms of how to structure an argument, or explore the issues associated with an issue, but Hansard is out of scope of this post!)

(Also note that I’m coming at this as a technologist, interested as much in the social processes, concerns and consequences associated with science and technology as much as the deep equations and principles that tend to be be taught as the core of the subject, at least in HE. And that I’m interested not just on how we can support the teaching and learning of current undergrads, but also how we can enculturate them into the availability and use of certain types of resource that are likely to continue being produced into the future, and as such provide a class of resources that will continue to support the learning and education of students once they leave formal education.)

So using IoT as a hook to provide examples, here’s the range of documents I came up with. (At some point it maybe worth tabulating this to properly summarise the sorts of information these reports might came, the communicative point of the document (to inform, persuade, provide evidence for or against something, etc), and any political bias that may be likely (in policy docs, for example).

Parliamentary Library Research Briefings

The Parliamentary Library produces a range of research briefings to cover matters of general interest (Commons Briefing papers, Lords Library notes), perhaps identified through multiple questions asked of the Library by members?, as well as background to legislation (Commons Debate Packs, Lords in Focus), through the Commons and Lords Libraries respectively.

Wider Parliamentary Documents

The Parliament website also supports navigation of topical issues such as Science and Technology, as well as sub-topics, such as Internet and Cybercrime. (I’m not sure how the topics/sub-topics are identified or how the graph is structured… That may be one to ask about when I chat to Parliamentary Library folk next week?:-)

Within the topic areas, relevant Commons and Lords related Library research briefings are listed, as well as
POSTnotes, Select Committee Reports and Early Day Motions.

(By the by, it’s also worth noting that chunks of the Parliament website are currently in scope of a website redesign.)

Government Documents

Along with legislation currently going through Parliament that is published on the Parliament website (along with Hansard reports that record, verbatim(-ish!) proceedings of debates in either House), explanatory notes provided by the Government department bringing a bill provide additional, supposedly more accessible/readable, information around it.

Briefing documents also appear in a variety of other guises. For example, competitions (such as the Centre for Defence Enterprise (CDE) competition on security for the internet of things, or Ofcom’s consultation on More radio spectrum for the Internet of Things) and consultations may both provide examples of how to start asking questions about a particular topic area (questions that may help to develop critical thinking, prompt critical reflection, or even provide ideas for assessment!).

Summary

In providing support for all members of the House, the Parliamentary research services must produce research briefings that can be used by both sides of the House. This may stand in contrast to documents produced by Government that may be influenced by particular policy (and political) objectives, or formal reports published by industry bodies and the big consultancies (the latter often producing reports that are either commissioned on behalf of government or published to try to promote some sort of commercial interest that can be sold to government) that may have a lobbying aim.

As I’ve suggested previously, (News, Courses and Scrutiny and Learning Problems and Consultation Based Curricula), maybe we could/should be making more use of them as part of higher education course readings, not just as a way of getting a quick, NPOV view over a topic area, bus also as a way of introduce students to a form of free and informed content, produced in timely way in response to issues of the day. In short, a source that will continue to remain relevant and current over the coming years, as students (hopefully) become lifelong, continuing learners.

If you select the demo option, a lab context is spawned for you, and provides access to a range of tools: staples, such as RStudio and Jupyter notebooks, a Linux terminal, and several website creation tools: Brackets, Omeka and WordPress (though the latter two didn’t work for me).

There’s also a file browser, which provides a common space for organising – and uploading – your own files. Files created in one application are saved to the shared file area and available for use on other applications.

The applications are being a (demo) password authentication scheme, which makes me wonder if persistent accounts are in the project timeline?

Once inside the application, you have full control over it. If you need additional packages in RStudio, for example, then just install them:

They work, too!

On the Jupyter notebook front, you get access to Python3 and R kernels:

In passing, I notice that RStudio’s RMarkdown now demonstrates some notebook like activity, demonstrating the convergence between document formats such as Rmd (and ipymd) and notebook style UIs [video].

Code for running your own DHBox installation is available on Github (DH-Box/dhbox), though I haven’t had a chance to give it a try yet. One thing it’d be nice to see is a simple tutorial showing how to add in another tool of your own (OpenRefine, for example?) If I get a chance to play with this – and can get it running – I’ll try to see if I can figure out such an example.

It also reminded me that I need to play with my own install of tmpnb, not least because of the claim that “tmpnb can run any Docker container”. Which means I should be able to set up my own tmpRStudio, or tmpOpenRefine environment?

That such combinations are now popping up all over the web makes me think that they’ll be a commodity service anytime soon. I’d be happy to argue this sort of thing could be used to support a “technology enhanced learning environment”, as well as extending naturally into“technology enhanced research environments”, but from what I can tell, TEL means learning analytics and not practical digital tools used to develop digital skills? (You could probably track the hell of of people using such environments if you wanted to, though I still don’t see what benefits are supposed to accrue from such activity?)

It also means I need to start looking out for a new emerging trend to follow, not least because data2text is already being commoditised at the toy/play entry level. And it won’t be VR. (Pound to a penny the Second Life hipster, hypster, shysters will be chasing that. Any VR campuses out there yet?!) I’d like to think we might see inroads being made into AR, but suspect that too will always be niche, outside certain industry and marketing applications. So… hmmm… Allotments… that’s where the action’ll be… and not in a tech sense…

Exciting:-) [UPDATE: note that that image uses an old version of the guacamole packages; I tried updating to the latest versions of the packages but it doesn’t just work so I rolled back. Support for Docker was introduced after the version used in the linuxserver/dockergui, but I don’t fully understand what that support does! Ideally, it’d be nice to run a guacamole container and then use docker-compose to link in the applications you want to expose to it? Is that possible? Anyone got an example of how to do it?]

So I gave it a go with Audacity. The files I used are contained in this gist that should also be embedded at the bottom of this post.

(Because the original linuxserver/dockergui was quite old, I downloaded their source files and built a current one of my own to seed my Audacity container.)

I tried exposing the /nobody folder by adding VOLUME /nobody to the Dockerfile and running a -v "${PWD}/files":/nobody switch, but it seemed to break things which is presumably a permissions thing? There are various user roles settings in the linuxserver/dockergui build files, so making poking around with those would fix things? Otherwise, we might have to see the container directly with any files we want in it?:-(

UPDATE: adding RUN mkdir -p /audacityfiles && adduser nobody root to the Dockerfile along with ./VOLUME /audacityfiles and then adding -v "${PWD}/files":/audacityfiles when I create the container allows me to share files in to the container, but I can’t seem to save to the folder? Nor do variations on the theme, such as s creating a subfolder in the nobody folder and giving the same ownership and permissions as the nobody folder. I can save into the nobody folder though. (Just not share it?)

WORKAROUND: in Kitematic, the Exec button on the container view toolbar takes you into the container shell. From there, you can copy files into the shared direcory. For example: cp /nobody/test.aup /nobody/share/test.aup Moving the share to a folder outside /nobody, eg to /audacityfiles means we can simply compy everything from /nobody to /audacityfiles.

Another niggle is with the sound – one of the reasons I tried the Audacity app… (If we can have the visual of the desktop, we want to try to push for sound too, right?!)

Unfortunately, when I tried to play the audio file I’d created, it wasn’t having any of it:

I suspect trying to fix this is a bit beyond my ken, as too are the sorting out the shared folder permissions, I suspect… (I don’t really do sysadmin – which is why I like the idea of ready-to-run application containers).

UPDATE 3: I pushed an image as psychemedia/audacity2. It works from the command line as:docker run -d -p 8080:8080 -p 3389:3389 -e "TZ=Europe/London" --name AudacityGui -v "${PWD}":/nobody/share psychemedia/audacity2

Anyway – half-way there. If nothing else, we could create and analyse audio files visually in the browser using Audacity, even if we can’t get hold of those audio files or play them!

I’d hope there was a simple permissions fix to get the file sharing to work (anyone? anyone?! ;-) but I suspect the audio bit might be a little bit harder? But if you know of a fix, please let me know:-)

PS I just tried launching psychemedia/audacity2 public Dockerhub image via Docker Cloud, and it seemed to work…

With its focus on enterprise use, it’s probably with good reason that the Docker folk aren’t that interested in exploring the role that Docker may have to play as a technology that supports the execution of desktop applications, or at least, applications for desktop users. (The lack of significant love for Kitematic seems to be representative of that.)

But I think that’s a shame; because for educational and scientific/research applications, docker can be quite handy as a way of packaging software that ultimately presents itself using a browser based user interface delivered over http, as I’ve demonstrated previously in the context of Jupyter notebooks, OpenRefine, RStudio, R Shiny apps, linked applications and so on.

I’ve also shown how we can use Docker containers to package applications that offer machine services via an http endpoint, such as Apache Tika.

I think this latter use case shows how we can start to imagine things like a “digital humanities application shelf” in a digital library (fragmentary thoughts on this), that allows users to take either an image of the application off the shelf (where an image is a thing that lets you fire up a pristine instance of the application), or a running instance of the application of the shelf. (Furthermore, the application can be run locally, on your own desktop computer, or in the cloud, for example, using something like a mybinder like service). The user can then use the application directly (if it has a browser based UI), or call on it from elsewhere (eg in the case of Apache Tika). Once they’re done, they can keep a copy of whatever files they were working with and destroy their running version of the application. If they need the application again, they can just pull a new copy (of the latest version of the app, or the version they used previously) and fire up a new instance of it.

But the question then was how to access the tools? The tools themselves are commandline apps, so the first thing we want to do is to be able to call into the container to run the command. A handy post by Mike English entitled Distributing Command Line Tools with Docker shows how to do this, so that’s all good then…

The next step is to consider how to retain copies of the files created by the command line apps, or pass files to the apps for processing. If we have a target host directory and mount it into the container as as a shared volume, we can keep the files on our desktop or allow the container to create files into the host directory. Then they’ll be accessible to us all the time, even if we destroy the container.

The gist that should be embedded below shows the Dockerfile and a simple batch file passes the Contentmine tool commands into the container which then executes them. The batch file idea could be further extended to produce a set of command shortcuts that essentially alias the Contentmine commands (eg a ./getpapers command rather than a ./contentmine getpapers command, or that combine the various steps associated with a particular pipeline or workflow – getpapers/norma/cmine, for example – into a single command.

UPDATE: the CenturyLinkLabs DRAY docker pipeline looks interesting in this respect for sequencing a set of docker containers and passing the output of one as the input to the next.

If there are other folk out there looking at using Docker specifically for self-managed “pull your own container” individual desktop/user applications, rather than as a devops solution for deploying services at scale, I’d love to chat…:-)

PS for several other examples of using Docker for desktop apps, including accessing GUI based apps using X WIndows / X11, see Jessie Frazelle’s post Docker Containers on the Desktop.

Via Tanya Elias/eliast05, a query regarding tools for harvesting historical tweets. I haven’t been keeping track of Twitter related tools over the last few years, so my first thought is often “could Martin Hawksey’s TAGSexplorer do it?“!

But I’ve also had a the twecoll Python/command line package on my ‘to play with’ list for a while, so I though I’d give it a spin. Note that the code requires python to be installed (which it will be, by default, on a Mac).

On the command line, something like the following should be enough to get you up and running if you’re on a Mac (run the commands in a Terminal, available from the Utilities folder in the Applications folder). If wget is not available, download the twecoll file to the twitterstuff directory, and save it as twecoll (no suffix).

Running the code the first time prompts you for some Twitter API credentials (follow the guidance on the twecoll homepage), but this only needs doing once.

Testing the app, it works – tweets are saved as a text file in the current directory with an appropriate filename and suffix .twt – BUT the search doesn’t go back very far in time. (Is the Twitter search API crippled then…?)

Looking around for an alternative, the GetOldTweets python script, which again can be run from the command line; download the zip file from Github, move it into the twitterstuff directory, and unzip it. On the command line (if you’re still in the twitterstuff directory, run:

ls

to check the name of the folder (something like GetOldTweets-python-master) and then cd into it:

cd GetOldTweets-python-master/

to move into the unzipped folder.

Note that I found I had to install pyquery to get the script to run; on the command line, run: easy_install pyquery.

This script does not require credentials – instead it scrapes the Twitter web search. Data limits for the search can be set explicitly.

Tweets are saved into the file output_got.csv and are semicolon delimited.

A couple of things I noticed with this script: it’s slow (because it “scrolls” through pages and pages of Twitter search results, which only have a small number of results on each) and on occasion seems to hang (I’m not sure if it gets stuck in an infinite loop; on a couple of occasions I used ctrl-z to break out). In such a case, it doesn’t currently save results as you go along, so you have nothing; reduce the --maxtweets value, and try again. On occasion, when running the script under the default Mac python 2.7, I noticed that there may be encoding issues in tweets which break the output, so again the file can’t get written,

Both packages run from the command line, or can be scripted from a Python programme (though I didn’t try that). If the GetOldTweets-python package can be tightened up a bit (eg in respect of UTF-8/tweet encoding issues, which are often a bugbear in Python 2.7), it looks like it could be a handy little tool. And for collecting stuff via the API (which requires authentication), rather than by scraping web results from advanced search queries, twecoll looks as if it could be quite handy too.