The previous post detailed (without going into code) how I imported the Tate’s metadata into a Neo4j database, and that is still a work in progress. And the Tate metadata is only the starting point. I plan on adding many more node types as I develop an overall social media platform with elements of crowdsourcing and gamification. However, there is already some highly-connected data to play around with. While most of the queries I can run on this data using Neo4j’s Cypher language could also be done using SQL, I can still do some interesting discovery in a very visual way – a way which SQL doesn’t lend itself to.

Example 1: Shortest paths between the artists William Johnstone and John, Augustus, OM (not sure how to rearrange that particular name).

What’s returned in visual format by the Neo4j browser is the following graph (you really need to click on the image to expand it):

The code snippet allshortestPaths((a1)-[*]-(a2)) retrieves all paths with a length equal to the length of the shortest path. In this case there are actually 570 possible paths with a length of 4 involving 82 separate nodes and 267 relationships.

Visually, we can immediately see what links the two artists. That they might have something like paper, oil paint and canvas in common is not a surprise or particularly useful, but that they share hope, cloud, tree and hill might be interesting.

The Cypher query is pretty dynamic. I don’t need to specify the types of nodes (using their labels) in between the two artists. With SQL, I would need to specify all the tables I wanted to join in the query.

It would be entirely possible to do this one in SQL in a relational database using multiple joins. It would likely be slower to run, though, and would lack the visual element of the CypherQL query. Just look at the MATCH criteria:

The (a) in the middle are the artists we want to match. We go in 2 directions from the artist: to the place of birth and to the subject via the artworks. SQL isn’t as semantically rich, lacking the arrows / connectors of CypherQL.

Example 3: Some recommendations.

Though the social aspect of the platform has yet to be developed, and it’s from this that much of the recommendation engine will be based on, we can make some initial additions to the artists and artworks recommendation engine. For example, if we were viewing an artwork featuring a bridge (let’s go with “Oriana”), we may be interested in other artworks featuring a bridge.

There are, however, 3,742 other artworks featuring a bridge, so you can see that the algorithm is a bit simplistic so far.

We could increase the connectedness of the current artwork and recommended artworks. For example, what about ones that share the same movements? We can extend the previous example (i.e. we are currently viewing the artwork Oriana which features a bridge).

In the previous post about importing the Tate’s open metadata into a Neo4j graph database, I mentioned that I was using Python. I quickly discovered a major performance issue when using Python to batch import into Neo4j. The main issue is that to use Python, I had to run the Neo4j database as a server, requiring REST-based access. This introduced a huge overhead and it crawled along processing a handful of artworks per minute. With nearly 70,000 artworks, it might have taken days for the import to run. Rather than spend too much time investigating whether this approach could be optimised to the point it would be usable, I decided to go with the obvious choice of development language for Neo4j – Java.

Neo4j is a Java-based database. One doesn’t always associated the words Java and performance in the same sentence. I mean, isn’t that the realm of C and C++ (e.g. Redis is C-based). But in Neo4j’s case, Java doesn’t seem to hold it back. There is no method of access to Neo4j that can be as fast as when using the Java libraries. However, while Neo4j is lightning fast in terms of retrieving nodes (graph traversal), there is a big overhead when writing data. Unlike a relational database, where the links between tables are logical using foreign keys and so writing data is a relatively inexpensive operation, the links between nodes in Neo4j are physical ones requiring their own data structures to be stored and result in generally a higher write latency. On top of that, Neo4j is transactional. This adds more of an overhead when writing. For a batch process, processing around 100,000 nodes and up to a million or more relationships, this is a non-runner.

The solution is Neo4j’s BatchInserter. This effectively bypasses Neo4j’s database management system and all its transaction handling and goes directly to the files on the disk. It is orders of magnitude faster when writing. However, because I was creating relationships between nodes, I needed a way to store the physical node ids for later lookup. The basic algorithm of my Tate batch importer is as follows:

For each artist read from its JSON file: Add the artist node For each movement within the artist: If the movement node was not already added, add the new movement node Link the movement node to the artist End For If the artist's place of birth was not already added, add the new place node Link the place of birth node to the artistEnd ForFor each artwork read from its JSON file: Add the artwork node Connect the artwork to the artist If the catalogue group was not already added, add the new catalogue group Link the artwork to the catalogue group For each movements within the artwork: If the movement node was not already added, add the new movement node Link the artwork to the movement End For For each subject within the artwork (having traversed from top level to second level to third level): If the subject node was not already added, add the new subject node (add as a person node if a "named individual") Link the artwork to the subject End For If the classification was not already added, add the new classification node Link the artwork to the classification node Parse the medium to its elements based on separators (commas, spaces, and, on) For each medium element: If the medium node was not already added, add the new medium node Link the artwork to the medium End ForEnd For

The main issue with the algorithm is the frequent lookups to see if a node was already added. There are currently close to a million of these lookups. Because I am using Neo4j’s BatchInserter, access to a facility to do a search for a node is not available. However, what the BatchInserter does provide is a physical node id for each added node. The solution is to store the ids from in the JSON file (and each artist, artwork, movement, subject, has an id, others such as medium and classification will just have the name as a key) along with the physical node ids generated by the BatchInserter in some kind of lookup store. The solution I chose was Redis, which is a very fast key-value store. I built the latest version from source code and ran it on the local host. It’s a very simple storage mechanism – I just prefix each of my keys with something like “artist:” or “movement:” and store the key and the value in the one giant lookup table. For example, using redis-cli to issue a commad to lookup the physical node for the artist with the id of 1234:

127.0.0.1:6379> get "artist:1234"
"4235"

While it is possible to run my importer on a Windows-based machine (e.g. using vagrant to host the redis service within a headless Ubuntu instance), the best option will be a unix-based machine. For example, using the vagrant solution on Windows [Core i3 laptop with 12GB RAM and a SSD drive], the process took about 23 minutes, which is more than tolerable. [Update: I installed Microsoft’s Windows-compiled version of Redis, which reduced overall time to about 8 minutes] However, the same process took about 3 minutes on a similarly spec’ed Linux Mint PC [Core i3 with 8GB RAM]. [Note: on a 7-year-old Linux laptop with 4GB RAM it took 10 minutes].

3 minutes to read through about 73,000 JSON files, map them to Java objects (Using Jackson), create 93,631 nodes and 719,766 relationships (edges) in Neo4j, using Redis to store 24,766 key-value pairs with almost 1.2 million key lookups. Doesn’t sound like too many bottlenecks there!

When I write graphing, I am not referring to graphing of mathematical functions or the production of fancy bar charts, though there is a highly visual component to it. I mean the creation of a graph database. The graph data model is one of the more niche NOSQL data models and is suited to the querying of highly-related / highly-connected data. By creating a graph of cultural heritage metadata (for artworks, museum objects, etc), the hope is that either connections between ‘entities’ can be discovered, or they can be discovered more easily or more quickly than before.

While the most obvious relationship between entities in the Tate metadata is the Artist-ContributesTo->Artwork (in the Tate metadata, there can be multiple contributors to an artwork), many additional entities can be conceived and connected. Examples include medium (paper, graphite, watercolour, etc), movement (Young British Artists, Pre-Raphaelite Brotherhood, etc), subject (house, tree, man, etc) and others. Artists can be related to movements, artworks can be related to mediums (I use mediums rather than media to avoid confusing with, well, the media) and subjects.

The following is a subset of the graph showing how several artworks relate to mediums. As you can see, even this very small subset is very dense. While visual, the purpose of the research isn’t all about visualisation.

In the following image, a subset of the subject hierarchy is shown.

In the Tate metadata, the subjects are a three-level hierarchy with the top level being a generic categorisation of the subject, the middle being more specific, then the third / bottom level being the one that specifically applies to the artwork. Initially, I will connect artworks only to the third-level / most specific subjects, but for performance purposes, I will look at connecting artworks to all three levels in the subject hierarchy. This would allow for more generic searches, e.g. search for artworks containing “animals: mammals” rather than having to specify “cow”, “sheep”, “horse”, etc.

More specifics in the next post, such as my use of Python, the py2neo module for Python, the Neo4J graph database, and more.

I’m investigating a number of technologies for the PhD. In the previous 3 posts, I covered the initial development of an Android app that scans an RFID tag for object id information and also connects to an Omeka collections management systems repository using its REST API to retrieve object metadata.

While much development is still required before I can begin to demo the Android app, I thought it would be appropriate to begin looking at other technology related to the Internet of Things that would allow me to build an interactive exhibit (or the technological components to make an existing exhibit more interactive). The interactive exhibit will connect to the Internet (or via another device that does), to upload some data generated by the interaction with one or more museum or gallery visitors.

I have yet to have that Eureka moment with regard to how this interaction will work. It could be that the creative spark comes from a museum professional who wants something built for their museum. I will also explore the possibility that what I build will actually be a piece of interactive art, with the art finding its way to the web or deriving something from the web.

There are a number of potential platforms. Gone are the days when it would have been necessary to have a fully-fledged PC on-site. Nowadays, small micro-controllers, such as the Arduino, allow for single purpose interactive components to be built. Even fully-fledged computers, such as the Raspberry Pi, allow for even more fascinating possibilities in a tiny form factor.

I have both a Raspberry Pi (Rpi) and an Arduino. I bought the Rpi last year to investigate its possible use for teaching programming. I haven’t used it very much, to be honest, but I intend to up the ante by integrating it into an interactive exhibit. There are many peripherals that can be added on, such as a camera.

The Arduino was purchased in the past week or so with funds from CIT’s Computing Department. For about €150 I got the Arduino Uno in the ARDX (Arduino Experimentation) kit, a wire cutter, soldering iron, small 16×2 LCD display, and an RFID/NFC shield. The advantage of the Arduino over the Rpi is the number of shields, sensors, actuators, and so on, that can be connected to the device – many of them without the need for the soldering iron. The ARDX kit came with a bundle of wires, resistors, motor, servo, 9V battery cable, a bunch of red and green LEDs, some push buttons, a light sensor, a temperature sensor, and one or two other bits and bobs. It’s worth looking at the web-page for the full contents.

For now, my Arduino lacks connectivity. There are wireless and ethernet shields available, but for now, I can hold off on purchasing them. I plan to output messages to the 16×2 LCD display and / or use LEDs for status display.

The ARDX starter kit comes with a set of open-source lessons and circuit diagrams. See http://www.oomlout.com/a/products/ardx/ for more details. I just got the kit in the post yesterday and had time to complete the first two experiments. The second one can be seen in the image below, where an array of 8 green LEDs blink in sequence from top to bottom. I go from blinking lights to motors, servos and sensors over the next 9 experiments. My hope is that I will have an appreciation of the possibilities of the Arduino by the time I reach the end. That will help my discussions with museum professionals, whom I can demo to – the 9V battery allows me to demo anywhere.

The Arduino is programmed using a simple IDE on Windows, Mac or Linux. I had no problem installing the Linux version following the Linux Mint instructions. I would guess that within about 10 minutes I had the IDE installed, the wiring done and the code uploaded to the Arduino. The LED blinked as it was supposed to first time.

Coding is done in good old-fashioned C. I’m quite happy with that. I was a professional C programmer in 1999 / 2000, when I was programming on the VMS operating system for a warehouse management system. There is something raw and exciting about getting closer to the hardware with C.

What I am seeing so far is that C coding for the Arduino doesn’t require a huge amount of C coding experience. Any experience of derived languages, such as Java, PHP and C#, should allow instant access to hacking the code.

So far so good. A bit of a buzz when you see something as rudimentary as an LED blinking just because you wrote the piece of code that did it. Now I need to get the creative juices flowing to come up with something artistic and interactive.

A bit of a learning curve, so there is, with regard to converting the retrieved JSON to Java objects (Plain Old Java Objects / POJOs — Spring beans). The building block of Omeka’s JSON documents is the Element. This seems to be based on Dublin Core metadata element sets, but does not seem to be a complete implementation. I still have to get my head around that side of things.

{"id":1,"url":"http://yourdomain.com/api/elements/1","name":"Text","order":1,"description":"Any textual data included in the document","comment":null,"element_set":{"id":1,"url":"http://yourdomain.com/api/element_sets/1","resource":"element_sets"},}

Each element would appear to match up with Dublin Core elements, such as Title, Subject and Description. For example:

Omeka, it appears, only returns 4 fields for an element – at least in the example I have on my previous post.

If I want to add some text for an element, I need to use an Element Text document, within which is embedded the relevant Element document, as well as an Element Set document, which will almost certainly be a reference to Dublin Core’s element set – though it opens up the possibility for an alternative to Dublin Core metadata. The example below adds the description “Test Description” to an item or collection:

For now I have decided to base my Java classes (used in my Android app) on the fields in retrieved documents using Omeka’s REST API. Here are the classes required for an Element Text (with getters and setters left out to save on reading time):

I had considered building my own collections management system. This would have had some advantages, but several disadvantages. The main disadvantage would have been time. Another would be the need to know the standards inside out – e.g. properly implementing / adhering to Dublin Core metadata standards.

It wasn’t difficult to find a collections management system that would provide the basis for my own collections (and those of any partner museums / galleries along the way). Omeka has some definite advantages: it is open source, so I can see why things happen the way they do and modify if necessary; it is extensible with plugin development “hooks”; it has some similarities in its philosophy with WordPress, the world’s most installed content management and blogging system; it is moving to the Zend framework – not my framework of choice, but one that is widely used and thus provides access to simplified APIs; it has a decent community of support around it.

I initially tried setting everything up myself with Omeka – I created a VirtualBox Ubuntu virtual server and eventually got it working, though there was a lot of flapping around with Apache and modrewrite. Later, though, I discovered a Turnkey Linux Omeka install. While it is possible to install the operating system (Debian Linux), apache, PHP, MySQL, and Omeka, from a downloaded ISO image (once burned to CD or mounted directly while starting a virtual server), the easiest approached is to download the VMDK disk image and use that when creating a Linux virtual server with VirtualBox or VMWare. These Turnkey Linux images tend to be cut down to almost the bare minimum. The Omeka one is a headless server – that is, there is no desktop with mouse and pointer. It only takes up 238 MB of space and can run with a fairly small memory footprint – easily under 512MB. Whether or not I would use one of the Turnkey Linux images for a production server, I am not sure – might need more optimisation – but for development purposes it is perfect. I can have many instances of Omeka installed and run the ones I want.

The Turnkey Linux Omeka image was only at version 2.0.1, but was easily upgraded to the latest version (2.1.4).

Spring for Android provides support for, amongst other things, Spring’s RestTemplate class. This allows an Android app to connect to a REST API and retrieve resource data in some representation or other. In the case of Omeka, from version 2.1 there is support for REST with JSON documents returned. The REST API is turned off by default and must be explicitly turned on in Omeka’s admin control panel. In addition, only items and collections marked as public will be returned to a REST API request.

The following is an example of a JSON document returned from a request to http://192.168.1.17/api/items/1 (running in a VirtualBox virtual server on my Linux PC):

Not particularly easy to read, is it? The only data of real interest to my consuming Android app for this particular item is the id number (1), the title (“Test Title”) and the description (“Test Description”). Pretty much everything else can be ignored. Part of the reason it is difficult to read is because each item (our cultural objects are just known as items) is made up of elements. Each field value, such as title, description, creator, and so on, are stored as separate rows in the element_texts table.

We can see more easily that the owner is the Omeka user with id 1, that there are no connected files (count is 0) and that there are no tags (annotations). There is a sub-document of element_texts which causes some complication. It would be nice if we could drill-down into each element text and immediately see that one is a title, another a description, and so on. What we get, though, is the value first and the element name within a further sub-document (the element). This isn’t insurmountable, but probably means that using automagic to automatically populate Java classes is out of the question (e.g. using RestTemplate’s getForObject… restTemplate.getForObject(url, Item.class);) – though I haven’t ruled this approach out just yet.

So where am I now? I can get the Android app to connect to the Omeka site and retrieve a JSON document for the RESTFul url supplied. However, I have yet to map the retrieved document to my internal Java Item class. Hopefully in part 2 I can get a bit closer to that goal.

Since the last post, I submitted a first draft literature review. It wasn’t particularly long and it probably still meanders a bit too much. Since submitting, I have continued to try to narrow my focus. Whereas before I intended to keep the technological augmentation of the museum to a minimum, with electronic id tagging of objects being as far as that went, I have now explored more advanced augmentation with Raspberry Pi and Arduino to the fore, both of which individually or combined provide access to a wide range of sensors and actuators that can be used to build interactive exhibits, for example.

However, the original plan to explore RFID and NFC on Android-enabled mobile devices is continuing. While I have Java experience, I have no Android experience. The Android SDK provides a whole new framework to learn with various files in various locations that you need to become familiar with. I don’t find it nearly as intuitive as getting to grips with a web MVC framework, as I have done recently with Laravel for PHP and Spring MVC for Java. But I am persevering and making slow progress.

There are a number of tools available for Android development. The most commonly-used is Android Developer Tools (ADT) plugin for Eclipse. My first Android adventures began with ADT. However, when I tried to get it working with a package / build manager – specifically Maven – I ran into all kinds of problems. I also tried the new Android Studio, which is still far from stable. Android Studio is built on the IntelliJ platform, which competes with Eclipse. An advantage it does have over ADT/Eclipse is Gradle integration. Gradle provides a package manager to download dependencies, much like you would inside a Project Object Model file (pom.xml) in Maven. I’ve only just started to feel comfortable with Maven and now I have yet another Ant-style tool to learn.

While obviously less stable than the more mature Eclipse + ADT, I am going to persevere with Android Studio. I get the feeling that Google is putting its weight behind it, offering integration with various Google cloud services, such as App Engine, as well as more specific Android features. Gradle also seems to simplify package and build management in comparison to Maven, for example.

In addition, I have been looking at the Spring for Android project. The Spring Framework for Java provides a means of developing applications that provides layers of abstraction for various complex technologies, or technologies with a lot of configuration overhead or that require lots of code to check for errors, etc. For these technologies, Spring simplifies configuration and through something called Aspect Oriented Programming (AOP), reduces the amount of “boilerplate” code needed (i.e. repetitive code that is error prone due to bad habits like copying and pasting). There is also a Spring Mobile project which is not Android specific and instead is for developing cross-platform web-based solutions using Spring MVC. That sounds great on the face of it, but I want to develop a native Android app to ensure I can get the maximum out of the platform and have proper access to hardware features, such as NFC.

I hit a few issues when trying to build even the Hello World project when trying to include Spring libraries. I found a solution here: http://forum.spring.io/forum/spring-projects/android/743289-importing-android-spring-with-gradle-cause-error-mutiple-dex-files

What have I achieved thus far with Android? I added code to read an RF tag. The test tag I used was taken from the plastic bag covering exam papers I had to correct. It contained some simple text data: “DATA8002 – 24012”. That’s the module code and a unique id number identifying the teaching instance of the module. For test purposes, this is all I need. However, as time goes on, I will need to get some fresh RF tags and use an NFC writer to add my own id data. In addition, I will need to get my hands on some more advanced NFC cards so that I can put some user data on there, most likely encrypted.

I was able to test that the device I was using – a Google Nexus 7 tablet – was able to read the data from the tag. The NFC TagInfo Android app allowed me to verify that. Then I found a useful Android NFC tutorial that saved some time: http://code.tutsplus.com/tutorials/reading-nfc-tags-with-android–mobile-17278

The end result isn’t Earth-shattering. I was able to develop a simple app that could intercept NFC tag reads and display data on the screen. The next post will detail how I managed to connect to a collections management system, Omeka, using the REST protocol (thanks to Spring Android) to retrieve Dublin Core-compliant object/item metadata in JSON format for display on my tablet.

One of the questions I might ask of my research into a technological platform that links objects in multiple museums is who the audience is for the platform. Is it researchers / historians and / or pursuers of “serious leisure”? Or is it the casual tourist who wants to see a cultural trail as part of their vacation? This is discussed in Balloffet et al (2014) who cite Chaumier (in French, I believe, so it might be difficult to follow up on his writing) as saying that the idea of museums becoming like amusement parks is “provocative” and “absurd”. The two institutions, if we can call them that, are “diametrically opposed”.

The paper mentions that just as museums are becoming more “innovative, lively environments that include recreational experiences in order to mediate content that is perceived as serious”, amusement parks are “including more content that is culturally rich.”

From early on, the idea of discreet technological augmentation rather than pervasive / invasive augmentation is what I have been focusing on. Not bells and whistles. Though there is a part of me that would like to open up the possibilities of museum interaction with children, and in that case an element of Disney is likely to be unavoidable. Perhaps another reason for the discreet augmentation is that it is cheaper to roll out on a multi-museum basis, which is what I aim to do.

What do I mean by discreet augmentation? I mean that the museum visitor would not be distracted by any augmentation – at least not too much. For example, the addition of tags or barcodes should be discreet. Perhaps you make it clear in the main lobby area or in the brochure that the feature is supported, and then let the tags / barcodes melt away into the background. Otherwise, the augmentation is entirely discreet, in the guise of a mobile app or a website. There could be an argument, I suppose, that lots of people with mobile devices in a museum would be distracting and in that sense not so discreet, but the mobile devices are not a fixture in the museum.

As part of a module, Technology Business Planning, I sent out a questionnaire to colleagues in CIT, by way of market research for a (fictitious) business plan covering a startup business that offers a social media and mobile app solution to museums (and the GLAM sector in general).

So far there are 113 respondents. Some of the key findings so far:

85.6% own a mobile device (smartphone or tablet)

18.6% of mobile device owners had installed a cultural heritage app

67.9% visited a museum at least once a year

67.7% were moderate to extreme users of social media

12.4% had shared their museum experiences online

74.3% thought it would be somewhat or very useful to have a mobile app in a museum that would provide additional exhibit information

44.2% found the idea of social media integration somewhat or very desirable

I can conclude that the use of mobile devices in the museum is very appealing, while social media integration had a lukewarm appeal.

It was difficult in a survey to give the respondent an idea of what the social media integration would be like. My gut feeling is that a different approach would be needed, such as field testing a prototype or at least providing screen mockups and perhaps video of the social media aspect of the platform in action.

It’s been a useful exercise so far. I intend to do some more analysis once I export from SurveyMonkey into CSV format. I’ll either learn to do this using SPSS or take the lazy approach and find a student to find some correlations – e.g. age and mobile device ownership or age and desire to use mobile devices in the museum.

The rigor of the research isn’t good enough to think about writing a publishable paper, but it gives me some ammunition for the business plan deliverable of the Technology Business Planning module, and it is something I can share with museum owners / directors to entice them into being interviewed for my PhD.

I also collected some comments as part of the survey, which were all anonymous, so I will share some of those and discuss them in another post.