Looking up Images Trademarked By Companies Using OpenCorporates and Google Refine

Looking through the example data available from an opencorporates company ID via the API, I spotted that registered trademark data was available. So here’s a quick roundabout way of previewing trademarked images using OpenCorporates and Google Refine.

(Hmm, it seems as if we could load in data from several URLs in one go… maybe data from different BP companies?)

Having grabbed the JSON, we can say which blocks we want to import as row items:

We can preview the rows to check we’re bringing in what we expect…

We’ll take this data by clicking on Create Project, and then start to work on it. Because the plan is to grab trademark images, we need to grab data back from OpenCorporates relating to each trademark. We can generate the API call URLs from the datum – id column:

If we look through the data, there are several fields that may be interesting: the “representative_name_lines (the person/group that registered the trademark), the representative_address_lines, the mark_image_type and most importantly of all, the international_registration_number. Note that some of the trademarks are not images – we’ll end up ignoring those (for the purposes of this post, at least!)

We can pull out these data items into separate columns by creating columns directly from the trademark data column:

The elements are pulled in using expressions of the following form:

Here are the expressions I used (each expression is used to create a new column from the trademark data column that was imported from automatically constructed URLs):

value.parseJson().datum.attributes.mark_image_type – the first part of the expression parses the data as JSON, then we navigate using dot notation to the part of the Javascript object we want…

value.parseJson().datum.attributes.mark_text

value.parseJson().datum.attributes.representative_address_lines

value.parseJson().datum.attributes.representative_name_lines

value.parseJson().datum.attributes.international_registration_number

Finding how to get images from international registration numbers was a bit of a faff. In the end, I looked up several records on the WIPO website that displayed trademarked images, then looked at the pattern of their URLs. The ones I checked seemed to have the form:http://www.wipo.int/romarin/images/XX/YY/XXYYNN.typ
where typ is gif or jpg and XXYYNN is the international registration number. (This may or may not be a robust convention, but it worked for the examples I tried…)

The following GREL expression generates the appropriate URL from the trademark column:

Okay – so maybe I need to tidy up the registration related columns, but as a recipe, it sort of works. (Note that it took way longer to create this blog post than it did to come up with the recipe…)

A couple of things that came to mind: having used Google Refine to sketch out this hack, we could now move code it up, maybe in something like Scraperwiki. For example, I only found trademarks registered to one legal entity associated with BP, rather than checking for trademarks held by the myriad number of legal entities associated with BP. I also wonder whether it would be possible to “compile” what Google Refine is doing (import from URL, select row items, run operations against columns, export templated data) as code so that it could be run elsewhere (so for example, could all through steps be exported as a single Javascript or Python script, maybe calling on a GREL/Google Refine library that provides some sort of abstraction layer of virtual machine for the script to make use of?)

PS What’s next…? The trademark data also identifies one or more areas in which the trademark applies; I need to find some way of pulling out each of the “en” attribute values from the items listed in the value.parseJson().datum.attributes.goods_and_services_classifications.

8 thoughts on “Looking up Images Trademarked By Companies Using OpenCorporates and Google Refine”

Wow. So much potential for this – I didn’t even know trademark data was playable-with and you’re already hacking it seemingly with ease. I’m interested to see what you come up with regarding web-UX related trademarks… terrifying to think how much of the language of our daily experience is protected and owned…

I know I don’t always comment but I do read every post, please keep up the amazing work!

Thanks for the comment ;-) Next step is to do a network map of companies and add in trademarked images? There is also trademark topic area codes/descriptions in the data, so I guess there be some way of seeing how different companies carve up topic areas with different trademarked images?
I thought the BP thing was interesting in that they trademark Wild Bean Cafe. I guess I’d always thought of that as an agreed concession with another company rather than being a BP brand…but I can see it makes sense….

@tim ah, wonderful, thanks… Having got the steps identified, I was thinking of trying to build a company specific version in Scraperwiki as a proof of concept. But the Easter break has interrupted things (family trek, wifi free zone…:-(