A few times on the Growstuff mailing list or IRC channel, someone’s excitedly suggested that we should import data from another CC-licensed data set. Each time, I say, “Trust me, that’s pretty complicated,” but I’ve never actually sat down and explained the full gory details of why.

The following is something I wrote up for our wiki so that I could point people at it next time the subject comes up. I thought it might be interesting to a wider audience, too, so that’s why I’m posting it here.

Importing data is hard.

This is a bit of a rant by Skud, who used to work on Freebase, a large open-licensed data repository which imported data in bulk from a range of sources, including Wikipedia, Netflix, the Open Library Project, and many more. She’s had a lot of experience in this area, and learnt a lot about the weird complications of mass data imports.

The simple case

They have a database. You have a database. Your fields are the same. Their API is easy to use and their license is compatible.

map their fields to ours, eg. their “name” is our “system_name”

import data

PROFIT!

Their fields aren’t the same

What if the fields aren’t quite equivalent? For instance, let’s say they have measurements in imperial and we use metric. We’ll need to have ways to convert them. That’s actually a really simple example. Import incompatibilities are more often at a semantic/ontological level. Growstuff has the idea of “crops” and “varieties” but what if the other database only has “plants” with no distinction? Or what if they have crops and varieties but draw the line somewhere slightly different to where we do? These sorts of incompatibilities are more common than not, and massively complicate any import effort.

Some of their data is bogus

Nothing against that other database — some of everyone’s data is bogus! But we need to check it. What “bogus” means will vary from place to place, but it might be spam entries, duplicate records, simple errors, or it might be cruft from their own broken imports. We need to look carefully at every import and make sure we’re skipping as much of this as possible. And this is largely a manual process, since what the bogosity will never be the same twice. You can do this by sampling, of course, but you still need to look at something on the order of a hundreds of records, and know what you’re looking for. Could you spot a mixed-up scientific name on a randomly chosen herb? I couldn’t.

The stub problem

Let’s say we want to import from a database of plant life that lists 10,000 edible plants and their nutritional content. Growstuff has 300 crops at present. We import everything! Now we have 9,700 pages with nothing but nutritional data. Nobody on Growstuff is using them, they have no pictures, they have no planting data, they have no discussions (except maybe spam comments that nobody cleans up because nobody notices). Our “newest crops” page, usually a source of interest, is now just a wasteland of grey placeholder images.

Should we have imported all 10,000 plants, or just the nutritional data of the 300 we already have? Or something in between? The answer is usually “something in between” — you might want that data if and only if you can get other partial data from other imports to make it more interesting.

The best way to do this is to import the 300 and make a note of the 9700. Then later, you can cross-correlate the notes you’ve made from various data imports and re-import those that have, say, at least 3 useful data sources and a picture. But that’s pretty complicated. (Also, see the discussion of repeated imports, below.)

Don’t forget the license

Let’s assume that their data is licensed compatibly — that means CC-BY-SA or CC-BY in our case, since we’re CC-BY-SA and none of the other clauses (ND, NC) are compatible with us. (Ignoring CC-0 and public domain stuff for now — those don’t need attribution at all.)

So by importing, we have to credit them. Now we need some way to represent that in the database. If we do this at the object level, it’s fairly simple: each thing in the database (crop, etc) has many licensors, each of which includes a name for the work (eg. “Katie’s Plants”), a license (eg. CC-BY), a licensor name (eg. Katie Smith), and a URL to link to the original data.

Now we have to display them on the page. Where? Probably at the bottom somewhere: “Some information on this page came from: Katie’s Plants (that would be a link) — CC-BY Katie Smith; SuperPlantDB under CC-BY-SA SuperPlants Inc; etc.”

The license chaining problem

Now imagine that the data on those sites came from other sites. For instance, let’s say Katie’s Plants previously did an import from Freebase.com, and SuperPlantDB did one from Wikipedia. We not only need to credit Katie’s Plants and SuperPlantDB but also those places.

Some questions to consider:

Are those second-degree sources, their licenses, licensors, etc available via the API? When Freebase imported images from Wikimedia Commons, we encountered this problem, because the license metadata had to be scraped from inconsistently-formatted HTML. Getting this wrong leads to complaints from licensors whose licenses we’re violating.

Do we know what part of the data on Katie’s Plants was sourced from Freebase? Maybe it was the international names, but we’re importing medicinal uses and not touching that part of her data. Does Katie’s license notice express this? Probably not — there’s no requirement in the CC licenses for the attribution to be at the field level, and our own attribution notices definitely don’t operate at this level of detail. Because we don’t have the detail, this means we end up with attribution inflation: pretty soon, every page on Growstuff has a hundred attributions at the bottom of every page.

Sure, we could just choose not to chain licenses, or to do it in some restricted way… but the moral high road here is to respect everyone’s license and attribution, and besides, if you only attribute some contributors, where do you decide to draw the line?

Katie’s fine — she imports from Hippie Herbs’ data with impunity because she’s non-profit. She attributes them on her site, and Hippie Herbs is happy. She doesn’t have to use the same license as them because they don’t have a “SA” (Share Alike) clause.

Now Growstuff comes along and wants to import data from Katie’s Plants. Katie’s Plants is CC-BY which is compatible with Growstuff… but what about the data that originally came from Hippie Herbs? We’re commercial, so we’re not meant to use it.

But how do we tell what’s what? Katie probably doesn’t attribute HH at the level of individual bits of data, so we can’t extract just the ok-for-commercial-use bits.

Basically, if you believe in license chaining (and as I said, it’s definitely the moral high road to take, so I think we should) then you have to be constantly vigilant for the taint of NC-licensed data anywhere in the sprawling tree of ancestors to your data.

What if we already have some data? (the merge case)

The simple case is fine for a green-field import with no existing data, which is described above. But let’s say we’re importing data into an area where we already have some contributions from Growstuff members.

map the fields as before

for each piece of data imported, compare with Growstuff

is Growstuff’s field empty? IMPORT!

are the two the same? no-op!

do they differ?

If they differ, do we trust our own community or the import source? Or do we need to adjudicate?

Let’s say we decide to adjudicate. We now need to build an app to let people vote on which one is “correct” — probably best of three or something like that. Freebase did this (multiple times) and I was involved in some of it. We called them “data games” and had leaderboards for who’d voted the most. We couldn’t get enough throughput, though, and sometimes by the time something had been adjudicated, another community member had edited the field on our site, thus invalidating the whole thing. We ended up paying people in developing countries to churn through these votes for us (we used ODesk, but you could use Amazon’s Mechanical Turk or whatever). However, they needed training, and weren’t cheap — even after all the work of setting up the voting queue, there was still considerable expense.

Do we let people edit the data after import?

This came up quite often with Freebase because sometimes they would import from “authoritative sources” who licensed their work specially to Freebase but didn’t generally have a CC license or an open community editing process. For instance, the time when I was talking to some people from the BBC, and one (an older dude) said, “If we gave you our programme data, we wouldn’t want anyone to edit it because we are the experts on our programmes.” This was pretty silly of course — another, younger BBC dude immediately turned to him and said “Ha ha ha, I’ve got two words for you: Doctor Who.” — but sadly these situations are common when you’re dealing with closed/non-community-based/”authoritative” data sources who don’t understand the power of crowdsourcing information.

But even when dealing with compatibly CC-licensed sources with open developer communities, there can still be some problems around the “authority” of the data and how it’s attributed.

Take the case where Katie’s Plants community have spent heaps of time editing their data and are very proud of it. We import it to Growstuff, then our community looks at it and decides that bits of it are wrong and change it.

Do we leave the license link to Katie’s Plants intact? Most likely yes, because our data has theirs in its DNA, so to speak. But what if we essentially deleted all the data from there? This might happen if, for example, we’d imported a picture from Wikimedia Commons then found that the picture was incorrect or inappropriate, so we blew it away. Now we should probably remove the license note. But how do you tell when data has been completely removed as opposed to modified or built upon?

In the Katie’s Plants example, what if Katie’s high quality medicinal plant information gets mixed up with ($DEITY forbid!) low-quality data from less experienced Growstuff members or from yet another import? What implications does this have for Katie’s site and their reputation? Under the license we’re allowed to mess it up because there is no “No Derivatives” (ND) clause, but socially/culturally they’ll be pretty unhappy if we do, and we can expect some backlash.

Repeat imports

Great news! Katie got a government grant and some fantastic press coverage, and her database has expanded enormously. We want to re-run the import. But now consider this case:

Katie’s plants, original: “Tomato – red”

Growstuff, original: “Tomato – red, yellow, green, black”

When we first imported, we put it to adjudication and found that Growstuff’s data was better, so we went with that.

Now we re-import, and Katie’s data has changed:

Katie’s plants, changed: “Tomato – red, yellow, green, striped”

So of course we put it through adjudication again. The correct answer is probably a union of the two sets.

Now, Katie’s database is growing fast, and so is Growstuff. We want to do a regular import from there — perhaps monthly. But somehow along the way, we’ve ended up with different ideas of tomato colour. Every month, their data is different to ours, and we have to keep re-adjudicating the same question: what colour/s are tomatoes? Boring. Our community is tired of playing the voting game, and/or it’s costing us money with our Mechanical Turk people.

So we decide to implement a check: if nothing’s changed on either side since the last adjudication, leave it. But now we have to implement change tracking, not just on Growstuff, but on Katie’s Plants as well. We need to keep a history of changes for every site we import. This is in addition to the infrastructure we’ve had to build to automatically run imports at regular time intervals.

How do we make our data available in return?

Obviously we have an API for people to access our data under CC-BY-SA. But keep in mind the license-chaining effect: if anyone uses data from Growstuff, they will also be constrained by the licenses of all the data sources we import. We will need to make that license information available in the API alongside our data, and make sure all our API docs and related materials explain the necessity of license chaining.

Take a look at Freebase’s Attribution Policy. They use CC-BY, but because of attribution chaining, they can’t just say that — they need a whole page with a wizard to help people figure out how to attribute something on the site. It’s incomplete, too: Freebase decided that they would only require license chaining for “content” as opposed to “facts” (a complicated issue in itself) which means images Wikipedia-based descriptions. They don’t require chained license information for other data sources. This is dubious in terms of the legality and culture of how Creative Commons works — there’s no really firm guidelines on this, but in my opinion the most moral/ethical stance is to always chain your attributions, and Freebase has chosen otherwise. In the past, this has caused some concern from the owners of other data sources that were imported to Freebase. Even Wikipedians have complained that Freebase doesn’t enforce their Wikipedia attributions strongly enough. This sort of thing can lead to reputation problems, if not legal ones.

Just the facts, ma’am

One final complication. Various courts have ruled that “facts” aren’t copyrightable. For instance, the fact that the crop “Corn” has the scientific name “Zea mays” can’t be copyrighted. Even if you have thousands of these facts all together, they can’t be copyrighted, because they’re not a “creative work”. They’re just a statement of fact.

This actually throws the whole idea of CC-licensing collections of data into doubt. And yet we have nothing better, so we do it anyway.

Some data projects have come up with various justifications for this. For instance, Freebase says that the arrangement of the facts is a creative work — that what’s CC-licensed is their schema. That’s pretty creative in itself! The thing is, none of this has really been tested. And so most open data projects have some kind of Terms of Service which explains what they think the CC-license is for and how it’s meant to be used. These generally say, “By accessing our data via our website or API, you agree to behave as if this CC license applied to it (even if there’s not a very strong legal basis for that outside this TOS).”

The original idea of CC licenses was to stop people having to write their own terms and conditions of use for their work, and standardise in such a way that people could easily re-use creative content. Yet for data projects, we end up having to make up our own TOS just to apply a CC license, and we’re back where we started — having to peer at a bunch of legalese and figure out what the hell it means.

Of course once you get into the complexities of license chaining described above, you now also have TOS chaining — if Growstuff uses Katie’s data under their TOS, and Katie uses Hippie Herbs’ under their TOS, is Growstuff now subject to Hippie Herbs’ TOS? No idea! I am not a lawyer! I don’t want to be one! I just want to make a website about growing food!

Conclusion

Importing data is hard! That doesn’t mean we shouldn’t do it, but we should go into it with an awareness of the potential potholes, and carefully weigh up whether importing something is the best choice for us at any given time.

Final note

Katies Plants, Hippie Herbs, and SuperPlantDB are all made-up examples. Any resemblance to actual open data projects is coincidental. Freebase, Wikimedia Commons, and the BBC are real, though.

Technically, for most the last year or so since leaving Google, I’ve been unemployed. I didn’t receive unemployment benefits, though, because I didn’t really need it and because the paperwork overhead seemed higher than I was prepared to deal with. (Plus of course the periods when I was studying or overseas.) But now I’m working on Growstuff and I’d like to get onto the New Enterprise Incentive Scheme, which offers small business training and mentoring and some small amount of funding for a year while you work on your new thing. Thing is, you need to be on unemployment benefits to qualify, and so I recently applied for Newstart.

As a Newstart recipient, I’m required to search for jobs, even though my goal is to run my own business. Whatever, I can play the game. I applied for a number of jobs online, figuring I’m probably over-qualified for most of them, but it fulfils the requirements. Today I got an email back from one of them, asking me to fill in an online questionaire. Obviously, to show good faith in my “job search” I need to do this, but I have to admit that it sapped my will to live.

The first part of the process was an 80 question “IQ” test which included the following questions:

The idea that the Earth is the centre of the universe is

a) improbable
b) intelligent
c) subversive
d) insular
e) astronomical

Such things as language, clothing, customs, color, idicate[sic]:

a) temperament
b) race
c) birthplace
d) location
e) personality

Which of the following is a trait of personality?

a) affluence
b) reputation
c) position
d) withdrawn
e) power

The second part of the questionaire, about the “product” of my most recent work (i.e. my year at Google) was even worse. Luckily their web app crashed and I couldn’t actually complete it.

You think those Google recruiters would know not to contact me, but the other day I got another perky “Opportunities at Google” email from one of them, telling me that they’d found my “online profile” online and that based on my “experience” they think I “could be a great addition to our team!”

Riiiiight.

Since I just deleted my LinkedIn profile, I emailed them asking where they’d found this “online profile”, since it was obviously outdated. Oddly enough, it seems they’d found a page about me on the Geek Feminism Wiki, and were using the rather sketchy outline of my open source background there as justification for trying to recruit me.

The recruiter admitted that the page was out of date, and asked me to let them know what I’d been up to lately so they could add it to their records. Below is a copy of what I sent them. I’m posting it here, lightly edited, for anyone who’s interested, and in the hopes that the next Google recruiter (I have no doubt that there’ll be one) might use that web search thingamajig to find out whether I’m a suitable candidate before emailing me.

Here’s what I’ve been up to for the last couple of years, since you asked.

In July 2010 the startup I was working for, Metaweb, was acquired by Google. I was brought in on a 1-year fixed term employment contract, since the group we were acquired into (Search) didn’t really know what to do with a technical community manager. I attempted to transfer my role over to Developer Relations, but was told that I “wasn’t technical enough” for the job I’d been doing for 3+ years, presumably because I didn’t have a computer science degree and believed that supporting our developer community was more important than being able to pass arbitrary technical quizzes.

Around the same time, Google started to develop Google+. As a queer/genderqueer woman, victim of abuse, and someone who was (at that very time) experiencing online harassment and bullying, I was very vocal within Google for the need for Google+ to support pseudonymity. Google decided not to do that, and instead told people they should use “the name they are known by” while in actual fact requiring their full legal names, in many cases requiring people to provide copies of their government ID when challenged. (Extensive documentation about this is available on the Geek Feminism wiki, if you’d like to read it. See Who is harmed by a “Real Names” policy? for starters.)

When I walked out the door of Google’s San Francisco office on July 15th, 2011, I was very glad to have left a company I thought was doing evil towards any number of marginalised and at-risk people. My first tweet on leaving was to criticise them for it.

Less than a week later I got my first email from a Google recruiter — not first ever, of course; I’d been spammed with them for years, but first since I quit working for them. Here’s the blog post I wrote about it. In case you can’t be bothered clicking through and reading it, here’s the money shot:

If you are a Google recruiter, and you want me to interview for SWE or SRE or any role that has an algorithm pop quiz as part of the interview, if you want me to apply for something without knowing what team I’ll be working on and whether it meshes with my values and goals and interests, if you want me to go through your quite frankly humiliating interview process just to be told that my skills and qualifications — which you could have found perfectly easily if you’d bothered to actually look before spamming me — aren’t suitable for any of the roles you have available, then just DON’T.

I returned to Australia and went back to school. I did a semester of Sound Production at TAFE, but it turned out that the sound engineering course I was enrolled in wasn’t really my cup of tea, just like I’d previously decided, back in the ’90s, that university wasn’t for me. Like so many others, I quit my computing degree because I was more interested in the Internet and open source software than in fixing COBOL applications for banks who were worried about Y2K. But then, I’m sure Google’s HR system already knows all about that — if I’d had a degree, you might have considered me worth keeping on last year. Instead, Google’s reliance on higher education credentials causes it to weed out people like me, even though I have a track record a mile long and buckets of evidence to show that I’m good at what I do.

In the end, I’ve spent most of the last year lying in hammocks reading books, working in my garden, going to gigs, hanging around recording studios, doing the odd bit of freelancing, and, over the last few months, travelling around Europe. It’s given me a good opportunity to reflect on my previous work.

Since I’ve been out of the Silicon-Valley-centred tech industry, I’ve become increasingly convinced that it’s morally bankrupt and essentially toxic to our society. Companies like Google and Facebook — in common with most public companies — have interests that are frequently in conflict with the wellbeing of — I was going to say their customers or their users, but I’ll say “people” in general, since it’s wider than that. People who use their systems directly, people who don’t — we’re all affected by it, and although some of the outcomes are positive a disturbingly high number of them are negative: the erosion of privacy, of consumer rights, of the public domain and fair use, of meaningful connections between people and a sense of true community, of beauty and care taken in craftsmanship, of our very physical wellbeing. No amount of employee benefits or underfunded Google.org projects can counteract that.

Over time, I’ve come to consider that this situation is irremediable, given our current capitalist system and all its inequalities. To fix it, we’re going to need to work on social justice and rethinking how we live and work and relate to each other. Geek toys like self-driving cars and augmented reality sunglasses won’t fix it. Social networks designed to identify you to corporations so they can sell you more stuff won’t fix it. Better ad targetting or content matching algorithms definitely won’t fix it. Nothing Google is doing will fix it, and in fact unless Google does a sharp about-turn, they’ll only worsen the inequality and injustice there is in the world.

I guess you’ll want to know what I’m working on at the moment. My current project is an open source, open data project called Growstuff, which helps food gardeners track and share information about what they’re growing and harvesting. It is built on principles of sustainability, including a commitment to a diverse and harassment-free community, to actively supporting developers rather than excluding them based on misguided ideas of meritocracy, and to funding the project through means that will never put the people running the website in opposition to our customers. That means no ads, in case you’re wondering. We’d rather our members paid us directly; that way, we’ll never forget who we’re meant to be serving. I’m working on Growstuff from home, where I can be myself and feel safe and comfortable. I work with volunteers from all round the world, and get to teach programming and web development and system administration and project management and sustainability to all kinds of people, especially those who’ve previously been excluded from or marginalised in their technical education or careers. We get to work on things we know are wanted and appreciated, and we don’t have to screw anyone around to do it.

Let me know when Google has changed enough to offer me something more appealing than that. If you don’t think that’s likely to happen, then please put me on whatever “Do Not Contact” blacklist you might have handy. I know you must have some such list; I only wish you regularly referred to it instead of spamming people who not only don’t want to work for you, but have nightmares about it.

Well, I’m home. Have been for a few days, actually, but in between jetlag, flaky internet, and nesting, I haven’t gotten around to posting.

The flight home was ghastly and let’s never talk about it, okay? I am still processing my thoughts on the trip overall but I guess the quick version is: 2.5 months is a long time to be city-hopping, it was more expensive than I expected, it was great to meet people everywhere (hi! thanks!), and I really want to spend more time in Andalusia and in the north-east of England.

Now I’m home I’m sorting out money (yay Centrelink) and work (some balance of Growstuff and more audio stuff), settling into our rearranged home (we have a new housemate, and a significant turnover and reshuffling of furniture as a result), and trying to restart my social life. Incidentally, if you’re interested in my domestic blog it’s over here and likely to have lots of food/gardening/crafts in the near future. NESTING. SPRING CLEANING. MORE NESTING.

I’m becoming increasingly disenchanted with social networking websites and probably going to delete my Facebook account. Yes, again. Especially after they outed queer students to their parents and then blamed the students for not understanding Facebook’s “robust privacy controls” — despite the students having locked down their accounts, and Facebook ignoring those settings.

With the way Twitter is going these days, I may drop that too. Or at least stop using it as a primary interface to the world. I keep coming back to the fact that if I’m going to create stuff, I don’t want some corporate jerkwads shoving ads all over it, potentially ads for things that are anathema to me. See, for example, that time when LiveJournal put anti-equality ads all over someone’s post celebrating a same-sex marriage, or the “Meet Hot Gamer Chicks” ads we used to get on the Geek Feminism wiki. I’ll gladly pay money to support a service, but I won’t stick around for that sort of misuse of my words.

So, if you want to be sure to keep following me even if I drop off those places, you might want to subscribe to my blog (by RSS, or you can get email updates if you prefer — there’s a subscription form on the bottom of every page on my site.); or subscribe to my journal on Dreamwidth (mostly an aggregate of this blog and my domesticity blog, with a few other things from time to time); or on whatever the next not-completely-asshatty social network gets enough people to be worth the trouble.

If you wanted to catalogue the shit-eating complacency and pretentiousness of Web 2.0, infographics would be right up there with the damn TED conference and people who put “rockstar” on their business card.

Did someone really sit down one day and think “you know, unless we have the market share of the iPad illustrated as a pie chart shaped as an apple, people will think this statistic is too dry”? The story of the iPad is an interesting one: much, much more interesting than can be displayed in three factoids hastily put together in a crappy infographic. You don’t need an infographic to tell the story of a computer that is the size and form of a magazine. You need a writer.

Everyone keeps telling me that infographics are fine, and that I’m just getting stuck in Sturgeon’s Law. I keep hearing infographics designers turn up at design events talking about the awesomeness of infographics. But in my day to day life, I can’t remember ever seeing a good infographic. That is, I can’t remember ever seeing an infographic that made it worth the page taking even half a second longer to load.

So, I replied saying that I’d take a look at the infographic and consider running it on the following terms:

The information is based on respected, preferably peer reviewed, studies, and provides citations.

The graphical display of the information provides insight that would not have been available through text.

A textual summary of the infographic is also provided, to improve accessibility for readers who have trouble interpreting a graphic.

The graphic is provided under a Creative Commons Attribution or Attribution-ShareAlike (CC-BY or CC-BY-SA) license.

I don’t expect a reply, but I’ll let you know if I get one. Now I just need to come up with similar terms for the dozens of people who keep asking to guest-post on my blog. I suspect the top condition would be, “I get to mock you and your post.” Especially that one that emailed me the other day, saying: “When searching Google for Open Source Development, we found a post on infotrope.net.” O RLY?

In case you missed it (and if you’re not in Australia, you probably did), right-wing radio personality Alan Jones gave a speech to the Sydney University Liberal (i.e. conservative — yeah, I know) Club in which he said that Prime Minister Julia Gillard’s father, who recently passed away, had “died of shame” because of his daughter. This is part of a string of really nasty, misogynistic slurs against the PM, from Jones and other right-leaning journalists, commentators, and even MPs.

Anyway, there are a number of petitions going around in protest of this. One petition I found today (ironically, via a gay friend who tweeted in support of it) contains the following text:

Sign to remove the order of Australia from Alan Jones of 2GB radio . Alan Jones was arrested in 1988 in London on two accounts for indecent acts in a public toilet . Later he was directly involved in the Cronulla riots calling bikies to be involved . Now Alan Jones on air called for our prime minister to be drowned at sea in a chaff bag and that her father died due to her actions . Sign this and share this petition to stop this injustice .

(emphasis mine)

Now don’t get me wrong. Alan Jones is a vile worm of a man, and his comments against Ms. Gillard (and on a range of other topics) are completely unsupportable. I would love to see him stripped of his OA, and for all his radio station’s sponsors to pull their support. But let’s not make this about who he has sex with. As Sarah (@Stokely) says, it’s homophobic and irrelevant.

Someone on Twitter mentioned that the problem was the “indecency”, nothing to do with the gender of the people involved. You should be aware that the charge in question would have been made under the Sexual Offences Act 1956, which predates the 1967 legalisation of homosexuality in the UK. The act says:

Indecency between men. It is an offence for a man to commit an act of gross indecency with another man, whether in public or private, or to be a party to the commission by a man of an act of gross indecency with another man, or to procure the commission by a man of an act of gross indecency with another man.

The term “indecency”, in British law in 1988, only referred to male/male activity. The law against “indecency” was repealed in the Sexual Offences Act 2003, and replaced with Sexual activity in a public lavatory, which applies to all genders equally.

OStatus: like Twitter, but open – Ooh. I'm actually quite excited about this. The HN thread has some good points about WordPress integration as well. If OStatus can get itself hooked in closely to the WordPress ecosystem, it could actually have enough people using it — non-geek people, that is — to be worthwhile.

Proofreading font – Did you know there is a special font for proofreading OCR'd texts? This one was developed by Project Gutenberg. "It's designed to constantly throw you OUT of the story and get you to focus on the letters and punctuation. It's glorious. And ugly. Wow, I didn't know it was possible to make a font that ugly and still readable."

The day I confronted my troll – An engaging story with a twist at the end. There's something about it that rubs me the wrong way. Perhaps it's the delight with which people have been latching on to a story that portrays trolls as harmless individual actors. Few such stories are solved as neatly as this.

To Encourage Biking, Cities Forget About Helmets – What makes cycling safe? Tons of cyclists on the road. Helmet laws make cycling seem difficult and scary, discouraging ordinary riders and paradoxically making cycling less safe. Take note, Melbourne!

It’s almost a month to the day since I posted my last travel update. Since then I’ve been to Paris, then to Calais and across the channel to Dover, along the south coast to Brighton and Portsmouth, up to London, stayed a week and a half, then north to York, brief visits to Durham and Newcastle, a few days in Edinburgh, then over to Liverpool, Manchester, Bristol, and five days in Cornwall.

Tonight I was meant to have taken the ferry from Plymouth to Roscoff in Bretagne, then spent a week meandering back through France and across the Pyrenees to Spain and fly home from Madrid. Problem: ferry strikes mean that my ferry’s not going anywhere. Rather than try and arrange my travel to figure that out, I decided I was a bit over high-energy travel and not all that enthused about figuring out a new route through France and Spain on short notice. So here I am back in London, staying at a friend’s place again, for a week til I hop on an EasyJet flight to Madrid and then home.

I suspect I’ll be taking it pretty easy in London, mostly just kicking around and working on Growstuff. I do want to make it to the one museum that was closed during the Olympics/Paralympics when I was last here, but that’s about it. Okay, and maybe another visit to the V&A. Um. Well, let’s just say that I don’t have any particular plans, and don’t intend to work too hard at it.

That said, if anyone wants to catch up for drinks/meals/pair programming this week, let me know.

I know I have a lot of historian and/or textile-inclined friends, so I was wondering if anyone has ever heard the word “stitch” used to mean “stitched textile goods”, in contexts like, “The importance of stitch during World War II…”?

I ask because it keeps being used that way in the book I’m reading: “Stitching for Victory” by Suzanne Griffith. I thought I read a fair bit of textile history and I’ve never encountered this usage before. Is it just this author’s idiosyncracy, or am I missing something?

(This post brought to you by a half-assed resolution to blog things that are too long for Twitter, rather than spreading them across two or three tweets.)

My housemate Emily and I are looking for a new housemate around the time I get back from my travels. It’s for a smallish room in a largish house in Thornbury, in Melbourne’s inner north. Would suit a feminist fan who’s also a foodie, or something along those lines. It’s $152/week or $658 a month, and available from October 10th.

Click through for the full description/room ad and more details than you could poke a stick at. Please feel free to share this with anyone you know who might be interested.

Room in established Thornbury share house available mid-October. $152/week.

We’ve got an easygoing, friendly atmosphere – we love hanging out but also cheerfully give each other introvert/alone time when needed with no hard feelings. We enjoy entertaining in a low-key way: friends round to dinner, craft nights, the occasional party or summer bbq, guests staying over from time to time.

We love cooking together (& shopping at local markets), though it’s not a requirement of living here. We share expenses for some household staples (toilet paper, laundry soap, etc), and would be happy to share groceries and pantry staples if you want to cook/eat communally with us. We’re not vegetarian but are veg-friendly and eat lots of veg food.

We have some basic, discussed & agreed upon house rules to keep responsibilities for cleaning (and expected levels of cleanliness) well-defined – so no one gets stressed about it.

Located in a quiet but v conveniently placed street in Thornbury, just off St Georges Rd and south of Miller St. 112 tram The available bedroom is on the small side: it can fit a double bed, bedside table(s) & shelves/clothes rack, lovely view onto green front yard with lots of natural light coming in, but sadly doesn’t have (and wouldn’t fit) a wardrobe or large chest of drawers.

However, there’s a large linen closet just outside your bedroom door, where you can keep as much stuff as you like, as well as a big communal wardrobe in the hall for hanging things and a moderate amount of storage space in the backyard shed.

The rest of house is well-furnished with our stuff, though we’d be happy to integrate yours if you have things. Communal storage & decoration in shared areas of the house is welcomed.

We share all utilities including a very good Internet connection. These average around $100/person/month in total.

You are:

Queer & kink & genderqueer/trans friendly, feminist and fat-positive

A grown up

Likes cooking & eating & sharing interesting food, esp more towards the vegetarian end of the spectrum (we are not vego, but meat occupies a pretty small part in our diets).

Good at using your words, and happy to talk openly about & agree to following some basic share house “rules” to keep the place at a comfortable standard of cleanliness for everyone.

Geek & fandom friendly, enjoys hanging out and watching a variety of tv/film/etc, from the trashy to the sublime (and sublimely trashy).