social media (29)

Decentralisation considered harmful

As I work - on writing standards, writing code, studying centralised and decentralised systems - and as I read the news and watch events unfold around me, bubbling away under the surface is always an unease. What if we're making it worse? We all have blind spots, limited experiences. And especially so since many of us working on decentralisting the Web are not amongst those who would benefit most from the purported advantages. Some of us have been working (or watching) in this space for years, decades, longer than the Web. But more of us, an only ever increasing number, have not. We are privileged, we are nerds, most of us don't have all that much experience, and we do not know best. We've jumped on this decentralisation thing as a solution to lots of global problems.

Towards the end of last week, Tantek prompted me to actually articulate some of what were previously just subconscious discomfort. How are the decentralised technologies we're working on going to make people more vulnerable?

Smaller attack surfaces: Large centralised systems have robust network architectures; lots of money and expertise to keep things running even if under attack (except when someone uses all of the Web-enabled kettles to DDOS them, but that aside). Many decentralised architectures imagine smaller 'pods' which federate. It's possible many of these servers will be run by volunteers, hobbyists, or small/poor organisations, and could be easily knocked over and kept down by malicious actors.

Quieter takedowns: We want it to be easier for small communities, perhaps vulnerable minorities, to create safe spaces in their own corner of the Web, and to be able to keep out those who jeopardise that. If these communities are 'disappeared' (perhaps made easier by the previous point) the rest of the Web might not notice until it's too late.

Illusion of control: We promote decentralisation as a way to control who has access to your personal/social data, and to be able to move it somewhere else if you want. But a key part of decentralisation is federation, or enabling access to your data by other systems, ie. so that you and your friends can use a different applications for the same thing, without that getting in the way of your interactions. This involves open data formats and standard APIs and likely complex access control setups. Most people tell me they can't get a handle on their Facebook privacy settings, and these are for a single unified system. Just because you could move your data to a different service, doesn't mean it's safe where it is.

Illusion of control 2: Normies look at me like I'm nuts when they find out I share more about myself on my personal website than they do on social media. I tell them I know exactly what I'm sharing, rather than having it slurped up by algorithms which monitor everything they click or hover over. My explicit sharing is greater, but my implicit sharing is reduced. Or so I think. Related to the previous point, my data is all public and nicely marked up to be machine readable. The confidence I have about the fact that I have to jump through inconvenient hoops of my own making to get it online is dangerous. If social media has normalised dangerous oversharing, and the general populace is starting to clock the 'dangerous' part, then decentralised social media runs the risk of convincing people their oversharing is 'safe' again, setting us back a decade.

The filter bubble: The easier we make it for people to avoid abuse online (just imagine for half a second that the decentralisation efforts are even close to solving this, k?), the easier we make it for people to filter out diverse points of view. The first thing I noticed when Twitter introduced its recent phrase filtering thing was a bunch of privileged liberals screaming about the filter bubble and completely missing the point. But anyway. If this is an either/or we're in trouble.

This is doubtless just the beginning of a very long list, and there are others thinking/writing about this as well. I'll update this post to list other articles as I come across them.

Why is three positive conference reviews more reassuring than one thousand facebook comments?

How do I know the reviewers are experts? I don't know who they are. I know the context and background of my facebook friends, I can critically interpret their replies. The reviewers are anonymous. They're people who had some time, just barely, probably during a commute or on the toilet. Or who owed someone a favour.

I've reviewed papers, usually having been asked to by the person who was supposed to be reviewing, about topics I barely know. I think: "it's okay, the other reviewers will swing it if I'm wrong."

Digital Memories

For about fourteen years I kept detailed daily journals. I couldn't sleep unless I'd written. And I couldn't just write "Went to school, Polly and Laura fell out again." I had to write every detail. Food, friends, lessons, homework, learning to make websites, gerbils, cats (those are all the things associated with being a normal teenager, right?), and all the thoughts and feelings associated with everything I'd encountered that day. It was habit, and compulsion. It was also a burden. When I went away, I had to plan to carry a notebook with me, and keep it somewhere other people wouldn't get hold of it. I had to make time before bed to write, usually for a full hour. If I had to skip a night, I had to double time the next night, and felt the pressure of holding onto the memories I hadn't manage to record for an extra day.

This continued during my undergrad. But was harder. Entries got shorter, even though more was happening. I'd pull spontaneous movie, coursework or just hanging-out all-nighters with friends who lived across town. Even if I happened to have the journal on me for some reason, ducking into another room for an hour to write would have been.. pretty weird. Sometimes I'd miss a few days, then spend a full afternoon, day or even weekend catching up. And while I was doing that... I was missing out on other things.

I have a heavy box containing dozens of notebooks. This is also a pain when I move house (as noted by people who have moved it for me). After living things, it's what I'd save in the event of a fire. It's really heavy though.

After moving to Edinburgh, I found less and less time to write. I'd do catch up spurts, but they became less frequent. It became exhausting. Eventually they seemed fruitless because I couldn't remember nearly as much detail as I wanted to. At first this was terrifying. I've read back journals from years ago, and don't remember thinking or feeling a lot of what I wrote about. If I hadn't written it down at the time... it'd be gone. So by not writing in the present, I'm depriving my future self of a lot. But eventually not-writing normalised, and the burden started to lift. When it was no longer a compulsion, it was no longer painful when I missed it.

My social media use was happening in fits and starts around then too. Posting to facebook probably helped to ease into worrying about journaling less, as I was recording my day-to-day elsewhere. But when I realised banal status updates had become a compulsion too, coupled with some people taking facebook activity far too seriously, I deleted everything from there (post by agonising post). I didn't think to export it beforehand. I was using twitter, but not for personal stuff.

We pour a lot of ourselves into digital archives, one way or another. But how do we get it out again? Why do we need to? Most of us post to silos like facebook and twitter, providing fuel for the corporate advertising machine and seeing only fleeting value for ourselves. My skepticism of this restricted how I used social media. Then I got into this decentralised social web malarkey, and my journaling addiction started to re-stir. Now I'm posting a lot again. Some of it makes it to twitter, but I post more solely on here, rhiaro.co.uk. It's not the journal, daily records material of old, but shorter, realtime, in the moment posts that in aggregate provide a record of the day (particularly as I pull in more quantified-self activity tracking type stuff).

And the burden is back.

I noticed it when I started posting less again this past month. I was tracking every single thing I ate for long enough that it became habit; tracking when I left or arrived at home, office, meetings, events, social occasions, and more. I stopped tracking both food and locations because some bugs have materialised in my code and I haven't got around to fixing them yet. It was agonising to start with. But as I still haven't made time to find out what the problems are, I realise I don't miss the stress of trying to log food or check-in on a poor mobile connection or worse, scribbling notes on paper to back-fill later when I can't do it in the moment. Once I started logging something, it didn't seem worth doing unless it was complete, unless I logged everything. Things missing made me anxious. Wtf?

Not everyone has this problem. Some people (so I've heard) post to social media because it satisfies an immediate need, and what happens to it after that doesn't matter. Many people aren't interested in the Own Your Data mantra, because this 'data' is ephemeral, not archival. Some people are totally happy to drop their thoughts and feelings into the black hole of the social media machine, never expecting to get them back. How freeing that must be.

I've never been blackout drunk and I have never understood the appeal; I'm kind of terrified of the idea that in a few years time and I want to look back over my years in Edinburgh there are going to be enormous gaps. Does that mean the memories I have retained are the only ones that were worth keeping? Or am I poorer because the things I neglected to record are gone forever?

Collectively, the Web-privileged world is recording an insane amount of unstructured personal data; so many fleeting thoughts and feelings and desires and needs. Where did this come from? Didn't we used to manage fine without? If it's a sign of progress, maybe we should be using it to progress. Whether it's stored under the originator's control or surrendered to a corporation, all together we have a detailed picture of what it is to be (certain types of) human. But nobody is using this for anything other than personalisation, recommendation, profiling... selling more crap to people. Except for the academics doing cool disaster-relief stuff with realtime twitter data: props for that.

Imagine if we could tap into the historical archive and use it to understand different perspectives, to boost empathy and tolerance. To create a concrete, collective ancestral memory that helps us build a better future for everyone.

If we're not going to do that, we should probably focus on living in the moment a bit more. I feel like that is healthier, but it goes against my impulse to (at least try to) record and permanently store everything.

Given all this, you'd think I'd have a better strategy for automatically backing up my database.

WebSci2015

ACM Web Science 2015 (Oxford, UK) was the complete opposite of WWW2015. Almost to a point of saturation... but I shouldn't complain :)

With a heavy weighting of social science (or social science influenced) papers, despite still having a majority of computer scientists in the audience, most sessions and panels were about ethics, privacy, digital rights, inclusivity and a 'pro-human' Web. The focus was overwhelmingly social media, with a side of Internet of Things, plus a weird smattering of robot ethics. Usually a focus on social media means lots of SNA, detecting content trends, and user profiling and other things I hate, but instead substantial discussion of people as people, rather than users (or targets), was refreshing. There was also plenty of work on sites other than Twitter and Facebook! Such as individual blogs, social gaming sites, specialist and smaller communities. Though I'm not sure if we figured out any solutions to the personal data crisis.

That's not to say it was all social! There were technical talks too, including a few papers on linked data.

Max presented our paper The Many Dimensions of Lying Online during the first Online Social Behaviours session, and along with the other three papers presented formed a great narrative about the complex and nuanced nature of online identities, and how they both affect and are affected by technical systems.

I'm not going to write up content here. Instead, see all my posts from during the conference at /tag/websci15.

Other things. The structure of the conference felt unusual (in a good way); parallel paper sessions were dispersed amongst panels and workshops. All sessions were very interdisciplinary. The catering was really good. There were fewer people than I expected.

WWW2015 reflection

I enjoyed WWW2015 because I got to hang out with emax and Dave and other SOCIAM people and talk about decentralisation, and eat lots of vegan gelato, and was still on a travelling and hacking high from the two weeks prior.

But content-wise, what I saw was disappointing at best, largely depressing. I unfortunately missed most of the SOCM workshop, which I'm sure would have catered to my tastes a lot more, as my presence was mandated in Microposts2015 on the same day. The Microposts workshops have a bias towards SNA of Twitter, in which I'm tangentially interested, but there's only so much SNA I can take and I think I took it all in 2013. The social science 'track' (two papers) was most interesting. I much prefer when social network users are treated as complex beings than dots and lines.

I flitted between tracks on security and privacy, and web science for a couple of days. I deliberately avoided anything to do with social networks which turned out to be damned difficult. Web science can be a bit big-data-analysis heavy, but often has some interesting human-social (as opposed to social-data) stuff to pique my interest. I went to security and privacy sessions because it's almost guarenteed to not have any social networks, and is usually either scandalous or practical, or both. Often over my head, but also often contains concepts directly relevant to my day-to-day that I wouldn't pick up on otherwise.

I almost cried at the programme for the last couple of days. Literally every session was about social data mining, bulk analysis of social data for tracking and profiling social network users, targeting advertisements, and generally selling people stuff. There was even a track called 'Monetization'. I figured the safest bet would be the W3C track which promised awesome WebRTC demos, which Claudio from Telecom Italia delivered, but then he [reminded](socialwg irc minutes) me that their interest in this stems from wanting to do live product placement in streaming video based on people's interests from social media and ohmygodicannoteven. I tweeted in anguish for a while, then dret summed it up pretty well:

"generally speaking, i'd like to see more "how to make the web work better" at #www2015, and less 'how to make more money with the web'." - @dret

In desperation (I'm on a linked data burnout currently) I dropped by the RDF session but it was ten similar-but-different-mine-is-better-i-promise ways of doing entity recognition - or if it was anything else it was so far divorced from practical application - and I just don't care.

In a last ditch attempt to learn something interesting, I went to the Industry Knowledge Graphs pecha kucha session. Google, Micorsoft, Elsevier and Tagasauris talked about how great they are at absorbing all the data. And.. getting the crowd to curate it nicely... and... not giving any of it back... oh. Because if someone uses it and it's wrong they might get sued? Sure. Whatever. Oh and then Lora Aroyo broke my heart by describing how to make anything and everything a 'shoppable experience' and Dave and I bolted to join Max in a coffee shop.

I had a great ciocolatte with almond milk, then we went for pre-dinner spaghetti in a rave cafe and talked about decentralisation and I recovered.

Overall it was a pretty productive couple of days, because I wrote this post about ActivityStreams and did some more AS2.0 experiments, fixed some bugs in my micropub endpoint and templates, and tweaked my CSS and added my /travel page. And evangelised the Social Web WG to Max a bit. Maybe I couldn't have done that if the conference had captured my interest, who knows.

And I discovered polenta toast with porcini mushrooms, and ate a lot of different flavours of vegan gelato. That's a net win.

Twitter to Linked Data

Or perhaps more generically, Status to Linked Data. But Twitter is the only thing I update with statuses, so I'll start with that.

I want to be able to:

Parse the whole backlog of my tweets and turn them into Linked Data along the same schema as my blog posts will be.

Tag all tweets with the same tagset that my blog posts use, automatically from hashtags or keywords, followed by a manual second pass. The mapping of hashtags and keywords to tags for the 'automatic' tagging will be done manually, I should think. This, then, needs a nice interface.

#SSSW2013: Social semantics and serendipity

We started work on the serendipity project before breakfast today, although I
didn't make it down as early as some of my teammates.

To start the day, Fabio Ciravenga talked about some really exciting practical
applications of monitoring and analysing social media streams. It's
particularly interesting during emergencies, or large events where problems
might occur. The people on the ground make the perfect sensors if you can
work out the differences between people who are saying something useful and
who aren't; people who are really there, and people who are speculating or
asking about the situation. A main problem has been that people tweet crap.
They were trying to monitor a house fire, but so many people were tweeting
lyrics from Adele's various singles at the time, which all apparently contain
references to fire, it was almost impossible.

They also put (or tapped into existing) sensors in peoples' cars to monitor
driving patterns with the aim of more fairly charging for car insurance. I
told my Mum about this the other day, and she was pretty alarmed by the idea.
Which made me wonder how they'll get mass adoption, if it's going to go
anywhere.

Fabio did have some interesting things to say about using all this data
ethically though, and never working for someone who is going to take that away
from you. But in case the 'bad guys' do find out about all this data you have
about people, keep a magnet handy.

This was followed by a hands-on session where we got to mess with a mini
version of the twitter topic monitoring system that Fabio's team use at large
events, to try to answer questions about the Tour de France only by
manipulating the incoming social media streams and following only links which
came through that.

Spanish omelette sandwiches were an amazing outdoor leisurely lunch. We
headed to the pool down the road and chilled out there for a couple of hours.
Us tough British folk found the water pleasantly tepid, whilst all those wimpy
Europeans and Latin Americans shivered on the grass. They'd made such a fuss
in advance about how cold the pool was going to be.

We regrouped that afternoon to work on Project Cusack, creating a slide deck
of pictures from Serendipity. I don't like slides with too much to read on,
so I enforced this. The imagery from the movie will be lost on most people,
but we have at least managed to choose pictures of John Cusack with
appropriate expressions for each part of the presentation. We worked outside
in the forest, because Oscar's 3G was faster than the residence wifi.

We also brainstormed for the required short film, which we only just
discovered doesn't have to be about our project.

We returned to the residence to find everyone eating ham and cheese, and
attempted to get some shots for our film, but other people were unwilling to
participate.

That evening we ate tasty vegetable soup, weird (in a bad way) pasta in a
creamy onion sauce, and chocolatey ice cream cake. The tutors spontaneously
organised a game where students had to arrange the tutors by age, which was
funny. Someone suggested the tutors ought to play it with the students.
Obviously there were too many students, but they elected to find the youngest
student, and that turned out to be me.

#SSSW2013: Practical semantics and human nature

Harith Alani talked about using semantics to solve problems around evaluating
the success of social media use in business. The SIOC ontology is widely used
to describe online community information. It's not as simple as measuring
someone's engagement with a brand's online presence - people are 'likeaholics'
on Facebook, so you have to look at someone's whole behaviour profile to judge
whether their like means anything or not. It's no good just aggregating your
data and spewing out numbers - you have to browse the data and try to
understand where it came from.

He mentioned how little work has been done in classifying community types.
Most of the work that has been done seems to be with social networks internal
to an organisation. A bottom-up approach to community analysis can handle
emergent behaviours and cope with role changes over time. Looking at
behaviour categories and roles can help an organisation to decide who to
concentrate on supporting and how in order to sustain the community. The
results they have seen so far suggest that a stable mix of the different types
of behaviours are needed to increase activities in forums - but they don't
know what causes what. They're reaching a point where they can use their
behaviour analysis to guess what's going to happen to a community: how long it
will last, how fast it will grow, how many replies a certain type of post is
likely to get, etc.

Next they want to be able to classify community types, and be able to look at
activities within a community over a period of time and automatically discover
what kind of community it is; it might be something different than what it was
set up for.

They created an alternative Maslow's Hierarchy of Needs to correspond with
activities seen on forums, and found that most people are happy to stay at the
lower levels of the hierarchy. For example, join a community, lurk for a bit,
ask one question and leave. Not everyone wants or needs to be a power user.

Papers are being written that find patterns in individual datasets for a
particular community in a particular context. Harith and his team are getting
tired of this; they want to generalise across communities. So they took seven
datasets and looked at how the analysis features differed as well as comparing
the results across community types, randomness (vs. topicality) of datasets,
and compared similar experiments.

Upcoming work includes the Reel Lives project, in which UoE is involved.
They're taking media fragments - photos, videos, audio clips, text recorded as
audio - and creating automated compilations to tell a story.

From Tommaso Di Noia's talk, I learnt that recommender systems have a lot of
maths behind them, especially for evaluating things, and reinforced something
I already knew: I don't maths good enough to be taken seriously by most of the
Informatics world. I think I understand the principles behind the maths, but
when something is descried in just maths, I have no idea what it relates to.
I'll work on this.

Real world recommender systems use a variety of approaches, including
collaborative (based on similar users' profiles); knowledge-based (domain
knowledge, no user history); item-based (similarities between items); content-
based (combination of item descriptions and profile of user interests).
Linked Open Data is used to mitigate a lack of information about entities, and
helps with recommending across multiple domains. You do have to filter the LD
you use before feeding it to your recommender system though, to avoid noise.
Notes here.

Tommaso's talk was followed up by a hands-on
session, where we got to poke about with some of
the tools he mentioned, including FRED
(transforms natural language to RDF/OWL); Tipalo (gets entity types from natural language text); and
using DBpedia to feed a recommender system.

Then we worked on our mini-projects for the afternoon. We made some progress
towards breaking down the concept of serendipity and working out what
properties we might need to represent as linked data, and how we could
observer a user and work out if/when/how they were having serendipitous
experiences without intruding too much.

In the evening we took a coach to 'nearby' historical town Segovia.
Apparently an extremely motion-sickness-inducing two and a half hour coach
journey around twisty mountain paths is 'nearby'. Fortunately I was
distracted from this horrible journey by a conversation with Lynda Hardman,
which I wish I had recorded. Lynda challenged various aspects of my PhD until
I could explain/justify them reasonably, including:

Why digital creatives? (I'm used to that one now).

What is the outcome?

Why Semantic Web for this?

She also recommended a number of resources, including theses of her recent
former students to help me with a structure for my own, and advice on
maintaining a healthy balance between thinking and doing.

Plus she used to live in Edinburgh, more or less across the road from where I
live now. Cool. Thanks Lynda! You haven't heard the last of me :)

#travel

Once we got to Segovia, we had a guided tour of the ancient Roman
architecture, interesting building façades and local legends. It was a very
good tour, but too hot to really focus. Then they took us to a restaurant for
a local speciality. I was all set to write a whole individual blog post
surveying the barbaric nature of human beings, but I didn't do it straight
away and now the passion has faded slightly, so I'll leave it at a paragraph.
Some people watched the local 'ceremony' out of morbid curiosity I imagine,
but it was the fact that so many people took so much pleasure in the idea of
violently hacking up bodies of three-week-old piglets that really bothered me.
Fortunately the surging standing crowd allowed me (and only one other) to
inconspicuously sit it out. The veggie option was tasty, but it was difficult
to really enjoy the rest of the evening whilst wondering vaguely about the
states of minds of most of the people I was sharing a table with.

Which behaviour categories you need to cater for more than others? How roles
impact activity in online community.

Consistently see that you need some sort of stable mixture of behaviours for
activities in forums to increase.

==> Don't know what's causing which.

What is a healthy community?

Use behaviour analysis to guess what's going to happen to community. Eg.

Churn rate.

User count.

Seeds/non-seeds prop (how many / if people reply to you).

Clustering.

Unexpected: the fewer focused experts in the community, the more posts
received a reply.
(But quality of answers?)

Community types (Little work in this space)

Muller, M. (CHI 2012) community types in IBM Connections:

Communities of Practice

Teams

Technical support

..

.. (see slides..)

Need an ontology and inference engine of community types.
Wants an automated process to tell you what type of community it is - it might
be something it wasn't set up for.
Then you could determine what sort of patterns you would expect to find.
Noone has done this yet.

Measurements of value and satisfaction

Answers different across communities. They ran it on IBM Connections -
corporate community.

Most of this work is for managers of communities - see what's happening and
help to predict what might be coming next.

Can classify users based on Maslow's Hierarchy of Needs?
Mapping the hierarchy to social media communities.
~90% users happily staying at the lower levels of the 'needs hierarchy'.

Behaviour evolution patterns

What paths they follow over time.
eg. people who become moderators eventually.

Engagement analysis

What's the best way to write a tweet so that people care about it?
Which posts are likely to generate more attention?

Getting bored of people finding patterns in individual datasets. What can be
generalised to other communities?

So experimented with 7 datasets and looked at how results differed across:

Information, support and services reciprocated between members ( overlap with ^ ?)

Shared context (culture, language)

Can be applied to virtual and offline communities.

-> More attributes = clearer example of community.

Preece:

Social interaction

Shared purpose

Common set of expected behaviours

Computer system that facilitates and mediates communication

^^ Things in common. Whittacker's is more inclusive/broad.

So for a SW SN:

Accessible via browser

Explicit links between users

System supports creation of these links

Links are visible and browseable

COP or SN may describe a community, not necessarily. IBCN will do, and could
be a COP or SN too.

Problem of Amateur Fic. is fluctuation of archive. Personal sites go down
etc. How to find a story you remember a bit of?

IBCN is also combination of WBSN and virtual community.
Lack of incentive to use FOAF (eg. on LiveJournal etc) (Plus ignorance).
Doesn't offer anything they don't already have.
They don't use much metadata, just tons of human-readable stuff.

SW would allow:

"better integration of distributed systems"

"improved searching and filtering"

"more personalised services"

experienced users

expand options

new ways to interact

new users

ease introduction re: unwritten rules, expectations, terminologies

FOP extension to FOAF for anonymous identities.
('Fan Online Persona' - why not just 'Online Persona'?)
Consistency likely in community-based system because of advantages of
reputation etc. Identity cost.
Shared set of behaviour values, or risk losing rep.
Reputation gained by taking part. (definitive part of community).
Additionally by creating works.

foaf:document and foaf:groups allow users to give details about their own
creations and review work of others.

OntoMedia to describe content complements FOP.
Options in FOP gathered from study of metadata of works in mailing lists,
websites and groups.
Recommender system -> notification system.

Allow SNS of writers to be studied at friend level and collaboration level.