Category: data-and-semantic-web

If I had some money from all the people who sent me details of Tim Burners-Lee’s Solid I would have enough to buy a cheap flight to somewhere in Europe with a cheap airline.

Solid is meant to change “the current model where users have to hand over personal data to digital giants in exchange for perceived value. As we’ve all discovered, this hasn’t been in our best interests. Solid is how we evolve the web in order to restore balance – by giving every one of us complete control over data, personal or not, in a revolutionary way.”

Solid isn’t a radical new program. Instead, “Solid is a set of modular specifications, which build on, and extend the founding technology of the world wide web (HTTP, REST, HTML). They are 100% backwards compatible with the existing web.

The comparisons between Solid and Databox have been asked by many and I would certainly say Databox (regardless of its name) isn’t a place to hold all your personal data. You could use it like that but its more of a privacy aware data processing platform/unit. I remember the first time I heard about Vendor relationship management (VRM), it was clear to me how powerful this could be for many things. But then again I also identified Data portability as something essential while most people just didn’t see the point.

Everything will live or die by not just developer support, privacy controls, security, cleverness, but by user demand… and it feels like personal data stores still a while off in most peoples imagination.

Maybe once enough people personally experience the rough side of personal data breaches it may change?

In July 2018, the sales engagement startup Apollo left a database containing billions of data points publicly exposed without a password. The data was discovered by security researcher Vinny Troia who subsequently sent a subset of the data containing 126 million unique email addresses to Have I Been Pwned. The data left exposed by Apollo was used in their “revenue acceleration platform” and included personal information such as names and email addresses as well as professional information including places of employment, the roles people hold and where they’re located. Apollo stressed that the exposed data did not include sensitive information such as passwords, social security numbers or financial data.

Till this is a everyday occurrence, most people will just carry on and not care? Maybe theres even a point it should be part of the furniture of the web, like the new grey?

First time was from Gregor Žavcer at MyData 2018 in Helsinki. I remember when he started saying if you have no control over your identity you are but a slave (power-phased of course). There was a bit of awe from the audience, including myself. Now to be fair he justified everything he said but I didn’t make note of the references he made, as he was moving quite quickly. I did note down something about no autonomy is data without self.

This looks incredible as we shift closer to the Dweb (I’m thinking there was web 1.0, then web 2.0 and now Dweb, as web 3.0/semantic web didn’t quite take root). There are many questions including service/application support and the difficulty of getting one. This certainly where I agree with Aral about the design of this all, the advantages could be so great but if it takes extremely good technical knowledge to get one, then its going to be stuck on the ground for a long time, regardless of the critical advantages.

Its over 14 years since the dataportability project was founded by a bunch of well meaning people including myself. It was a challenging time with vendor lock, walled gardens and social guilt trips; to be honest little changed till very recently with GDPR.

Data export was good but user controlled data transfer is something special and one of the dreams of the data portability project. Service to service; not because there was a special agreement setup between the services but because you choose to move of your own freewill; makes so much sense.

In 2007, a small group of engineers in our Chicago office formed the Data Liberation Front, a team that believed consumers should have better tools to put their data where they want, when they want, and even move it to a different service. This idea, called “data portability,” gives people greater control of their information, and pushes us to develop great products because we know they can pack up and leave at any time.

In 2011, we launched Takeout, a new way for Google users to download or transfer a copy of the data they store or create in a variety of industry-standard formats. Since then, we’ve continued to invest in Takeout—we now call it Download Your Data—and today, our users can download a machine-readable copy of the data they have stored in 50+ Google products, with more on the way.

Now, we’re taking our commitment to portability a step further. In tandem with Microsoft, Twitter, and Facebook we’re announcing the Data Transfer Project, an open source initiative dedicated to developing tools that will enable consumers to transfer their data directly from one service to another, without needing to download and re-upload it. Download Your Data users can already do this; they can transfer their information directly to their Dropbox, Box, MS OneDrive, and Google Drive accounts today. With this project, the development of which we mentioned in our blog post about preparations for the GDPR, we’re looking forward to working with companies across the industry to bring this type of functionality to individuals across the web.

However! The devil is in the data or rather the lack of it. As the EFF point out theres no tracking data exchange, the real crown jewels. The transfer tool is good but if the services don’t even share the data, then whats the point?

Before I headed on holiday, I got a message from POF then OKcupid a day later, saying they need the request from the email which is on the account. Fair enough, so I forwarded each email to that email address and replied all to myself and to them but from that email account address.

A few days later I got emails, first from POF and then OKCupid.

You have recently requested a copy of your PlentyofFish (“POF”) personal data, and we’re happy to report that we have now verified your identity.

We are attaching a copy of your personal data contained in or associated with your POF account. The password to access the personal data will be sent in a separate email.

By downloading this data, you consent to the extraction of your data from POF, and assume all risk and liability for such downloaded data. We encourage you to keep it secure and take precautions when storing or sharing it.

The information contained in this archive may vary depending on the way you have used POF. In general, this information includes content and photos you have provided us, whether directly or through your social media accounts, messages you have sent and other data you would expect to see from a social media service like POF.

Please note that there is some information we cannot release to you including information that would likely reveal personal information about other users. Those notably include messages you received on POF, which are not provided out of concern for the privacy of the senders.

You have recently requested a copy of your OkCupid personal data, and we’re happy to report that we have now verified your identity.

We are attaching a copy of your personal data contained in or associated with your OkCupid account. The password to access the personal data will be sent in a separate email.

By downloading this data, you consent to the extraction of your data from OkCupid, and assume all risk and liability for such downloaded data. We encourage you to keep it secure and take precautions when storing or sharing it.

The information contained in this archive may vary depending on the way you have used OkCupid. In general, this information includes content and photos you have provided us, whether directly or through your social media accounts, messages you have sent and other data you would expect to see from a social media service like OkCupid.

Please note that there is some information we cannot release to you including information that would likely reveal personal information about other users. Those notably include messages you received on OkCupid, which are not provided out of concern for the privacy of the senders.

Sincerely,

OkCupid Privacy Team

So on my train journey from Stockholm to Copenhagen, I had a look inside the Zip files shared with me. Quite different, I’d be interesting to see what others will do.

Forrester, I – POF Records.zip

UserData.json | 6.2 kb

UserData.pdf | 40.5 kb

Profile_7.jpg | 30.1 kb

Profile_6.jpg | 25.0 kb

Profile_5.jpg | 17.4 kb

Profile_4.jpg | 18.8 kb

Profile_3.jpg | 26.6 kb

Profile_2.jpg | 11.7 kb

Profile_1.jpg | 30.7 kb

OkCupid_Records_-Forrester__I.zip

Ian Forrester_JSN.txt | 3.8 mb

Ian Forrester_html.html | 6.6mb

As you can see quite different, interestingly no photos in the OKCupid data dump, even the ones I shared as part of my profile. In POF the PDF is a copy of the Json file, which is silly really.

So the Json files are the most interesting parts…

Plenty of Fish

.POF don’t have much interesting data, basically a copy of my profile data in Json including Firstvisit, FirstvisitA, etc to FirstvisitE complete with my ip address. I also can confirm I started my profile on 2012-01-25.

Then there is my BasicSearchData and AdvancedSearchData which includes the usual stuff and when I LastSearch ‘ed and from which IP address.

Nothing else… no messages

OkCupid

OkCupid has a ton more useful information in its Json. Some interesting parts; I have logged into OKCupid a total of 24157 times! My status is Active? My job is Technology? The geolocation_history is pretty spot on and the login_history goes from July 2007 to current year, complete with IP and time.

The messages is really interesting! They decided to share one of the messages, so only the ones you send rather what you received. As the messages are not like emails, you don’t get the quoted reply, just the sent message. Each item includes who from (me) and time/date. There are some which are obviously a instant massager conversation which look odd reading them now. In those ones, theres also fields for peer, peer_joined, time and type. Its also clear where changes have happened for example when you use to be able to add some formatting to the message and you use to have subject lines.

Some which stick out include, Allergic to smoking?, insomnia, ENTP and where next, The Future somewhat answered, So lazy you’ve only done 40 something questions, Dyslexia is an advantage, But would you lie in return? No bad jokes, gotland and further a field, Ok obvious question, etc.

Of course the images are publicly available via the url, so I could pull them all down with a quick wget/curl. Not sure what to make about this idea of making them public. Security through obscurity anyone?

As long as you can see the picture above, OKCupid is making my profile pictures public

Now the images strings seems to be random but don’t think this is a good idea at all! Wondering how it sits with GDPR too, also wondering if they will remove them after a period of time. Hence if the image a above is broken, then you know what happened.

Then we are on to the purchases section. It details when I once tried A-list subscription and when I cancelled it. How I paid (paypal), how much, address, date, etc… Its funny reading about when I cancelled it…

OkCupid has received your recent request for a copy of the personal data we hold about you.

For your protection and the protection of all of our users, we cannot release any personal data without first obtaining proof of identity.

In order for us to verify your identity, we kindly ask you to:

1. Respond to this email from the email address associated with your OkCupid account and provide us the username of your OkCupid account.

2. In your response to this email, please include a copy of a government-issued ID document such as your passport or driving license. Also, we ask you to please cover up any personal information other than your name, photo and date of birth from the document as that is the only information we need.

We may require further verification of your identity, for example, if the materials you provide us do not establish your identity as being linked to the account in question.

Please note that if you previously closed your account, your data may be unavailable for extraction as we proceed to its deletion or anonymization in accordance with our privacy policy. Even if data is still available for extraction, there is some information we cannot release to you including information that would likely reveal personal information about other users. Those notably include messages you received on OkCupid, which are not provided out of concern for the privacy of the senders.

PlentyofFish (“POF”) has received your recent request for a copy of the personal data we hold about you.

For your protection and the protection of all of our users, we cannot release any personal data without first obtaining proof of identity.

In order for us to verify your identity, we kindly ask you to:

1. Respond to this email from the email address associated with your POF account and provide us the username of your POF account.

2. In your response to this email, please include a copy of a government-issued ID document such as your passport or driving license. Also, we ask you to please cover up any personal information other than your name, photo and date of birth from the document as that is the only information we need.

We may require further verification of your identity, for example, if the materials you provide us do not establish your identity as being linked to the account in question.

Please note that if you previously closed your account, your data may be unavailable for extraction as we proceed to its deletion or anonymization in accordance with our privacy policy. Even if data is still available for extraction, there is some information we cannot release to you including information that would likely reveal personal information about other users. Those notably include messages you received on POF, which are not provided out of concern for the privacy of the senders.

Best,

POF Privacy Team

Well I guess they are being careful at least but will be interested to see what other questions they ask me.

…Duportail eventually got some of the rest of her data, but only on a voluntary basis, and only after she identified herself as a journalist. Her non-journalist friends who followed suit never got responses to similar requests.

Finally armed with the 800 pages she had clawed back from Tinder, Duportail wrote a story reflecting on her own relationship with her data, and the myopic view Tinder had of her love life. I feel her story helps bridge the chasm between those with information stored in the database and the architects behind it, providing much needed neutral common ground to democratically discuss power distributions in the digital economy.

Given the popularity of her story, and my overflowing inbox, I would say many agree. And indeed, you should expect more similar stories to be unearthed in the future because of the upcoming General Data Protection Regulation (GDPR). From May 2018, the new European-level regulation will come into force, claiming wider applicability – including on US-based companies, such as Tinder, processing the personal data of Europeans – and harmonising data protection and enforcement by “levelling up” protections for all European residents.

…beyond the much older right of access, the true revolution of GDPR will come in the form of a new right for all European citizens: the right to portability.

It seems like such a small thing but actually it has the potential to be extremely disruptive. Heck its one of the things I wanted back in early 2011. Imagine all those new services which could act like brokers and enable choice! It could be standard to have the ability to export and import rich data sets like Attention profile markup language (APML).

I’m back at the Quantified self conference and it’s been a few years since due to scheduling and other conflicts. It’s actually been a while since I talked about the Quantified self mainly because I feel it’s so mainstream now, few people even know what it is, although they use things like Strava, fitbits, etc.

With home automation tools, it is now possible for your personal data to influence your environment. Soon, your personal data could be used to influence how a movie is shown to you! Let’s talk about the implications and ethics of data being used this way.

Its basically centered around the notion our presence effects the world around us. Directly linking Perceptive media and the Quantified self together. Of course I’m hoping to tease out some of the complexity of data ethics with people who full understand this and have skin in the game as such.

Storj is an open-source, decentralized, cloud storage platform. It is based on the cryptocurrency Bitcoin’s (BTC) blockchain technology and peer-to-peer protocols. The Storj network uses its own cryptocurrency, Storjcoin X (SJCX), while its front-end software supports the use of other digital currencies such as Bitcoin and more traditional forms of payment like the dollar. Unlike traditional cloud storage providers, Storj keeps data spread across a decentralized network eliminating the problem of having a single point of failure. It also encrypts all data making it impossible for anyone, including Storj, to snoop on users’ files without having a user’s private encryption key. In return for offering storage space to the network, users are paid cryptocurrency.

Well this is Storj and its frankly quite an amazing concept whoses time as come.

This is a very attractve setup for someone like me with many terabytes of storage and hyperfast broadband. Unlike the risks of running an Tor exit node, everything is strongly encrypted and the host has zero knowledge of whats being stored or transfered.

You could write tools and editors to make the recipes have everything needed to fit with the cooks skill level, ingredients, time, allergies, preferences, party size, etc… I mean who wouldn’t want to describe every aspect of their special dish? (I’m avoiding the copyright/licensing questions for now)

I read about W3C’s project Memento a while ago but its become a reality recently.

The Memento protocol is a straightforward extension of HTTP that adds a time dimension to the Web. It supports integrating live web resources, resources in versioning systems, and archived resources in web archives into an interoperable, distributed, machine-accessible versioning system for the entire web. W3C finds Memento work with online reversion history extremely useful for the Web in general and practical application on its own standards to be able to illustrate how they evolve over time

I was trying to find examples of what I meant but it’s very difficult googling for them as they get lost in a sea of other stuff, some of it very weird.

There was a period when a whole bunch of sites with domain names like…

youshoulddatejo.com, smartandhandsomeian.com and samwantstodateyou.com etc… (not real sites of course) Were the rage for a short while, they would pop up now and then. These people without knowing it could have changed the dating field. They all seemed to contain similar elements and it wouldn’t take long for someone or myself to modify microformat hresume into a hDating microformat (I’m not going to talk about Microdata or RDF/A as its outside the scope of this post, but yes to both). semantically rich data published on the web as way to bring a distribution model to online dating.

Steven was talking about the advantages of machine-readable Web Pages and his point knocks right at door of the walled gardens of the social networks. Swap social networks of facebook, instagram, etc for Match.com, EHarmony, OkCupid, etc’s walled gardens… and you got the same problem and same solutions?

But imagine if profiles were part of the public internet? When I mean public, I mean not hidden away behind a walled garden (hidden/private web). Because really what are you paying for, if you are paying at all?

I can hear you panic or even laugh… Here’s questions which might be crossing your mind

I don’t want my profile to be public!
This is fine, I understand some rather not be so open about their status. It doesn’t have to be connected with the rest of your online profiles by the way (this is down to you)It doesn’t necessarily need your name or even a public photo of you (there are many ways to verify someone without such information, think about what PGP, GPGP Escrow services, Ebay, Airbnb, etc do). Also like FoaF you can even hash or encrypt parts to avoid spam, catfishers, stalkers, etc. Maybe hide parts of your dating identity till its required. Theres endless possibilities, which I haven’t even explored.

How do I message or email someone, and what happens if things go south?South meaning, things start breaking up or you want to stop them messaging you. This is a partly solved problem. There no need to have you’re real email address. Services can step in and provide emails or instant messaging solutions which expire or forward on transparency. It could also be done with a standard protocol and encrypted for further privacy. Off the Record already does this, for goodness sake lets not build new protocols (badly or jokey) to do already solved stuff! (Yes this is what most dating sites are doing now)

How do I trust what I am seeing or reading?
The same is true of most dating sites now, how do you know the picture isn’t a catfish, they really are the body shape they say or show? How do you know the picture isn’t 10 years ago? All the dating site/service is really offering you is access to single people (not that is always true of course)
This is where the idea of a blockchain for online dating could come in quite useful, to verify with reputation, but if you don’t trust the technology. You can opt for something else… or even build your own! You only have to look at the people who have hacked OkCupid (Amy Webb and Chris McKinlay’s). Imagine what they could do if not restricted to the wall garden and the systems they could write for the rest of us.

But its easier to pay the money and sit safely within the closed garden?Safely…? Total illusions. But yes its easier, but you are limited by how much you are willing to pay. The open way you can have access to many more profiles, better ways to filter them and theoretically better solutions which you can share with friends.
This way also puts more emphasis on you to do work, but I can imagine systems and services like wordpress, medium, squarespace, etc doing the heavy lifting for you.

How would I search?
You don’t think some startup will jump into this arena? If not one of the big search giants?! The beauty is if you feel one is better than the other, you can easily switch. No rubbish claims, which can’t be verified. Just imagine when gocompare/money supermarket get involved to show you the best sites to find what you seek. Or imagine crowd sourcing this all.

But dating site x’s algorithm is great
Don’t worry there will be multiple services jumping over each other for your money, data or other things to prove they are the one you should use. Some will be highly manual, some will be heavily automated. Currently there is no urgency to fix, innovate or try something different. Its not all bad news for dating services, they can run their magic algorithms on the public data set.

But my dating service offers X, Y and Z.
Thats nice but have you thought how effective X, Y and Z actually are? Are they a distraction or actually making dating life better? Regardless… there is the perfect opportunity to have a ecosystems of services blossom and offer unique services on top of the open, machine readable profile network.

Distributed models are sustainable?

Think about the way search engines innovated on the structured data and offered better matches as a result. The important part is, if you don’t like what a certain service is doing or how they treat you, you can just move elsewhere without the fear of loosing access to that person still.

What I’m suggesting is similar but on your terms. There are other advantages such as having access to the biggest market of daters, personalised choice, better tools than one dating site can/want to create, bespoke advice and guidance from people who really give crap. This could issue in a new era in the art of match making!

But it doesn’t stop there, oh I’ve just scratched the surface. I feel a lot of the endemic corruption in online dating is due the centralised model.

Imagine if you could aggregate that profile into the legacy dating services. Almost a IFTTT recipe or Atomkeep? to update parts of your legacy profile on a schedule or manual push.

You could allow tinder to use one photo, OkCupid to upload 4-6 photos and a deeper description, Match.com only my photos marked professional and the deeper description.

All is possible if you rethink the current setup. unfortunately the controlling companies (MATCH group currently own 27% plus of the online dating market and they own, OkCupid, POF, Tinder and many more) have zero interest in changing much. On top of that daters seem quite lazy and less interested in working for dates?

As you can imagine, there isn’t much in this area but I did find fermat. Its a p2p matching platform. I have yet to really look and see if its doing things how I would imagine

I’ve just gotten a chance to play around with an early build of Now on Tap, Google’s wild new feature that, in essence, does Google searches inside apps automatically. It works like this: when you’re in an app — any app — you hold down the home button. Android then figures out what is on the screen and does a Google Now search against it. A Now search is slightly different from your usual Google search, because it brings back cards that are full of structured data and actions, not just a list of links.

When I first watched the keynote, I thought of the Tim Burners-Lee Semantic Web vision (paid pdf only now).

The real power of the Semantic Web will be realized when people create many programs that collect Web content from diverse sources, process the information and exchange the results with other programs. The effectiveness of such software agents will increase exponentially as more machine-readable Web content and automated services (including other agents) become available.

Its not the semantic web thats for sure, the problem is that its amazing and the user experience is magical but its all within Googles own stack. This rather bothers (even) me for many of the ethics of data reasons. I’m sure app developers may be a little miffed too?

What makes Google Now’s pull away from apps even more compelling is that it was joined at I/O by a series of gentle pushes in the same direction. Google’s doing everything it can to get us all back to the web.

Now if I think the Wired piece is interesting but they are shouting down from the wrong tree. Google are climbing another tree somewhere else. Ok enough with the analogies what do I mean?

If I saw Google on tap working in the browser instead of on top of apps I would be extremely impressed and be really making solid ties between Tim Berners-Lee’s agents in the semantic web. But instead we are left with something slightly disappointing, like a parlour trick of sorts.

Don’t get me wrong its impressive but its not the big deal which I first thought it was. I’m sure the Chrome team are already working on ways to surface semi structured data to Google now, and when they do… wow!