A couple days ago, I was thinking how it had been a while since I’ve made a new viz and I thought I’d head over to /r/datasets and see if I could find something interesting. What I ended up finding was the dataset of my dreams.

This dataset was compiled by some researchers from Denmark. It contains information on over 68k users and their question answers*. It’s pretty hefty and I’m still digging into it, but I wanted to throw something fun up here before I spend too much time falling down the rabbit hole. OkCupid is an incredibly rich source of data, as evidenced by their own data blog. Just to whet your appetite of things to come from this amazing dataset, I’ve made this exploratory viz to let you compare personality traits.

The main technology that drives OkCupid is it’s matching algorithm. It’s based on questions it asks you in which you choose your answer and how’d you like the other person to answer. These questions are all broken up into categories and also used to generate scores for different “personality traits.” For those who are curious here’s most of mine, minus some less safe for work ones. 😉

On that note, here’s the viz! More to come, I’m sure.

*Update: There’s been some controversy over the ethics of this dataset. The authors have since removed it from the linked website. I had already removed the user name column from the dataset because it was extraneous and I didn’t need it. I’ve now also updated my viz to not include as much potentially identifying information such as location. I don’t feel that looking at this data without that stuff is unethical, but if you have thoughts on the matter, I’d love to hear them.

Share this:

Stumbled upon a pretty fantastic group of Airbnb datasets for Amsterdam, Barcelona, London, NYC, Paris, Portland, San Francisco, and Sydney. You can find them here. Looks like things are spread across a few different tables so some join/blend action will probably be necessary. But on first glance, they look pretty robust. Enjoy!

Share this:

I found this link to the Beer Institute Brewer’s Almanac through a post of excellent vizzes from Data Knight Rises. There’s definitely a lot of info here, and while it may not all connect with each other directly, making dashboards difficult, it would probably do well with story points.