TOS Word Clouds

Terms of Service: the important agreements we make with countless apps and web services but never actually read.

I decided that word clouds might be helpful to get a sense for what the terms of service for various social media are all about. Luckily, Andreas Mueller built a word cloud module for python and made it available on GitHub. It’s quite a nice piece of code. With it, you can build word clouds which match the shape and color of whatever image you want. (Or you can specify a width and height and have yourself a nice boxy word cloud.) The font used is also customizable. The font size of the words in the cloud is positively correlated with the words frequency (so the more often a word is used, the larger it appears in the image).

My first attempt was with the behemoth that is Facebook. I grabbed their terms of service and saved it in a text file. I found the font that facebook uses in its logo at Social Fonts. To shape the word cloud, I created a mask based on the Facebook favicon like the image below.

Mask of the Facebook favicon.

The words in the word cloud will be placed in the black regions of the mask. (Actually, they will be placed in any region of the image which is not pure white.) I want the font color to match Facebook’s trademark blue (#3b5998), so I built a square image of just that color, like the one below.

#3b5998

I wanted the small words to be legible, so the actual image I built was 2000 pixels by 2000 pixels. My code (which is at the bottom of this post) produced the following image. (Click to see full resolution.)

Facebook TOS word cloud with 2000 words.

I used a similar process for a couple of other popular social media platforms. In each case, I grabbed the terms of service from their websites, built masks of their logos, and matched the fonts and colors as best I could.

Snapchat TOSTinder TOSTwitter TOSWhatsApp TOSYik Yak TOS

These are fairly in line with what you might expect. Most of these services generate revenue by either directly selling information about their users (i.e. content) to third parties, or use that information to target ads. So the Facebook word cloud seems to loudly declare, “Facebook will use content.”

What was a little surprising was that Snapchat and Yik Yak both have a higher proportion of references to arbitration. Perhaps it’s because of this?

My code is below. I based it on one of the examples provided by Mueller. Since his python module is MIT licensed, the code below has the same license.