Post Tagged with: Development

It has invaded our English language and turned an entire generation of potentially deep thinkers into dangerously unopinionated feelers.

I am of course talking about “I feel like,” that verbal affectation I started to notice in my social circle–and my own language use–some months ago, and have since noticed everywhere, and almost always from people under the age of 35.

You can hear it almost anywhere you look, whether it’s on television, in movies, or in pretty much any casual conversation or workplace meeting. A quick Google Ngram search shows an explosion in the popularity of this phrase since 1960.

This phrase is almost never used to describe one’s feelings. It is almost exclusively used to describe an opinion. “I feel like that would look better in blue.” “I feel like the Republicans hate poor people.” “I feel like we should have a media strategy instead of a product strategy.” It is used when there is the slightest hint of disagreement, or any opinion is expressed that the speaker is uncertain about. In either case, it is used as a verbal softening: a way to have an opinion without having an opinion. “I’m not saying you’re wrong, I just feel like you’re wrong.” You can question my idea, but you can’t question my feelings.

Katie J.M. Baker over at Jezebel found that women use it more than men, although anecdotally I haven’t found that to be the case. Almost everyone I know uses it all the time, male or female. That said, she has an interesting explanation for why this verbal tic is so prevalent.

An “I feel like” preface implies that my feelings aren’t set in stone; they’re not necessarily rational or well thought-out. I strive to have faith that my opinions are worthy, but I don’t want to be the kind of person who is so convinced she has something important to say that she asserts every statement as fact, not feeling.

I have a theory that this phrase truly took hold in the feel-good, self-esteem driven, sharing-is-caring educational environment that permeated the 90’s childhood. We were always taught, in disagreement, to use personal “I” statements instead of accusatory “you” statements. For instance, if someone does something offensive to you, tell them how you felt affected, not how wrong they were. That way, we were taught, we would be better able to reach mutual understanding.

There is a legitimate point to be made for self-doubt and self-deprecation, especially regarding an opinion on an issue about which there are multiple points of view. It certainly makes the flow of conversation less confrontational and more nuanced. Ad hominems are much more difficult and arguably no longer ad hominems, for example, when you have to preface them with “I feel like,” as in: “I feel like you are an idiot.”

But what worries me about this phrase is how it has become a substitute for all disagreement, and a qualifier of any opinion. Instead of becoming more certain about ideas or beliefs we hold, we use verbal legalese to de-escalate our statements. Americans are becoming more willing to not understand their own opinions and to “feel” them out instead.

Even as someone who thinks people should say “I don’t know” a lot more often, I am troubled by the implications of this turn of phrase. For starters, there is such a thing as a right idea, and no amount of feeling is going to make up for credible evidence that an idea is right. Feeling something, instead of learning, knowing and expressing an opinion, is a cop-out: a way to avoid substantial debate to arrive at truth.

More importantly, by using “I feel like” to state opinions as well as facts, the speakers of our beloved language may very well start to lose the ability to distinguish the two. This is not an unfounded fear. In our society, we have an increasingly hard time understanding the difference between being offended and being right.

Justin P. McBrayer has examined this growing phenomenon, which if not caused by or causing, has certainly been helped along by our addiction to “I feel like” in everyday conversation.

The hard work lies not in recognizing that at least some moral claims are true but in carefully thinking through our evidence for which of the many competing moral claims is correct. That’s a hard thing to do. But we can’t sidestep the responsibilities that come with being human just because it’s hard.

He is right. It is hard work to understand the difference between feeling something and knowing something, between having an idea grounded in loosely assembled self-assurance, or a firm opinion grounded in evidence. Saying “I feel like” abrogates truth, and makes it easier to avoid doing the work of truly understanding an idea.

EphChat, which stands for, you guessed it, “Ephemeral Chat,” is a chat program with a twist. No data is stored server side, and messages are only visible to participants for 60 seconds before they fade away into nothingness.

Anyone can create a new chatroom with a random URL hash, or can create their own chatroom. Users are anonymous but you can edit your name if you wish. Messages are encrypted all the way to the server, where they are relayed to the chatroom participants and then immediately deleted. User sessions are stored until a user disconnects, then they are deleted, too.

Why did I build this? Well, partly as an experiment with Firebase, but also because I like the idea of people being able to communicate in an encrypted, anonymous way without governments snooping on them. Reporters can use this to do sensitive interviews; protesters under despotic regimes can use it to organize resistance.

The code is up on GitHub, which I felt was necessary to provide transparency into the app’s inner workings and security. I don’t usually make my repositories public, for fear of being ripped apart by the hackersphere, but if anyone is going to use this app they’re going to want to know how it works.

The genesis of this idea was a couple weeks ago when my cofounder said: “Would it be possible to see what percent of our email list was female or male based on their names alone?” Thus Drillbit was born.

In the last couple weeks I have been pouring over data sets and trying different formulas to find the best way to break down a list of seemingly random name data into digestible information. The resulting app allows anyone to upload their mailing lists and see who’s in them, and in perhaps the coolest feature, they can segment their list as well.

The Project

Drillbit uses publicly available datasets to create a likely demographic profile of mailing lists based on first and last names. Upload your mailing, customer or user list with first and last names, and based on that information we will create an age, gender and demographic profile of your list.

The Datasets

Listed here are the foundational datasets of this project, including for analysis tools that haven’t yet been released.

Methodology

The essential principle behind Drillbit is that an individual’s first and last names betray a lot of information about his or her background, origins, language, gender, and even income and ideology. Names can be both varied in their originality and popularity as well as conservative in their staying power. A surname can be passed down for generations, whereas first names have a tendancy to be cyclical.

As an example, take the name “Max.” It is a common name, or common enough it would seem, that one could find out very little information from the name alone. But as it turns out, “Max” only may seem common to us given its surge in popularity in the late 80’s and early 90’s–the birth years of the rapidly matriculating Generation Y. In 1974, only 400 Max’s were born nationwide!

Of course, baby name popularity is not a new idea. But the variance is astounding, and not just in terms of popularity. In 2012, the two most popular baby names for boys and girls were “Jacob” and “Sophia.” Unlike “Max,” both of these popular names seem to have spent the last 100 years on the up-and-coming list.

With this amount of unique variance in names–some names jump and others sink, some names are like fads and others never really take off–it isn’t surprising that, in the aggregate, it is possible to take a list of people and determine how old they are likely to be.

So that’s what I did. Using the above datasets on name popularity, I was able to come up with some pretty convincing initial results, benchmarking against existing lists I knew well.

The first step is to condense the data I had into a table which compared year of birth, and gender, with the % likelihood that any random “Michael” born in the last century was actually born in that year. For example, if 10,000 Michaels were born between 1900 and 2000, and 1000 Michaels were born in 1950, then 1950-M-MICHAEL has a 10% likelihood; i.e., given a random Michael, there is a 10% chance he was born in 1950.

With the charts above, you can see how this would play out. If you were to use Drillbit to upload a list of 5000 Jacobs, you would see the age match pattern roughly cohere to the above chart. The more Jacobs there are, the higher confidence we would have in the result.

There are some obvious complications with this model. The first is that although 10% of all Michaels might have been born in 1950, they would be over 60 now, and their chance of being around is much smaller than that of a Michael born in 2000. That’s where actuarial data comes in. Using the above actuarial table divided by gender, I was able to normalize the distribution based on likelihood of survival in each age cohort. No matter how many Max’s were born in the 1910’s, there aren’t a lot left today.

The second problem is that names are not unisex; in fact, most names in the database aren’t 100% unisex, Michael included. It became clear that age data had to be done on the basis of gender, and not on totals. Names that are popular with one gender are not necessarily popular with the other at the same time.

To compensate for this, age data was tabulated separately, all the way down to the actuarial normalization. Female names were rated and graded against each other, male names were separately, and only at the end were they normalized against each other.

Compared to Age, Gender and Race were quite easier. Gender analysis was a simpler form of the age analysis–likely names were divided by gender and then normalized by age. Race/ethnicity data was also quite simple based on surnames–the data was already organized by the Census, albeit 13 years ago, so getting it into a searchable database wasn’t tricky.

Limitations

There are some obvious limitations to my method. The first is in the nature of large numbers, or small numbers as the case may be. If you were to put a list of 2 names into Drillbit, it would spit out a similar looking demographic profile running the gammut of all ages and perhaps some different races as well. There are few names that are reliably “Black” names or “Over 65” names (although, there are a few names with a 100% incidence within one to five years–challenge you to find them). Like with any aggregate data project, the larger the list, the more reliable Drillbit will be.

The other limitation is in any sort of list that comes with existing biases. Say, a list of NBA players (heavily 25-35 and black) or a list of sitting US Congresspeople (heavily male, white, and 35-55). These inherent biases will be reflected in the anlaysis, but probably not to the extent that they could be. This is the House of Representatives, according to Drillbit:

Obviously 18-24 year old congresspeople would be impossible. And yet, even with a small list of 435 names, the trends in age in reality poke through.

In short, you shouldn’t use Drillbit to analyze a list whose composition is already known to you to skew heavily in favor of one or two demographics. However, it’s worth nothing that Congress is 18.3% female, and Drillbit predicted 20.5% based on names alone. Not shabby.

The final inherent bias that’s worth mentioning is in skewing toward younger ages. Since younger people are overwhelmingly more likely to be alive, post-normalized numbers skew younger. In addition, in development, I had a category be “Under25” but it became apparent that although my database could detect age variability all the way to Age 0, babies aren’t going to be on mailing lists, and they were throwing off all the results. So to compensate for the younger skew, I made a judgment call to make a cutoff at 18, and not track any younger cohorts, even though some websites may have 13-18 year olds as users.

Now that you know more about how I did it, upload a list and try it out!