Posts categorized "Bias"

The New York Times has been making waves this week featuring management practices at Amazon and workplace tracking practices at various companies (link). These are essential references for how data make us dumber.

I am going to ignore the shocking claim by the journalist who stated that GE is "long a standard-setter in management practices." To give him some credit, he did not say "good" management practice. It is true that business schools like to glorify GE managers. But the most famous GE doctrine is to line all employees up at the end of the year, and give the bottom 10% pink slips. (See Jack Welch's Wiki page.) This practice is of the same cloth as the "purposeful Darwinism" that was vilified in the article about Amazon.

What I want to focus on is the completely bonkers line of argument paraded by software vendors who sell workplace tracking (i.e. surveillance) tools.

1. The performance of your workers is completely measured by our continuous and usually stealthy tracking of data.

2. Because of the continuous and stealthy nature of tracking, the data are objective, unbiased, trustworthy, and accurate.

I couldn’t imagine living in a world where I’m supposed to guess what’s important, a world filled with meetings, messages, conference rooms, and at the end of the day I don’t know if I delivered anything meaningful.

So what are the data that would allow each worker to know every day whether they "delivered something meaningful"? The article mentioned just two types of data: the usual tracking of how people spent their time at work; and little notes workers are encouraged to send to bosses to "nudge" or "cheer" each other.

Just because you can count "nudges" or "cheers", or you can count the words, or pairs of words, or triplets of words, most frequently associated with someone, doesn't mean you know anything meaningful about their performance.

In fact, a lot of this data are manipulated, and probably worthless.

Even within the Times articles, there are multiple examples of why employee notes are not to be trusted. "People wouldn't put something negative in a public forum, because it would reflect poorly on them," said one vendor. At Amazon, employees reported that the secret feedback system is "frequently used to sabotage others". I find it hard to believe that we even need proof of such behavior. In fact, that is one of the key points I made in Numbersense.

Counting emails, or minutes spent on the work computer, is similarly pointless. Someone who spent 20 minutes on the computer is not necessarily more productive than someone who spent 10 minutes working and 10 minutes web-surfing random things. The former employee might be slower, or confused, or learning on the job, or day-dreaming. Again, it's hard to believe that we even need proof of this point.

There is a tendency to believe that data have intrinsic value. One of the worrying trends in the age of Big Data is insufficient time spent understanding if the data collected measure the right things, and whether the analyses provide even marginally trustworthy answers to the questions being asked.

In our newest column, we take on the recent media obsession with companies who make robots that hire people. (link)

As with most articles about data science, the journalists failed to dig up any evidence that these robots work, other than glowing quotes from the people who are selling these robots. We point out a number of challenges that such algorithms must overcome in order to generate proper predictions. We also discuss why measuring the outcomes of these predictions is so hard: one problem is we have no objective standard for someone being the "correct" hire; another is the action we take based on the predictions affects the outcome that was predicted.

This piece is part of the StatBusters column written jointly with Andrew Gelman. Hope they fix the labeling soon. In it, we talk about two recent studies on data privacy, which leads to contradictory conclusions. How should the media report such surveys? Is the brand name of the organization enough? In addition, we debunk the notion that consumers will definitely get something valuable out of sharing their data.

I only read nutrition studies in the service of this blog but otherwise, I don't trust them or care. Nevertheless, the health beat of most media outlets is obsessed with printing the latest research on coffee or eggs or fats or alcohol or what have you.

Now, the estimable John Ioannidis has published an editorial in BMJ titled "Implausible Results in Human Nutrition Research". John previously told us about the crisis of false positives in medical research.

Oops, here are some statistics on nuitrition "science":

In 52 attempts at using randomized experiments to validate findings from observational studies, the number of times the findings were replicated: 0

In the NHANES questionnaire (the basis of all those findings), two-thirds of the participants provided answers that imply an energy intake that is "incompatible with life". I haven't read this paper; seems like worthwhile reading.

There are at least 34,000 papers on PubMed with keywords "coffee OR caffeine" which means this one nutrient has been associated with almost any interesting outcome.

Almost every single nutrient imaginable has peer reviewed publications associating it with almost any outcome. A statistician should never give the advice "If at first you don't succeed,..."

Many findings are entirely implausible (and still get published in top journals)... for example, the idea that a couple of servings a day of a single nutrient will halve the burden of cancer is clearly "too good to be true," even more so for anyone who is familiar with this literature

"Big datasets just confer spurious precision status to noise"

Randomized experiments offer hope but are woefully undersized (like requiring 10 times the current sample).

Just to nail home the point, John concludes: "Definitive solutions will not come from another million observational papers or a few small randomized trials."

I mentioned the Harvard Business Reviewarticle on business use of customer data in the "Big Data" era. In the previous post, I looked at the nature of the evidence used by the authors. In this post, ignoring my discomfort with some of the evidence, I examine the conclusions of the article.

***

The report has a three-part structure: the first section describes the issues; the second section communicates results from a few surveys conducted by frog - a global strategy and design agency - on various issues related to data privacy; and the third section presents examples of their recommendations for clients, which they offer generally to businesses involved in collecting and monetizing customer data.

The survey results are revealing (although the sample size of 900 in five countries is tiny so I'm not sure you should believe them). The agency found that 97% of the people surveyed are concerned about businesses and governments mis-using their data. Seventy-two percent of Americans are reluctant to share information with businesses because they "just want to maintain their privacy".

The authors also learned that consumers have grossly under-estimated the extent of data collection. Only 25% of the respondents said they knew businesses tracked their location, and only 14% said they knew businesses shared their web-surfing history. Finally, their analysts attributed dollar value to the privacy of different types of data.

I follow them up to this point. In fact, the authors summed it up very nicely at the beginning of the article: most [companies] prefer to keep consumers in the dark, choose control over sharing, and ask for forgiveness rather than permission.

Unfortunately, I am let down by the list of recommendations that follow. They feel to me like tweaks on failed ideas, rather than paradigm shifts.

***

The first recommendation is "educate the consumers". The authors gave an example of one of their own consulting clients who required "customers" to watch a video and give preliminary consent before sharing their own (genomic) data. And the personal data is withheld until the "customer" returns a hard-copy agreement.

We don't need to be reminded that every day, we "voluntarily" sign Terms and Conditions which no ordinary person actually reads. Frequently, we are told not to use a website if we don't agree with any part of a lengthy agreement written in one-sided language favoring the business.

The "new" solution doesn't change the status quo. In fact, it gives businesses a stronger case for arguing that their users have voluntarily given up the right to their own data. In my view, until businesses confront the issue of properly disclosing how they collect data, what information is being collected, and how such data are being sold or traded, consumers will continue to find such practices creepy.

***

The second recommendation looks good on paper but is impractical. Another one of frog's client is featured here. This client allows customers to specify which pieces of data can go to whom.

Assume there are 100 variables (only!) being collected and five levels of access control. That amounts to 500 yes/no questions each user is required to answer in order to gain full control of the data. In practice, most users will decide not to bother because it is too complex and time-consuming. The solution is a form of suffocation by paperwork.

For the data analysts, such a solution creates headaches. It generates self-selected data of the worst kind. Each variable has its own source of bias as different subsets of users decide to withhold their data for their own reasons.

To implement such a system properly requires a herculean effort. Say I reviewed the list of 100 variables and divided them into five groups of 20 variables using the five levels of control (from allowing anyone to see my gender to hiding my age from everyone). Two months later, I changed my mind. I removed access to 80 of the 100 variables from everyone. Now, the database administrator should find all instances of those 60 variables and delete them. Some of the data may already have been sold to other entities, and what if those other entities re-sell my data after I asked for the data to be deleted by the original source?

***

The last recommendation is an argument that businesses should not need to pay users for their data. Given the finding in the second section that users assign meaningful dollar values to their data, this seems to be a solution for businesses rather than for consumers.

Pandora's free advertising-supported service is used as an example of customers' willingness to exchange their privacy for "in-kind value". The article failed to mention just how much money Pandora has been paying for such data! As this other HBR article tells us, Pandora is "13 years, 175 million users, little profit". It has never been able to establish a profitable business model because while 80% of its revenues come from advertising to those "free" accounts, 60% of its revenues immediately goes out the door as royalty payments for the "free" music! It's not surprising that many consumers are willingly engaging in this lop-sided exchange with Pandora.

***

I often wonder if consumers realize that over-sharing their data works to their disadvantage, would they become more interested in how businesses use their data?

For instance, insurance companies will be very interested in acquiring data from personal analytics devices, like Fitbit. They will use the data to predict whether you have health risks, and they will charge you more for insurance. Everyone is at risk for something.

The Uber app gives its users the ability to track their drivers -- in Manhattan, it's like watching a horse-race when your driver tries to negotiate the city gridlock. The same data is used by Uber to get an accurate picture of supply and demand, which drives their surge-pricing algorithms. That's how you end up paying five to ten times the normal cab rate.

Businesses use personal data to reduce information asymmetry, which in the past prevented them from extracting maximum value from consumers.

Today, the data privacy question is phrased as "Company X would like to collect information about your heart rate and in exchange, you will get notified if any irregularity is detected. Are you willing to share such data with Company X?"

Imagine you are asked a different question: "Company X would like to collect information about your heart rate and in exchange, you will get notified if any irregularity is detected. Being notified of heart-rate irregularity may help you but 80% of the warnings will be false alarms. Also, your heart rate data will be used by our insurance arm to adjust your insurance premiums. There is a 50% chance that your premium will increase after sharing your data. Are you willing to share such data with Company X?"

Last time we heard about Deflategate on this blog, Warren Sharp compiled some statistics on fumble rates, showing that the Patriots were unusually good at avoiding fumbles. (link, link) I thought the level of analysis was "above average" and remarked that statistical evidence of this type can only get you so far. The metric is indirect, and it does not speak to causation.

The official investigators have now issued their report. New York Times has its coverage here. As one reader commented, this article, currently nearing 800 comments, has more comments than most articles with more serious subject matter. The NYT article is one of the better ones out there on this subject.

Two set of new evidence has emerged.

The first, which is getting most of the headlines and attention, are text messages involving two Patriots employees who discussed their deflating operation. These text messages are highly incriminating for the two involved and for me, also incriminating for Tom Brady, the team's superstar quarterback (who refused to release his own text messages or other correspondence to the investigators). The text messaging evidence shores up the causal evidence in a way that numbers by themselves could never accomplish.

The takeaway from the text evidence is the power of "metadata". Metadata is data about the text messages (sender, recipient, date and time of sending, length, etc.), as distinct from the content of the texts. Metadata went mainstream when the U.S. government was revealed to have been massively scooping up metadata on domestic phone calls, but denied collecting contents of said phone calls (See thesecoverage, for example). The investigators can use metadata to learn who else is in the circle of insiders, how often they communciate, when they communicate, etc. Notice that these pertinent questions do not require knowing the contents of the text themselves. (This is not to say knowing the contents of at least some of the text messages is important--at the minimum, to zoom in on the relevant texts.)

But these investigators could not determine when the deflator operation started, how often it occurred or the full scope of the operation. This is likely to do with selective disclosure of the text messages by selected parties (e.g. none from Brady).

Another takeaway is the inherent bias in surveillance data. Simply put, you only know what you can measure, and there is much that are not being measured. To get the "full scope", the investigators would need phone records, emails, and even wiretap evidence following the key players around (just kdding).

***

The second set of evidence is also extremely important to the story but it has received far less attention. One reason I like the NYT coverage is that the reporter gets to this evidence before talking about the text messages. For the first time, I see direct evidence of football tampering. The NFL rule requires footballs to be inflated to between 12.5 and 13.5 pounds per square inch. According to the NYT report, after the Colts raised suspicion at half-time of the Patriots-Colts matchup, all of the footballs were found to be underinflated (below 12.5 pounds), with the minimum vaule of 10.5.

This is the first time I see a clear admission that all of the footballs were underinflated. This is much more convincing evidence that someone tampered with the footballs than any of the fumble analysis.

Further, the referee had already weighed the balls before the game, and at the time, found all of the Colts-supplied footballs to be about 13 pounds, and only two of the Patriots-supplied footballs to be under-inflated.

Once tampering is established, the investigators can move on to finding the cause. Here, they are helped by videotapes from surveillance cameras, and also the texts.

***

One nitpick about the sentence: 'The report uses the nebulous phrase “more probable than not” several times in making its conclusions.' To a statistician, this is a very precise statement, not nebulous at all! I interpret the investigators to mean there is more than 50% chance. That is the standard of "preponderance of evidence."

***

FiveThirtyEight has a lengthy discussion of the report. They helpfully showed a screenshot of the measured ball weights:

Harvard Business Review devotes a long article to customer data privacy in the May issue (link). The article raises important issues, such as the low degree of knowledge about what data are being collected and traded, the value people place on their data privacy, and so on. In a separate post, I will discuss why I don't think the recommendations issued by the authors will resolve the issues they raised. In this post, I focus my comments on an instance of "story time", some questions about the underlying survey, and thoughts about the endowment effect.

***

Much of the power of this article come from its reliance on survey data. The main survey used here is one conducted in 2014 by frog, the "global product strategy and design agency" that employs the authors. They "surveyed 900 people in five countries -- the United States, the United Kingdom, Germany, China, and India -- whose demographic mix represented the general online population". (At other points in the article, the authors reference different surveys although no other survey was explicitly described other than this one.)

Story time is the moment in a report on data analysis when the author deftly moves from reporting a finding of data to the telling of stories based on assumptions that do not come from the data. Some degree of story-telling is required in any data analysis so readers must be alert to when "story time" begins. Conclusions based on data carry different weight from stories based on assumptions. In the HBR article, story time is called below the large graphic titled "Putting a Price on Data".

The graphic presented the authors' computation of how much people in the five nations value their privacy. They remarked that the valuations have very high variance. Then they said:

We don't believe this spectrum represents a "maturity model," in which attitudes in a country predictably shift in a given direction over time (say, from less privacy conscious to more). Rather, our findings reflect fundamental dissimilarities among cultures. The cultures of India and China, for example, are considered more hierarchical and collectivist, while Germany, the United States and the United Kingdom are more individualistic, which may account for their citizens' stronger feelings about personal information.

Their theory that there are cultural causes for differential valuation may or may not be right. The maturity model may or may not be right. Their survey data do not suggest that there is a cultural basis for the observed gap. This is classic "story time."

***

I wonder if the HBR editors reviewed the full survey results. As a statistician, I think the authors did not disclose enough details about how their survey was conducted. There are lots of known unknowns: we don't know the margins of error on anything, we don't know the statistical significance on anything, we don't know whether the survey was online or not, we don't know how most of the questions were phrased, and we don't know how respondents were selected.

What we do know about the survey raises questions. Nine hundred respondents spread out over five countries is a tiny poll. Gallup surveys 1,000 people in the U.S. alone. If the 900 were spread evenly across the five countries, their survey has fewer than 200 respondents per country. A rough calculation gives a margin of error of at least plus/minus 7 percent. If the sample is proportional to population size, then the margin of error for a smaller country like the U.K. will be even wider.

The authors also claim that their sample is representative of the "demographic mix" of the "general online population." This is hard to believe since they have no one from South America, Africa, Middle East, Australia, etc.

The graphic referenced above, "Putting a Price on Data," supposedly gives a dollar amount for the value of different types of data. Here is the top of the chart to give you an idea.

The article said "To see how much consumers valued their data, we did conjoint analysis to determine what amount survey participants would be willing to pay to protect different types of information." Maybe my readers can help me understand how conjoint analysis is utilized for this problem.

A typical usage of conjoint is for pricing new products. The product is decomposed into attributes so for example, the Apple Watch may be thought of as a bundle of fashion, thickness, accuracy of reported time, etc. Different watch prototypes are created based on bundling different amounts of those attributes. Then people are asked how much they are willing to pay for different prototypes. The goal is to put a value on the composite product, not the individual attributes.

***

Also interesting is the possibility of an "endowment effect" in the analysis of the value of privacy. We'd really need to know the exact questions that the survey respondents were asked to be sure. It seems like people were asked how much they would pay to protect their data, i.e. to acquire privacy. In this setting, you don't have privacy and you have to buy it. A different way of assessing the same issue is to ask how much money would you accept to sell your data. That is, you own your privacy to start with. The behavioral psychologist Dan Kahneman and his associates pioneered research that shows the value obtained by those two methods are frequently wide apart!

In a classic paper (1990), Kahneman et. al. told one group of people that they have been gifted a mug, and asked them how much money they would accept in exchange for it (the median was about $7.) Another group of people were asked how much they were willing to pay to acquire a mug; the median was below $3.

Is this the reason why businesses keep telling the press we don't have privacy and we have to buy it? As opposed to we have privacy and we can sell it at the right price?

***

Despite my reservations, the HBR piece is well worth your time. It raises many issues about data collection that you should be paying attention to. Read the whole article here.

This is a supplement to the previous post about a new research paper on the effect of Alcoholics Anonymous, and an NY Times exposition that I commented on. A misreading of that article led me to complain about per-protocol analysis, which wasn't the methodology behind the Humphrey et. al. research. I will explain their methodology in this post (known as instrumental variables analysis).

***

In the last post, I showed this hypothetical situation, involving patients who "cross over" (disobey treatment assignment) in a randomized experiment.

In the paper, actual treatment is measured by the change in frequency of attending AA meetings (relative to baseline).

Because initial treatment assignment (rows) is random, one expects that equal proportions of people would have moved out of state, got married, got divorced, etc. Similarly, one expectas equal proportions of people would have increased AA attendance. But in the table above, 90% of people in the treatment arm upped attendance while only 60% of those assigned to no treatment increased attendance. (The researchers use a continuous scale of frequency rather than proportion but the concept is the same.)

Of course, the random assignment to treatment itself is a cause of higher relative attendance. People are told to go to AA meetings. But there are other reasons for increased attendance, such as self-motivation leading those in the no-treatment arm to cross over.

In ITT analysis, you ignore the actual attendance, and analyze how treatment assignment affects the amount of drinking.

Alternatively, one can run a regression of frequency of AA meetings on amount of drinking (relative to baseline). This will yield a result such as "the more meetings someone attends, the less they drink". The problem with this analysis is that while the initial assignment is random, the actual attendance is tainted by selection bias.

***

Instead of using the actual frequency of AA meetings as a regressor, the instrumental variables (IV) analysis uses a predicted frequency of AA meetings. The prediction is itself a regression of treatment assignment and demographic variables on the actual frequency of AA meetings. In other words, we only care about the proportion of the variability in AA attendance that can be explained by the random assignment (controlling for the demographic variables). The remaining variability (due to self-motivation, etc.) are left on the table.

This is the "correction" that Frakt inferred in the New York Times article. I think Frakt is correct that the conclusion can be applied only to those who obey the protocol but I don't think the researchers drop all non-compliers from the dataset.

Also, Humphrey, et. al. seem to be at odds with the author of The Atlantic article, as they say "The long-established positive association between AA involvement and better outcomes was therefore consistent with, but did not prove, causation."

[After communicating with Frakt, Humphrey and Dean Eckles, I realize that I was confused about Frakt's description of the Humphrey paper, which does not perform PP analysis. So when reading this post, consider it a discussion of ITT versus PP analysis. I will post about Humphrey's methodology separately.]

The New York Times plugged a study of the effectiveness of Alcoholics Anonymous (AA) (link). The author (Austin Frakt) used this occasion to advocate "per-protocol" (PP) analysis over "intent-to-treat" (ITT) analysis. He does a good job explaining the potential downside of ITT, but got into a mess explaining PP and never properly addressed the downside of PP. It's an opportunity missed because I fear the article confuses readers even more on an important topic.

The key issue at play is non-compliance in a randomized experiment. If some patients are assigned to AA treatment and others are assigned to some other treatment, typically some subset of patients will "cross-over," (or drop out altogether), and usually such cross-over is associated with the outcome being measured--for example, a patient assigned to AA treatment felt that AA was not working and aberrantly switched to the other treatment; or vice versa.

ITT and PP differ in how they deal with the subset of non-compliers. In ITT, you analyze everyone in the experiment based on their initial assignment, ignoring non-compliance. In PP, you drop all non-compliers from the study, and analyze the subset of compliers only. (Each analysis is "extreme" in its own way.)

Between these two, I usually preferred ITT. The PP analysis answers the question: "If everyone complied with the treatment, what would be its effect?" I don't find the assumption of zero non-compliance realistic. ITT answers a different question: "Of those who take are given the treatment, what would be the expected effect?" This effect is an average of those who complied and those who did not comply, weighted by the proportion of compliers.

***

Frakt lost me when he said:

In a hypothetical example, imagine that 50 percent of the sample receive treatment regardless of which group they've been assigned to. And likewise imagine that 25 percent are not treated no matter their assignment. In this imaginary experiment, only 25 percent would actually be affected by random assignment.

First of all, the arithmetic does not work. If we ignore assignment as he suggested in the first two sentences, then the patients can either have received treatment or not. But 50 percent plus 25 percent leaves 25 percent of the patients unaccounted for.

Here is an illustration of what I think Frakt wanted to get across:

Of the 50% assigned to the treatment, 90% (45 out of 50) complied and 10% crossed over. Of the other half initially assigned to no treatment, 60% (30 out of 50) crossed over to the treatment. All in all, 75% of the study population received treatment and 25% did not... regardless of their initial assignment.

In an ITT analysis, all patients in the table are analyzed. We compare the top row with the bottom row. By contrast, in a PP analysis, we only analyze the patients along the top-left, bottom-right diagonal, namely, the 65% of the patients who complied with the assigned treatment. So, we compare the top left corner with the bottom right corner.

The important question is whether this 65% subset constitutes a random sample. Frakt implies it is: "only 25 percent [i.e. 65 percent in my example] would actually be affected by random assignment." Maybe when he said "affected by", he didn't really mean random; because it should be obvious that treatment is no longer randomized within the 65% subset.

If the 65% subset were randomly drawn from the initial population, we should still have equal proportions of treated versus non-treated but in fact, we have 70% treated versus 30% not treated. Said differently, the not-treated patients are more likely to cross over than the treated patients.

Cross-over isn't something that happens randomly. Patients are assessing their own health during the experiment, and thus, the opting out is frequently related to the observed (albeit incomplete) outcome.

***

In the article, Frakt states that the study of Humphreys et. al. "corrects for crossover by focusing on the subset of participants who do comply with their random assignment". I call this "filtering" rather than "correcting".

Does analyzing this subset lead to an accurate estimate of the treatment effect? I don't think so.

By filtering out the cross-overs, the researchers introduce a survivorship bias. If the cross-overs do so because they are unhappy about their assigned treatment, then these patients, if forced to continue the original treatment, are likely to have below-par outcomes compared to those who did not cross over. In a PP analysis, this subset is removed. Practically, this means that the treatment effect (PP analysis) is too optimistic.

Frakt is careless with his language when it comes to discussing the downside of PP analysis. He says (my italics):

it’s not always the case that the resulting treatment effect is the same as one would obtain from an ideal randomized controlled trial in which every patient complied with assignment and no crossover occurred. Marginal patients may be different from other patients...Despite the limitation, analysis of marginal patients reflects real-world behavior, too.

"Not always" leaves the impression that PP analysis is usually right except for rare situations. Note how he uses the word "limitation" above (paired with "despite"), and below, when discussing ITT analysis:

For a study with crossover, comparing treatment and control outcomes reflects the combined, real-world effects of treatment and the extent to which people comply with it or receive it even when it’s not explicitly offered. (If you want to toss around jargon, this type of analysis is known as “intention to treat.”) A limitation is that the selection effects introduced by crossover can obscure genuine treatment effects.

The choice of words leaves the impression that ITT is more limited than PP when both analyses suffer from problems arising from the same source: patients with worse outcomes are more likely to cross over.

***

Many readers of the NYT article link to a much longer article in The Atlantic. It appears that the scientific evidence on AA is very weak.

I was creating an online survey using Surveymonkey earlier this week. They asked me to try their new design, and so I did. There appeared to be a bug in one of the features. It kept preventing me from displaying the questions in a certain way. I tried a bunch of tricks but after ten minutes, decided to switch back to the old design. I clicked on their Feedback link and after describing my problem, they asked me to answer a few questions.

Here is one question:

This question is as standard as they come in a customer satisfaction survey.

My mood at the time was slightly unhappy. Just as I was about to click on that second-to-last radio button, I stopped. Can you see why? (Look at the choices more carefully.)

The fourth button is labelled "Slightly Satisfied". I was expecting it to say "Slightly Dissatisfied"!

Then I realized Surveymonkey is using a unipolar scale. All five answers are varying levels of satisfaction. I'm more used to a bipolar scale, such as:

Extremely satisifed

Somewhat satisfied

Neither satisfied nor dissatified

Somewhat dissatisfied

Extremely dissatisfied

The bipolar scale is centered in the middle and allows answers in both positive and negative directions.

I was debating between the last two choices. Was I "slightly satisfied" or "not at all satisfied"? Surely, I wasn't 100 percent unhappy, far from it. But "slightly" was also inappropriate. The mirror image of "slightly dissatisfied" should be "mostly satisfied", which meant I should be debating between the second button and the last.

However, "very satisfied" didn't fit with my mood, even though technically it was the mirror image of it. I wanted to express a negative sentiment, albeit minor, not a positive sentiment, albeit qualified. (Since I couldn't bear to pick either, I abandoned the survey at that point.)

I am not a fan of unipolar scales for many applications. For example, if you are measuring political attitudes (conservative and liberal), would your choices be:

Extremely conservative

Very conservative

Moderately conservative

Slightly conservative

Not at all conservative

or would they be

Extremely conservative

Somewhat conservative

Neither conservative nor liberal

Somewhat liberal

Extremely liberal

?

The unipolar scale automatically creates the problem of which pole to feature in those answers. Conservatives probably won't have an issue with that unipolar conservative scale but it's difficult for a "somewhat liberal" person to think he/she is "moderately conservative" or "very conservative"; vice versa.

The criticism of bipolar scales is that people (and I think this means Americans, and I doubt it generalizes to other cultures) tend to bias toward the positive direction relative to the negative. I don't see that as a big problem if a 7-point scale is used, or have the scale re-centered.