I can't believe she wrote that on Amazon! - Patterns of self-disclosure in Amazon reviews

As online product reviews become ubiquitous, more individuals increasingly write and rely on them. In an effort to share their experiences and opinions about a product, do individuals share private and sensitive information in online reviews? Sensitive information disclosure online exposes an individual to privacy risks including deanonymization, loss of reputation, and psychological harm. This poster addresses this critical issue by examining the extent of sensitive information disclosed in Amazon.com’s product reviews. We also explore if disclosure of real name, location, type of products and the nature of reviewer affects the extent of sensitive information disclosed in the reviews. We crawled Amazon.com and gathered all online reviews posted for six products that pertained to weight loss, anti-aging, sex-related, fragrance, baby care and electronic goods. This resulted in 3485 reviews, which were text-analyzed using Linguistic Inquiry Word Count (LIWC) analysis. Then, data processed through LIWC were further analyzed with descriptive statistics, discriminant analysis and ANOVA techniques. We find that Amazon’s reviewers disclose higher levels of sensitive information in the following categories: family, humans, positive emotions, negative emotions, sadness, cognitive mechanisms, concerns related to work, achievements, leisure and money. Users who used real name and disclosed their location revealed more personal information about sadness, health, and concerns related to achievements, and less about leisure concerns. The sensitivity of information disclosed was a function of the type of product reviewed. Finally, occasional and non-professional reviewers provided higher level of sensitive information, perhaps as a way to increase their participation in the Amazon community.

I can't believe she wrote that on Amazon! - Patterns of self-disclosure in Amazon reviews

As online product reviews become ubiquitous, more individuals increasingly write and rely on them. In an effort to share their experiences and opinions about a product, do individuals share private and sensitive information in online reviews? Sensitive information disclosure online exposes an individual to privacy risks including deanonymization, loss of reputation, and psychological harm. This poster addresses this critical issue by examining the extent of sensitive information disclosed in Amazon.com’s product reviews. We also explore if disclosure of real name, location, type of products and the nature of reviewer affects the extent of sensitive information disclosed in the reviews. We crawled Amazon.com and gathered all online reviews posted for six products that pertained to weight loss, anti-aging, sex-related, fragrance, baby care and electronic goods. This resulted in 3485 reviews, which were text-analyzed using Linguistic Inquiry Word Count (LIWC) analysis. Then, data processed through LIWC were further analyzed with descriptive statistics, discriminant analysis and ANOVA techniques. We find that Amazon’s reviewers disclose higher levels of sensitive information in the following categories: family, humans, positive emotions, negative emotions, sadness, cognitive mechanisms, concerns related to work, achievements, leisure and money. Users who used real name and disclosed their location revealed more personal information about sadness, health, and concerns related to achievements, and less about leisure concerns. The sensitivity of information disclosed was a function of the type of product reviewed. Finally, occasional and non-professional reviewers provided higher level of sensitive information, perhaps as a way to increase their participation in the Amazon community.

12336 Views

Share Presentation

Can you say a bit more about what kinds of information you got from the LIWC analyses? Why did you choose that analysis as opposed to other computational tools such as Topic Modeling or Sentiment Analysis to examine your data?

Dear Prof. Lidz,
Thanks a lot for your interest in my project!
For the scope of this study, we measured the degree of sensitivity of information disclosed using the framework adopted in Tausczik and Peppebaker (2010) and implemented through the software LIWC.
In particular, we used LIWC to measure the following: pronouns as indicators of attention allocation (self-directed, otherdirected), social processes (family, friends, humans), affective processes (swear, positive emotion, negative emotion, anxiety, anger, sadness, cognitive mechanisms), biological processes (health, sexual), and personal concerns (work, achievements, leisure, home, money, religion, death) (as detailed in Pennebaker et al., 2007, Tausczik and Peppebaker, 2010). LIWC allows a valid and reliable, finer grained analysis of information revealed in text, paying particular attention to its layers and degrees of (possible) sensitivity.
I agree with you that alternative scalable methods to study sensitive information in large portions of text include opinion mining and sentiment analysis, as well as topic modeling.
The first two methods allow one to investigate point of view and subjectivity as they emerge from textual analysis and natural language processing (e.g. Liu, 2012). For example, opinion and sentiment analysis have been successfully implemented for fake reviews detection in Amazon (Liu, 2010). Even though opinion and sentiment mining are very powerful methodological approaches, they tend to focus on solving opinion-oriented classification problems (Liu, 2010, 2012). By focusing on sentiment and opinion, they would leave out an important component of sensitive information that LIWC could capture. As a consequence, they were not considered suitable for the scope of this study.
Topic modeling could definitely represent a valuable tool that we consider implementing in our future research, to address new, topic-oriented research questions. Yet, LIWC is built and tested for measuring the level of sensitivity, thus its functions were perfectly tailored to our purpose and scope!
Let me know if you have further questions or comments! thanks again for your feedback :)
Federica

How might you use additional methods to understand why people are interpreting privacy or the use of their information and why they are revealing it? Interesting work. However the poster does not explain how the research fits into a larger IGERT project or in applications to how we control profiling or advertizing?

Dear Prof. Pinel,
Thanks a lot for your interest in my project! Your questions bring up very important components and unfold possible developments of this research. In fact, motivations, attitudes, values, and perceptions are fundamental aspects that intersect with privacy and data sharing online. And we mean to address them in the next steps of our work.

To answer the first part of your question, Ideally, I believe it would be very interesting to reach out to people in our sample, selecting a variety of our reviewers by paying attention to different “patterns of behavior” as well as to different types of reviewers. For example, we would include those who share a lot as well as those who share very little, those who use real names and those who do not, and so on. Reaching out to these individuals may be a challenge, but it would most likely be the most informative way to understand motivations, attitudes, values, and perceptions, compare them with actual behavior, and understand how these components influence sharing behavior. Interviews, for example, could follow the structure of those used in Island of Privacy, a fascinating book in which Prof. Nippert-Eng investigates privacy motivations offline. But they would also address the context of Amazon and consumers websites, discussing possible negative consequences of sharing online to understand how users perceive those. Perhaps a more feasible alternative would be to select a number of participants for an experiment, let them use a retailing site (without priming them on privacy concerns, just asking them to freely interact with the site, browse products, vote, write reviews etc), observe their behaviors and havefollow up interviews to understand their motivations etc.
Research in social capital (e.g. Ellison et al., 2010), would probably suggest that many perceive social media (and perhaps user generated content as well) as a community and disclose high level of information to facilitate ties creation and maintenance, online community formation, identity development, psychological reassurance, and self-expression and engaging in a negotiation that considers these benefits against the risks of privacy loss.

As you pointed out, this project does not address the problem of profile control and management of personal information for advertising in its current stage. Yet, its future developments intend to include such a crucial direction. As existing research had not yet investigated the levels of sensitive information that people share on retailing websites such as Amazon, we thought it would be important to first study actual behaviors so that we could taylor the directions to solve related privacy concerns, in a more informed fashion. In fact, the conclusions for our study raise several open questions: first, whether it would be possible, by using methods similar to ours, to provide usable warning indicators that inform end-users when they input privacy sensitive reviews. A more ambitious (but perhaps more usable) system would also provide deanonymizing suggestions in case the system finds certain reviews to be sensitive. Continuing to retain the high quality of reviews similar to those found in Amazon, while providing deanonymizing suggestions would be a challenge to current socio-technical systems.

Thanks a lot for your feedback!
Federica

References

Ellison, N. B., Lampe, C., Steinfield, C., and Vitak, J. 2010. With a Little Help From My Friends. How Social Network
Sites Affect Social Capital Processes. In Papacharissi, Z., Ed. A Networked Self: Identity, Community, and Culture on
Social Network Sites. New York : Routledge.

Nippert-Eng, C. 2010. Islands of Privacy. The University of Chicago Press, Chicago.

This is a very interesting project. I always thought it odd people would share so much personal information and always wondered how much of it was fake and submitted by paid reviewers. It looks like you have the data, so how did you control for what was real and what was likely paid for? Is there a way to identify how much other social media the reviewers use (reference to Facebook etc.) that might indicate how comfortable they are at sharing their personal information. How does this project fit in with the larger IGERT project?

Dear Prof. Morse,
Thanks a lot for your insight and interest in our project! And thanks for the important questions you suggest! I find it unbelievable too the amount of information that some people share about them and about others online!

To address you interest in fake reviews detection, Prof. Liu (2010, 2012) successfully conducts very interesting research using sentiment analysis and opinion mining to address fake-review detection on Amazon. Informed by his findings and approach, as well as by other literature on fake and sponsored reviews, we tried to controll for fake reviews in a variety of ways. In particular, we collected a number of variables that may help indentifying/detecting sponsored reviews. Amazon provides some reviewers with different badges (e.g. Hall of Fame, Vine Voice, Top Reviewer) to acknowledge their role within the Amazon’s community. These are either very active reviewers (probably professional) or reviewers that openly get free products for reviewing them (Vine Voice). In addition, we used the variable “number of reviews posted” to divide reviewers in different groups based on their engagement on the community (going from the 1-10 reviews all the way to those who had more than 500 reviews – we identified 5 different levels of engagement and considered them in our analysis). Regular and low frequency reviewers (as opposed to frequent reviewers and seemingly professional ones) were often more likely to disclose sensitive information. They consistently tended to share higher level of personal information belonging in many categories, perhaps as a way to increase their personal participation in the Amazon community. Also, we considered the “number of stars” given as a potential intervening variable. Most of the reviewers in our sample did not belong in the Hall of Fame (99.8%), in the Top Reviewer (98.8%) or in the Vine Voice (96.4%). The typical reviewer in our sample had published 30 reviews (SD = 216; range = 5675), whose average length was of 97 words (SD = 107.64; range = 2080). In addition, most reviews provided positive reviews. In particular in a scale from 1 (worst) to 5 (best) the average number of stars was M = 4.26, SD = 1.23. We assumed that it is unlikely that reviewers who post relatively few reviews have been paid by Amazon. And we found that “seemingly true” reviewers share more.

To answer your second question, it would be necessary to get consent from our reviewers to track their behavior across other social media, asking them to grant us access to their online identities. Provided that we get enough people to collaborate and participate in such a study, I believe the results would improve the understanding of online behaviors across platforms. In fact, it would provide us with a more informed perspective on who these reviewers are, and on how they use social media. Patterns of use and consumption are most likely an important element that influence one’s willing to disclose online, one’s understanding on the possible consequences, and one’s overall privacy concerns.

We believe that this project fits in the larger IGERT project/mission/philosophy by providing a solid base that describes how individuals share information in retailing sites. Starting from the actual observed behavior that we measured here, we can now move on to address the privacy concerns that these behaviors may generate. In particular, as I also discussed in my answer to Prof. Pinel, the findings of our study raise several open questions: first, whether it would be possible, by using methods similar to ours, to provide usable warning indicators that inform end-users when they input privacy sensitive reviews. A more ambitious (but perhaps more usable) system would also provide deanonymizing suggestions in case the system finds certain reviews to be sensitive. Continuing to retain the high quality of reviews similar to those found in Amazon, while providing deanonymizing suggestions would be a challenge to current socio-technical systems. These interdisciplinary and integrative directions are fundamental aspects of the overarching IGERT project on electronic security and privacy.

Hi Federica,
Interesting project! Could you say more about the interesting finding that amazon.com reviewers overall revealed more about leisure activities, but the subgroup who appeared to use their real name/identity talked less about leisure? What do you think is going on there?

Dear Prof Sherman,
Thanks for your interest in our project! You suggest a very interesting angle of analysis and provide an important discussion topic in your question. I definitely agree that the one you point out is a very interesting finding. I believe such a finding suggests fascinating (also because partly unexpected!) connections between our research and the research on social capital, and the studies of personality (which I am not extremely familiar with… so thank you for pointing out this direction and apologies for my simplifications in the answer…).

In general, the research on social capital suggests that there may be a negative correlation between desired access to social capital and privacy concerns. In other words, higher levels of privacy often are positively correlated to decreased access to social capital (e.g. Ellison et al., 2001a, 2011b). This, together with our findings, suggest that individuals engage in a cost-benefits analysis before disclosing information, they weigh the potential of privacy loss against their need of social support and access to social capital, often assessing a higher value to sociality and thus putting their information at potential risk. This may explain the fact that people who use a real identity share more because they seek online community and social support.

But to address your question more directly, our results also seem to suggest that those who reveal their real name and their location may belong in the pessimistic personality group (I say “seem to” because our analysis does not directly measure the personality types, but it would be an interesting new direction to pursue with our data!). Our real name reviewers talk significantly more about sadness, concerns related to personal achievements, and health problems. Even though we did not find significant differences in the “negative emotion” category of words, our findings seem to point in the direction of pessimistic personalities as well as to social strain (in the way you and your colleagues discuss in the 2009 paper “It’s all in how you view it: Pessimism, social relations, and life satisfaction in older adults with osteoarthritis”). I believe these users disclose information to access social support and improve their social capital. Research seems also to suggest that leisure is related to optimistic personality (e.g.Heo & Lee, 2010). So this would be consistent with the idea that real name reviewers are pessimistic types…. (I say idea, it is not a hypothesis yet ?, your help would be very appreciated in formulating and testing one!).

I would be very interested in hearing your opinion as well! Thanks a lot!
Federica

References

Ellison, N. B., Lampe, C., Steinfield, C., and Vitak, J. 2011a. With a Little Help From My Friends. How Social Network Sites Affect Social Capital Processes. In Papacharissi, Z., Ed. A Networked Self: Identity, Community, and Culture on Social Network Sites. New York : Routledge.

Dear Prof. Kofinas,
thanks for your kind words, I am so happy to hear them! Doing research that matters and that captures people’s attention is such an important part of our work! And you pose great questions as well.
As you know, our poster (and ongoing research) investigates the level of sensitive information disclosed by users in online retailing websites. We believe this is useful for two main reasons/to achieve two main goals. Our first goal is to understand behavioral patterns in consumer reviews so that we can speculate, in an informed way, what needs to be done in that context to improve users’ security and privacy. The latter is our most important goal, whose achievement is the next step of this research. Before implementing solutions, we needed to investigate and understand actual disclosing behaviors, and identify actual potential risks and problems. We believe that this goal is extremely important to build upon the current directions of research on electronic security and privacy.
The privacy concerns related to self-disclosure online are often studied in the context of social media. But we believe that in such a context people are increasingly more prompted to think about their privacy, also as a consequence of increase in the public discussion of privacy infringements/scandals in social media. Retailing sites are a different platform, perhaps with different dynamics and different problems. Perhaps people disclose more in platforms such as amazon than they do on Facebook as they are not “primed” to think about privacy when posting a review. Especially if they feel passionately about the product (for example when reviewing baby products or health supplements). BUT Amazon reviews are PUBLIC. AND Amazon is crawler-friendly. (Which means: information can be easily exploited generating risks of identity tefth, psychological harm, privacy infringements etc.) And yet, there are not studies that investigate patterns of disclosure of sensitive information in consumers’ reviews sites.
Our project is interdisciplinary as it combined the perspectives of communication (social capital, online community, online identity formation), psychology/behavior (LIWC, sensitivity of information), and computer science (data mining, opinion mining, sentiment analysis, and future implementation of tools that I will discuss in just a moment). In particular, in the present stage our research combined the power of data mining techniques, the perspective of opinion mining and sentiment analysis, the use of software as Linguistic Inquiry Word Count developed and used in the domain of psychology (thus adding an important psychological-oriented component in our perspective and approach). Finally, the findings of this research open up directions for future – fundamentally interdiscliplinary – research. In fact, the conclusions for our study raise several open questions: first, whether it would be possible, by using methods similar to ours, to provide usable warning indicators that inform end-users when they input privacy sensitive reviews. A more ambitious (but perhaps more usable) system would also provide deanonymizing suggestions in case the system finds certain reviews to be sensitive. These directions, which we are undertaking as next step of our research, more clearly combine the efforts, techniques, and perspectives of computer science, information and decision sciences, and social sciences. Continuing to retain the high quality of reviews similar to those found in Amazon, while providing deanonymizing suggestions would be a challenge to current socio-technical systems.

and YES, we believe that there are motivations related to social capital. Research in social capital suggests that individuals share information to get social support, facilitate ties creation and maintenance,foster online community formation, identity development, psychological reassurance, and self-expression. To address this hypothesis, though, we need to interview reviewers or to implement a survey… but we believe that this would be an important development of our research!
our hypothesis would be for example that real name reviewers use amazon to gather social capital, or that reviewers with a specific personality type do that.

I would be happy to get your feedback and insights! thanks a lot!
Fede

Simon Madsen

Guest

May 22, 2013 | 02:31 p.m.

of these 3 identity development, psychological reassurance, and self-expression, in my understanding especially the 2 first could be related to narrativism which i generally see as a more individual thing than social. Not a criticism. Just thinking.

Hi Simon, thanks for your comment!
I only partially agree with your point, because I think there are social components of identity development (for example reinforcement and positive feedback – think about symbolic interactionism). Psychological reassurance also has a social component.
The same could be said for narratives. For example, I believe that narrative developed in private diaries are individual processes whereas narratives shared online in blogs or other forms of user generated content are narrative performed and, as such, ineherently social. At least in part. I’d be happy to discuss more about this!! :-)

It seems to me this might have something to do with the American cultural phenomena of shopping for entertainment. Virtual shopping through these electronic screens leaves out the social interaction, so perhaps this motivates excessive personal information sharing to make up for this void of interaction in real life. Interesting work!

Hi Geoffrey! I am glad you liked our project! Yes, I agree that an individualistic society may encourage this kind of phenomena, due to a lack of social support offline. It would be interesting to validate this hp with a cross cultural study! thanks for your feedback!