Category Archives: skills

My biggest takeaway from this year’s conference is that AAPOR is a very healthy organization. AAPOR attendees were genuinely happy to be at the conference, enthusiastic about AAPOR and excited about the conference material. Many participants consider AAPOR their intellectual and professional home base and really relished the opportunity to be around kindred spirits (often socially awkward professionals who are genuinely excited about our niche). All of the presentations I saw firsthand or heard about were solid and dense, and the presenters were excited about their work and their findings. Membership, conference attendance, journal and conference submissions and volunteer participation are all quite strong.

At this point in time, the field of survey research is encountering a set of challenges. Nonresponse is a growing challenge, and other forms of data and analysis are increasingly en vogue. I was really excited to see that AAPOR members are greeting these challenges and others head on. For this particular write-up, I will focus on these two challenges. I hope that others will address some of the other main conference themes and add their notes and resources to those I’ve gathered below.

As survey nonresponse becomes more of a challenge, survey researchers are moving from traditional measures of response quality (e.g. response rates) to newer measures (e.g. nonresponse bias). Researchers are increasingly anchoring their discussions about survey quality within the Total Survey Error framework, which offers a contextual basis for understanding the problem more deeply. Instead of focusing on an across the board rise in response rates, researchers are strategizing their resources with the goal of reducing response bias. This includes understanding response propensity (who is likely not to respond to the survey? Who is most likely to drop out of a panel study? What are some of the barriers to survey participation?), looking for substantive measures that correlate with response propensity (e.g. Are small, rural private schools less likely to respond to a school survey? Are substance users less likely to respond to a survey about substance abuse?), and continuous monitoring of paradata during the collection period (e.g. developing differential strategies by disposition code, focusing the most successful interviewers on the most reluctant cases, or concentrating collection strategies where they are expected to be most effective). This area of strategizing emerged in AAPOR circles a few years ago with discussions of nonresponse propensity modeling, a process which is surely much more accessible than it sounds, but it has really evolved into a practical and useful tool that can help any size research shop increase survey quality and lower costs.

Another big takeaway for me was the volume of discussions and presentations that spoke to the fast-emerging world of data science and big data. Many people spoke of the importance of our voice in the realm of data science, particularly with our professional focus on understanding and mitigating errors in the research process. A few practitioners applied error frameworks to analyses of organic data, and some talks were based on analyses of organic data. This year AAPOR also sponsored a research hack to investigate the potential for Instagram as a research tool for Feed the Hungry. These discussions, presentations and activities made it clear that AAPOR will continue to have a strong voice in the changing research environment, and the task force reports and initiatives from both the membership and education committees reinforced AAPOR’s ability to be right on top of the many changes afoot. I’m eager to see AAPOR’s changing role take shape.

“If you had asked social scientists even 20 years ago what powers they dreamed of acquiring, they might have cited the capacity to track the behaviors, purchases, movements, interactions, and thoughts of whole cities of people, in real time.” – N.A. Christakis. 24 June 2011. New York Times, via Craig Hill (RTI)

AAPOR a very strong, well-loved organization and it is building a very strong future from a very solid foundation.

MORE DETAILED NOTES:

This conference is huge, so I could not possibly cover all of it on my own, so I will try to share my notes as well as the notes and resources I can collect from other attendees. If you have any materials to share, please send them to me! The more information I am able to collect here, the better a resource it will be for people interested in the AAPOR or the conference-

Patrick Ruffini assembled the tweets from the conference into this storify

Annie, the blogger behind LoveStats, had quite a few posts from the conference. I sat on a panel with Annie on the role of blogs in public opinion research (organized by Joe Murphy for the 68th annual AAPOR conference), and Annie blew me away by live-blogging the event from the stage! Clearly, she is the fastest blogger in the West and the East! Her posts from Anaheim included:

My full notes are available here (please excuse any formatting irregularities). Unfortunately, they are not as extensive as I would have liked, because wifi and power were in short supply. I also wish I had settled into a better seat and covered some of the talks in greater detail, including Don Dillman’s talk, which was a real highlights of the conference!

I believe Rob Santos’ professional address will be available for viewing or listening soon, if it is not already available. He is a very eloquent speaker, and he made some really great points, so this will be well worth your time.

Data cleaning has a bad rep. In fact, it has long been considered the grunt work of the data analysis enterprise. I recently came across a piece of writing in the Harvard Business Review that lamented the amount of time data scientists spend cleaning their data. The author feared that data scientists’ skills were being wasted on the cleaning process when they could be using their time for the analyses we so desperately need them to do.

I’ll admit that I haven’t always loved the process of cleaning data. But my view of the process has evolved significantly over the last few years.

As a survey researcher, my cleaning process used to begin with a tall stack of paper forms. Answers that did not make logical sense during the checking process sparked a trip to the file folders to find the form in question. The forms often held physical evidence of a indecision on the part of the respondent, such as eraser marks or an explanation in the margin, which could not have been reflected properly by the data entry person. We lost this part of the process when we moved to web surveys. It sometimes felt like a web survey left the respondent no way to communicate with the researcher about their unique situations. Data cleaning lost its personalized feel and detective story luster and became routine and tedious.

Despite some of the affordances of the movement to web surveys, much of the cleaning process stayed routed in the old techniques. Each form has its own id number, and the programmers would use those id numbers for corrections

if id=1234567, set var1=5, set var7=62

At this point a “good programmer” would also document the changes for future collaborators

*this person was not actually a forest ranger, and they were born in 1962
if id=1234567, set var1=5, set var7=62

Making these changes grew tedious very quickly, and the process seemed to drag on for ages. The researcher would check the data for a potential errors, scour the records that could hold those errors for any kind of evidence of the respondent’s intentions, and then handle each form one at a time.

My techniques for cleaning data have changed dramatically since those days. My goal is to use id numbers as rarely as possible, but instead to ask myself questions like “how can I tell that these people are not forest rangers?” The answer to these questions evokes a subtley different technique:

* these people are not actually forest rangers
if var7=35 and var1=2 and var10 contains ‘fire fighter’, set var1=5)

This technique requires honing and testing (adjusting the precision and recall), but I’ve found it to be far more efficient, faster, more comprehensive and, most of all- more fun (oh hallelujah!). It makes me wonder whether we have perpetually undercut the quality of the data cleaning we do simply because we hold the process in such low esteem.

So far I have not discussed data cleaning for other types of data. I’m currently working on a corpus of Twitter data, and I don’t see much of a difference in the cleaning process. The data types and programming statements I use are different, but the process is very close. It’s an interesting and challenging process that involves detective work, a better and growing understanding of the intricacies of the dataset, a growing set of programming skills, and a growing understanding of the natural language use in your dataset. The process mirrors the analysis to such a degree that I’m not really sure why it would be such a bad thing for analysts to be involved in data cleaning.

I’d be interested to hear what my readers have to say about this. Is our notion of the value and challenge of data cleaning antiquated? Is data cleaning a burden that an analyst should bear? And why is there so little talk about data cleaning, when we could all stand to learn so much from each other in the way of data structuring code and more?

Last night I acted as a mentor at the annual Career Exploration Expo sponsored by my graduate program. Many of the students had questions about developing a professional identity. This makes sense, of course, because graduate school is an important time for discovering and developing a professional identity.

People enter our program (and many others) With a wide variety of backgrounds and interests. They choose from a variety of classes that fit their interests and goals. And then they try to map their experience onto job categories. But boxes are difficult to climb into and out of, and students soon discover that none of the boxes is a perfect fit.

I experienced this myself. I entered the program with an extensive and unquestioned background in survey research. Early in my college years (while I was studying and working in neuropsychology) I began to manage a clinical dataset in SPSS. Working with patients and patient files was very interesting, but to my surprise working with data using statistical software felt right to me much in the way that Ethiopian meals include injera and Japanese meals include rice (IC 2006 (1997) Ohnuki Tierney Emiko). I was actually teased by my friends about my love of data! This affinity served me well, and I enjoyed working with a variety of data sets while moving across fields and statistical programming languages.

But my graduate program blew my mind. I felt like I had spent my life underwater and then discovered the sky and continents. I discovered many new kinds of data and analytic strategies, all of which were challenging and rewarding. These discoveries inspired me to start this blog and have inspired me to attend a wide variety of events and read some very interesting work that I never would have discovered on my own. Hopefully followers of this blog have enjoyed this journey as much as I have!

As a recent graduate, I sometimes feel torn between worlds. I still work as a survey researcher, but I’m inspired by research methods that are beyond the scope of my regular work. Another recent graduate of our program who is involved in market research framed her strategy in a way that really resonated with me: “I give my customers what they want and something else, and they grow to appreciate the ‘something else.'” That sums up my current strategy. I do the survey management and analysis that is expected of me in a timely, high quality way. But I am also using my newly acquired knowledge to incorporate text analysis into our data cleaning process in order to streamline it, increasing both the speed and the quality of the process and making it better equipped to handle the data from future surveys. I do the traditional quantitative analyses, but I supplement them with analyses of the open ended responses that use more flexible text analytic strategies. These analyses spark more quantitative analyses and make for much better (richer, more readable and more inspired) reports.

Our goal as professionals should be to find a professional identity that best capitalizes on our unique knowledge, skills and abilities. There is only one professional identity that does all of that, and it is the one you have already chosen and continue to choose every day. We are faced with countless choices about what classes to take, what to read, what to attend, what to become involved in, and what to prioritize, and we make countless assessments about each. Was it worthwhile? Did I enjoy it? Would I do it again? Each of these choices constitutes your own unique professional self, a self which you are continually manufacturing. You are composed of your past, your present, and your future, and your future will undoubtedly be a continuation of your past and present. The best career coach you have is inside of you.

Now your professional identity is much more uniquely or narrowly focused that the generic titles and fields that you see in the professional marketplace. Keep in mind that each job listing that you see represents a set of needs that a particular organization has. Is this a set of needs that you are ready to fill? Is this a set of needs that you would like to fill? You are the only one who knows the answers to these questions.

Because it turns out that you are your best career coach, and you have been all along.

My Grandma was a force to be reckoned with. My grandfather was a writer, and he described her driving down the street amidst symphonies. She was beautiful and stubborn, strong willed and sharp. Once a young woman with the good looks of a model, she wore high heels and took daily trips to the gym well into her 90’s. At the age of 94 she managed to run across her house, turn off the water and stand with her hand on her hip in front of the shower before I returned from the next room over with the shampoo I forgot (lest I waste water).

My Grandma, looking amazing

A few years ago I visited her in Florida. She collected work for all of her visitors to do, and we were busy from the moment I arrived. To my surprise, many of the tasks she had gathered involved dealing with customer service and discovering the truth in advertisements. At one point she led me into the local pharmacy with a stack of papers and asked to see the manager. Once she found the manager she began to go through the papers one by one and ask about them. The first paper on the stack was about the Magic Jack. He showed her the package, and she questioned him in depth about how it worked. I was shocked. I’d never thought of a store manager in this role before.

After that trip I began to pay closer attention to the ways in which the people around me dealt with customer service, and I became a kind of customer service liaison for my family. My older family members had an expectation that any customer service agent be both extensively knowledgeable and dependably respectful, but the problems of customer service seemed to have grown beyond this small, personable level to a point where a large network of people with structurally different areas of knowledge act together to form a question answering system. The amount and structure of knowledge necessary has become the focus of the customer service problem, and people everywhere complain about the lack of knowledge, ability and pleasant attitude of the customer service agents they encounter.

This is a problem with many layers and levels to it, and it is a problem that reflects the developing data science industry well. In order to deliver good customer service a great deal of information has to be organized and structured in a meaningful way to allow for optimal extraction. But this layer cannot be everything. The customer service interaction itself needs to be set-up in such a way to allow customers to feel satisfied. People expect personalized, accurate interactions that are structured in a way that is intuitive to them. The customer service experience cannot be the domain of the data scientists. If it is automated, it requires usability experts to develop and test systems that are intuitive and easy to use. If it is done by people, the people need to have access to the expertise necessary for them to do their job and be trained in successful interpersonal interaction. I believe that this whole system could be integrated well under a single goal: to provide timely and direct answers to customer inquiries in 3 steps or less.

The past few years have brought a rapid increase in customization. We have learned to expect the information around us to be customized, curated and preprocessed. We expect customer service to know intuitively what our problems are and answer them with ease. We expect Facebook to know what we want to see and customize our streams appropriately. We expect news sites to be structured to reflect the way we use them. This increase in demand and expectations is the drive behind our hunger for data science, and it will fuel a boom in data and information science positions until we have a ubiquitous underlayer of organized information across all necessary domains.

But data and information science are new fields and not well understood. Our expectations as users exceed the abilities of this fast-evolving field. We attract pioneers who are willing to step into a field that is changing shape beneath their feet as they work. But we ask for too much of a result and expect too much of a result, because these pioneers can’t be everything across all fields. They are an important structural layer of our newly unfolding economy, but in each case, another layer of people are needed in order to achieve the end result.

Usability is an important step above the data and information science layer. Through usability studies, Facebook will eventually learn that people and goals are not constant across all visits. Sometimes I look at Facebook simply to see if I’ve missed any big developments in the lives of my friends and loved ones. Sometimes I want to catch news. Sometimes I’m bored and looking for ridiculous stuff to entertain me. Sometimes I have my daughter next to me and want to show her funny pet pictures that I normally wouldn’t look twice at. Through usability studies, Facebook will eventually learn that users need some control over the information presented to them when they visit.

Through usability studies newspapers will better understand the important practice of headline scanning and develop pay models that work with peoples reading habits. Through qualitative research newspapers will understand their importance as the originators of news about big events with few witnesses, like peace treaties and celebrity births and deaths and the real value of social media for events with large numbers of witnesses and points of view. News media sources are deep in a period of transition where they are learning to better understand dissemination, virality, clicks, page views, reader behavior and reader expectations, and the strengths and weaknesses of social media news sources.

There have been many blog posts (like this one) about Isaac Asimov’s predictions for the future, because he was so right about so many things. At this point we’re at a unique vantage point where his notions of machine programmers and machine tenders are taking deeper shape. This year we will continue to see these changes form and reform around us.

I’m reading a book that I like to call “post-apocalyptic research methodology.” It’s ‘After Method: Mess in Social Science Research’ by John Law. At this point the book reads like a novel. I can’t quite imagine where he’ll take his premise, but I’m searching for clues and turning pages. In the meantime, I’ve been thinking quite a bit about failure, honesty, uncertainty and humility in research.

How is the current research environment like a utopian society?

The research process is often idealized in public spaces. Whether the goal of the researcher is to publish a paper based on their research, present to an audience of colleagues or stakeholders about their research, or market the product of their research, all researchers have a vested interest in the smoothness of the research process. We expect to approach a topic, perform a series of time-tested methods or develop innovative new methods with strong historical traditions, apply these methods as neatly as possible, and end up with a series of strong themes that describe the majority of our data. However, in Law’s words “Parts of the world are caught in our ethnographies, our histories and our statistics. But other parts are not, and if they are then this is because they have been distorted into clarity.” (p. 2) We think of methods as a neutral middle step and not a political process, and this way of thinking allows us to focus on reliability and validity as surface measures and not inherent questions. “Method, as we usually imagine it, is a system for offering more or less bankable guarantees.” (p. 9)

Law points out that research methods are, in practice, very limited in the social sciences “talk of method still tends to summon up a relatively limited repertoire of responses.” (p. 3) Law also points out that every research method is inherently political. Every research method involves a way of seeing or a way of looking at the data, and that perspective maps onto the findings it yields. Different perspectives yield different findings, whether they are subtly or dramatically different. Law’s central assertion is that methods don’t just describe social realities but also help to create them. Recognizing the footprint of our own methods is a step toward better understanding our data and results.

In practice, the results that we focus on are largely true. They describe a large portion of the data, ascribing the rest of the data to noise or natural variation. When more of our data is described in our results, we feel more confident about our data and our analysis.

Law argues that this smoothed version of reality is far enough from the natural world that it should perk our ears. Research works to create a world that is simple and falls into place neatly and resembles nothing we know, “’research methods’ passed down to us after a century of social science tend to work on the assumption that the world is properly to be understood as a set of fairly specific, determinate, and more or less identifiable processes.” (p. 5) He suggests instead that we should recognize the parts that don’t fit, the areas of uncertainty or chaos, and the areas where our methods fail. “While standard methods are often extremely good at what they do, they are badly adapted to the study of the ephemeral, the indefinite and the irregular.” (p. 4). “Regularities and standardizations are incredibly powerful tools, but they set limits.” (p. 6)

Is the Utopia starting to fall apart?

The current research environment is a bit different from that of the past. More people are able to publish research at any stage without peer review using media like blogs. Researchers are able to discuss their research while it is in progress using social media like Twitter. There is more room to fail publicly than there ever has been before, and this allows for public acknowledgment of some of the difficulties and challenges that researcher’s face.

Building from ashes

Law briefly introduces his vision on p. 11 “My hope is that we can learn to live in a way that is less dependent on the automatic. To live more in and through slow method, or vulnerable method, or quiet method. Multiple method. Modest method. Uncertain method. Diverse method.”

Many modern discussions of about management talk about the value of failure as an innovative tool. Some of the newer quality control measures in aviation and medicine hinge on the recognition of failure and the retooling necessary to prevent or limit the recurrences of specific types of events. The theory behind these measures is that failure is normal and natural, and we could never predict the many ways in which failure could happen. So, instead of exclusively trying to predict or prohibit failure, failures should be embraced as opportunities to learn.

Here we can ask: what can researchers learn from the failures of the methods?

The first lesson to accompany any failure is humility. Recognizing our mistakes entails recognizing areas where we fell short, where our efforts were not enough. Acknowledging that our research training cannot be universal, that applying research methods isn’t always straightforward and simple, and that we cannot be everything to everyone could be an important stage of professional development.

How could research methodology develop differently if it were to embrace the uncertain, the chaotic and the places where we fall short?

Another question: What opportunities to researchers have to be publicly humble? How can those spaces become places to learn and to innovate?

Note: This blog post is dedicated to Dr Jeffrey Keefer @ NYU, who introduced me to this very cool book and has done some great work to bring researchers together

I’ve been working on a post about humility as an organizational strategy. This is not that post, but it is also about humility.

I like to think of myself as a research methodologist, because I’m more interested in research methods than any specific area of study. The versatility of methodology as a concentration is actually one of the biggest draws for me. I love that I’ve been able to study everything from fMRI subjects and brain surgery patients to physics majors and teachers, taxi drivers and internet activists. I’ve written a paper on Persepolis as an object of intercultural communication and a paper on natural language processing of survey responses, and I’m currently studying migration patterns and communication strategies.

But a little dose of humility is always a good thing.

Yesterday I hosted the second in a series of online research, offline lunches that I’ve been coordinating. The lunches are intended as a way to get people from different sectors and fields who are conducting research on the internet together to talk about their work across the artificial boundaries of field and sector. These lunches change character as the field and attendees change.

I’ve been following the field of online research for many years now, and it has changed dramatically and continually before my eyes. Just a year ago Seth Grimes Sentiment Analysis Symposia were at the forefront of the field, and now I wonder if he is thinking of changing the title and focus of his events. Two years ago tagging text corpora with grammatical units was a standard midstep in text analysis, and now machine algorithms are far more common and often much more effective, demonstrating that grammar in use is far enough afield from grammar in theory to generate a good deal of error. Ten years ago qualitative research was often more focused on the description of platforms than the behaviors specific to them, and now the specific innerworkings of platform are much more of an aside to a behavioral focus.

The Association of Internet Researchers is currently having their conference in Denver (#ir14), generating more than 1000 posts per day under the conference hashtag and probably moving the field far ahead of where it was earlier this week.

My interest and focus has been on the methodology of internet research. I’ve been learning everything from qualitative methods to natural language processing and social network analysis to bayesian methods. I’ve been advocating for a world where different kinds of methodologists work together, where qualitative research informs algorithms and linguists learn from the differences between theoretical grammar and machine learned grammar, a world where computer scentists work iteratively with qualitative researchers. But all of these methods fall short because there is an elephant in the methodological room. This elephant, ladies and gentleman, is made of content. Is it enough to be a methodological specialist, swinging from project to project, grazing on the top layer of content knowledge without ever taking anything down to its root?

As a methodologist, I am free to travel from topic area to topic area, but I can’t reach the root of anything without digging deeper.

At yesterday’s lunch we spoke a lot about data. We spoke about how the notion of data means such different things to different researchers. We spoke about the form and type of data that different researchers expect to work with, how they groom data into the forms they are most comfortable with, how the analyses are shaped by the data type, how data science is an amazing term because just about anything could be data. And I was struck by the wide-openness of what I was trying to do. It is one thing to talk about methodology within the context of survey research or any other specific strategy, but what happens when you go wider? What happens when you bring a bunch of methodologists of all stripes together to discuss methodology? You lack the depth that content brings. You introduce a vast tundra of topical space to cover. But can you achieve anything that way? What holds together this wide realm of “research?”

We speak a lot about the lack of generalizable theories in internet research. Part of the hope for qualitative research is that it will create generalizable findings that can drive better theories and improve algorithmic efforts. But that partnership has been slow, and the theories have been sparse and lightweight. Is it possible that the internet is a space where theory alone just doesn’t cut it? Could it be that methodologists need to embrace content knowledge to a greater degree in order to make any of the headway we so desperately want to make?

Maybe the missing piece of the puzzle is actually the picture painted on the pieces?

In Jan Blommaerts book, the Sociolinguistics of Globalization, I learned about the iconicity of language. Languages, dialects, phrases and words have the potential to be as iconic as the statue of liberty. As I read Blommaert’s book, I am also reading about Total Survey Error, which I believe to be an iconic concept in the field of survey research.

Total Survey Error (TSE) is a relatively new, albeit very comprehensive framework for evaluating a host of potential error sources in survey research. It is often mentioned by AAPOR members (national and local), at JPSM classes and events, and across many other events, publications and classes for survey researchers. But here’s the catch: TSE came about after many of us entered the field. In fact, by the time TSE debuted and caught on as a conceptual framework, many people had already been working in the field for long enough that a framework didn’t seem necessary or applicable.

In the past, survey research was a field that people grew into. There were no degree or certificate programs in survey research. People entered the field from a variety of educational and professional backgrounds and worked their way up through the ranks from data entry, coder or interviewing positions to research assistant and analyst positions, and eventually up to management. Survey research was a field that valued experience, and much of the essential job knowledge came about through experience. This structure strongly characterizes my own office, where the average tenure is fast approaching two decades. The technical and procedural history of the department is alive and well in our collections of artifacts and shared stories. We do our work with ease, because we know the work well, and the team works together smoothly because of our extensive history together. Challenges or questions are an opportunity for remembering past experiences.

Programs such as the Joint Program in Survey Methodology (JPSM, a joint venture between the University of Michigan and University of Maryland) are relatively new, arising, for the most part, once many survey researchers were well established into their routines. Scholarly writings and journals multiplied with the rise of the academic programs. New terms and new methods sprang up. The field gained an alternate mode of entry.

In sociolinguistics, we study evidentiality, because people value different forms of evidence. Toward this end, I did a small study of survey researchers’ language use and mode of evidentials and discovered a very stark split between those that used experience to back up claims and those who relied on research to back up claims. This stark difference matched up well to my own experiences. In fact, when I coach jobseekers who are looking for survey research positions, I draw on this distinction and recommend that they carefully listen to the types of evidentials they hear from the people interviewing them and try to provide evidence in the same format. The divide may not be visible from the outside of the field, but it is a strong underlying theme within it.

The divide is not immediately visible from the outside because the face of the field is formed by academic and professional institutions that readily embrace the academic terminology. The people who participate in these institutions and organizations tend to be long term participants who have been exposed to the new concepts through past events and efforts.

But I wonder sometimes whether the overwhelming public orientation to these methods doesn’t act to exclude some longtime survey researchers in some ways. I wonder whether some excellent knowledge and history get swept away with the new. I wonder whether institutions that represent survey research represent the field as a whole. I wonder what portion of the field is silent, unrepresented or less connected to collective resources and changes.

Particularly as the field encounters a new set of challenges, I wonder how well prepared the field will be- not just those who have been following these developments closely, but also those who have continued steadfast, strong, and with limited errors- not due to TSE adherence, but due to the strength of their experience. To me, the Total Survey Error Method is a powerful symbol of the changes afoot in the field.

For further reference, I’m including a past AAPOR presidential address by Robert Groves

Proceedings of the Fifty-First Annual Conference of the American Association for Public Opinion Research
Source: Source: The Public Opinion Quarterly, Vol. 60, No. 3 (Autumn, 1996), pp. 471-513
ETA other references: