This past weekend, we presented the SurveyMan work for the first time, at the Off the Beaten Track workshop at POPL. I first want to say that PLASMA seriously represented. We had talks in each of the sessions. Though I didn't have the chance to see Charlie's talk on Causal Profiling, Dan said it definitely engendered discussion and that people in the audience were "nodding vigorously" in response to the work. Dimitar presented Data Debugging, which people clearly found provocative.

I was surprised by the audience's response to my talk; I know Emery had said that people whom he talked to were excited about this space, but sometime that's hard to believe when you're a grad student chugging away at the implementation and theory behind the work. It was invigorating to be able to describe what we've done so far and hear enthusiastic feedback. In all my practice talks, I had focused on the language itself, but for OBT, at the behest of my colleagues, I took the debugging angle instead. Most of the people in the audience had used surveys for their research and were quite familiar with these problems. While language designers have tried to tackle surveys before, they frequently come from the perspective of embedding it in a language *they* already use. The approach we take leverages tools that our target audience uses. We limit the expressivity of the language and make statistical guarantees, which is what our users care about the most.

I had a few really interesting questions about system features. Someone made the point that bias cannot be entirely removed through redundancy -- that we can't know if we've found enough ways of expressing a question to control for the underlying different interpretations. In response, I suggested that we could think about using approaches from cross-language models to determine whether we have categorically the same questions. The idea is that if a set of questions produces the same distribution of responses, it is sufficiently similar. Of course, this approach neglects the non-local effects of question wording. Whether or not this can be controlled through question order randomization is something I'll have to think about more.

As a followup question, I was also asked if we could reverse-engineer the distributions we get from the variants to identify different concepts. This was definitely not something I had considered before. I wasn't sure we would, in practice, have sufficient variants and responses to produce meaningful results, but it's something to consider as future work.

A lot of the other questions I had were about features of the system that I did not highlight. For example, I did not go into any detail about the language and its control flow. I was also asked if we were considering adding clustering and other automated domain-independent analyses, which I am working on right now. Quite a few of the concerns are addressed by our preference for breakoff over item-nonresponse. There was also an interesting ethics question about using our system to manipulate results. Of course, SurveyMan requires active participation from the survey designer; the idea is not to prevent the end-user from adding bias, but to illuminate its presence.