Statistical Machine Learning Analysis of Debian Mailing Lists

In this talk, I will discuss the use of state-of-the-art machine learning techniques to analyze Debian mailing lists in order to discover political, social, and technical patterns that could be used to inform project decisions. I will concentrate on a class of techniques known as statistical topic models, which automatically infer groups of semantically-related words, known as topics, from word co-occurrence patterns in documents. The resultant topics can then be used to detect emergent areas of technical activity, identify subcommunities, and track trends over time. In addition to providing a brief overview of statistical topic models and their application to Debian mailing list data, I will present examples of topics inferred from Debian mailing lists, as well as some preliminary political, social, and technical findings discovered via these topics.

In this talk, I will discuss the use of state-of-the-art machine learning techniques to analyze Debian mailing lists in order to discover political, social, and technical patterns that could be used to inform project decisions. I will concentrate on a class of techniques known as statistical topic models, which automatically infer groups of semantically-related words, known as topics, from word co-occurrence patterns in documents. The resultant topics can then be used to detect emergent areas of technical activity, identify subcommunities, and track trends over time. In addition to providing a brief overview of statistical topic models and their application to Debian mailing list data, I will present examples of topics inferred from Debian mailing lists, as well as some preliminary political, social, and technical findings discovered via these topics.

Hanna Wallach is a senior postdoctoral research associate at the University of Massachusetts Amherst, where she develops machine learning techniques for identifying and answering social science questions. In her not-so-spare time, Hanna used to maintain some Debian packages and has run several projects that encourage and promote women's involvement in free software development -- most notably Debian Women, with Erinn Clark and Helen Faulkner.