Archetypal internet-scale source code searching

Abstract

In recent years, a number of search engines have appeared on the Internet for the purpose of searching repositories consisting of hundreds of millions of lines of Open Source code. However, it is not clear precisely what, how, and why programmers search for code on the Internet. To this end, we conducted a web-based survey to understand the source code searching behavior of programmers, specifically, their search motivations, search targets, tools used, and code selection criteria. Data was collected from 69 respondents, including 58 specific examples of searches. We applied open coding to these anecdotes and found two main search motivations, as well as, a range of sizes for search targets. The first motivation was searching for source code that could be dropped into a project and used immediately. The secondmotivation was searching for reference examples of source code to provide. The targets of these searches could range in size from a block (a few lines of code) to a subsystem (e.g. library or API), to an entire system. Using combinations of motivations and target sizes, nine archetypes were identified. As well, the overall search process is discussed. The paper concludes with some guidance on how to use these archetypes to create personas for the purpose of designing Internet-scale source code search engines and for generating scenarios for evaluating these tools.