SUITE 2012 – Proceedings

Preface

The fourth international workshop on Search-driven Development – Users, Infrastructure, Tools, and
Evaluation (SUITE 2010) focuses on exploring the notion of search as a fundamental activity during
software development, and all aspects that are related with integrating search into software development workflows. Two primary observations encouraged us to start this workshop series. First, results from existing research show that developers spend majority of their time in searching for code. Furthermore, developers have to struggle with a myriad of additional information needs such as those that are related with design or requirements documents or even with the communication between the various stakeholders in a development project. Second, recently there has been considerable effort from both academia and industry in building specialized search tools for software developers, in particular large-scale code search engines.
The three previous editions of SUITE have made significant contributions to this research area with many good papers published in the workshop, further extended as major conference papers, and also as a textbook chapter. In this edition, we focus beyond source code with the following general theme: Code search and beyond: developing pragmatic search solutions for more effective software development. The program that has been put together by the program committee and ourselves includes three main topics: similarity search and duplicate removal, program analysis, and automatic programming. With these wide varieties of topics, we hope that SUITE 2012 will be a venue to discuss the problems and state-of-the-art; share ideas and results. Along with these topics, our program also includes a keynote talk from an industrial researcher and many vibrant group discussions to set the future directions in the area of Search-driven development. We look forward to seeing you all to make SUITE 2012 again a great success.

Automatically building programs has been a research goal for over 40 years. Code search technology, particularly code search combined with directed program transformations and validation, has the potential to address many of the problems related to automatic programming. In this position paper we outline an approach to using code search as a tool for generating moderate sized programs, define three problems that will need to be addressed, and describe our first steps toward solving those problems.

Code search has always been essential to software development; it is the cornerstone of activities such as program comprehension and maintenance. Traditionally, code search required learning of complex query languages with very steep learning curves.
In contrast, programming environments for mobile devices targeting novice programmers are becoming popular and code search is becoming increasingly important. Yet, dedicated code query languages present a learning barrier for novice programmers.
In this paper we consider search-by-example as a way of dealing with this problem. Given a query code snippet, we find all similar snippets in the codebase and present them to the user. This problem is a special instance of the clone detection problem, and, by using relevant techniques, we can perform precise code search with little to no configuration and completely agnostic of code formatting, variable renamings, etc. These properties make search-by-example very easy to use by inexperienced programmers.
We built a prototype of our approach in TouchDevelop, a novel mobile app development environment for Windows Phone. We will use it as a testing ground for future evaluation.

As the quantity of software artifacts, mainly source code and software models, stored in repositories increases, the need for their efficient search becomes more important. In this paper we propose content-based query (a.k.a query-by-example) approach for searching software model repositories, in order to retrieve significant models or model fragments. The query-by-example search conveys the user need in form of a model or pattern specified in a coarse way. Our approach incorporates analysis and indexing of models using textual information retrieval techniques, which exploit the knowledge of the metamodel the models conform to. This allows us to explore different segmentation granularities on models and different indexing techniques ranging from simple bag of words, to index structures which integrate metamodel information. We detail the proposed theoretical framework, the implementation of the method upon open-source architectures, and we discuss the results of our experiments upon a public dataset of UML models.

We propose an interactive querying approach for program analysis and comprehension tasks. In our approach, an analyst uses a set of basic filters (information retrieval, structural, quantitative, and user selection) to define complex queries. These queries are built following an interactive and iterative process where basic filters are selected and executed, and their results displayed, changed, and combined using predefined operators.

Clustering is of great practical value in retrieving reusable requirements artifacts from the ever-growing software project repositories. Despite the development of automated cluster labeling techniques in information retrieval, little is understood about automatic labeling of requirements clusters. In this paper, we review the literature on cluster labeling, and conduct an experiment to evaluate how automated methods perform in labeling requirements clusters. The results show that differential labeling outperforms cluster-internal labeling, and that hybrid method does not necessarily lead to the labels best matching human judgment. Our work sheds light on improving automated ways to support search-driven development.

This paper presents an Eclipse plug-in that provides source code similarity search over source code available on the Internet. We show how our Linked Data repository (SeCold) and scalable clone search approach (SeClone) can provide the enabling technology for an open Internet-scale similarity search service.

Research on software reuse over the last decade has removed a lot of obstacles to its practical adoption. However, despite the claims in the software reuse literature of 1990's there are still some fundamental research challenges to be addressed, especially the problem of delivering "good" (i.e. high quality) search results with high precision and semantic recall. In terms of precision, one of the most promising approach to have emerged in recent years is test-driven search which only includes components in the result set that actually match a developer’s behavioral requirements as defined by a test case. However, the test-driven search prototypes available today currently have a low “semantic recall” because they are unable to find semantically matching components which have the wrong syntactic interface. In this paper we describe an automatic adaptation engine that alleviates this problem by automatically creating adapters to allow semantically mismatching components to be tested by test-driven search engines, thus significantly enhancing their semantic recall.

Modern software development requires a large investment in learning application programming interfaces (APIs).
Recent research found that the learning materials themselves are often inadequate: developers struggle to find answers beyond simple usage scenarios.
Solving these problems requires a large investment in tool and search engine development.
To understand where further investment would be most useful, we ran a study with 19 professional developers to understand what a solution might look like, free of technical constraints. In this paper, we report on design implications of tools for API learning, grounded in the reality of the professional developers themselves.
The reoccurring themes in the participants' feedback were trustworthiness, confidentiality, information overload and the need for code examples as first-class documentation artifacts.

A search-based recommendation system looks, in the code repository,
for programs that are relevant to the program being edited.
Storing a large amount of open source programs into the repository
will make the search results better, but also causes the code clone
problem; i.e., recommending a set of program fragments that are
almost idential. To tackle this problem, we propose a novel approach that
ranks recommended programs by taking their ``freshness'' count into
account. This short paper discusses the background of the problem, and
illustrates the proposed algorithm.