Several research tools and projects require groups of similar code changes as input. Examples are recommendation and bug finding tools that can provide valuable information to developers based on such data. With the help of similar code changes they can simplify the application of bug fixes and code changes to multiple locations in a project. But despite their benefit, the practical value of existing tools is limited, as users need to manually specify the input data, i.e., the groups of similar code changes.

To overcome this drawback, this paper presents and evaluates two syntactical similarity metrics, one of them is specifically designed to run fast, in combination with two carefully selected and self-tuning clustering algorithms to automatically detect groups of similar code changes.

We evaluate the combinations of metrics and clustering algorithms by applying them to several open source projects and also publish the detected groups of similar code changes online as a reference dataset. The automatically detected groups of similar code changes work well when used as input for LASE, a recommendation system for code changes.