Are Code Examples on an Online Q&A Forum Reliable? A Study of API Misuse on Stack Overflow

Abstract

Programmers often consult an online Q&A forum such as Stack Overflow to learn new APIs.
This paper presents an empirical study on the prevalence and severity of API misuse on Stack Overflow.
To reduce manual assessment effort, we design Maple, an API usage mining approach
that extracts patterns from over 380K Java repositories on GitHub and subsequently
reports potential API usage violations in Stack Overflow posts.
We analyze 217,818 Stack Overflow posts using Maple and find that around 31% of them
have potential API usage violations that may produce the symptoms such as program
crashes and resource leaks. Such API misuse is caused by three main
reasons—missing control constructs, missing or incorrect order of API calls, and
incorrect guard conditions. Even the posts that are accepted as correct answers or
upvoted by other programmers are not necessarily more reliable than other posts in
terms of API misuse. This study result calls for a new human-in-the-loop approach
to augment Stack Overflow code snippets and help the user consider better or
alternative API usage.

ACM Reference

BibTeX Reference

@inproceedings{ReliableQA2018,
author = {Tianyi Zhang and Ganesha Upadhyaya and Anastasia Reinhardt and Hridesh Rajan and Miryung Kim},
title = {Are Code Examples on an Online Q&A Forum Reliable? A Study of API Misuse on Stack Overflow},
booktitle = {ICSE'18: The 40th International Conference on Software Engineering},
location = {Gothenberg, Sweden},
month = {May 27-June 3, 2018},
year = {2018},
entrysubtype = {conference},
abstract = {
Programmers often consult an online Q&A forum such as Stack Overflow to learn new APIs.
This paper presents an empirical study on the prevalence and severity of API misuse on Stack Overflow.
To reduce manual assessment effort, we design Maple, an API usage mining approach
that extracts patterns from over 380K Java repositories on GitHub and subsequently
reports potential API usage violations in Stack Overflow posts.
We analyze 217,818 Stack Overflow posts using Maple and find that around 31% of them
have potential API usage violations that may produce the symptoms such as program
crashes and resource leaks. Such API misuse is caused by three main
reasons---missing control constructs, missing or incorrect order of API calls, and
incorrect guard conditions. Even the posts that are accepted as correct answers or
upvoted by other programmers are not necessarily more reliable than other posts in
terms of API misuse. This study result calls for a new human-in-the-loop approach
to augment Stack Overflow code snippets and help the user consider better or
alternative API usage.
}
}