Abstract: In recent years, offensive, abusive and hateful language, sexism, racism and
other types of aggressive and cyberbullying behavior have been manifesting with
increased frequency, and in many online social media platforms. In fact, past
scientific work focused on studying these forms in popular media, such as
Facebook and Twitter. Building on such work, we present an 8-month study of the
various forms of abusive behavior on Twitter, in a holistic fashion. Departing
from past work, we examine a wide variety of labeling schemes, which cover
different forms of abusive behavior, at the same time. We propose an
incremental and iterative methodology, that utilizes the power of crowdsourcing
to annotate a large scale collection of tweets with a set of abuse-related
labels. In fact, by applying our methodology including statistical analysis for
label merging or elimination, we identify a reduced but robust set of labels.
Finally, we offer a first overview and findings of our collected and annotated
dataset of 100 thousand tweets, which we make publicly available for further
scientific exploration.