Abstract

Raw but valuable user data is continuously being generated on social media platforms. This data is, however, more valuable when they are mined using different approaches such as machine learning techniques. Additionally, this user-generated data can be used to potentially save lives especially of vulnerable social media users, as several studies carried out have shown the correlation between social media and suicide. In this study, we aim at contributing to the research relating to suicide communication on social media. We measured the performance of five machine learning algorithms: Prism, Decision Tree, Na¨ıveNa¨ıve Bayes, Random Forest and Support Vector Machine, in classifying suicide-related text from Twitter. The results of the study showed that the Prism algorithm has outperformed the other machine learning algorithms with an F-measure of 0.84 for the target classes (Suicide and Flippant). This result, to the best of our knowledge, is the highest performance that has been achieved in classifying social media suicide-related text.