Empirical Study of Features and Unsupervised Sentiment Analysis Techniques for Depression Detection in Social Media (Published)
This study provides an empirical evaluation of diverse traditional learning, deep learning, and unsupervised techniques based on diverse sets of features for the problem of depression detection among Twitter and Reddit users. The main objective of this study is to investigate the most appropriate features, document representations, and text classifiers for the significant problem of depression detection on social media microblogs, such as tweets, as well as macroblogs, such as posts on Reddit. The study’s investigation will concentrate on the linguistic characteristics, blogging behavior, and topics for features, multi-word, and word embeddings for document representation as well as on unsupervised learning for text clustering. This study will select the best approaches in the literature as baselines to practically examine them on the depressive and non-depressive dataset of blogs designed for this work. The study’s integrations and ensembles of the selected baselines will be experimented as well to recommend a design for an effective social media blog classifier based on unsupervised learning and WE document representation. The study concluded that the experiments proved that a stacking ensemble of Adam Deep Learning with SOM clustering followed by Agglomerative Hierarchical clustering with topic features and pre-trained word2vec embeddings achieved an accuracy more than 92% on Twitter and Reddit depression analysis datasets.
Keywords: Analysis, Depression, Social media, computing., data analytics, empirical study