Chronological detection of depression in social media threads by means of natural language processing
View/ Open
Date
2023-11-28Author
Garin Aldezabal, Asier
Metadata
Show full item recordAbstract
Detecting depression in social media has become an increasingly important research area
in recent years. With the widespread use of social media platforms, individuals at risk of
suicide often express their thoughts and emotions online, providing an opportunity for
early detection and intervention.
Artificial Intelligence and, particularly, Natural Language Processing open pathways
towards the processing of massive amount of messages and the detection of depression
traits and other risks related to mental health. Our main thesis question rests on the early
prediction of depression detection in social media messages. We explore the accuracy
gained by a system as more and more information (in terms of more social messages over
time) from a user are available. Is the system becoming more and more accurate given
subsequent information or is there a limit? How many messages do we need to train a
simple model capable to attain an accuracy above a threshold? Do recent messages add
much information to older ones? These research questions have arisen in our work.
A key cornerstone in artificial intelligence-based approaches rests, needless to say,
on to the available data-sets. The data available bounds the ability of the system to gain
knowledge. Thus, an important part of this work consists on an overview of the data-sets
used to detect depression in social media, also mentioning various extra data-sets along
the way. In our study we found that there are international challenges devoted to this task,
among others, CLPsych.
We explore simple though efficient inference algorithms able to classify messages; next,
we test the ability of the models to classify a user as with or without risk, just given social
messages written by the user. In an attempt to put the focus on our main research question
(i.e. assessing the impact of getting more and more information across time to gain accuracy
in the task of message classification in the frame of early detection of depression signs) we
opted for simple classifiers, that is, linear approaches, and left out of the scope exploring the
behaviour of different classification approaches. Our experimental framework is developed
using the practice data-set made available at CLPSych 2021. To make use of the data more
intelligently, the chronological factor is added. Using a specific technique that progressively
takes into account new data (chronologically) at each time, we can observe promising
changes in the classification accuracy. These values might provide key ideas about the
evolution of depression signs for detection. In other words, the results in a time-line might
help to gain evidences that a user might be showing traces of or towards depression.
At the end, some comparisons and discussion are made regarding past research work
related to this field, to do a critical analysis of the results.
Hizkuntza: Ingelesa.