Chronological detection of depression in social media threads by means of natural language processing

Garin Aldezabal, Asier

View/Open

Memoria (2.902Mb)

Date

2023-11-28

Author

Garin Aldezabal, Asier

Metadata

Show full item record

Estadisticas en RECOLECTA
(LA Referencia)

URI

http://hdl.handle.net/10810/63201

Abstract

Detecting depression in social media has become an increasingly important research area in recent years. With the widespread use of social media platforms, individuals at risk of suicide often express their thoughts and emotions online, providing an opportunity for early detection and intervention. Artificial Intelligence and, particularly, Natural Language Processing open pathways towards the processing of massive amount of messages and the detection of depression traits and other risks related to mental health. Our main thesis question rests on the early prediction of depression detection in social media messages. We explore the accuracy gained by a system as more and more information (in terms of more social messages over time) from a user are available. Is the system becoming more and more accurate given subsequent information or is there a limit? How many messages do we need to train a simple model capable to attain an accuracy above a threshold? Do recent messages add much information to older ones? These research questions have arisen in our work. A key cornerstone in artificial intelligence-based approaches rests, needless to say, on to the available data-sets. The data available bounds the ability of the system to gain knowledge. Thus, an important part of this work consists on an overview of the data-sets used to detect depression in social media, also mentioning various extra data-sets along the way. In our study we found that there are international challenges devoted to this task, among others, CLPsych. We explore simple though efficient inference algorithms able to classify messages; next, we test the ability of the models to classify a user as with or without risk, just given social messages written by the user. In an attempt to put the focus on our main research question (i.e. assessing the impact of getting more and more information across time to gain accuracy in the task of message classification in the frame of early detection of depression signs) we opted for simple classifiers, that is, linear approaches, and left out of the scope exploring the behaviour of different classification approaches. Our experimental framework is developed using the practice data-set made available at CLPSych 2021. To make use of the data more intelligently, the chronological factor is added. Using a specific technique that progressively takes into account new data (chronologically) at each time, we can observe promising changes in the classification accuracy. These values might provide key ideas about the evolution of depression signs for detection. In other words, the results in a time-line might help to gain evidences that a user might be showing traces of or towards depression. At the end, some comparisons and discussion are made regarding past research work related to this field, to do a critical analysis of the results. Hizkuntza: Ingelesa.