Large Scale Linguistic Processing of Tweets to Understand Social Interactions among Speakers of Less Resourced Languages: The Basque Case

Fernández de Landa, Joseba; Agerri Gascón, Rodrigo; Alegría Loinaz, Iñaki

dc.contributor.author	Fernández de Landa, Joseba
dc.contributor.author	Agerri Gascón, Rodrigo
dc.contributor.author	Alegría Loinaz, Iñaki
dc.date.accessioned	2020-02-04T08:22:48Z
dc.date.available	2020-02-04T08:22:48Z
dc.date.issued	2019-06-13
dc.identifier.citation	Information 10(6) : (2019) // Article ID 212	es_ES
dc.identifier.issn	2078-2489
dc.identifier.uri	http://hdl.handle.net/10810/40404
dc.description.abstract	Social networks like Twitter are increasingly important in the creation of new ways of communication. They have also become useful tools for social and linguistic research due to the massive amounts of public textual data available. This is particularly important for less resourced languages, as it allows to apply current natural language processing techniques to large amounts of unstructured data. In this work, we study the linguistic and social aspects of young and adult people's behaviour based on their tweets' contents and the social relations that arise from them. With this objective in mind, we have gathered over 10 million tweets from more than 8000 users. First, we classified each user in terms of its life stage (young/adult) according to the writing style of their tweets. Second, we applied topic modelling techniques to the personal tweets to find the most popular topics according to life stages. Third, we established the relations and communities that emerge based on the retweets. We conclude that using large amounts of unstructured data provided by Twitter facilitates social research using computational techniques such as natural language processing, giving the opportunity both to segment communities based on demographic characteristics and to discover how they interact or relate to them.	es_ES
dc.description.sponsorship	The second author is funded by the Spanish Ministry of Economy and Competitiveness (MINECO/FEDER, UE), under the project CROSSTEXT (TIN2015-72646-EXP) and the Ramon y Cajal Fellowship RYC-2017-23647. He also acknowledges the support of the BBVA Big Data 2018 "BigKnowledge for TextMining (BigKnowledge)" project.	es_ES
dc.language.iso	eng	es_ES
dc.publisher	MDPI	es_ES
dc.relation	info:eu-repo/grantAgreement/MINECO/TIN2015-72646-EXP	es_ES
dc.relation	info:eu-repo/grantAgreement/MINECO/RYC-2017-23647	es_ES
dc.rights	info:eu-repo/semantics/openAccess	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/es/	*
dc.subject	social informatics	es_ES
dc.subject	social networks	es_ES
dc.subject	topic modelling	es_ES
dc.subject	relations	es_ES
dc.subject	less resourced languages	es_ES
dc.subject	text classification	es_ES
dc.subject	information extraction	es_ES
dc.subject	natural language processing	es_ES
dc.subject	benchmark	es_ES
dc.title	Large Scale Linguistic Processing of Tweets to Understand Social Interactions among Speakers of Less Resourced Languages: The Basque Case	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.rights.holder	This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0)	es_ES
dc.rights.holder	Atribución 3.0 España	*
dc.relation.publisherversion	https://www.mdpi.com/2078-2489/10/6/212	es_ES
dc.identifier.doi	10.3390/info10060212
dc.departamentoes	Arquitectura y Tecnología de Computadores	es_ES
dc.departamentoeu	Konputagailuen Arkitektura eta Teknologia	es_ES

Files in this item

Name:: information-10-00212.pdf
Size:: 987.9Kb
Format:: PDF
Description:: Artículo principal

View/Open

Name:: license_rdf
Size:: 914bytes
Format:: application/rdf+xml

View/Open

This item appears in the following Collection(s)

Artículos

Show simple item record

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0)

Except where otherwise noted, this item's license is described as This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0)