Labour Statistics vs. Static Word Embeddings: a Comparison of Gender Bias
Figueroa Vásquez, Andrés
MetadataShow full item record
This project explores the relation between labour statistics information and three language models: GloVe, word2vec and fastText, in both English and Spanish. The aim is to see what differs in reality versus word embedding spaces in terms of gender bias. To do so, diverse linguistic data sets were created, using what previous authors called extreme she occupations and extreme he occupations. To better assess their behaviour, these outcomes were compared to gender-neutral professions. This way, the variation of utilising different static word embeddings, corpora and natural languages will be determined, as to discover the patterns that lie underneath them.