Dealing with dialectal variation in the construction of the Basque historical corpus
Ikusi/ Ireki
Data
2020-12Egilea
Estarrona Ibarloza, Ainara
Etxeberria Uztarroz, Izaskun
Etxepare Igiñiz, Ricardo
Padilla Moyano, Manuel
Soraluze Irureta, Ander
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects : 79-89 (2020)
Laburpena
This paper analyses the challenge of working with dialectal variation when semi-automatically normalising and analysing historical Basque texts. This work is part of a more general ongoing project for the construction of a morphosyntactically annotated historical corpus of Basque called Basque in the Making (BIM): A Historical Look at a European Language Isolate, whose main objective is the systematic and diachronic study of a number of grammatical features. This will be not only the first tagged corpus of historical Basque, but also a means to improve language processing tools by analysing historical Basque varieties more or less distant from present-day standard Basque.