Show simple item record

dc.contributor.advisorAgerri Gascón, Rodrigo ORCID
dc.contributor.advisorRigau Claramunt, Germán ORCID
dc.contributor.authorUstaszewski, Michael
dc.contributor.otherLenguajes y Sistemas Informáticos - Hizkuntza eta Sistema Informatikoakes
dc.date.accessioned2016-11-30T08:53:42Z
dc.date.available2016-11-30T08:53:42Z
dc.date.issued2016-11-30
dc.date.submitted2016-09-27
dc.identifier.urihttp://hdl.handle.net/10810/19647
dc.description.abstractIn morphologically complex languages, many high-level tasks in natural language processing rely on accurate morphosyntactic analyses of the input. However, in light of the risk of error propagation in present-day pipeline architectures for basic linguistic pre-processing, the state of the art for morphosyntactic tagging is still not satisfactory. The main obstacle here is data sparsity inherent to natural lan- guage in general and highly inflected languages in particular. In this work, we investigate whether semi-supervised systems may alleviate the data sparsity problem. Our approach uses word clusters obtained from large amounts of unlabelled text in an unsupervised manner in order to provide a su- pervised probabilistic tagger with morphologically informed features. Our evalua- tions on a number of datasets for the Polish language suggest that this simple technique improves tagging accuracy, especially with regard to out-of-vocabulary words. This may prove useful to increase cross-domain performance of taggers, and to alleviate the dependency on large amounts of supervised training data, which is especially important from the perspective of less-resourced languages.es
dc.language.isoenges
dc.rightsinfo:eu-repo/semantics/openAccesses
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjecttagginges
dc.subjectmorphosyntaxes
dc.subjecttagging in Polishes
dc.subjectnatural language processionges
dc.titleData sparsity in highly inflected languages: the case of morphosyntactic tagging in Polishes
dc.typeinfo:eu-repo/semantics/masterThesises
dc.rights.holderAttribution-NonCommercial-NoDerivatives 4.0 International*


Files in this item

Thumbnail
Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivatives 4.0 International
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International