dc.contributor.advisor | Agerri Gascón, Rodrigo  | |
dc.contributor.advisor | Rigau Claramunt, Germán  | |
dc.contributor.author | Ustaszewski, Michael | |
dc.contributor.other | Lenguajes y Sistemas Informáticos - Hizkuntza eta Sistema Informatikoak | es |
dc.date.accessioned | 2016-11-30T08:53:42Z | |
dc.date.available | 2016-11-30T08:53:42Z | |
dc.date.issued | 2016-11-30 | |
dc.date.submitted | 2016-09-27 | |
dc.identifier.uri | http://hdl.handle.net/10810/19647 | |
dc.description.abstract | In morphologically complex languages, many high-level tasks in natural language
processing rely on accurate morphosyntactic analyses of the input. However, in
light of the risk of error propagation in present-day pipeline architectures for basic
linguistic pre-processing, the state of the art for morphosyntactic tagging is still
not satisfactory. The main obstacle here is data sparsity inherent to natural lan-
guage in general and highly inflected languages in particular.
In this work, we investigate whether semi-supervised systems may alleviate the
data sparsity problem. Our approach uses word clusters obtained from large
amounts of unlabelled text in an unsupervised manner in order to provide a su-
pervised probabilistic tagger with morphologically informed features. Our evalua-
tions on a number of datasets for the Polish language suggest that this simple
technique improves tagging accuracy, especially with regard to out-of-vocabulary
words. This may prove useful to increase cross-domain performance of taggers,
and to alleviate the dependency on large amounts of supervised training data,
which is especially important from the perspective of less-resourced languages. | es |
dc.language.iso | eng | es |
dc.rights | info:eu-repo/semantics/openAccess | es |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | tagging | es |
dc.subject | morphosyntax | es |
dc.subject | tagging in Polish | es |
dc.subject | natural language processiong | es |
dc.title | Data sparsity in highly inflected languages: the case of morphosyntactic tagging in Polish | es |
dc.type | info:eu-repo/semantics/masterThesis | es |
dc.rights.holder | Attribution-NonCommercial-NoDerivatives 4.0 International | * |