OpenTagger: A flexible and user-friendly linguistic tagger

Sanjurjo-González, Hugo; Andaluz Pinedo, Olaia

dc.contributor.author	Sanjurjo-González, Hugo
dc.contributor.author	Andaluz Pinedo, Olaia
dc.date.accessioned	2020-11-28T10:47:47Z
dc.date.available	2020-11-28T10:47:47Z
dc.date.issued	2020
dc.identifier.uri	http://hdl.handle.net/10810/48683
dc.description.abstract	Linguistic annotation adds valuable information to a corpus. Annotated corpora are highly useful for linguists since they increase the range of linguistic phenomena that may be registered, categorised and retrieved. In addition, they are also significant for machines, as Natural Language Processing applications involve working with well-annotated data (e.g. Imran, Mitra and Castillo 2016) and some machine learning classifiers employ annotated data to test or train new language annotation tools, among other uses. In this regard, Pustejovsky and Stubbs (2012) report on stages for building annotated corpora to train machine learning algorithms. This paper describes OpenTagger, a new linguistic tagger that allows users to include any type of information to the different paragraphs, sentences, or words that compose a text. OpenTagger is characterised by its high usability and flexibility. It is a web application that allows users to manually annotate texts using their own predefined tag set or creating a new one. Thus, it offers an answer to any need for a tailor-made annotation system. This tagset may include nested categories. In addition, multiple layers of annotation are possible. The annotation process is very easy and provides two options: i) Selecting text and tagging; ii) Selecting a tag and annotating as much text as precissed. OpenTagger also includes a search box to query the text and retrieve relevant sections for tagging. In sum, the open character of this tool and its user-friendliness allows extending the benefits of annotation to a wider variety of research questions. OpenTagger differs from others well-known taggers such as Nooj (Silberztein, 2005) because of its simplicity and web access, as it is not specialised for grammar construction or other complex processes. Potential users range from novel linguist researchers to experts. Last, it should be mentioned that a further integration within the corpus analysis software ACTRES Corpus Manager (Sanjurjo-González, 2017) is planned for the future. OpenTagger will make the process of building and querying custom annotated corpora more straightforward using ACM.	es_ES
dc.description.sponsorship	ACTRES, TRALIMA/ITZULIK, GIU19/067, Gobierno Vasco IT1209/19	es_ES
dc.language.iso	eng	es_ES
dc.rights	info:eu-repo/semantics/openAccess	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	*
dc.subject	custom annotation	es_ES
dc.subject	corpus	es_ES
dc.subject	web application	es_ES
dc.subject	XML	es_ES
dc.title	OpenTagger: A flexible and user-friendly linguistic tagger	es_ES
dc.type	info:eu-repo/semantics/conferenceObject	es_ES
dc.rights.holder	Atribución-NoComercial-SinDerivadas 3.0 España	es_ES

Files in this item

Name:: LingColl_OpenTagger.pptx.pdf
Size:: 318.7Kb
Format:: PDF
Description:: Presentación

View/Open

Name:: license_rdf
Size:: 811bytes
Format:: application/rdf+xml

View/Open

This item appears in the following Collection(s)

TRALIMA/ITZULIK-Congresos

Show simple item record

Except where otherwise noted, this item's license is described as Atribución-NoComercial-SinDerivadas 3.0 España