An Open Source Corpus and Automatic Tool for Section Identification in Spanish Health Records
View/ Open
Date
2023-09Author
De la Iglesia, Iker
Vivó, María
Chocrón, Paula
de Maeztu, Gabriel
Atutxa Salazar, Aitziber
Metadata
Show full item record
Journal of Biomedical Informatics 145 : (2023) // Article ID 104461
Abstract
Objective:
The aim of the present work is to provide the scientific community with a Spanish open-source dataset to build and evaluate automatic section identification systems. Together with this dataset, the purpose is to design and implement a suitable evaluation measure and a fine-tuned language model adapted to the task.
Conclusion:
Although section identification in unstructured clinical narratives is challenging, this work shows that it is possible to build competitive automatic systems when both data and the right evaluation metrics are available. The annotated data, the implemented evaluation scripts, and the section identification Language Model are open-sourced hoping that this contribution will foster the building of more and better systems.