Show simple item record

dc.contributor.authorMoles, Luis
dc.contributor.authorAndrés Fernández, Alain
dc.contributor.authorEchegaray López, Goretti
dc.contributor.authorBoto Sánchez, Fernando
dc.date.accessioned2024-06-27T13:46:02Z
dc.date.available2024-06-27T13:46:02Z
dc.date.issued2024-06-19
dc.identifier.citationMathematics 12(12) : (2024) // Article ID 1898es_ES
dc.identifier.issn2227-7390
dc.identifier.urihttp://hdl.handle.net/10810/68683
dc.description.abstractDespite the increasing availability of vast amounts of data, the challenge of acquiring labeled data persists. This issue is particularly serious in supervised learning scenarios, where labeled data are essential for model training. In addition, the rapid growth in data required by cutting-edge technologies such as deep learning makes the task of labeling large datasets impractical. Active learning methods offer a powerful solution by iteratively selecting the most informative unlabeled instances, thereby reducing the amount of labeled data required. However, active learning faces some limitations with imbalanced datasets, where majority class over-representation can bias sample selection. To address this, combining active learning with data augmentation techniques emerges as a promising strategy. Nonetheless, the best way to combine these techniques is not yet clear. Our research addresses this question by analyzing the effectiveness of combining both active learning and data augmentation techniques under different scenarios. Moreover, we focus on improving the generalization capabilities for minority classes, which tend to be overshadowed by the improvement seen in majority classes. For this purpose, we generate synthetic data using multiple data augmentation methods and evaluate the results considering two active learning strategies across three imbalanced datasets. Our study shows that data augmentation enhances prediction accuracy for minority classes, with approaches based on CTGANs obtaining improvements of nearly 50% in some cases. Moreover, we show that combining data augmentation techniques with active learning can reduce the amount of real data required.es_ES
dc.description.sponsorshipThis work was financed by the Basque Government through their Elkartek program (SONETO project, ref. KK-2023/00038).es_ES
dc.language.isoenges_ES
dc.publisherMDPIes_ES
dc.rightsinfo:eu-repo/semantics/openAccesses_ES
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/es/
dc.subjectactive learninges_ES
dc.subjectCTGANes_ES
dc.subjectdata augmentationes_ES
dc.subjectentropy samplinges_ES
dc.subjectmachine learninges_ES
dc.titleExploring data augmentation and active learning benefits in imbalanced datasetses_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.date.updated2024-06-26T13:24:29Z
dc.rights.holder© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/ 4.0/).es_ES
dc.relation.publisherversionhttps://www.mdpi.com/2227-7390/12/12/1898es_ES
dc.identifier.doi10.3390/math12121898
dc.departamentoesCiencia de la computación e inteligencia artificial
dc.departamentoeuKonputazio zientziak eta adimen artifiziala


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/ 4.0/).
Except where otherwise noted, this item's license is described as © 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/ 4.0/).