Show simple item record

dc.contributor.authorTellaetxe Abete, Maitena
dc.contributor.authorCalvo Molinos, Borja ORCID
dc.contributor.authorLawrie, Charles
dc.date.accessioned2021-12-15T09:23:34Z
dc.date.available2021-12-15T09:23:34Z
dc.date.issued2021-12
dc.identifier.citationNAR Genomics and Bioinformatics 3(4) : (2021) // Article ID lqab092es_ES
dc.identifier.issn2631-9268
dc.identifier.urihttp://hdl.handle.net/10810/54479
dc.description.abstract[EN]Increasingly, treatment decisions for cancer patients are being made from next-generation sequencing results generated from formalin-fixed and paraffin-embedded (FFPE) biopsies. However, this material is prone to sequence artefacts that cannot be easily identified. In order to address this issue, we designed a machine learning-based algorithm to identify these artefacts using data from >1600000 variants from 27 paired FFPE and fresh-frozen breast cancer samples. Using these data, we assembled a series of variant features and evaluated the classification performance of five machine learning algorithms. Using leave-one-sample-out cross-validation, we found that XGBoost (extreme gradient boosting)and random forest obtained AUC (area under the receiver operating characteristic curve) values >0.86. Performance was further tested using two independent datasets that resulted in AUC values of 0.96, whereas a comparison with previously published tools resulted in a maximum AUC value of 0.92. The most discriminating features were read pair orientation bias, genomic context and variant allele frequency. In summary, our results show a promising future for the use of these samples in molecular testing. We built the algorithm into an R package called Ideafix (DEAmination FIXing) that is freely available at https://github.com/mmaitenat/ideafix.es_ES
dc.description.sponsorshipDepartamento de Educaci ́on, Universidades e Investi- gaci ́on of the Basque Government [PRE 2019 2 0211 to M.T.A]; Ikerbasque, Basque Foundation for Science [to C.L.]; Starmer–Smith Memorial Fund [to C.L.]; Ministerio de Econom ́ıa, Industria y Competitividad (MINECO) of the Spanish Central Government [to C.L., PID2019- 104933GB-10 to B.C.]; ISCIII and FEDER Funds [PI12/00663, PIE13/00048, DTS14/00109, PI15/00275 and PI18/01710 to C.L.]; Departamento de Desarrollo Econ ́omico y Competitividad and Departamento de Sanidad of the Basque Government [to C.L.]; Aso- ciaci ́on Espa ̃nola Contra el Cancer (AECC) [to C.L.]; Diputaci ́on Foral de Guipuzcoa (DFG) [to C.L.]; Depar- tamento de Industria of the Basque Government [ELKA- RTEK Programme, project code: KK-2018/00038 to C.L., ELKARTEK Programme, project code: KK-2020/00049 to B.C., IT-1244-19 to B.C.]es_ES
dc.language.isoenges_ES
dc.publisherOxford University Presses_ES
dc.relationinfo:eu-repo/grantAgreement/MINECO/PID2019-104933GB-10es_ES
dc.rightsinfo:eu-repo/semantics/openAccesses_ES
dc.rights.urihttp://creativecommons.org/licenses/by-nc/3.0/es/*
dc.titleIdeafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing dataes_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.rights.holder© The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.comes_ES
dc.rights.holderAtribución-NoComercial 3.0 España*
dc.relation.publisherversionhttps://academic.oup.com/nargab/article/3/4/lqab092/6412600#309156260es_ES
dc.identifier.doi10.1093/nargab/lqab092
dc.departamentoesCiencia de la computación e inteligencia artificiales_ES
dc.departamentoeuKonputazio zientziak eta adimen artifizialaes_ES


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

© The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License
(http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work
is properly cited. For commercial re-use, please contact journals.permissions@oup.com
Except where otherwise noted, this item's license is described as © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com