Ideafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing data
dc.contributor.author | Tellaetxe Abete, Maitena | |
dc.contributor.author | Calvo Molinos, Borja | |
dc.contributor.author | Lawrie, Charles | |
dc.date.accessioned | 2021-12-15T09:23:34Z | |
dc.date.available | 2021-12-15T09:23:34Z | |
dc.date.issued | 2021-12 | |
dc.identifier.citation | NAR Genomics and Bioinformatics 3(4) : (2021) // Article ID lqab092 | es_ES |
dc.identifier.issn | 2631-9268 | |
dc.identifier.uri | http://hdl.handle.net/10810/54479 | |
dc.description.abstract | [EN]Increasingly, treatment decisions for cancer patients are being made from next-generation sequencing results generated from formalin-fixed and paraffin-embedded (FFPE) biopsies. However, this material is prone to sequence artefacts that cannot be easily identified. In order to address this issue, we designed a machine learning-based algorithm to identify these artefacts using data from >1600000 variants from 27 paired FFPE and fresh-frozen breast cancer samples. Using these data, we assembled a series of variant features and evaluated the classification performance of five machine learning algorithms. Using leave-one-sample-out cross-validation, we found that XGBoost (extreme gradient boosting)and random forest obtained AUC (area under the receiver operating characteristic curve) values >0.86. Performance was further tested using two independent datasets that resulted in AUC values of 0.96, whereas a comparison with previously published tools resulted in a maximum AUC value of 0.92. The most discriminating features were read pair orientation bias, genomic context and variant allele frequency. In summary, our results show a promising future for the use of these samples in molecular testing. We built the algorithm into an R package called Ideafix (DEAmination FIXing) that is freely available at https://github.com/mmaitenat/ideafix. | es_ES |
dc.description.sponsorship | Departamento de Educaci ́on, Universidades e Investi- gaci ́on of the Basque Government [PRE 2019 2 0211 to M.T.A]; Ikerbasque, Basque Foundation for Science [to C.L.]; Starmer–Smith Memorial Fund [to C.L.]; Ministerio de Econom ́ıa, Industria y Competitividad (MINECO) of the Spanish Central Government [to C.L., PID2019- 104933GB-10 to B.C.]; ISCIII and FEDER Funds [PI12/00663, PIE13/00048, DTS14/00109, PI15/00275 and PI18/01710 to C.L.]; Departamento de Desarrollo Econ ́omico y Competitividad and Departamento de Sanidad of the Basque Government [to C.L.]; Aso- ciaci ́on Espa ̃nola Contra el Cancer (AECC) [to C.L.]; Diputaci ́on Foral de Guipuzcoa (DFG) [to C.L.]; Depar- tamento de Industria of the Basque Government [ELKA- RTEK Programme, project code: KK-2018/00038 to C.L., ELKARTEK Programme, project code: KK-2020/00049 to B.C., IT-1244-19 to B.C.] | es_ES |
dc.language.iso | eng | es_ES |
dc.publisher | Oxford University Press | es_ES |
dc.relation | info:eu-repo/grantAgreement/MINECO/PID2019-104933GB-10 | es_ES |
dc.rights | info:eu-repo/semantics/openAccess | es_ES |
dc.rights.uri | http://creativecommons.org/licenses/by-nc/3.0/es/ | * |
dc.title | Ideafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing data | es_ES |
dc.type | info:eu-repo/semantics/article | es_ES |
dc.rights.holder | © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com | es_ES |
dc.rights.holder | Atribución-NoComercial 3.0 España | * |
dc.relation.publisherversion | https://academic.oup.com/nargab/article/3/4/lqab092/6412600#309156260 | es_ES |
dc.identifier.doi | 10.1093/nargab/lqab092 | |
dc.departamentoes | Ciencia de la computación e inteligencia artificial | es_ES |
dc.departamentoeu | Konputazio zientziak eta adimen artifiziala | es_ES |
Files in this item
This item appears in the following Collection(s)
Except where otherwise noted, this item's license is described as © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License
(http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work
is properly cited. For commercial re-use, please contact journals.permissions@oup.com