Ideafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing data

Tellaetxe Abete, Maitena; Calvo Molinos, Borja; Lawrie, Charles

dc.contributor.author	Tellaetxe Abete, Maitena
dc.contributor.author	Calvo Molinos, Borja
dc.contributor.author	Lawrie, Charles
dc.date.accessioned	2021-12-15T09:23:34Z
dc.date.available	2021-12-15T09:23:34Z
dc.date.issued	2021-12
dc.identifier.citation	NAR Genomics and Bioinformatics 3(4) : (2021) // Article ID lqab092	es_ES
dc.identifier.issn	2631-9268
dc.identifier.uri	http://hdl.handle.net/10810/54479
dc.description.abstract	[EN]Increasingly, treatment decisions for cancer patients are being made from next-generation sequencing results generated from formalin-fixed and paraffin-embedded (FFPE) biopsies. However, this material is prone to sequence artefacts that cannot be easily identified. In order to address this issue, we designed a machine learning-based algorithm to identify these artefacts using data from >1600000 variants from 27 paired FFPE and fresh-frozen breast cancer samples. Using these data, we assembled a series of variant features and evaluated the classification performance of five machine learning algorithms. Using leave-one-sample-out cross-validation, we found that XGBoost (extreme gradient boosting)and random forest obtained AUC (area under the receiver operating characteristic curve) values >0.86. Performance was further tested using two independent datasets that resulted in AUC values of 0.96, whereas a comparison with previously published tools resulted in a maximum AUC value of 0.92. The most discriminating features were read pair orientation bias, genomic context and variant allele frequency. In summary, our results show a promising future for the use of these samples in molecular testing. We built the algorithm into an R package called Ideafix (DEAmination FIXing) that is freely available at https://github.com/mmaitenat/ideafix.	es_ES
dc.description.sponsorship	Departamento de Educaci ́on, Universidades e Investi- gaci ́on of the Basque Government [PRE 2019 2 0211 to M.T.A]; Ikerbasque, Basque Foundation for Science [to C.L.]; Starmer–Smith Memorial Fund [to C.L.]; Ministerio de Econom ́ıa, Industria y Competitividad (MINECO) of the Spanish Central Government [to C.L., PID2019- 104933GB-10 to B.C.]; ISCIII and FEDER Funds [PI12/00663, PIE13/00048, DTS14/00109, PI15/00275 and PI18/01710 to C.L.]; Departamento de Desarrollo Econ ́omico y Competitividad and Departamento de Sanidad of the Basque Government [to C.L.]; Aso- ciaci ́on Espa ̃nola Contra el Cancer (AECC) [to C.L.]; Diputaci ́on Foral de Guipuzcoa (DFG) [to C.L.]; Depar- tamento de Industria of the Basque Government [ELKA- RTEK Programme, project code: KK-2018/00038 to C.L., ELKARTEK Programme, project code: KK-2020/00049 to B.C., IT-1244-19 to B.C.]	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Oxford University Press	es_ES
dc.relation	info:eu-repo/grantAgreement/MINECO/PID2019-104933GB-10	es_ES
dc.rights	info:eu-repo/semantics/openAccess	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by-nc/3.0/es/	*
dc.title	Ideafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing data	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.rights.holder	© The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com	es_ES
dc.rights.holder	Atribución-NoComercial 3.0 España	*
dc.relation.publisherversion	https://academic.oup.com/nargab/article/3/4/lqab092/6412600#309156260	es_ES
dc.identifier.doi	10.1093/nargab/lqab092
dc.departamentoes	Ciencia de la computación e inteligencia artificial	es_ES
dc.departamentoeu	Konputazio zientziak eta adimen artifiziala	es_ES

Files in this item

Name:: lqab092.pdf
Size:: 1.075Mb
Format:: PDF
Description:: Artículo

View/Open

Name:: license_rdf
Size:: 920bytes
Format:: application/rdf+xml

View/Open

This item appears in the following Collection(s)

Artículos

Show simple item record

© The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License
(http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work
is properly cited. For commercial re-use, please contact journals.permissions@oup.com

Except where otherwise noted, this item's license is described as © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com