A Preprocessing Procedure for Haplotype Inference by Pure Parsimony
Abstract
Haplotype data is especially important in the study of complex diseases
since it contains more information than genotype data. However,
obtaining haplotype data is technically difficult and expensive. Computational
methods have proved to be an effective way of inferring haplotype
data from genotype data. One of these methods, the haplotype inference
by pure parsimony approach (HIPP), casts the problem as an optimization
problem and as such has been proved to be NP-hard. We have designed
and developed a new preprocessing procedure for this problem. Our proposed
algorithm works with groups of haplotypes rather than individual
haplotypes. It iterates searching and deleting haplotypes that are not
helpful in order to find the optimal solution. This preprocess can be coupled
with any of the current solvers for the HIPP that need to preprocess
the genotype data. In order to test it, we have used two state-of-the-art
solvers, RTIP and GAHAP, and simulated and real HapMap data. Due to
the computational time and memory reduction caused by our preprocess,
problem instances that were previously unaffordable can be now efficiently
solved.