Pérez Sena, Francis Damián
MetadataShow full item record
The problem to address is the accommodations deduplication. The deduplication is a special case of entity resolution (ER) consisting in grouping different representa- tions of the same entity, usually coming from different sources. The deduplication is a complex process that requires several phases, being the most common ones, block- ing and pair resolution. A new phase is introduced in addition to the previous ones, clustering, that was not considered in previous work. We aim to build a framework able to cover the different phases and design a strategy of clustering maximizing the precision with the maximal possible recall.