Relative Content

Tag Archive for pythonduplicatesrecord-linkage

How to perform deduplication with the python record linkage toolkit with large data sets?

I use blocking to trim down the size of the index of record pairs, but sometimes I need to do a full index (or sortedneighborhood on a couple of columns) on a large data set with approx 1M records, which results in a couple billion record pairs.

Thiết kế website giá rẻ

Danh mục

Relative Content

Tag Archive for pythonduplicatesrecord-linkage

How to perform deduplication with the python record linkage toolkit with large data sets?