Biometric deduplication

Nationwide biometric systems contain tens or hundreds of millions or even billions fingerprints. These data usually have been collected for years, from different country regions, by hundreds or thousands operators.

During enrollment, the data were not necessarily checked for duplicates: because of human errors or identity fraud biometric databases in AFIS or ABIS can be full of different errors and duplicates.

Biometric deduplication

Nationwide biometric systems contain tens or hundreds of millions or even billions fingerprints. These data usually have been collected for years, from different country regions, by hundreds or thousands operators.

During enrollment, the data were not necessarily checked for duplicates: because of human errors or identity fraud biometric databases in AFIS or ABIS can be full of different errors and duplicates.

The most frequent types of errors
  • Full duplicate of a record
    All fingerprints in a record are the same as in another record. Such errors occur, when a person gets a new ID without duplication check or as a result of a fraud.


  • Errors inside a record
    Some finger in a record belonging to one person is a duplicate of another finger of the same person. Such errors happen because of operator's mistakes.
  • Partial duplicates
    Errors in a record, where some fingerprints are duplicated from another record can be caused by human errors, when fingerprints are manually added to AFIS from folders.
Up to 2% of errors
Large-scale and nationwide biometric databases can have 1-2% of different errors.
All or most of all these errors can be discovered during automated deduplication.
Up to 2% of errors
Large-scale and nationwide biometric databases can have 1-2% of different errors.
All or most of all these errors can be discovered during automated deduplication.
Ideal scenario for finding all errors in a database is to match all fingerprint templates against each other. But there is a problem: number of matching operations required for all-to-all comparison is huge. Its calculated by the formula:

It makes the task almost impossible for “traditional” algorithms, that’s why vendors use different approaches in order to decrease number of matches and solve deduplication task. For example, it can be some “logic” approach like using one finger and match only fingers by position (index to index, thumb to thumb, etc.) or using several different algorithms – fast and less accurate and slow and more accurate for sequential search in the database.
Using such approaches along with large number of servers make deduplication task feasible, but - by definition – providing less accurate results compared to direct all-to-all matching.

Problem
where n – number of fingerprints in a database
For example, for the database of 10 mln people, where each person has 10 fingerprints all-to-all matching means 5 000 000 000 000 000 matching operations. Considering matching speed of “traditional” fingerprint recognition algorithms, deduplication with all-to-all matching would take from 3 000 to 30 000 years of processing on one modern computer*.

* - Numbers are taken from open sources (i.e. biometric companies websites and published NIST benchmarks). Accordingly to these sources average matching speed of “traditional” fingerprint recognition algorithms: from 10 000 to 100 000 matches per 1 s on one CPU.
Machine learning based algorithms make possible all-to-all matching in huge biometric databases. Neurodactyl fingerprint recognition algorithm has not only world’s top tier recognition accuracy, but impressive matching speed: up to 1 billion matching operations per 1 s on one CPU and up to 10 billion – on GPU**.
In the same example with 100 million fingerprints database and 5 000 000 000 000 000 matching operations, deduplication would take less than 2 months on one server with one CPU or less than 10 days on one server with one GPU. That implies direct all-to-all matching using the most accurate algorithm, which allows to find maximum errors of all types.

Solution
** - benchmarked on CPU Xeon Gold 6256 and GPU RTX 3090
How It Works
Simple deduplication program allows to solve deduplication task without complex integration with existing AFIS.
Import your fingerprints
The program takes as an input fingerprint images with person’s ID and fingers positions for this ID.
ID and finger positions also can be taken from file name.
Extract biometric templates
The program extracts proprietary biometric templates from fingerprint images. Biometric templates extraction for a database of 100 million fingerprints takes around 20 days on a single computer with one GPU or around 5 days on single server with 4 GPUs.
All-to-all matching
After template extraction a user indicates matching score and the program starts run all-to-all matching process. For a database of 100 million fingerprints matching takes around 2 months on one CPU Intel Xeon Gold or less than 10 days on one high-end consumer GPU.
Get results
Deduplication results are provided as 3 lists with different types of errors:
- Pairs of IDs where fingerprints of all fingers are the same (full duplicate)
- IDs where fingerprints of one person are duplicated (one finger is enrolled twice in the same record)
- Pairs of IDs and fingers positions, where there is a match of N fingerprints or more (N is indicated by a user)
Done!
Neurodactyl deduplication program finds maximum errors in big databases within days using minimum possible hardware and the most accurate matching algorithm.
Make deduplication for free!
Buy Neurodactyl fingerprint recognition SDK and get deduplication program for free.
Send a request to know more details.