A growing trend in financial institutions is performing rule-based auto elimination on secondary attributes. So for example when we have a match on a name; but a mismatch on a secondary attribute such as date of birth, nationality, or location; then the match will be eliminated before a person looks at it.
We at SQA Consulting only encourage this practice, but it should be done with care, computer-based rules have no lee-way for applying common sense, so your rules should be precise and cover all eventualities.
In a previous article, we gave examples of where it would be unwise to eliminate on dates of birth being different, for many reasons. One of those reasons was the use of a default date of birth, and in this article, we will give you more information as to why that is the case.
A default date of birth is used when the operator who is inputting the data – or data migration program – cannot determine the real date of birth for whatever reason, and so for the sake of completeness uses a standard date of birth as a substitute. This is of course very bad practice as an incorrect date of birth is much more dangerous than a missing date of birth. What date of birth is used could be anything, but often follows a pattern, such as:
- A date with repeated year/month/day figures – 1911/11/11
- A system base date – 1900/01/01
- A significant event, such as the independence date of a country, or the date England won the World Cup
Some of the above a predictable, other not so. To determine to what extent we have defaulted date of birth in our system we can construct a histogram of how often individual dates of birth occur in our system. the following graph shows this for 100,000 people.
The first thing to notice is that even for 100,000 people there are remarkably few people who share a date of birth, this is why it is such a powerful elimination attribute.
The next thing to notice is that there are spikes in the graph where a date of birth is shared more than the surrounding dates. Many of these are the for first of January as described in a previous article, but others relate to different behaviour:
This example is relatively safe and clean but does show small numbers of bad behaviour, in another example where we performed similar data profiling we found 5% of all people shared the birthday 9th December 1959, which was the independence day of the country involved.
At SQA Consulting we regularly perform Data Profiling targeted at issues for screening such as the above, getting to intimately know your data – and its implications on the systems that consume it – is the first step towards building compliant solutions.
By safely applying auto elimination rules we can achieve great efficiency improvements the key is applying the rules in a safe way, please read the rest of our growing set of articles on how to auto eliminate safely here.
Alternatively, you can contact us at SQA Consulting, to see how we may assist you in developing the necessary skills needed for implementing these strategies.