The setting of thresholds be it for a system upgrade, new system implementation or a desire to improve effectiveness or efficiency of a system will require a controlled process so that you have the right information to make your decision. The end goal should be to set the balance between operational efficiency and the system effectiveness that satisfies your internal risk appetite.
Too many false positives will increase the workload unnecessarily, ultimately leading to investigator fatigue and the potential for actual matches to get missed. A threshold that is too tight can lead to possible real matches not being alerted by the system, running the risk of being identified later by a corresponding bank.
How do you select the appropriate threshold?
A number of the activities recommended here typically require input from external assurance providers such as ourselves at SQA Consulting.
Guidance from the system vendor
If you are carrying out a system upgrade, understand what the upgrade involves and whether it has any bearing on the matching algorithm. Engaging the vendor early is important, particularly if it is a significant upgrade. Don’t be fobbed off with a response that the details will be included in the release notes, as that will be too late for test planning.
If you are embarking on a new system implementation, discussions with the vendor regarding how they support the setting of the threshold will be critical. Do they recommend a threshold? What threshold do their clients typically run the system at – are the clients comparable to your business model?
Determine early on what support they offer – is implementation tuning an additional cost, do they offer this as a service? If the model is cloud-based how many test files will they process for you as part of the tuning exercise?
Internal Baseline
Whether you are upgrading or changing systems altogether – you have a baseline level for matching which is the matching level in your current version. It is possible to create a test file based on the types of matches being processed and the results being generated. Essentially this is putting together a sample of matches that do currently hit on the system at the different thresholds. Depending on the type of ongoing assurance that you do carry out, this may be something you already have. A simple test can be running this test file and identifying those records which now do not generate an alert – depending on the changes to the threshold it may be valid that they do not match or it may indicate something else has changed in the algorithm. This is more an acid test rather than an in-depth review.
This will show you the types of matches you currently see but it won’t show you the types of matches the system doesn’t alert on.
Efficiency Based tuning
This requires understanding the level of false positives that the system is generating and determining if that is an acceptable level of work for the investigations team. Banks and organisations have finite resources, and these must be employed in the most effective way. Investigating low-quality alerts is not necessarily the most appropriate resource deployment as it leads to operator fatigue. There has been regulatory guidance stating that high numbers of false positives may indicate a need to review the screening programme [1] and also that undue levels of false positives may have a negative impact on the efficacy of the process [2].
Your MI and statistics should be able to demonstrate the volumes of alerts that are closed as false positives currently. You will have an idea of the levels of resources required to manage these volumes and overall turnaround times for closing the alerts. It can be hard to determine these rates in a test when assessing a new threshold unless you can test a file which is a copy from production to see the levels of false positives being generated at the new threshold. This does create information security issues due to the use of live customer data on the system.
SQA Consulting has a defined approach for measuring false-positive levels on a test system using test data. A file of approx. 25000 records can be run through at different thresholds to determine the anticipated false positive rate. This rate when considered in the context of expected daily production volumes will indicate the expected levels of false-positive alerts at the threshold under consideration.
It is these false-positive rates that can then be considered relative to how effective the different thresholds are at identifying matches for different business scenarios as discussed in the next paragraph.
Absolute Rule Tuning
This involves looking at certain business scenarios for matching and how the different thresholds impact their matching capability or effectiveness. Multiple business scenarios can be tested for both Personal and Company type names. For example, how does the fuzzy matching cater for a title included in the given name field or an added Spanish maternal name in the surname field? Or for a company name, how well will the threshold perform when matching against company suffixes in another language e.g. Ltd v LTDA. Understanding how the scenario performs at the different thresholds and looking at what names now do not match for the scenario when the threshold is increased is key here. Again, this is testing that will likely require input from an external assurance provider such as ourselves.
SQA Consulting runs over 30 different scenarios for both Personal names and Company type names as part of absolute rule tuning testing. These business scenarios have been built up based on experience in the field of fuzzy matching, constant analysis of the sanctions lists data, in-depth knowledge of how names are used and translated, ongoing reviews of regulator fines, client feedback, etc.
Plotting the scores for a scenario at the different thresholds relative to the false positive rate can help identify the most cost-effective threshold options. At certain points, the gains ineffectiveness will be coupled with large increases in the false positive rates, whereas at certain points gains ineffectiveness will have less impact on false-positive rates and it is these points that we try to identify.
When the detailed analysis of the internal thresholds is completed it is then pertinent to consider them relative to your peers. This is where benchmarking comes into its own.
Relative Rule Tuning – Benchmarking
This involves reviewing the effectiveness and efficiency rates of the system and the different thresholds under review and comparing them against other banks and organisation. To access the benchmarking data an external assurance provider will be required.
Ultimately each bank has their own risk, and as such comparing a bank to other banks can have its dangers – other banks do not necessarily set a precedent for what fuzzy matching capability is needed. However, it is comforting to know that you are operating at a level that is commensurate with other institutions – no one wants to be bottom of the curve unless you are looking for budget to upgrade/replace the system. Plotting the various thresholds under review against the benchmarks will also help provide evidence supporting the decision making.
SQA Consulting Benchmarking Example – Threshold Testing
Orange = Peers, Blue = Threshold comparison ranging from a threshold of 75 to 100.
Vertical Axis represents effectiveness scores, Horizontal Axis represents the rate of false positives.
Thresholds higher than 84 (up to 100) give effective results that are below the peer benchmark scores. The higher the threshold the tighter the rules and as such fewer matches will be identified as part of the fuzzy testing.
If this is a new system and potential threshold settings are being reviewed, as represented by the blue dots above, then the same test pack can also be tested through the existing screening system that is being considered for replacement. This provides similar information to the Internal Baseline testing described previously, but now we have an extra current system baseline comparison dot we can place in the above graph, which provides a valuable comparison between the current system and the proposed replacement system.
Conclusion
Implementing a threshold that achieves the balance between operational efficiency and the system matching effectiveness that satisfies your internal risk appetite will take some time and effort but is a fundamental step. Without it, any investment in a screening system will be wasted as you will not be able to prove that the system is working as expected or stand over the veracity of the system.
A combination of the above approaches will provide the tools necessary to make an informed decision regarding the appropriate threshold for your screening system.
For further information regarding threshold setting and screening in general please contact us at SQA Consulting.
[1] Federal Financial Institutions Examination Council
[2] JMLSG