Noise Words – Which words to use?

 

Commonly used words from your customer names should be the first port of call. How many of your customers have LTD or Bank in their name? To support the decision making understanding the make-up of the sanctions lists and common names recorded for companies in your jurisdiction will be beneficial.

The most commonly used words for company names registered in the United Kingdom are:

Screen Shot 2020 07 28 At 08.57.17

Some words really stand out as good candidates, like “LTD”, “Group”, “Holdings”, “PLC”, and “Services”. Other words like “London”, and “Europe” are geography-related so not necessarily as black and white for inclusion on the list. Would you consider the following to be a valid match or not:

RGI International London = RGI International Paris?

Words that are less frequent include: Property, and Properties; these both seem like good candidates and are good examples of including both the singular and plural form of a noise word. The example common words we have looked at above are common words from UK company names, but other languages must also be considered particularly where you have more diverse geographical operations and clients. For example, for company names from France, the more commonly used words include S.A., sarl, groupe, international, technologies, &, societe, France, etc. A different set of words but there can be some overlap. What languages does the vendor provide noise words in?

The sanctions lists also contain common words, which can be easily identified through analysis of the various consolidated lists. Some examples include company, and, for, limited, ltd, ltda, etc.

The use of noise words can be extremely beneficial at helping to refine the matching process but there are scenarios when it can back-fire.

The more words that are included on the noise words lists, the more the likely the possibility that a name is entirely composed of noise words and removing them will result in an empty name. This is, of course, undesirable, and those deploying strategies including high volumes of noise words should consider matching both with and without noise words.

Another danger with noise words is when a spelling error occurs. If there is a spelling error for example on the list side, when the noise words rule is applied the noise word won’t be removed from the list word and the matching will be unbalanced.

For example:

RGI Intermational versus RGI International

In the example above there is the spelling error in international so when the noise words rule is applied this matching becomes:

RGI Intermational versus RGI

So rather than the very close match, it is now much less likely to match.

This presents another argument that matching both with and without noise words should be considered as standard practice. Instead of just taking the default noise words list and assuming it works as other clients use it, testing and more testing is the key.

 

If you would like to learn more about this, and how SQA Consulting may assist you in such needs please contact us.

 

  • Iso 27001 2013 Badge White
  • CE+ Logo Affiliated Hi Res
  • Iso 9001 2015 Badge White