ebook img

Linguistic Identity Matching PDF

257 Pages·2013·3.061 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Linguistic Identity Matching

Eltsine AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAbbbbbddddddoooooooooooooooooooooooooolllllllRRRRRRRRRRRRRRRRRRaaaaaaahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhmmmmmmmmmmmmmmmmmmaaaaaaaaaaaaaaannnnnnnnnnnnnnnnnnnnn AAAAAAAAAAAAAAAAAbbbbbbbbbbbddddddddddddddooorrrraaahhhhhhhhhhhmmmaannnnnnnnn Yeltsin Ieltsin AAAbbddaall-RRahmmaann Bertrand Lisbach Victoria Meyer AAAAAbbbbbbbduurrRRRRRRRaaahhhhmmmmmmmmmaaaaaaaannnnnnnnn Linguistic Identity Matching Linguistic Identity Matching ⋅ Bertrand Lisbach Victoria Meyer Linguistic Identity Matching BertrandLisbach VictoriaMeyer Freiburg,Germany Zurich,Switzerland ISBN978-3-8348-1370-1 ISBN978-3-8348-2095-2(eBook) DOI10.1007/978-3-8348-2095-2 LibraryofCongressControlNumber:2013939631 SpringerVieweg ©SpringerFachmedienWiesbaden2013 Thisworkissubjecttocopyright. Allrightsarereserved, whetherthewholeorpartofthematerialis concerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation,broadcasting, reproductiononmicrofilmorinanyotherway,andstorageindatabanks.Duplicationofthispublication orpartsthereofispermittedonlyundertheprovisionsoftheGermanCopyrightLawofSeptember9,1965, initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Violationsareliable toprosecutionundertheGermanCopyrightLaw. Theuseofgeneraldescriptivenames,registerednames,trademarks,etc.inthispublicationdoesnotimply, evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevantprotectivelaws andregulationsandthereforefreeforgeneraluse. Printedonacid-freepaper. SpringerispartofSpringerScience+BusinessMedia www.springer.com v Foreword by David Smith My interest in identity matching comes from my years’ fraud investigation and AML work. A few years ago, I would have expected that banks and other financial institutions would, by now, have reduced the AML process to a routine, low level back office function. How wrong I was! At time of writing, the revolutions and uprisings in the Middle East have resulted in wide-scale sanctions where identity matching will be critical. The death of Osama bin Laden and the reported capture of a vast hoard of data will no doubt start a major anti terrorist financing effort. Looking at this from a bank’s point of view, this is a worrying time, given the scale of fines and sanctions imposed by regulators on both the firms and individuals. No decent bank wants to get caught out but this may not be as easy as it may appear. Before looking at how Linguistics should be a major step forward in Identity Matching it is worth considering the context in which it can be used in AML work in financial firms. The Wolfsberg Group 2009 Statement on AML Screening, Monitoring and Searching usefully sets out 4 key areas involving matching: • Screening (real time payment screening to prevent sanctions and other breaches) • Monitoring (either live or delayed monitoring for unusual transactions) • Client Screening (KYC not only at the opening of accounts but also at appropriate periods thereafter) • Searches (typically in response to regulatory enquiries, updated sanctions lists, etc.) The first major issue is scale – the volume of data in any large bank is vast, so the only realistic way of searching for names is using technology. Large banks that have grown, particularly by mergers, are frequently relying on legacy IT systems that were never designed with AML matching in mind. In addition, customer data may not be in the same format in different parts of the same bank and AML watch lists are not yet standardised between the various issuing authorities. The scale and complexity all conspire to make automated matching very difficult. Because of the dangers of missing a relevant match (a “false negative”), often vi Foreword the systems are allowed to return large numbers of false positive hits, which then need to be manually sorted. It is also worth considering the information contained on AML / sanctions lists. A full name with nationality, date of birth, address, etc would be very useful but is obviously not readily available for drug lords or terrorists, so matching even just a few names that are not presented in a set format presents a real challenge. The information set out in this book is fascinating – the range of linguistic variations is quite amazing. How any matching worked before must be something of a mystery unless it is of the so-called exact name matching variety of a decade ago. I am sure the book will be read with interest by AML professionals but I would be disappointed if it were not studied very carefully by regulators and authorities issuing AML / sanctions lists. If they are short of time I would recommend at least chapter one, some of chapters three to six, which get to grips with linguistics, and of course, chapter seven on errors. The how to match considerations are essential for AML professionals but would also be a good place for regulators to start before doing reviews of banks’ AML procedures. To me, a linguistic approach to matching is a more logical starting point anyway. Take, for example, a Mexican regional bank being ordered to search for names of one of the murderous drug gangs. If the search was being done manually, the staff, being local, would automatically be using their knowledge of the language, common variations, etc. whereas the same search at, say, a Jersey bank would not benefit from that local knowledge. If linguistic matching can bridge that gap in knowledge then it could be key to ensuring that the danger of false negatives is eliminated whilst helping to reduce false positives. David Smith is a forensic accountant with over 20 years’ experience leading major fraud investigations. As the senior partner in KPMG Forensic Accounting, he worked extensively throughout Europe and the Middle-East. He currently works as a consultant dealing with investigations and corporate governance issues. vii Contents Introduction: Paradigm Shift in Identity Matching Theory Part I: Introduction to Linguistic Identity Matching 1 Chapter 1: Basic Concepts 5 1.1 Identity Matching and Name Matching 5 1.2 Database Profiles and Search Profiles 6 1.3 True and False Positives, True and False Negatives 8 1.4 Hit Rate and Search Accuracy (Recall and Precision) 9 1.5 Linguistic Identity Matching 10 Chapter 2: The Application of Identity Matching Techniques 12 2.1 Customer Relationship Management (CRM) 12 2.2 Anti-Money Laundering and Counter Terrorist Financing 16 2.3 Criminal Investigation and Crime Management 23 2.4 Information Search 25 2.5 Conclusion 26 Chapter 3: Introduction to Proper Names 28 3.1 Important Characteristics of Proper Names 28 3.2 The Historical Development of Naming Conventions 30 3.3 Western Names: the Germanic Example 31 3.4 Other Features of Western Naming Systems 33 3.5 The Russian Example 36 3.6 The Arabic Example 37 3.7 The Chinese Example 40 3.8 Conclusion 42 Chapter 4: Transcription 45 4.1 Transcription, Transliteration and Translation 46 4.2 Latin and Non-Latin Scripts 49 4.3 Romanisation of Cyrillic Names 52 4.4 Romanisation of Arabic Names 55 4.5 Romanisation of Names from the Chinese Script 60 4.6 Conclusion: Transcription as the Achilles Heel of Name Matching 65 viii Contents Chapter 5: Derivative Forms of Names 68 5.1 Aliases and Derivative Names 68 5.2 Hypocorisms 70 5.3 Translated Names 73 5.4 Derivative and Translated Forms of Names of Legal Persons 76 Chapter 6: Phonetically Similar Names 77 6.1 Homophones 77 6.2 Linguistic Matching and Phonetics 80 Chapter 7: Typos 82 7.1 Variations, Spelling Mistakes and Typos 82 7.2 Motor Function and the Role of the Keyboard 83 7.3 Optical Character Recognition 85 7.4 Typos in the Identity Matching Process 86 Part II: Name Matching Methods 89 Chapter 8: Name Matching Methods of the First Generation 92 8.1 Introduction 92 8.2 Pattern Matching with Levenshtein Distance 93 8.3 Pattern Matching with N-Gram Methods 97 8.4 Phonetic Encoding with Soundex 99 8.5 Thesaurus-Based Matching Methods 102 8.6 Summary of the Application of First Generation Name Matching Methods 104 8.7 Reasons for the Continued Application of First Generation Methods 107 Chapter 9: Second Generation Name Matching Methods 113 9.1 Introduction 113 9.2 G2 Pattern Matching: Advancements of Edit Distance and N-Gram Methods 113 9.3 G2 Phonetic Encoding: Advancements on Soundex 117 9.4 Generative Algorithms for Name Variants 120 9.5 Summary of the Application of Second Generation Matching Methods 126 9.6 Conclusion: Three Decades of Name Matching 129 Chapter 10: Third Generation Name Matching Methods 130 10.1 Introduction 130 10.2 Principle Requirements for G3 Solutions 131 Contents ix 10.3 Linguistic Similarity Keys for Transcription and Homophones 137 10.4 Thesauri for Derivative Names and Special Cases 142 10.5 Generative Algorithms for Covering Simple Typos 143 10.6 Integration of Methods 147 10.7 Conclusion 151 Chapter 11: Benchmark Study 154 11.1 Introduction 154 11.2 Match Methods to be Assessed 154 11.3 Methodology and Findings 156 11.4 Conclusion 165 Part III: Into the New Paradigm 169 Chapter 12: Name Matching and Identity Matching 172 12.1 Space-Related Identity Attributes 173 12.2 Time-Related Identity Attributes 179 12.3 Classifying Attributes 182 12.4 Identification Codes 185 12.5 Integration of Single Attribute Comparisons 186 12.6 Conclusion 191 Chapter 13: Evaluation of Identity Matching Software 193 13.1 Introduction 193 13.2 Defining Requirements 195 13.3 Selecting Potential Candidates 199 13.4 Test Focus and Test Design 201 13.5 Analysis of Results 211 13.6 Conclusion 214 Chapter 14: A Linguistic Search Standard 216 14.1 The Need for a Search Standard 216 14.2 A Proposed Linguistic Search Standard 219 14.3 Defining a Corporate Linguistic Search Standard 220 14.4 Calculating the Match Level of a Full Name 229 14.5 Applying a Linguistic Search Standard 230 Index 233 Introduction: Paradigm Shift in Identity Matching Theory

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.