Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS) Department of Industrial Engineering and Business Information Systems Data Driven Banking: Applying Big Data to accurately determine consumer creditworthiness Author: Shen Yi Man Final MSc Thesis Business Information Technology Track: IT Management & Innovation September 2016 Supervisors University of Twente: ir. K. Sikkel. dr. H.C. van Beusichem External Supervisor: Anonymous 30-9-2016 i Document Title: Data Driven Banking: Applying Big Data to accurately determine consumer creditworthiness Date: 30-9-2016 Author: Y. Man (Shen Yi) [email protected] s1128337 Educational Institution: University of Twente The Netherlands Faculty: Faculty of Electrical Engineering, Mathematics and Computer Science Department: Industrial Engineering and Business Information Systems Educational Program: MSc. Business Information Technology Specialization: IT Management & Innovation Graduation Committee Ir. Klaas Sikkel Faculty of Electrical Engineering, Mathematics and Computer Science Dpt. of Industrial Engineering and Business Information Systems University of Twente, Zilverling 4102 [email protected] Dr.ing. Henry van Beusichem Faculty of Behavioural, Management and Social Sciences Dpt. of Finance & Accounting University of Twente, Ravelijn 2315 [email protected] Anonymous Project Manager NeoBank The Netherlands Consumptive Finance [email protected] N.E.O. Bank and its employees, divisions and products are fictional entities to replace a large bank in the Netherlands who shall remain anonymous in the continuance of this thesis. I Preface While writing this foreword I am finally realizing that after six years of studying at the university, my academic career is coming to a close. It was an incredibly enjoyable period in which I grew a lot and learned that there is much more to life than I could have imagined beforehand. With my internship at NeoBank coming to an end, this concludes another chapter of my life. One that I will always remember with a smile on my face. This master thesis is the end result of the past six months I spent at NeoBank for my graduation project. Obstacles appeared in the course of this graduation project which I found quite challenging at times, but I’m glad to have done it altogether. It offered me the chance to learn a lot on the subjects of Big Data analytics, credit scoring and the banking sector. My master in Business Information Technology at the University of Twente will officially be completed at approval of this thesis. It also marks the beginning of my professional career which will start after a short vacation in Asia. I would like to express my gratitude to all the people that supported me during this journey and that helped me finalize this last project. I’d like to thank my friends and family for all their relentless nagging, support and understanding. It provided me with motivation and inspiration at difficult times. In particular, I’d like to thank all my supervisors for their input, feedback and wisdom. Thank you Klaas, for assuming a leading role and helping me through each step of the way to this final product. My thanks to you Henk, for helping me in the initial stages of the thesis. Thank you Henry, for jumping in on such a short notice and helping me put the last hand to the project. I’d like to thank Anonymous for the support and guidance during this pleasant period at Neo. Whenever I was stuck in a frame of thought, I could discuss things with you over the phone or in person. A special thanks to all the great colleagues at NeoBank that provided me with the help and information needed to complete my thesis. I do hope that this report will be of good use and interesting to read. Enjoy! Shen Yi Man Nijmegen, September 2016 II Executive Summary Financial institutions judge consumer creditworthiness on frequent basis. Errors and inaccuracies in this process cause an increased value of outstanding loans which will not be recuperated by banks due to default. Failing to comply with payment obligation can mark consumers for years, lowering their consumer creditworthiness and making it even more difficult for them to obtain a future loan. These are concerns of both authorities and banks when designing a financial product and its application process. To solve these problems accompanied by structural consumer debt, we turn to Big Data analytics. Conventional credit scoring methods at traditional banks are becoming less relevant in today’s age of massive data generation. Millennials are well-connected and more digitized than ever before. This leads to new possibilities when looking at the contents of new data and the applications that are possible with thorough analysis of great representative quantities. In specific, Machine Learning can be used to greatly improve three of the current five steps in which a credit scoring process is structured. (1) Data (2) Data (3) Data (4) Score (5) Decision Identification Collection Conversion Distribution Making Within Data Identification, new relevant data variables can be detected and used as proxies to measure the two components of creditworthiness: the ability to repay and the willingness to repay. These data variables can be used to enhance a credit scoring or risk model which is used in Data Conversion to compute a consumer credit score. The model is used to improve the credit scoring process on a list of conditions (Subsection 3.1.1). It is also possible in this step to let the model build and enhance itself through Machine Learning algorithms. The last step Decision Making can be improved by creating a proprietary automatic decision making algorithm through Machine Learning which will streamline the underwriting process. After an initial model is created by using the training data, it has to be validated and tested to measure its performance. The validation step is used to enhance and calibrate the model before it is practically tested or completely discarded. The accuracy is determined by taking historical data sets in which the result is known and comparing these true numbers with results generated by the new model. Predicted Result No Default Default Actual Result No Default True Positive (n) False Negative (n) Default False Positive (n) True Negative (n) During the data identification process, new credit models are built for testing which make use of newly discovered data variables. These new variables need to be validated through the Six-Point FICO Test before integration and testing within the model. These points are the following. 1. Regulatory compliance – All data sources and data variables must comply with the legislation. 2. Depth of information – This factor covers the detail and context of data variables. The richer the data, the more accurate the score will be when computed. High quality data must be acquirable. 3. Scope and consistency of coverage – To be relevant, the data source must cover a large percentage of the population. Format consistency is for operating, analyzing and storing the data. 4. Accuracy – The incoming data must be validated and tested on basis of historical data. 5. Predictiveness – To add value to credit risk models, data variables must be proven predictive towards consumer repayment behavior. This can be tested in practice through Machine Learning. 6. Additive Value (Orthogonality) – Data must be uniquely additive and not “double counted”. III Executive Summary ------------------------------------- Paragraph Deleted Due to Confidentiality ------------------------------------- The Big Data maturity had been qualitatively measured in order to gauge the possibilities in implementing Machine Learning at NeoBank. The assessment shows that the technical requirements are currently easily fulfilled to start implementation. However, if the demand continues to grow of Big Data oriented projects, the department will soon be short on hands to be able to capitalize on each opportunity. This research forms the basis of a recommendations plan (Section 6.2) to improve the Big Data maturity of the whole of NeoBank based on the TDWI Big Data maturity model assessment. DIMENSIONS TOTAL EVALUATED Data Big Data PARTIES Organization Infrastructure Analytics Governance Management Maturity NeoBank Company-wide DDA Department ------------------------------------- Paragraph Deleted Due to Confidentiality ------------------------------------- In the High Level Solutions which are provided additionally, strategic designs are explained which make use of the technology to gain a competitive advantage towards the market. These solutions intersect with different interests of various stakeholders and qualitative criteria. Extended drafts have been made to describe the scenario in which these HLSs could be implemented. The solutions had been ordered in level of disruptiveness. The first solutions in Personalization, Automatic Client Appraisal and Budget Counseling are more feasible to occur in the future according to market research than the last HLS: the IOIS platform. This is largely due to the high dependency on joint funding and the collaborative research effort of inter- organizational information systems. This risky endeavor would require extensive funding and commitment of all parties involved. Most banks would rather depend on their own internal system. IV Table of Contents Preface .......................................................................................................................................................... II Executive Summary ...................................................................................................................................... III List of Figures .............................................................................................................................................. VII List of Abbreviations .................................................................................................................................. VIII 1 Introduction .......................................................................................................................................... 1 1.1 Problem Statement ....................................................................................................................... 1 1.2 Research Objectives ...................................................................................................................... 2 1.3 Research Questions ...................................................................................................................... 2 1.4 Research Approach ....................................................................................................................... 3 1.5 Reading Guide ............................................................................................................................... 4 2 Methodology ......................................................................................................................................... 5 2.1 Systematic Literature Review ....................................................................................................... 5 2.2 Non-Scientific Literature Review .................................................................................................. 6 2.3 External Information Acquisition .................................................................................................. 6 2.4 Semi-Structured Interviews .......................................................................................................... 7 2.5 Drafting Solutions ....................................................................................................................... 10 3 Literature Review ................................................................................................................................ 13 3.1 Academic Literature .................................................................................................................... 13 3.1.1 Structure of Credit Scoring .................................................................................................. 14 3.1.2 Relevant Data ...................................................................................................................... 16 3.1.3 Applying Big Data Technology............................................................................................. 18 3.1.4 Accuracy of Creditworthiness ............................................................................................. 22 3.1.5 Big Data Maturity ................................................................................................................ 23 3.2 Non-Academic Literature ............................................................................................................ 26 3.2.1 Consultancy and IT Company Whitepapers ........................................................................ 26 3.2.2 Fintech Organizations ......................................................................................................... 33 3.2.3 Financial Authorities & Government .................................................................................. 38 3.3 Literary Insights on Research Questions ..................................................................................... 41 4 Analysis & Results ............................................................................................................................... 43 4.1 Big Data Maturity of NeoBank .................................................................................................... 43 4.1.1 NeoBank-wide Assessment ................................................................................................. 43 4.1.2 Data Driven Analytics Assessment ...................................................................................... 43 V 4.1.3 Big Data Maturity in the Financial Sector ........................................................................... 44 4.1.4 Final Assessment ................................................................................................................. 46 4.2 Traditional Credit Scoring at NeoBank ........................................................................................ 47 4.2.1 Credit Scoring Processes ..................................................................................................... 47 4.2.2 Models within Credit Scoring .............................................................................................. 47 4.3 External Market Review .............................................................................................................. 48 4.3.1 Overview of Data-Driven Lenders ....................................................................................... 48 4.3.2 External Interview Insights .................................................................................................. 50 4.3.3 Legal Impediments .............................................................................................................. 53 5 High Level Solutions ............................................................................................................................ 56 5.1 Machine Learning Basis............................................................................................................... 57 5.2 Automatic Personalization and Client Appraisal ......................................................................... 59 5.3 Budget Counseling ...................................................................................................................... 61 5.4 Collaboration and Standardization (IOIS) ................................................................................... 66 6 Conclusion & Discussion ..................................................................................................................... 70 6.1 Validation .................................................................................................................................... 73 6.2 Recommendations ...................................................................................................................... 74 6.3 Limitations of the Study .............................................................................................................. 75 6.4 Future Work ................................................................................................................................ 76 Bibliography ................................................................................................................................................ 77 Appendices .................................................................................................................................................. 81 Appendix A – Approached Organizations ............................................................................................... 81 Appendix B – Interview Questions .......................................................................................................... 82 Appendix C – TDWI Big Data Maturity Model Criteria Description ........................................................ 85 Appendix D – Big Data Maturity Dimensions Assessment NeoBank ...................................................... 87 Appendix E – Big Data Maturity Dimensions Assessment DDA .............................................................. 88 Appendix F – Capital Requirements Regulation and Directive IV ........................................................... 89 Appendix G – Straight Through Processing ............................................................................................. 91 Appendix H – Wet op het Financieel Toezicht ........................................................................................ 92 Appendix I – Improvement (ML) Process for Credit Scoring................................................................... 93 Appendix J – Machine Learning Techniques ........................................................................................... 94 Appendix K – Machine Learning Model Validation Techniques ............................................................. 95 VI List of Figures Figure 1. Intermediate Research Results ...................................................................................................... 3 Figure 2. Filtering Process of Literature Review ........................................................................................... 5 Figure 3. The DSRM Process Model (Pfeffers et al., 2007) ......................................................................... 10 Figure 4. High Level Solution Template ...................................................................................................... 11 Figure 5. Stakeholder Overview Template .................................................................................................. 12 Figure 6. Computing the Accuracy of a Credit Scoring Model (Confusion Matrix) ..................................... 22 Figure 7. Stages of Big Data Maturity (Halper & Krishnan, 2014) ............................................................... 23 Figure 8. Big Data Maturity Assessment Criteria (Halper & Krishnan, 2014) ............................................. 25 Figure 9. Maturity Scoring Table (Halper & Krishnan, 2014) ...................................................................... 25 Figure 10. Big Data Maturity Final Assessment .......................................................................................... 46 Figure 11. Potential Improvement in Credit Scoring Steps ......................................................................... 57 Figure 12. Stakeholder Overview HLS 1 ...................................................................................................... 60 Figure 13. Stakeholder Overview HLS 2 ...................................................................................................... 63 Figure 14. Mockup of Data Driven App – Login & Navigation Page............................................................ 64 Figure 15. Mockup of Data Driven App – Views ......................................................................................... 65 Figure 16. Stakeholder Overview HLS 3 ...................................................................................................... 68 Figure 17. Rough Architecture Sketch IOIS ................................................................................................. 69 Figure 18. Effective Level of Required Regulatory Capital (European Commission, 2016) ........................ 89 Figure 19. Transition of Required Capital Buffers based on CRD IV (Financial Market Lawyers, 2016) ..... 90 Figure 20. Example of a STP-Based Process in Trading (Docupace, 2016) ................................................. 91 Figure 21. An Overview of the Wft (De Nederlandse Overheid, 2016) (Rijksoverheid, 2016) ................... 92 Figure 22. Machine Learning in Credit Scoring ........................................................................................... 93 VII List of Abbreviations AFM Autoriteit Financiële Markten (Authority for the Financial Markets) AI Artificial Intelligence ANN Artificial Neural Networks AWS Amazon Web Services BTS Binding Technical Standards CBR Case Based Reasoning CDR Call Detail Records CI Customer Intelligence CoE Center of Excellence CRR/CRD IV Capital Requirements Regulation and Directive IV DDA Data Driven Analytics (NeoBank) DNB De Nederlandsche Bank (Dutch National Bank) DSRM Design Science Research Methodology EAD Exposure At Default EBA European Banking Authority FCRA Fair Credit Reporting Act GRC Governance, Risk management and Compliance HDFS Hadoop Distributed File System HLS High Level Solution ID3 Iterative Dichotomiser 3 IDB Inter-American Development Bank IOIS Inter-Organizational Information System LGD Loss Given Default LML Lifelong Machine Learning LTI Loan-To-Income LTV Loan-To-Value MDA Multi-Discriminant Analysis ML Machine Learning NEOFC NEO Fast Credit NVB Nederlandse Vereniging van Banken P2P Peer-to-Peer PD Probability of Default PFC Paleo Fast Credit PMO Program Management Office ROC Receiver Operating Characteristic ROI Return On Investment SEPA Single European Payment Area STP Straight Through Processing SVM Support Vector Machines TDWI The Data Warehouse Institute TILA Truth In Lending Act VfN Vereniging van financieringsondernemingen in Nederland Wbp Wet bescherming persoonsgegevens Wck Wet op het consumentenkrediet Wft Wet op het financieel toezicht WSBI World Savings and Retail Banking Institute VIII 1 Introduction Latest research has shown that consumer debt is structurally growing worldwide and has a certain correlation with the economic prosperity (Brown, Stein, & Zafar, 2015). Keynesian theory suggest that lending money is beneficial towards the economy as it leads to more expenditure. The increase in expenditure leads to increased production and growing industries. Growing industries lead to increased employment rates and provide a stimulus to the economy. Financial institutions play an essential role in this process as they collect idle savings and redistribute these funds in an uncertain environment. In spite of precautions, some consumers still loan to the extent they structurally cannot pay back the money they are indebted. When an economic crisis occurs, this massive scale occurrence is called an “economic credit bubble”. The value of assets deviate from the intrinsic value as the obtained credit of consumers deviates from the actual creditworthiness. Consumers spend money they do not actually own and in the prospect of paying back, fall behind in economic wealth and stay indebted. Detailed and accurate risk assessment is of key importance to prevent this from occurring. In the Netherlands, financial authorities such as the “Autoriteit Financiële Markten” (AFM), The Dutch Bank (DNB), and the government come into play when the risk of such an unfortunate event grows. These institutions create laws and guidelines that limit the playing field of banks in order to protect consumers. They supervise over all national financial institutions to keep relevant parties in check. The main goal of a financial firm will always be to generate value and earn profit in order to guarantee its existence. However, ethical conduct and a positive impact on the society are also primary goals for a bank. This sometimes results in a conflict of interest between various key stakeholders. 1.1 Problem Statement In this specific case of consumer credit, NeoBank in the Netherlands released a financial product called “NEO Fast Credit” (NEOFC). This income-based credit was designed as a short-term high interest loan which facilitates small abrupt payments in a convenient manner. The utility of this product lies in the fact that relatively small credit can be borrowed without a hassle for a short period of time in a consumer friendly way. The credit has to be completely paid off every three months, after which a new loan cycle can be started. The application process has been streamlined by implementing an automated superficial income-test without credit scoring model. Recently, the AFM collided with the “Nederlandse Vereniging van Banken” (NVB) in a dispute to extend the requirements before granting this type of short-term credit to an individual. The AFM argues that this financial product requires a wider client evaluation based on calculations in order to reduce the risk of structural consumer debt. The NVB disagrees, as this type of credit legally does not have to comply with the “Wet consumentenkrediet” (Wck). This law obligates extensive terms of client evaluation for financial products with a lending period of more than three months. Furthermore, extensive evaluation implicates increased transaction and overhead costs to facilitate and process such an income and expenses test. Moreover, it reduces the utility and consumer friendliness of this financial product as an extensive screening in its current form delays the application. There is a concerning and growing issue of consumer debt which can only be solved by gathering true, accurate and timely data of consumers. Improving risk assessment entails using more information to construct a complete and relevant consumer profile. In this research project, the possibilities are explored in solving the problem of structural consumer debt through the use of Big Data. Page 1 of 95
Description: