Automatic disease detection by analysing peripheral blood smear images Mathew Ramsay Thesis presented in fulfilment of the requirements for the degree of Bachelor of Science in Computer Science (Honours) at the University of the Western Cape Supervisor: Dr Jean-Baka Domelevo Entfellner Co-supervisor: M. Ghaziasgar, R. Dodds, S. Tondeur This version November 24, 2017 ii Declaration I, Mathew Ramsay, declare that this thesis “Automatic disease detection by analysing peripheral blood smear images” is my own work, that it has not been submitted before for any degree or assessment at any other university, and that all the sources I have used or quoted have been indicated and ac- knowledged by means of complete references. Signature: ........................ Date: ........................ Mathew Ramsay. iii iv Abstract Whilst drepanocytes are important for sickled cell anemia disease, schizocyte detection is important for the diagnosis of thrombotic thrombocytopenic pur- pura and related thrombotic microangiopathies. Schizocytes are also a key indicator of a life threatening condition affecting a human patient. In the modern context only the latest generation automated cell counters provide a means to flag their operators if a schizocyte is detected, and very few of them are able to provide a schizocyte count. For the implementation of this project we have the primary focus of detecting schizocytes in peripheral blood smear images. Once detection is robust enough the program could be adapted to flag, and perhaps count the number of schizocytes versus erythrocytes. Im- plementation is based on using a high performance image processing library coupled with machine learning. We make use of Gussian filtering, Otsu’s Bi- narization, Canny edge detection, and a coefficient of roundness to extract objects and draw it onto a 20x20 pixel image. The k-Nearest Neighbours (kNN) algorithm and a support vector machine (SVM) is then trained and tested on after we convert our extracted image to a flattened feature vector of 400 pixels. This will allow us to cluster and classify images into groups based on their pixel feature set. i = 103 images representing erythrocytes, drepanocytes, and various schizocytes were used. The resulting classification prediction of kNN was only 10.5% when using n = 5000 unsupervised learn- ing samples and k = 8 neighbours. The best resulting classification prediction used a SVM with a linear kernal, which predicted 92.8%. For the SVM we employed a supervised learning problem with l = 2 labels, thus a binary clas- sification problem. We suspect the poor prediction for kNN results being due to an inconsistent and highly variable dataset. We are yet to find a uniform dataset of schizocyte images taken under constant conditions. The SVM pro- vides a much better prediction result using a linear kernel. I believe this project has definitely proved that machine learning and image processing is feasible when attempting to classify and count red blood cells based on their morphology. Acquiring a larger and better dataset is likely to improve results v by combating the variability in image quality, magnification, and cell shape. Lastly, this mini-thesis will detail the project analysis, design, development, implementation, and deployment phases of this project from a software point of view. vi vii Key words Canny edge detection Classification Clustering Disease detection and diagnosis Drepanocyte Erythrocyte Image processing K-nearest neighbours Kernel Machine learning Otsu’s Binarization Peripheral blood smear image Prediction Support vector machine Schizocyte viii ix Acknowledgment This thesis is a compilation of the efforts of many people that helped and me through the years. I would first like to thank my supervisor Dr. JB for encouraging me during my study. Without our weekly meetings, this work would not have been possible. At this time I would like to extend a very special thanks my co-supervisors. Without their help I would certainly not be where I am today. I would also like to thank the Trustees of the NRF government bursary scheme for their unwavering financial assistance, without which my efforts would have been impossible. x
Description: