DISCRIMINATION OF ACCEPTABLE AND CONTAMINATED HEPARIN BY CHEMOMETRIC ANALYSIS OF PROTON NUCLEAR MAGNETIC RESONANCE SPECTRAL DATA By Qingda Zang A Dissertation Submitted to the University of Medicine and Dentistry of New Jersey – School of Health Related Professions in partial fulfillment of the Requirements for the Degree of Doctor of Philosophy Department of Health Informatics April, 2011 ii ABSTRACT DISCRIMINATION OF ACCEPTABLE AND CONTAMINATED HEPARIN BY CHEMOMETRIC ANALYSIS OF PROTON NUCLEAR MAGNETIC RESONANCE SPECTRAL DATA Qingda Zang Heparin is a highly effective anticoagulant that can contain varying amounts of undesirable galactosamine impurities (mostly dermatan sulfate or DS), the level of which indicates the purity of the drug substance. Currently, the United States Pharmacopeia (USP) monograph for heparin purity dictates that the weight percent of galactosamine in total hexosamine (%Gal) may not exceed 1%. In 2007 and 2008, heparin contaminated with oversulfated chondroitin sulfate (OSCS) was associated with adverse clinical effects, i.e., a rapid and acute onset of a potentially fatal anaphylactoid-type reaction. In order to develop efficient and reliable screening methods for detecting and identifying contaminants in existing and future lots of heparin to ensure the integrity of the global supply, chemometric techniques for heparin proton nuclear magnetic resonance (1H NMR) spectral data were applied to establish adequate multivariate statistical models for discrimination between pure heparin samples and those deemed unacceptable based on their levels of DS and/or OSCS. iii The whole research work consisted of two parts: (1) the development of quantitative regression models to predict the %Gal in various heparin samples from NMR spectral data. Multivariate analyses including multiple linear regression (MLR), Ridge regression (RR), partial least squares regression (PLSR), and support vector regression (SVR) were employed in this investigation. To obtain stable and robust models with high predictive ability, variables were selected by genetic algorithms (GA) and stepwise methods; (2) differentiation of heparin samples from impurities and contaminants by the different pattern recognition and classification approaches, such as principal components analysis (PCA), partial least squares discriminant analysis (PLS-DA), linear discriminant analysis (LDA), k- nearest-neighbor (kNN), classification and regression tree (CART), artificial neural networks (ANN) and support vector machine (SVM), as well as the class modeling techniques soft-independent modeling of class analogy (SIMCA) and unequal dispersed classes (UNEQ). Overall, the results from this study demonstrate that NMR spectroscopy coupled with multivariate chemometric techniques shows promise as a valuable tool for evaluating the quality of heparin sodium active pharmaceutical ingredients (APIs). These developed models may be useful in monitoring purity of other complex pharmaceutical products from high information content data. iv ACKNOWLEDGEMENTS I would like to acknowledge my advisor Dr. Dinesh P. Mital for his inspiring supervision and supportive attitudes. The completion of this dissertation could not have been possible without his invaluable guidance and unending patience. I wish to express my gratitude to my co-advisor, Dr. William J. Welsh who has given me the opportunity to be where I am today. I would like to thank him for trusting me and letting me go my own way. I want to express my sincere thanks to the faculty members at the Department of Health Informatics, especially to Dr. Syed S. Haque, Dr. Shankar Srinivasan, and Dr. Masayuki Shibata, for their expertise, training, advice and assistance throughout my graduate study. I am very grateful to Dr. Richard D. Wood at Snowdon, Inc. for his stimulating discussion, timely encouragement and constructive suggestions. I would also like to thank the staff at the US Food and Drug Administration (FDA). They provided the analysis data and more importantly, the financial support, which made the research work possible. The collaboration with them has greatly broadened my perspectives and I have learned a great deal from them. Special thanks to Dr. Lucinda F. Buhse, Dr. David A. Keire, Dr. Christine M. V. Moore, Dr. Moheb Nasr, Dr. Ali Al-Hakim, and Dr. Michael L. Trehy. v I would like to extend my gratitude to Dr. Dmitriy Chekmarev at the Department of Pharmacology for spending his time in reviewing this dissertation and valuable comments and feed-back. Finally, I wish to thank my colleagues at Dr. Welsh‟s group, Dr. Ni Ai, Dr. Vladyslav Kholodovych, Dr. Eric Kaipeen Yang and Dr. Oyenike Olabisi for their consistent enthusiasm and reliable willingness to help, and friendly and pleasant environment. vi TABLE OF CONTENTS ABSTRACT ..................................................................................................... iii ACKNOWLEDGEMENTS ............................................................................... v LIST OF TABLES ............................................................................................ix LIST OF FIGURES ..........................................................................................xi Chapter I. INTRODUCTION ............................................................................ 1 1.1 Statement of the Problem ...................................................................... 1 1.2 Background of the Problem ................................................................... 4 1.3 Objectives of the Research .................................................................... 7 1.4 Research Hypotheses ........................................................................... 9 1.5 Results and Significance of the Research ........................................... 11 Chapter II. LITERATURE REVIEW ............................................................... 16 2.1 The Structure, Preparation and Medical Use of Heparin ..................... 17 2.1.1 Structures of Glycosaminoglycans (GAGs) ................................... 17 2.1.2 Preparation of Heparin .................................................................. 21 2.1.3 Medical Use of Heparin ................................................................. 22 2.2 Heparin Crisis ...................................................................................... 24 2.2.1 Adverse Events ............................................................................. 25 2.2.2 Contaminant Identification ............................................................. 26 2.2.3 USP Monograph for Heparin Quality ............................................. 32 2.3 Chemometrics and its Application in Heparin Field ............................. 33 2.3.1 Variable Selection ......................................................................... 34 2.3.2 Multivariate Regression Analysis .................................................. 39 2.3.3 Chemometric Pattern Recognition ................................................ 46 2.3.4 Application of Chemometrics in Heparin Field .............................. 67 Chapter III. DATA AND METHODS .............................................................. 72 3.1 Heparin Samples ................................................................................. 72 3.1.1 Pure, Impure and Contaminated Heparin APIs for Classification .. 72 3.1.2 Heparin API Samples for %Gal Determination .............................. 73 3.1.3 Blends of Heparin Spiked with other GAGs .................................. 74 3.2 Proton NMR Spectra............................................................................ 75 3.3 Data Processing .................................................................................. 77 vii 3.4 Computational Programs ..................................................................... 79 3.5 Performance Validation ....................................................................... 80 Chapter IV. RESULTS AND DISCUSSION ................................................... 82 4.1 Multivariate Regression Analysis for Predicting %Gal ......................... 82 4.1.1 Variable Selection ......................................................................... 82 4.1.2 Multiple Linear Regression Analysis ............................................. 90 4.1.3 Ridge Regression Analysis ........................................................... 97 4.1.4 Partial Least Squares Regression Analysis ................................ 101 4.1.5 Support Vector Regression Analysis ........................................... 105 4.2 Classification of Pure and Contaminated Heparin Samples .............. 108 4.2.1 Principal Components Analysis ................................................... 110 4.2.2 Partial Least Squares Discriminant Analysis ............................... 115 4.2.3 Linear Discriminant Analysis ....................................................... 119 4.2.4 k-Nearest-Neighbor ..................................................................... 123 4.2.5 Classification and Regression Tree ............................................. 128 4.2.6 Artificial Neural Networks ............................................................ 133 4.2.7 Support Vector Machine .............................................................. 137 4.2.8 Analysis of Misclassifications ...................................................... 141 4.2.9 Classification Analysis of Heparin Spiked with other GAGs ........ 145 4.3 Class Modeling for Discriminating Heparin Samples ......................... 149 4.3.1 SIMCA Analysis .......................................................................... 149 4.3.2 UNEQ Analysis ........................................................................... 165 Chapter V. SUMMARY AND CONCLUSIONS ............................................ 173 5.1 Multivariate Regression for Predicting %Gal ..................................... 173 5.2 Classification for Pure and Contaminated Heparin Samples ............. 175 5.3 Class Modeling Using SIMCA and UNEQ ......................................... 180 Chapter VI. FUTURE DIRECTION FOR RESEARCH ................................ 184 References .................................................................................................. 188 Appendix A: Abbreviations .......................................................................... 204 Appendix B: Index ....................................................................................... 207 viii LIST OF TABLES Table 1. Summary Statistics of %Gal Measured from HPLC ........................... 74 Table 2. Variable IDs and their Corresponding Chemical Shifts ...................... 79 Table 3. The Stepwise Variable Selection Procedure for Dataset A ............... 85 Table 4. The Stepwise Variable Selection Procedure for Dataset B ............... 86 Table 5. Parameters for the Genetic Algorithms ................................................ 87 Table 6. The Variables (ppm) Selected by Genetic Algorithms ....................... 89 Table 7. Model Parameters of Multiple Linear Regression (MLR) ................... 92 Table 8. Model Parameters of Ridge Regression (RR) ................................... 100 Table 9. Model Parameters of Partial Least Squares Regression (PLSR) .. 104 Table 10. Model Parameters for Support Vector Regression with RBF Kernel ................................................................................................................................... 107 Table 11. Number and Type of Misclassifications (Errors) by PLS-DA Classification ........................................................................................................... 118 Table 12. Wilks‟ Lambda ( ) & F-to-enter (F) of Variables (V) for Various v Models ...................................................................................................................... 120 Table 13. Performance of LDA Classification Models under Different Variables .................................................................................................................. 121 Table 14. Performance of kNN Classification Models for Original Data ....... 124 Table 15. Performance of PCA-kNN Classification Models under Different PCs ........................................................................................................................... 125 Table 16. Model Parameters and Classification Rates for CART .................. 130 Table 17. Model Parameters and Classification Rates for ANN .................... 137 ix Table 18. Model Parameters and Classification Rates for SVM .................... 141 Table 19. Classification Matrices for the Heparin vs DS Model in the 1.95- 5.70 ppm Region .................................................................................................... 143 Table 20. Classification Matrices for the Heparin vs [DS + OSCS] Model in the 1.95-5.70 ppm Region .................................................................................... 144 Table 21. Classification Matrices for the Heparin vs DS vs OSCS Model in the 1.95-5.70 ppm Region ........................................................................................... 144 Table 22. Compositions of the Series of Blends of Heparin Spiked with other GAGs and Test Results for Classification from SVM, CART and ANN in the 1.95-5.70 ppm Region ........................................................................................... 148 Table 23. Sensitivity and Specificity from SIMCA Modeling for Heparin, DS, and OSCS ............................................................................................................... 151 Table 24. Classification Matrices and Success Rates from SIMCA Class Modeling for Heparin, DS and OSCS ................................................................. 157 Table 25. Discriminant Powers (DP) of Variables (V) for Various Models ... 161 Table 26. The Compositions of the Series of Blends of Heparin Spiked with other GAGs and Test Results from Class Modeling ......................................... 164 Table 27. Wilks Lambda (λ) and F-to-enter (F) Values of Variables (V) ....... 167 Table 28. Sensitivity and Specificity from UNEQ Class Modeling for Heparin, DS and OSCS ......................................................................................................... 169 Table 29. Classification Matrices from UNEQ Class Modeling for Heparin, DS and OSCS ............................................................................................................... 172 x
Description: