THE DESIGN AND DEVELOPMENT OF GPU ACCELERATED ALGORITHMS FOR AB INITIO INTEGRALS AND INTEGRAL DERIVATIVES ILLUSTRATED ON AB INITIO QUANTUM AND HYBRID QM/MM DYNAMICS CARINA ALICIA RENISON SUPERVISOR: PROFESSOR KEVIN J. NAIDOO n A THESIS SUBMITTED IN ACCORDANCE WITH THE REQUIREMENTS FOR w THE DEGREE OF o T DOCTOR OF PHILOSIPeHY p a C IN T HE f o y t SCIENTIFIC iCOMPUTING RESEARCH UNIT s r e v iTHE DEPARTMENT OF CHEMISTRY n U UNIVERSITY OF CAPE TOWN APRIL 2016 n w The copyright of this thesis vests in the author. No o T quotation from it or information derived from it is to be published without full acknowledgeement of the source. p The thesis is to be used for private study or non- a C commercial research purposes only. f o Published by the Universit y of Cape Town (UCT) in terms y t of the non-exclusive license granted to UCT by the author. i s r e v i n U ii ABSTRACT THE DESIGN AND DEVELOPMENT OF GPU ACCELERATED ALGORITHMS FOR AB INITIO INTEGRALS AND INTEGRAL DERIVATIVES ILLUSTRATED ON AB INITIO QUANTUM AND HYBRID QM/MM DYNAMICS CARINA ALICIA RENISON Graphical Processing Units (GPUs) are highly parallel, programmable accelerators boasting high peak floating point performance. Over the last couple of years the use of GPUs for general purpose computing have revolutionized quantum chemistry. The computational bottleneck in an ab-initio quantum method is the calculation of a large number of two- electron integrals. To date, a number of GPU accelerated two-electron integral implementations have been developed significantly improving the performance of a static quantum mechanical (QM) calculation. However, when performing an ab-initio QM gradient calculation, optimization, QM or Hybrid Quantum Mechanical/Molecular Mechanical (QM/MM) dynamics simulation the two- electron integral derivatives arise as an additional bottleneck. Hybrid QM/MM methods particularly dynamics methods are commonly used to study large chemical/biological systems. These methods are a popular choice used for studying reaction mechanisms, conformational and configurational structures important in glycobiology. Usually a semi- empirical QM method is used however, these have shown variable accuracy for the study of carbohydrate conformation and prevents an analytical investigation of electronic structure. The use of a higher level of theory, such as an ab initio method, is desirable however this comes at a much greater computational cost. Together with the bottlenecks above this cost results from the polarization of the QM region within an electrostatic embedding scheme, which requires the calculation of a large number of one-electron integral derivatives. Thus for QM/MM calculations the one-electron integral derivatives becomes a third bottleneck together with the two mentioned above. Recently, the above bottlenecks have become popular GPU acceleration targets. This thesis describes an extension of the GPU based Quantum Supercharger Library (QSL) to perform the above calculations/simulations. In contrast to GPU packages developed from iii the ground up, the QSL is a library of routines aimed at accelerating legacy codes, such as GAMESS-UK, GAMESS-US and NWChem, used in electronic structure calculations. Algorithms are presented for accelerating the one- and two-electron integral derivatives on a GPU. In addition to the derivatives, the one-electron integral calculation was ported to the GPU in order to remove a small additional cost arising for large QM/MM systems. Furthermore, the use of automatic code generation for generating GPU kernels was explored and compared to the original approach. Using the QSL library implemented in GAMESS-UK several benchmark calculations and simulations were performed. These were performed in double precision on a single GPU (Kepler K20) and compared to a single CPU using the 6-31G basis set. A speedup of up to 9.3X is achieved for an ab initio gradient calculation compared to the CPU running an optimized serial version of GAMESS-UK using the Schlegel method. For a single point QM/MM calculation of cellobiose in different sized water spheres (3267-24843 point charges) speedups of between 13X and 34X is achieved. QSL/GAMESS-UK coupled to the CHARMM molecular dynamics package was then used in order to perform accelerated molecular dynamics simulations. Benchmark QM and QM/MM molecular dynamics simulations were performed on cellobiose in vacuo and in a water sphere (45 QM atoms and 24843 point charges respectively). The QSL is able to perform 9.7 ps/day of ab initio QM dynamics and 6.4 ps/day of QM/MM dynamics. Testing of the auto-generated version of the QSL showed better performance for lower angular momentum classes but reduced performance for higher angular momentum classes. The efficiency of the integral and derivative routines within the QSL library was tested on a computationally intense realistic glycobiological condensed phase free energy computation. Ab-initio pucker free energy surfaces/volumes of !-Ribofuranose and !-Glucopyranose in vacuum and in water were computed. These are the first converged Hartree-Fock/MM free energy simulations for these carbohydrates performed in solution. The value of the ab initio Free energy surfaces/volumes was demonstrated through an analysis of solvent polarization effects that are evident through a comparison of the vacuum and solution minimum free energy pathways. In particular the water structure around the pucker conformers, analysis of the primary alcohol distribution as well as analysis of the electronic structure of these conformers reveals that water significantly affects the free energy pathways, primary alcohol distribution and the barriers for inter-conversion between pucker conformers. iv P UBLICATIONS While this thesis describes my own work performed under the guidance of my supervisor Prof. Kevin J. Naidoo, aspects of the work have been published. My own work represents all aspects related to a) the 1-electron integral algorithm and code development, b) 1-electron integral derivatives algorithm and code development and c) 2-electron integral derivatives algorithm and code development. Sections of this work have been published in: 1. C. Alicia Renison, Kyle, D. Fernandes and Kevin J. Naidoo, J. Comput. Chem. 2015, 36, 1410-1419 In this paper only development work performed by me is reported. This was combined with the HF QSL plugged into GAMESS-UK to be reported in Mr. Fernandes’ thesis (a doctoral student under Prof. Naidoo’s supervision). The entire QSL was needed to perform optimization and QM/MM simulations. Hence Mr. Fernandes in addition to his helpful discussions is a co-author on this paper. 2. Kyle, D. Fernandes, C. Alicia Renison and Kevin J. Naidoo, J. Comput. Chem. 2015, 36, 1399-1409 In this paper the development work of the HF QSL performed by Mr. Fernandes and the 1-electron integral development performed by me are reported. v D ECLARATION I, Carina Alicia Renison, hereby declare that the work on which this thesis is based is my original work (except where acknowledgements indicate otherwise) and that neither the whole work nor any part of it has been, is being, or is to be submitted for another degree in this or any other university. I authorise the University to reproduce for the purpose of research either the whole or any portion of the contents in any manner whatsoever. CARINA ALICIA RENISON APRIL 2016 vi A CKNOWLEGEMENTS I am thankful for financial support provided by the University of Cape Town, the South African Research Chairs Initiative (SARChI) and the National Research Foundation (NRF). I would like to thank the Nvidia Professor Partnership Program to Kevin J. Naidoo for a generous donation of equipment used in this study and for the usage of their development cluster. In addition I would like to thank Mark Berger from Nvidia for always keeping me updated on all the latest in the GPU world. Thank you to my supervisor Prof. Kevin Naidoo for allowing me the opportunity to work with you. In addition I would like to thank all the members of the Scientific Computing Research Unit who helped me at some point during my stay and with whom I shared interesting discussions. In particular I would like to thank Kyle Fernandes for all the things he taught me, especially about GPU computing and for sharing the joys and frustrations of being a PhD student. I would also like to thank Dr. Chris Barnett for lots of discussions about puckering and other random topics over a cup or two of coffee. My sincere thanks goes to my husband Martin for supporting me with my studies over several years. Thank you for all your encouragement and understanding. Last but not least I would like to thank the rest of my family for all their support during the last couple of years. vii A BBREVIATIONS ALU: Arithmetic Logic Unit AM1: Austin Model 1 API: Application Programming Interface ATP: Adenosine triphosphate CP: Cremer and Pople method CPU: Central Processing Unit CUDA: Compute Unified Device Architecture DFT: Density Functional Theory DNA: Deoxyribonucleic acid ERI: Electron Repulsion Integral FEARCF: Free Energy from Adaptive Reaction Coordinate Forces FPGA: Field Programmable Gate Array GPU: Graphical Processing Unit HF: Hartree Fock ILP: Instruction Level Parallelism IUPAC: International Union of Pure and Applied Chemistry LJ: Lennard Jones MD: Molecular Dynamics MM: Molecular Mechanics NMR: Nuclear Magnetic Resonance Spectroscopy NVT: Canonical ensemble PM3: Parametric Method Number 3 PMF: Potential of Mean Force ps: picosecond QM: Quantum Mechanics QM/MM: Hybrid Quantum Mechanics/Molecular Mechanics QSL: Quantum Supercharger Library RNA: Ribonucleic acid viii SCF: Self Consistent Field SIMD: Single Instruction Multiple Data SM/SMX: Streaming Multiprocessor STO: Slater-type orbital GTO: Gaussian-type orbital TIP3P: Transferrable Intermolecular Potential 3P VVER: Velocity Verlet method VV2: New Velocity Verlet method WHAM: Weighted Histogram Analysis Method ix
Description: