Bachelor’s thesis in Physics Hydration models for thione-, thiol- and deprotonised 2-Mercaptopyridine, from molecular dynamics simulations. Author: Supervisors: Anton Ljungdahl, Michael Odelius, [email protected] Jesper Norell Stockholm University, Department of Physics November 8, 2016 Abstract The water solvation near nitrogen- and sulphur-sites, in the thione- and thiol-forms of 2-Mercaptopyridine(labeled2TPand2MPrespectively),andinthedeprotonised,neg- atively charged version (labeled 2AP), has been investigated. The aim of the project was to provide insights into the solvation, which can form the basis for future investi- gations of ground state, and photo-excited state, proton transfer processes. Molecular dynamics simulations were performed of 2TP, 2MP and 2AP, in liquid water solution, under both constant pressure and constant volume, at ambient condi- tions, using a time step of 1 fs for a 90 ns production simulation time. Thefirstsolvationshellwaterstructures,nearthemolecules’nitrogen-andsulphur- sites, were obtained from analysis of both radial and spatial distributions functions. Based on this analysis, a solvation model for each organic molecule is presented. The deprotonised 2AP had the most well defined water structure, and its N- and S-site oxygen- and hydrogen-distribution functions were, qualitatively, replicated in corresponding unprotonated sites in 2TP and 2MP. There were on average 5 water molecules within the 2AP S-site first solvation shell, and 4 within the 2TP. The 2AP had on average 2 water molecules within the N-site first solvation shell, which was also the case for 2MP. The distribution functions showed no clear indication of hydrogen- bonding between the protonated sites and water oxygen. Sammanfattning F¨orflyttning och ¨overf¨oring av protoner ¨ar en viktig kemisk process i biologiska sys- tem. Thione-thiol tautomerer, d¨ar en proton f¨orflyttas mellan aktiva kv¨ave- och svavel-grupper, ¨ar en prototyp av dessa ¨overf¨oringsprocesser. Vattensolvatiseringen vidkva¨ve-ochsvavel-gruppernahosthione-ochthiol-formernaav2-Mercaptopyridine (kallade 2TP och 2MP), samt den deprotonerade versionen (kallad 2AP), har stud- erats. Detta i syfte att ge underlag f¨or f¨orst˚aelse av proton¨overf¨oring i grund- och fotoexciterade tillst˚and. Molekyldynamiksimuleringar av 2TP, 2MP och 2AP utf¨ordes, i vattenl¨osning, un- der konstant tryck och konstant volym, vid ambienta f¨orh˚allanden. Tidssteget var 1 fs, och l¨angden av simuleringen var 90 ns. Fr˚an radiella och spatiala f¨ordelningsfunktioner, r¨aknade fr˚an molekylernas kv¨ave- och svavel-grupper till vattnets syre- respektive v¨ate-atomer, kunde solvatiseringen i det f¨orsta skalet best¨ammas. Baserat p˚a denna analys s˚a presenteras tre solvatiser- ingsmodeller, en f¨or varje organisk molekyl, f¨or framtida anv¨andning i kvantkemiska bera¨kningar. 1 Acknowledgements I want to thank my wonderful family, Anna, Tor, Vide and my children’s grandparents. Without you my studies would not be possible. Thanks to Michael and Jesper, for giving me the chance to work on this project, and for all the great support and feedback. I feel privileged to have had you as supervisors. Thanks to my fellow thesis project workers in the C4:aquarium, Johan, Mihae and Math- ias, for the good company and interesting discussions. And special thanks to Jesper and Johan, for all the bash, text editor and general Linux tips. 2 Contents 1 Introduction 4 1.1 The thione-thiol tautomers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Purpose and motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 Theory 7 2.1 Molecular dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Basic principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.2 Molecular dynamics in statistical ensembles . . . . . . . . . . . . . . 9 2.2 Force field methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Quantum chemistry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 Structure and the distribution functions . . . . . . . . . . . . . . . . . . . . 13 2.4.1 Radial distribution function (RDF) . . . . . . . . . . . . . . . . . . . 14 2.4.2 Spatial distribution function (SDF) . . . . . . . . . . . . . . . . . . . 15 3 Simulation 16 3.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4 Results 20 4.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2 The liquid structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5 Discussion and conclusion 29 5.1 Suggestions for the quantum chemistry setup . . . . . . . . . . . . . . . . . . 29 5.2 General discussion of the results . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.3 Molecular flexibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.4 Uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 A Appendix 38 A.1 The Born-Oppenheimer approximation . . . . . . . . . . . . . . . . . . . . . 38 A.2 The leap-frog algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 A.3 Molecular dynamics using GROMACS . . . . . . . . . . . . . . . . . . . . . 40 A.4 Input parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3 1 Introduction The ambition when writing this thesis, was to present the project in a manner such that a hypothetical reader, with a background corresponding to an undergraduate physics student, would be able to understand and judge the method and results. This means, arguably, that the work should be somewhat self-contained, and as a consequence the background is rather extensive. A reader only interested in the scientific content could probably skip the entirety of Chapter 2, and the main part of Chapter 3 (except Section 3.2). To guide the reader through the text, a small summary of each chapter’s content is presented at its beginning. To introduce some necessary terminology and concepts, this chapter starts off with pre- sentingthestudiedchemicalsystems, inSection1.1. Thepurposeof, andmotivationbehind, the project is then given in Section 1.2. And last, Section 1.3 summarises the work done, and gives an overview of the thesis content. 1.1 The thione-thiol tautomers Three very similar, small, aromatic organic molecules, were studied. Two of them, 2- Thiopyridone (2TP) and 2-Mercaptopyridine (2MP), are structurally identical, up to the placement of a proton. The third is called 2AP, an anionic (negatively charged) variant, in which a proton is lost. Figure 1.1 shows structural drawings of the molecules, which illustrates how 2TP and 2MP differ from each other only by the placement of a proton, present either on the nitrogen (N-site) in the case of 2TP, or on the sulphur (S-site) in the case of 2MP. 2MP 2TP 2AP H H H H H H C C C C C C H C C H solv. pol. pH H C C H H C C H N C N C N C S H S S– H Figure 1.1: Schematic drawings of the two tautomers 2-Mercaptopyridine (2MP) and 2- Thiopyridone (2TP), with the deprotonised version, 2AP, to the far right. The 2AP negative charge is distributed over the whole molecule, and not only on the sulphur. The back-and-forth arrows illustrates how the equilibrium concentrations, of each form, depend on the solution envi- ronment (solvent polarity, pH, thermal equilibrium, optical excitation [8, 7]). 4 Theorganicchemistrynomenclatureforsuchmoleculesisthione-thioltautomers. Tautomers aredefinedasstructuralisomers1, thatinterconvertwitheachother, andtheequilibriumcan (as in this case), be controlled by solvent-solute interactions, e.g. polarity. This interconver- sion, or tautomerisation process, consists of transferring a proton. It is this tautomerisation proton which is lost in 2AP. The thione-thiol prefix means that the proton transfer is be- tween the nitrogen and sulphur groups, and 2TP is called the thione-form, while 2MP is the thiol-form. 1.2 Purpose and motivation The scientific goal was to obtain information about the tautomer-water structure, by per- forming molecular dynamics simulations of the three organic molecules in aqueous solution. Based on analysis of the structural data, a solvation model for quantum chemical calcu- lations, applied to soft X-ray spectroscopy, was to be developed. This in the context of a research project of Jesper Norell and Michael Odelius (of the Quantum Chemistry and Molecular Dynamics-group (QCMD) at the Stockholm University Physics department), in collaboration with Sebastian Eckert and Alexander F¨ohlisch (Helmholtz-Zentrum Berlin) and coworkers. They study excited state proton transfer processes, with the thione-thiol tautomers studied herein, as model systems. Proton-transfer processes are important chemical reactions in chemistry and biochem- istry. They represent the smallest, but still everywhere present, rearrangement in the chem- istry that is vital to life [8, 7]. Thione-thiol (and keto-enol, a transfer between carbon and oxygen groups) tautomerisms constitute prototypes of these processes, and has been studied extensively [8]. The amino acid cysteine, for example, has a sulphur-proton group that can be activated by thione-thiol interconversion, and this may play an important role in how a protein’s biological function is defined [8]. The tautomers studied in this thesis are smaller than cysteine, however, but contains the same functional nitrogen and sulphur groups, thus the study of these smaller systems may aid in understanding the more complex biochemical processes. The proton-transfer process was not directly studied in this project, but rather how the solvent (liquid water) reacts to the transfer. The idea was to investigate the hydrogen- bondingcapabilitiesoftheprotonatedandunprotonatedS-andN-sites,inthethreedifferent molecules. The primary simulation method was molecular dynamics (MD), and this method is pre- sented in more detail in Section 2.1. The open-source simulation package GROMACS (ver- sion5.1.2)wasusedforperformingthesimulations, andalsofor(mostof)theanalysis. Now, there is a big difference between learning how to get already available software to perform simulations corresponding to some pre-determined settings, and actually understanding why those settings were decided on, and how the results were calculated and analysed. Another purpose of this thesis, or secondary goal, was to demonstrate this understanding. This is accomplished by providing enough background on the methods used, both in the general context of computer simulations (with a focus on molecular systems), and in the specific 1Structural isomers are molecules that share the same structural formula, in this case C H NS (for 2TP 5 5 and 2MP), but have different bonds. 5 context of this project. 1.3 Overview MD simulations of the three organic molecules solvated in water, in NVT and NPT en- sembles, with a simulation time of 100 ns (allowing for 10 ns of equilibration time) and a time step of 1 fs, were performed (using GROMACS [3]) and analysed for structural infor- mation. The potential energy surface was obtained by force field methods, with OPLS/AA [16] as the chosen force field. The TIP4P/2005 [1] water model was used for water repre- sentation. Radial distribution functions from the N- and S-sites, to the water oxygen and hydrogen atoms, were calculated using GROMACS’ analysis tools. Corresponding spatial distributions were calculated using the free software TRAVIS [5]. To obtain starting con- figurations of the organic molecules, as well as distributed charges, preparatory quantum chemical calculations were performed, using the software Gaussian [10] (with HF/6-31G* level of theory). Nowtheaboveparagraphisrathertechnical, andprobablynotverytransparenttosome- one not familiar with the methods. The entireties of Chapter 2 and Section 3.1 are devoted to its explanation. The primary concern is introducing the MD method, and understanding ofthedistributionfunctions, butaverybriefoverviewofquantumchemistryisalsoincluded. A great deal can be said about all these subjects of course, but the focus here is on basic intuition and the underlying approximations, without a lot of time spent on technicalities. In Chapter 4, some simulation benchmarks are discussed, before the liquid structure results are presented. It is important to have some notion of how well the simulations rep- resent the chosen target conditions and model. Chapter 5 begins with a presentation of the solvation models, and then continues on to discuss the structure results and uncertainties. A slightly more formal (compared to Chapter 2) look on the Born-Oppenheimer approx- imation and on the leap-frog variant of the Verlet algorithm are given in Appendices A.1 and A.2. The Appendix A.3 explains the GROMACS specific workflow, to further help in understanding the molecular dynamics method. To make the performed simulations more replicable, the relevant GROMACS input parameters are given in Appendix A.4. An observation from the early simulations was that the organic molecules showed a consid- erable degree of flexibility, i.e breaking of the molecular plane, and this was hypothesized to affect the distribution functions. To investigate, rigid simulations were attempted, using the SHAKE algorithm (as available in GROMACS) to constrain the atoms to the molecular plane. These attempts largely failed, since the systems “exploded” under the constraints, one could say that the simulation engine broke down. The only successful rigid simulation was of 2TP, and with a 0.1 fs time step, for 10 ns. The full investigation of the rigid ver- sus flexible simulations was abandoned, due to time constraints, but the 2TP results are discussed in Section 5.3. 6 2 Theory Thepointofthischapteristopresentabriefoverviewoftheideasandtechniquesunderlying the methods used for this thesis project. Technical details are avoided, and the emphasis is put on the basic principles and ideas. This is at the cost of completeness, and many things have been left out. The important approximations are discussed however, and examples are given in an attempt to illustrate the ideas in relation to this thesis project. Most central is perhaps Section 2.4, on the liquid structure and distribution functions, since the results are related to this. But the chapter starts with a discussion of molecular dynamics, in Section 2.1, and then Section 2.2 further explains the concept of simulat- ing a molecular system, without explicitly solving the Schro¨dinger equation, via force field methods. Preparatory quantum chemical calculations were made, to optimise geometry and determine force field charge parameters, and so a very short presentation on quantum chem- istry is given in Section 2.3. Its focus is on the important approximations, and basically just mentions the two main computational methods of Hartree-Fock theory and Density Functional Theory. 2.1 Molecular dynamics The idea of the molecular dynamics (MD) simulation technique is to follow the classical dynamics of a many-particle system and efficiently sample the configurational space, and evolve the system in time. Classical in this context means that the simulation is modeling the dynamics of the atomic nuclei by using classical mechanics, and that any quantum mechanical behavior of the nuclei is ignored. The strong coupling of electronic and nuclear degrees of freedom makes it difficult to use molecular dynamics to study reaction processes, which involve the transitions of electrons. 2.1.1 Basic principles The core of a molecular dynamics simulation is solving (numerically) Newton’s equations of motion, for all particles in the system. Since these are second order differential equations of the form F = m ¨r , i = 1,2,...,N, (2.1) i i i foranN-particlesystem, adynamicssimulationneedsasetofinitialpositionsandvelocities, or phase space coordinates, at the initial time t . It also needs a potential energy function 0 V(r ,r ,...,r ) for the whole system, so that the force acting on each particle i can be 1 2 N 7 computed through ∂V F = . (2.2) i −∂r i The initial conditions, and the potential, can then be used to estimate the position and velocities, at a later time t , provided that the time step ∆t is small enough, so that the 1 interaction between the particles can be considered constant in that time step. Then the positions and velocities at t can be used obtain a new set of phase space coordinates at time 1 t = t +∆t, and so forth. The principle is, that by integrating the equations of motion, for 2 1 a large number of small time steps, the time evolution of the system can be obtained. What ∆t is a small enough time step? Typical dynamical properties in molecular sys- tems, such as rotational and vibrational motion, occur with frequencies between 1011-1014 s−1. A time step on the order of a femtosecond (fs), 10−15 s, is necessary to describe such motion accurately enough [15]. There are various ways of numerically integrating the equations of motion, common ex- amples are Verlet type algorithm (like the leap-frog algorithm) or Runge-Kutta methods [15]. The leap-frog method was used for the simulations in this thesis, and this algorithm is discussed in more detail in the Appendix A.2. The algorithm illustrates how the previous positions, velocities and potential energy are used to obtain the next configuration. Clearly, then, it is necessary to have an explicit expression for the potential energy at each time step in order to perform the integration. This potential has to take into account all the interactions between the particles of the system, which include the bonded interac- tions between the atoms in a molecule, as well as the intermolecular interactions. These interactions are of electronic nature [22]. It is easily imagined that, for even a moderately complex system, finding such an explicit potential is a non-trivial task. So how can this be accomplished? The potential can in principle be calculated quantum mechanically by separating the nuclear and electronic parts of the Schr¨odinger equation (SE) ˆ H Ψ(X,x) = E Ψ(X,x) , (2.3) | (cid:105) | (cid:105) where X and x denote the positions of the atomic nuclei and electrons respectively. If we make the approximation that the nuclei are stationary, the electronic wave-function depends parametrically only on the positions of the nuclei (and not on the velocities). This is called the Born-Oppenheimer (BO) approximation, and is often very good, even for light atoms and molecules (such as H ) [15]. See the Appendix A.1 for a more detailed treatment of the 2 BO approximation. In this approximation the electronic wave function gives rise to what is called a potential energy surface (PES), denoted by V(X), which is defined by the solutions to the electronic SE for all nuclear positions X = r ,r ,...,r [15, 9, 22]. The PES can be written 1 2 N { } V(X) = E +V to illustrate that it contains both the electronic energy and the nuclear- e nn nuclear repulsion potential. But solving the Schr¨odinger equation (numerically) for the whole system in every time step is computationally very heavy. Even though such methods exist, called ab initio MD, a faster (but less accurate) method is to approximate V(X) by a parametrised function called a force field [15]. The force field method was used in this thesis project, and is further explained in Section 2.2. Another concern is that the particle counts in real world systems are in the order of Avogadro’s number ( 1024), while MD simulations typically have particle numbers on ∼ 8 the order of 102 105 [9, 22]. A common method [22], to closer approximate physical − systems, is the introduction of periodic boundary conditions (PBC). The idea is that the MD simulation take place in some box (not necessarily cubic), and that if a particle in the simulation passes through any side of that box it is put back inside the box on the opposite side. This simulation box is then replicated, in principle infinitely, in all directions, so that the simulated particles interact not only with the other particles in the box, but also with the replicated particles. The replicated boxes contain precisely the same configurations as the “parent” simulation box, at any given instant in the simulation. It is assumed that such PBC help in modeling a system, of the same (relatively small) size, that is embedded in a real, macroscopic system [22]. But the force calculated on a particle results from the particle’s interaction with all other particles in the system, and if there are infinitely many copies of the system, infinitely many interactions are implied. The interaction forces typically decay rapidly with distance [22, 15], however, and the particles in the replicate boxes will not contribute much to the force(anexampleistheLennard-Jonesinteraction,proportionaltor−6,whichwillbefurther discussedin Section2.2). Inparticular, ifthe interactions between particlescan beneglected forseparationslongerthanhalfthelinearsystemsize, theinteractionforce(foreachparticle) can be calculated efficiently by only considering the closest neighbour of the other particles [22]. This is called the minimum image convention. But electrostatic interactions can not be truncated in this way; it can be shown that the interaction potential cannot be cut off unless it decays faster than r−3 [9]. To take the long-range interactions into account, in a computationally feasible way, special methods, such as Ewald sums, are required [6, 9]. 2.1.2 Molecular dynamics in statistical ensembles The discussion above concerns the study of a system of N particles, in some volume V, whose time evolution is constrained to a phase space surface defined by the constant energy E. The type of statistics that can be extracted from MD simulations are time averages. But in a statistical ensemble, the ensemble averages are functions of the distribution of microstates. If we assume these averages to be equal, however, we can view the MD method above as a representation of the microcanonical (constant NVE) ensemble [9]. This as- sumption of equality between time and ensemble averages is called the ergodic hypothesis. We also need to assume that, for long enough simulation times, the time averages does not depend on the initial conditions. If not, the extracted data would only apply to the specific initial conditions. But the simulations performed in this thesis were in constant NVT and constant NPT ensembles, and not the microcanonical, so how does that work? The way that the NVT, or canonical ensemble, is introduced in statistical mechanics, is by considering the system of interest as being connected to some large reservoir, of constant temperature T, in such a way that energy can be exchanged between the system and the reservoir [18]. In constant NVT MD simulations then, the idea is to simulate in such a way that a canonical distribution is obtained. That is obtaining a well defined time averaged temperature, while still allow- ing for temperature fluctuations typical of the canonical distribution. This is, in principle, achieved by coupling the system to a reservoir, or heat bath, but there are different coupling methods. This thesis utilised the Nos´e-Hoover thermostat, which relies on a Lagrangian reformulation of the equations of motion. This Lagrangian, however, contains additional artificial coordinates and velocities [9], to couple the system to the reservoir. Similar ideas 9
Description: