Specialist Periodical Reports
Chemical Modelling: Applications and Theory comprises critical
literature reviews of all aspects of molecular modelling. Molecular
modelling in this context refers to modelling the structure, properties
and reactions of atoms, molecules and materials.
Each chapter provides a selective review of recent literature,
incorporating sufficient historical perspective for the non-specialist to
gain an understanding.
With chemical modelling covering such a wide range of subjects, this
Specialist Periodical Report serves as the first port of call to any chemist,
biochemist, materials scientist or molecular physicist needing to
acquaint themselves with major developments in the area.
Editor A Hinchliffe
Chemical Modelling:
Applications and Theory
Volume 5
ISBN 978-0-85404-248-7
www.rsc.org/spr
Hinchliffe
9 780854 042487
Chemical Modelling: Applications and Theory
Volume 5
Specialist Periodical Reports
Specialist Periodical Reports provide systematic and detailed review
coverage in major areas of chemical research. Compiled by teams of
leading experts in their specialist fields, this series is designed to help
the chemistry community keep current with the latest developments
in their field. Each volume in the series is published either annually or
biennially and is a superb reference point for researchers.
Specialist Periodical Reports
Chemical Modelling
Applications and Theory
Volume 5
www.pdfgrip.com
www.pdfgrip.com
A Specialist Periodical Report
Chemical Modelling
Applications and Theory
Volume 5
A Review of the Literature Published between June 2005 and May 2007
Editor
A. Hinchliffe, School of Chemistry, The University of Manchester, Manchester, UK
Authors
D. J. Evans, Australian National University, Canberra, Australia
P. B. Karadakov, University of York, York, UK
J. R. Kitchin, Carnegie Mellon University and National Energy Technology
Laboratory, Pittsburgh, PA, USA
R. A. Lewis, Novartis Institutes for Biomedical Research, Basel, Switzerland
S. D. Miller, Carnegie Mellon University, Pittsburgh, PA, USA
E. A. Moore, The Open University, Milton Keynes, UK
A. J. Mulholland, University of Bristol, Bristol, UK
A. H. Pakiari, Shiraz University, Shiraz, Iran
D. Pugh, University of Strathclyde, Glasgow, UK
D. J. Searles, Griffith University, Brisbane, Australia
D. S. Sholl, Carnegie Mellon University and National Energy Technology
Laboratory, Pittsburgh, PA, USA
T. E. Simos, University of Peloponnese, Tripolis, Greece
M. Springborg, University of Saarland, Saarbruăcken, Germany
S. Wilson, University of Oxford, Oxford, UK and Comenius University, Bratislava,
Slovakia
C. J. Woods, University of Bristol, Bristol, UK
www.pdfgrip.com
If you buy this title on standing order, you will be given FREE access
to the chapters online. Please contact with proof of
purchase to arrange access to be set up.
Thankyou.
ISBN 978-0-85404-248-7
ISSN 1472-0965
A catalogue record for this book is available from the British Library
r The Royal Society of Chemistry 2008
All rights reserved
Apart from any fair dealing for the purpose of research or private study for
non-commercial purposes, or criticism or review as permitted under the terms
of the UK Copyright, Designs and Patents Act, 1988 and the Copyright and
Related Rights Regulations 2003, this publication may not be reproduced,
stored or transmitted, in any form or by any means, without the prior
permission in writing of The Royal Society of Chemistry, or in the case of
reprographic reproduction only in accordance with the terms of the licences
issued by the Copyright Licensing Agency in the UK, or in accordance with the
terms of the licences issued by the appropriate Reproduction Rights
Organization outside the UK. Enquiries concerning reproduction outside the
terms stated here should be sent to The Royal Society of Chemistry at the
address printed on this page.
Published by The Royal Society of Chemistry,
Thomas Graham House, Science Park, Milton Road,
Cambridge CB4 0WF, UK
Registered Charity Number 207890
For further information see our web site at www.rsc.org
Typeset by Macmillan India Ltd, Bangalore, India
Printed by Henry Ling Ltd, Dorchester, Dorset, UK
www.pdfgrip.com
Preface
DOI: 10.1039/b801788n
Welcome to Volume 5 of the ‘Chemical Modelling’ SPR. Naturally, I want to start
by thanking my team of Reporters for the hard work they have put into making this
the best and most comprehensive volume so far. We hope you will derive benefit and
perhaps even pleasure from our efforts.
It seems a long time since I wrote the following in my Preface to Volume 1
(1999) . . .
‘Starting a new SPR is never easy, and there was the problem of where the
contributors should start their accounts; since time began? five years ago? An SPR
should be the first port of call for an up-to-the-minute account of trends in a specialist
subject rather than a dull collection of references. My solution was to ask contributors
to include enough historical perspective to bring a non-specialist up to speed, but to
include all pertinent references through May 1999. Volume 2 will cover the literature
from June 1999 to May 2001 and so on. In subsequent Volumes, I shall ask those
Contributors dealing with the topics from Volume 1 to start from there. New topics will
be given the same generous historical perspective opportunity as Volume 1 but will have
to cover the literature to 2001 + n where n = 0, 2, 4, . . . . This process will continue
until equilibrium is reached.’
Equilibrium was reached a couple of Volumes ago and some mature topics don’t
need cover every Volume.
My final sentence for Volume 1 was
‘I am always willing to listen to convincing ideas for new topics’
as indeed I am.
Alan Hinchliffe
Manchester 2008
alan.hinchliff
RSC Specialist Periodical Report
Chemical Modelling: Applications and Theory (Volume 5)
Chapter
1
2
3
4
5
Corresponding author
Dr Adrian Mulholland
Dr Richard Lewis
Prof Michael Springborg
Dr Elaine A Moore
Prof David Sholl
6
Prof Debra Searles
7
Dr Steven Wilson
8
Dr David Pugh
9
Prof Ali Pakiari
10
11
Dr Peter Karadakov
Prof Theodore Simos
Topic
Multiscale modelling of biological systems
Computer-aided drug design 2005–2007
Solvation effects
The solid state
DFT studies of alloys in heterogeneous
catalysis
Fluctional relations, free energy
calculations and irreversibility
Many body perturbation theory and
its application to the molecular
structure problem
Experiment and theory in the determination
of molecular hyperpolarizabilities in solution
The Floating Spherical Gaussian
Orbital (FSGO) method
Advances in valence bond theory
Numerical methods in chemistry
Chem. Modell., 2008, 5, 7–8 | 7
This journal is
c
The Royal Society of Chemistry 2008
www.pdfgrip.com
www.pdfgrip.com
CONTENTS
Cover
The icosahedral golden
fullerene WAu12 reproduced
by permission of Pekka Pyykkoă,
Chemistry Department,
University of Helsinki, Finland.
Preface
Alan Hinchliffe
7
Multiscale modelling of biological systems
13
Christopher J. Woods and Adrian J. Mulholland
Introduction
Interfacing QM with MM models
Interfacing atomistic with coarse grain models
Interfacing particle with continuum models
Beyond continuum models
Conclusion
13
15
26
41
45
46
Computer-aided drug design 2005–2007
Richard A. Lewis
Introduction
QSAR and ADMET
Structure-based drug design
Virtual screening
De novo structure generation
Fragment-based screening
51
51
51
54
56
60
61
Chem. Modell., 2008, 5, 9–12 | 9
This journal is
c
The Royal Society of Chemistry 2008
www.pdfgrip.com
Target fishing
Library design
Conclusions
62
63
64
Solvation effects
67
Michael Springborg
Introduction
Fundamental methods
Recent studies
Conclusions
67
68
75
114
The solid state
E. A. Moore
Introduction
Interatomic potential methods
Ab initio methods
QM/MM
Molecular dynamics and related methods
Properties
Applications
Minerals
Conclusions
119
Density functional theory studies of alloys in heterogeneous catalysis
150
John R. Kitchin, Spencer D. Miller and David S. Sholl
Introduction
Segregation
Adsorption properties on alloy surfaces
Reaction kinetics
Miscellaneous
Electrocatalysis
Conclusions
150
153
158
166
170
172
176
Fluctuation relations, free energy calculations and irreversibility
182
Debra J. Searles and Denis J. Evans
Introduction
Fluctuation relations
Free energy relations
182
184
191
10 | Chem. Modell., 2008, 5, 9–12
This journal is
c
The Royal Society of Chemistry 2008
www.pdfgrip.com
119
121
124
130
131
133
134
141
142
Fluctuation relations and irreversibility
Comment on the interpretation of the fluctuation relation
Conclusions
201
201
202
Many-body perturbation theory and its application to the molecular
structure problem
208
S. Wilson
Introduction
An overview of previous reports
Applications
An overview of applications of second order theory
Application area 1: periodic systems
Application area 2: DNA bases and amino acids
Application area 3: DFT benchmarking
Application area 4: basis set extrapolation and the calibration
of general energy models
Summary and future directions
208
Experiment and theory in the determination of molecular
hyperpolarizabilities in solution; pNA and MNA in dioxane
David Pugh
Introduction
General theory of the response to frequency-dependent electric
fields
The sum over states method
General theory of the EFISH experiment
Ab initio and DFT calculations of the pNA b tensor
Gas phase measurement
Solution EFISH studies of pNA and MNA in dioxane
Theoretical approaches to the calculation of the EFISH
nonlinearity of pNA in solution
Recent work on pNA/MNA
Appendix I. Conversion of units
249
The floating spherical Gaussian orbital (FSGO) method
A. H. Pakiari
Introduction
Part I
Part II, Development of original theory
Part III, Application of FSGO method
279
208
209
211
230
237
238
241
242
243
249
251
253
254
257
260
261
271
275
276
279
280
282
296
Chem. Modell., 2008, 5, 9–12 | 11
This journal is
c
The Royal Society of Chemistry 2008
www.pdfgrip.com
Part IV, Using the FSGO concept in other methods
Appendix
298
307
Advances in valence bond theory
312
Peter B. Karadakov
Introduction
Comparison of the MO and VB approaches
Developments in VB methodology
Applications of VB theory
Concluding remarks
312
313
316
328
346
Numerical methods in chemistry
T. E. Simos
NewtonCotes formulae for the numerical integration of the
Schroădinger equation
Stabilization of a multistep exponentially-fitted methods and
their application to the Schroădinger equation
Appendix A
Appendix B
Appendix C
Appendix D
Appendix E
Appendix F
Appendix G
Appendix H
Appendix I
12 | Chem. Modell., 2008, 5, 9–12
This journal is
The Royal Society of Chemistry 2008
www.pdfgrip.com
c
350
350
380
399
408
421
427
439
444
453
464
Multiscale modelling of biological systems
Christopher J. Woods and Adrian J. Mulholland
DOI: 10.1039/b608778g
1. Introduction
At what point does a collection of molecules become a biomolecular system? At what
length scale does biology begin, and chemistry end? Biological phenomena involve
the flow of information across a range of length and timescales. For example, a cell
may be placed under physical stress at the macroscopic level, which causes an
increase in pressure within its protective membrane. This pressure has the effect of
opening1 or closing2 mechanosensitive ion channels, thereby changing the flow of
individual ions into the cell. This changes the ionic concentration within the cell,
which then acts as the trigger for a signal sent via a protein signalling pathway. A
chemist would look at this as a molecular system that was capable of converting
mechanical forces into electrical signals. A biologist would however look at this as
the mechanism a cell uses to adapt to stress, and thereby stay alive. Biology is full of
such examples. Every thought we have involves the passage of signals between
neurons, which itself requires the conversion of electrical signals into flows of ions.
These ions trigger the release of neurotransmitter molecules, which cross the synaptic
gap between neurons, and bind to individual receptor proteins at the synapse. This
causes a change in protein conformation, which open nearby ion channels, causing
ions to rush in or out of the neuron, thereby continuing the signal. Information is
constantly flowing between the macroscopic world and the atomic, chemical world.
Indeed it is this interplay between the chemical and macroscopic worlds that is a real
beauty of biology, and it is the recent advances made by the science of biochemistry
that has revealed the elegance of the chemicals of life to all. However, while it is
possible to use a microscope to watch how an individual cell responds to external
stimuli, it is not possible to ‘zoom in’ further and observe what is occurring at the
chemical level. Experiments can infer what is happening, and can provide supporting
evidence for a particular hypothesis, but there is no experimental technique or
microscope that allows us to watch a chemical reaction within an enzyme active site.
Until such techniques are developed, the most appealing route that currently exists is
to use computers to create models of the biochemical world. Computational
scientists can create virtual enzymes, and models of cell membranes, and then use
these to provide a window through which the interactions of biomolecules can be
observed. If the models are constructed on the firm foundations of physics and
chemistry, and if their predictions are carefully compared and validated against
experiment, then simulations using these models can provide the valuable insight
necessary to link the chemical and biological worlds.
Computational scientists have developed many tools for modelling molecules.
Computer models are not perfect recreations of reality. Instead, approximations and
assumptions have to made, and the model compromised for the sake of computational efficiency. As the size of the system gets larger, and so the size and number of
molecules increases, so to does the computational expense of the calculation. This
means that the larger the system, the more compromises and approximations must
be made. This act of compromise has led computational scientists to develop four
main levels of biomolecular modelling:
1. Quantum mechanics (QM). Quantum chemical calculations model the fine
detail of the electrons in the molecule. They achieve this by modelling the electrons
as a quantum mechanics wavefunction that interacts with the electrostatic potential
Centre for Computational Chemistry, School of Chemistry, University of Bristol, Bristol, UK
BS8 1TS
Chem. Modell., 2008, 5, 13–50 | 13
This journal is
c
The Royal Society of Chemistry 2008
www.pdfgrip.com
field generated by the atomic nuclei. Quantum chemical calculations provide the
most physically realistic and accurate models of molecules, but this accuracy comes
at a cost. While methods have been developed that allow QM calculations on
complete proteins,3 in general the high computational expense of QM methods limits
their application to small molecular systems.
2. Molecular mechanics (MM). Atomistic molecular mechanics calculations apply
the assumption that the fine detail information about the behaviour of the electrons
can be ignored, and instead they are approximated by representing their effects using
simple descriptors such as atomic partial charges or polarisabilities. By modelling the
electrons implicitly, MM methods are much less expensive than QM methods, and so
they are able to model significantly larger systems. By including atomic detail, MM
models are still limited to the molecular level, and even today’s largest applications
can only achieve the modelling of hundreds of thousands of atoms over hundreds of
nanoseconds.
3. Coarse grain (CG). Coarse grain (or coarse grained) calculations apply the
assumption that the fine detail information about the position of each atom in the
molecule can be ignored, and instead groups of atoms are approximated by smearing
them out into single ‘beads’. So, for example, rather than modelling each atom in a
protein, a CG representation would portray each residue as a single bead. This
approximation allows CG simulations to achieve length and timescales that are far
beyond those possible using atomistic MM models.
4. Continuum. Continuum models apply the assumption that the fine detail
information about the location of any particles or groups can be ignored, and
instead systems are modelled as continuum regions. For example, implicit solvent
models ignore the location of each individual solvent molecule, but instead represent
the complete solvent as a fuzzy dielectric continuum. Equally, continuum models of
a cell membrane ignore the individual locations of each lipid molecule, and instead
model the membrane as a homogenous elastic sheet. By ignoring particles, and
instead modelling biological systems as continuous fields or homogenous assemblies,
continuum models are able to simulate the largest length scales and longest timescales of any of the four levels.
These four levels of biomolecular modelling are each well-suited to modelling
phenomena at the length and timescales for which they were designed. However,
what makes biology work, and what makes it scientifically interesting, is the
interplay and flow of information across the different length and timescales. It is
not possible for simulations at any one of these biomolecular modelling levels to
represent these complex, multiscale biological phenomena on their own, and so
methods that allow the combination of different levels of biomolecular model
together must therefore be sought. Multiscale modelling, in which calculations at
multiple length and/or timescales are combined together into a single simulation, is
now becoming popular, and its development is now the focus of significant research
effort. Multiscale modelling is not new, for example combined QM/MM methods,
and MM/continuum implicit solvent methods have been used for over 30 years, and
multiscale methods have a rich heritage of applications in the fields of materials
modelling and nanomaterials,4 and modelling fluid and gas flow.5,6 Recently, there
has been a huge increase in the development and application of multiscale methods
for biomolecular modelling. This review focuses on these developments, in particular
the application of multiscale methods to biomolecules covering the period from 2005
to 2007. Coveney7 has produced a review of biological multiscale modelling that
covers the period up to 2005.
Before starting this review, there first needs to be a definition of what is meant by a
multiscale method. There are several different definitions that vary depending on the
type of coupling between the different modelling levels. This review will adopt
perhaps the most broad definition of a multiscale method, namely that it is any
method that involves a flow of information from one modelling level to another. By
definition, if there is a flow of information from one level to another, then there must
14 | Chem. Modell., 2008, 5, 13–50
This journal is
c
The Royal Society of Chemistry 2008
www.pdfgrip.com
be an interface between the levels through which this information will flow.
Throughout this review it will become clear that there are four main classes of
interface;
1. One-way, bottom-up interfaces. These involve a one-way, often one-time
transfer of information from a lower level of modelling to a higher level. Examples
include using a QM calculation to parameterize an MM forcefield, or using an MM
simulation to parameterize a CG potential.
2. One-way, top–down interfaces. These involve a one-way transfer of information
from a higher level of modelling to a lower level. Examples include using a CG model
to reconstruct an atomistic model of a protein, or using a continuum model to
provide the boundary conditions for an atomistic simulation.
3. Two-way parallel interfaces. These involve a two-way dynamic transfer of
information between two simulations running in parallel at two different modelling
levels. An example includes running both an MM and CG simulation of a system
and using replica exchange8–10 moves to exchange coordinates between the two
levels.
4. Two-way embedded interfaces. These involve embedding a low modelling level
region within a simulation at a higher level, e.g. embedding a QM model of a
substrate and active site within an MM model of the enzyme, or embedding an MM
model of an ion channel within a CG model of a membrane.
This review is therefore organised according to the different interfaces between
levels (QM/MM, atomistic/CG, particle/continuum), and then by the different
classes of interface that are used between these levels.
2. Interfacing QM with MM models
The most accurate physical description of atoms and molecules is provided by
quantum chemical calculations. Quantum chemical calculations are capable of
correctly predicting the energetics and conformations of small molecules from first
principles, using broadly applicable approximations (e.g. the Born-Oppenheimer
approximation) and nothing more than fundamental physical constants as input.11
Quantum chemical calculations model electrons as a quantum mechanics (QM)
wavefunction that interacts with the electrostatic potential field created by the
atomic nuclei of the molecule. QM provides the most exact physical model of
matter at the atomic scale, and QM calculations are capable of predicting chemical
bonding and chemical reactivity. There are several recent reviews of quantum
chemical methods,11–13 and QM methods may now be used across a length and
timescale that ranges from modelling the femotosecond interactions of infra-red
laser light with carbon monoxide,14 to modelling the sub-nanosecond dynamics of a
complete protein.15 There are a range of QM methods available with varying degrees
of approximation, with a range that includes fast semi-empirical Hamiltonians such
as AM116 or PM3,17,18 and highly exact coupled cluster methods such as
LCCSD(T).19 Because QM methods include an explicit representation of electrons,
they are able to model chemical processes such as charge transfer, bond breaking
and formation, and changes of molecular polarisation. However, the high computational expense of QM methods prevents their application to the large length and
timescales that are required to understand complex biomolecular processes.
MM methods provide a simpler representation of molecules, in which the fine
detail of the electrons represented implicitly via partial charges and, is some cases,
molecular polarisabilities.20,21 MM models represent molecules as a collection of
atoms interacting through classical potentials. There are several MM models (or
forcefields), and they differ in the functional forms of the interaction potential used
between atoms, and in the means by which these interaction potentials are
parameterized. Several good recent reviews of MM forcefields have been produced.22–26 Several MM forcefields have been developed for application to biomolecular systems. The most popular of these are the CHARMM,27 AMBER,28,29
Chem. Modell., 2008, 5, 13–50 | 15
This journal is
c
The Royal Society of Chemistry 2008
www.pdfgrip.com
GROMOS30–32 and OPLS33,34 forcefields. Each of these forcefields has evolved over
time, with different versions produced periodically. However, despite the proliferation and evolution of MM forcefields for biomolecular modelling, their functional
forms all remain broadly similar. Each atom is modelled as a single point in space.
Pairs of atoms in separate molecules interact through a pairwise non-bonded
potential, Enb, which depends only on the distance between the atoms, r. The
electrostatic part of this non-bonded potential, Eelec, is modelled using Coulomb’s
law, assigning a fixed partial charge to each atom in the molecule. The non-bonded
potential must also model the van der Waals (vdW) forces between the molecules,
which result from the combination of the Pauli repulsion that results from the
inability of two electrons to occupy the same space with the same set of quantum
numbers, and the attractive dispersion (London) forces whose physical basis lies in
the ‘instantaneous dipoles’ that result from the wavefunctions of close atoms moving
in phase. These vdW forces have their origin in the behaviour of electrons, which are
not explicitly modelled in MM forcefields. These forces must therefore be approximated. The most common approximation used for biomolecular applications is the
Lennard-Jones (LJ) potential. This approximates the vdW interactions using a 12–6
repulsive–attractive potential, ELJ,
"
6 #
sij 12
sij
;
ð1Þ
ELJ rij ị ẳ 4eij
rij
rij
where ELJ(rij) is the Lennard-Jones energy between atom i and atom j, rij is the
distance between the pair of atoms, and sij and eij are parameters that are tuned to
reproduce the strength of the vdW forces between this pair of atoms, often by fitting
to macroscopic properties. Note that this is a pairwise potential that acts only
between pairs of atoms. This is despite the fact that unlike the permanent electrostatic forces, vdW forces are not pairwise in nature. Indeed, while permanent
electrostatic forces are pairwise, MM forcefields use Coulomb’s law and fixed atomic
partial charges to model both the permanent electrostatics of the molecule and,
implicitly, its polarisation. Charge polarisation is also not a pairwise phenomenon.
The non-bonded potential energy between two molecules is given by the sum of the
Coulomb and LJ energies between all pairs of atoms in the molecules. It is therefore
an effective pair potential, as the derivation of the partial charges and LJ parameters
must account for the errors implicit in only using a pairwise sum over atoms, and
must therefore include 3-, 4- to n-body effects implicitly.
Modelling the electronic detail of a molecule, as well as providing an explicit
representation of polarisation and vdW forces, is also responsible for giving a correct
representation of chemical bonding. As MM forcefields do not explicitly model
electrons, they must include classical interaction potentials that mimic the effects of
chemical bonding. MM forcefields include classical intramolecular interaction
potentials, e.g. a harmonic bond potential, Ebond that acts between bonded atoms
(called 1–2 atoms), a harmonic angle potential, Eangle that acts on the angle between
a series of three bonded atoms (1–3 atoms), and a torsional potential, Etorsion, that
acts about the dihedral formed by four bonded atoms (1–4 atoms). Atoms that are
separated by more than three bonds (1–5+ atoms) are treated as being non-bonded,
and so their interaction energy is calculated using the sum of their Coulomb and LJ
interaction energies. The total intramolecular energy of a molecule is then given by
the sum of the bond energy between all 1–2 atoms, the sum of the angle energy
between all 1–3 atoms, the sum of torsion energy between all 1–4 atoms, and the sum
of the non-bonded Coulomb and LJ energies between all pairs of 1–5+ atoms. The
total energy of a system of molecules can then be calculated as the sum of the
intramolecular energies of all of the molecules, together with the sum of the nonbonded potential energies between all pairs of molecules.
By using classical potentials, MM models allow for a very rapid evaluation of the
energy and forces acting on each atom within a large biomolecular system. This
16 | Chem. Modell., 2008, 5, 13–50
This journal is
c
The Royal Society of Chemistry 2008
www.pdfgrip.com
rapid evaluation allows these forces and energies to be used by statistical conformational sampling methods, such as molecular dynamics (MD)35,36 or Monte Carlo
(MC),36,37 to generate large ensembles of configurations of the system, from which
macroscopic (thermodynamic) properties may be evaluated. However, by not
explicitly modelling electrons, MM models struggle to model many chemically
important phenomena, such as chemical bond breaking and formation, electronic
polarisation and charge transfer. There is therefore a strong motivation to combine
MM models with quantum mechanics (QM) calculations within a multiscale
modelling framework, and combined QM/MM biomolecular simulation methods
have a rich history of application and evolution since their original inception in the
early 1970s.38,39
Using a broad definition, QM/MM multiscale methods are those that involve a
transfer of information across an interface between the QM and MM levels of
modelling. There are several different types of interface in use, which fall into four
categories:
1. One-way bottom-up methods. These involve a single transfer of information
from a QM calculation to a classical simulation, e.g. by using QM calculations to
parameterize the classical potentials used in MM force fields.
2. Two-way dynamic parameterisation methods. These involve a dynamic transfer
of information between separate classical and quantum calculations, e.g. using
successive QM calculations to dynamically re-parameterize the classical potentials
the QM atoms of an MM forcefield during a live simulation.
3. Two-way embedded methods. These involve embedding of molecules or parts of
molecules modelled using QM into a system of molecules modelled using MM, e.g.
using QM to model a substrate, and MM to model the enzyme and solvent.
4. Two-way parallel methods. These involve the running in parallel of classical and
quantum simulations, and dynamically sharing information between them at run
time.
Examples of each of these different types of interface, and recent developments in
their methodology, will now be discussed in turn.
2.1 One-way bottom-up QM/MM interfacing methods
As described in the last section, molecular mechanics (MM) forcefields use classical
potentials to calculate the interaction energy between pairs of atoms. Several types of
interaction potential are required to fully describe the MM energy of a set of
molecules:
1. Non-bonded potentials. These typically take the form of a Coulomb potential
between non-bonded pairs of atoms to describe polarisation and permanent
electrostatics, and a Lennard-Jones (LJ) potential between non-bonded atom pairs
to describe the vander Waals (vdW) interactions.
2. Bonded potentials. These typically involve harmonic terms that are applied
between 1–2 bonded and 1–3 bonded pairs of atoms, and cosine terms between 1–4
bonded pairs. These potentials try to model the effect of chemical bonding.
These classical interaction potentials must be parameterized, e.g. the magnitude of
the partial charges on each atom in the molecule must be assigned, and the
equilibrium bond length and size of the harmonic force constant must be attached
to each bond. In the early biomolecular MM forcefields, these parameters were
developed to produce molecular models that could reproduce known experimental
properties of the bulk system. For example, several MM water models have been
developed.26,40,41 One of the earliest successful models, TIP3P,42 was parameterized
such that simulations of boxes of TIP3P molecules reproduced known thermodynamic properties of water, such as liquid density and heats of vaporisation. Such a
parameterisation scheme is to be applauded, as it ties the molecular model closely to
experiment. Indeed many of the common MM models of amino acids were
developed by comparison to experiment, e.g. OPLS.33 Indeed it is such a good
Chem. Modell., 2008, 5, 13–50 | 17
This journal is
c
The Royal Society of Chemistry 2008
www.pdfgrip.com
scheme that even some modern water models, like TIP5P,43 are still parameterized in
this way. However, it was quickly realised that parameterisation against experiment
required large amounts of physical data that was just not available for the novel
molecules being conceived during rational drug design. The developers of biomolecular forcefields therefore created recipes that allowed for the parameters of new
molecules to be derived from quantum chemical calculations. One popular example
of such a recipe is GAFF (generalised AMBER forcefield),44 which is an MM
forcefield for small drug-like ligands that is compatible with AMBER. GAFF uses
generic atom types that are assigned to each atom in the drug-like molecule, e.g.
aliphatic carbon or aromatic hydrogen. These atom types are used to assign LJ,
bond, angle and dihedral parameters to the molecule from a large parameter library.
The partial charges for the atoms are derived by performing an AM116 semiempirical
QM calculation, and calculating BCC45,46 charges. In a very broad sense, this
parameterisation is a multiscale method, as information (the charge distribution)
from a QM calculation is transferred to an MM simulation via the parameterisation
of the atomic partial charges. This multiscale parameterisation therefore represents a
one-time, one-way flow of information up from the QM level to the MM level. A
similar scheme is available for the OPLS forcefield,47 which uses CM1A charges48
that are also derived from semiempirical AM1 QM calculations. Wang and Sandberg49 used a more complex multiscale parameterisation to derive intramolecular
CHARMM parameters for the interaction of DNA bonded to a gold surface. The
parameters were calculated by fitting to density functional theory (DFT) QM
calculations.
2.2 Two-way dynamic parameterisation methods
QM calculations are now used routinely as the source of parameters for MM
forcefields. Indeed this application is now so routine that most workers would not
consider forcefield parameterisation to be an example of a multiscale method.
However, there is a drawback with using QM calculations to provide MM forcefield
parameters. The problem is that the information flow is only in one direction, from
the QM to the MM level. This means that the QM-derived parameters for a molecule
have to be very general, and be able to represent the molecule in a variety of
conformations and environments. This is an unreasonable requirement, as it is clear
that the charge distribution and polarisation of the molecule depends on both its
conformation and its environment, e.g. whether it is in bulk solvent or whether it is
bound to the active site of a protein. Because information only flows from the QM to
MM level, there is no mechanism that allows information about the environment
and conformation of the molecule experienced during the MM simulation to be fed
back to the QM calculation. A solution is to modify multiscale parameterisation so
that there is a two-way interface between the QM and MM levels of modelling.
There have been two recent applications that have made such a modification: one
targeted at creating a QM/MM multiscale docking method,50 and another targeted
at QM/MM multiscale free energy calculations.47,51,52
Docking is one of the primary tools used during the process of rational drug
design.53 The aim of docking is to predict the binding mode of a ligand with a
protein. Because docking calculations are typically used to study how libraries of
thousands of ligands bind to a protein, the calculations involved must be simple and
efficient. This means that the interaction potentials used in docking tend to be based
on molecular mechanics forcefields.53 MM forcefields struggle to model the changes
in polarisation upon protein-ligand binding, an effect that is thought to account for
as much as 10–40% of the binding affinity.54 Multiscale docking methods that
attempt to use QM calculations to overcome this problem have therefore been
developed.50,55 Cho et al.50 have developed a QM/MM docking method that uses
multiscale parameterisation dynamically throughout a docking calculation. The
electrostatic interaction energy between the ligand being docked and the protein is
18 | Chem. Modell., 2008, 5, 13–50
This journal is
c
The Royal Society of Chemistry 2008
www.pdfgrip.com
calculated using Coulomb’s law, with atomic partial charges placed on the protein
and ligand atoms. Cho et al. first showed that the docking prediction is improved if
the atomic partial charges on the ligand are derived from a QM calculation of the
ligand in the bound geometry. They demonstrated this by selecting several test
protein-ligand systems whose bound geometries were known via crystal structures
available in the protein databank (PDB). They calculated the partial charges by
performing a density functional theory (DFT) calculation of the ligand in the bound
geometry. The wavefunction was polarised by embedding the partial charges from
the protein within the Hamiltonian of the QM calculation. By using the bound
geometry, and by embedding the partial charges of the protein, information about
the environment of the ligand in the protein active site was made available to the QM
calculation. The polarised wavefunction was used to obtain the molecular electrostatic potential (MEP) surface around the ligand. Partial charges were generated
using an electrostatic potential fitting procedure to reproduce the QM MEP. By
calculating these partial charges from a QM calculation that had information about
the bound geometry, it could be argued that these atomistic partial charges were
optimised for the bound geometry. Cho et al. demonstrated this50 by running redocking calculations where the ligands were docked using both optimised and
generic partial charges. The results demonstrated that docking calculations using
the optimised charges were significantly more likely to rediscover the known
experimental binding modes. To turn this observation into an effective docking
algorithm, Cho et al. created an iterative dynamic reparameterisation algorithm;
1. Dock the ligand using the default partial charges taken from the docking MM
forcefield.
2. Perform a QM calculation on the predicted binding mode to obtain optimised
partial charges for the ligand.
3. Dock the ligand again, this time using the optimised partial charges.
4. Keep iterating until the charges and predicted binding mode converge to within
a set limit.
By dynamically reparameterising the ligand throughout the docking calculation,
Cho et al. allow information to flow both ways between the QM and MM levels. A
similar idea has been developed by Jorgensen and co-workers to create a QM/MM
multiscale method for free energy calculations.47,51,52 Jorgensen and co-workers
developed a method to obtain atomic partial charges efficiently from a QM
calculation that were compatible with the partial charges from the standard OPLS
all-atom forcefield.47 The charges were calculated using the charge model 1
(CM1A)48 analysis of an AM1 semiempirical QM calculation. However, as CM1A
charges were parameterized to reproduce gas-phase dipole moments,48 they had to
be scaled by a factor of 1.2 so that an implicit account could be made for the extra
polarising effects of polar solvents. Jorgensen and co-workers first used this method
to perform QM/MM hydration free energy calculations. The solute molecule was
modelled using QM (AM1), while the solvent molecules were modelled using MM
(OPLS). The QM calculation was used to obtain the intramolecular energy of the
solute. The QM calculation was also used to obtain the atomic partial charges on the
solute using AM1/CM1A. These partial charges were used to calculate the electrostatic interaction energy between the solute and solvent via Coulomb’s law. The LJ
equation was used to obtain the vdW interaction between the solute and solvent
using pre-assigned OPLS e and s LJ parameters. The QM calculation in this method
was used only to dynamically re-parameterize the MM atomic partial charges during
the simulation. The only information flow from the MM to QM level was the change
in conformation of the solute. The solvent environment around the solute was not
passed explicitly, as it was not included within the QM calculation. The effect of the
solvent was only felt implicitly in the QM calculation via the application of the scale
factor. Despite the lack of explicit modelling of the solvent environment in the QM
calculation, Jorgensen and co-workers have successfully used this method to study
solution-phase Diels-Alder reactions,56 and to study the enzyme-catalysed Claisen
Chem. Modell., 2008, 5, 13–50 | 19
This journal is
c
The Royal Society of Chemistry 2008
www.pdfgrip.com
rearrangement reaction of chorismate to prephenate.51 Jorgensen and co-workers
have since adapted this method57–59 to use the PDDG/PM3 semiempirical QM
Hamiltonian,60 using the CM361 method to extract charges, which are then scaled by
a factor of 1.14,62 again to provide an implicit account for the extra polarising effects
of the solvent.
2.3.
Two-way embedded methods
Periodic multiscale reparameterisation, i.e. periodically during a simulation, provides a two-way conduit for information flow between the QM and MM levels of
modelling. However, the coupling between levels is not particularly strong. The
interchange of information between levels occurs only periodically, which can lead to
the MM level falling out of step with the QM. This problem was experienced during
the development of the QM/MM docking method of Cho et al.50 If the ligand was
initially docked in a poor configuration, then the partial charges derived for that
conformation could bias the subsequent docking runs to rediscover the poor
configuration in preference to the correct binding mode. Cho et al. developed a
survival of the fittest algorithm50 that ran multiple docking runs in parallel, thereby
preventing one poor result from biasing the rest of the calculation.
A closer coupling between the QM and MM levels can be achieved using an
embedded interace method. A QM region of the biomolecular system is embedded
within a larger MM simulation. One of the primary application areas for embedded
QM/MM methods is computational enzymology (the computational modelling of
enzyme-catalysed reaction mechanisms), where typically a QM model of the substrate and part of the enzyme active site is embedded within an MM model of the rest
of the enzyme and solvent.63 Embedding a QM model within an MM simulation
creates a dynamic and permanent interface between the two levels, with information
flow across that interface having to be managed for each configuration of the
simulation. The ONIOM method, developed by Morokuma and co-workers,64
provides such an interface via the use of multilevel corrections. The ONIOM method
partitions the system into multiple layers. For example, consider a two-layer system,
where a low-level QM region, A, is embedded within a high-level MM region, B. The
energy of both regions, A + B, is first calculated using only the MM Hamiltonian,
giving EMM(A + B). This total energy is corrected by calculating the difference in
energy between the QM and MM energies of the QM region, EQM(A) À EMM(A).
The total ONIOM energy of the system is therefore EMM(A + B) + EQM(A) À
EMM(A). The generalisation of this algorithm to multiple levels is straight-forward,
e.g. a system can be divided into an ab initio QM region, A, which is embedded
within a semiemprical QM region, B, which is itself embedded within an MM
system, C. The ONIOM energy in this case would be EMM(A + B + C) +
[Esemiemprical(A + B) À EMM(A + B)] + [Eab initio(A) À Esemiemprical(A)].
The ONIOM method allows the facile combination of QM and MM levels of
modelling. However, the electrostatic interaction between the QM and MM regions
in the original ONIOM implementation is handled at the MM level only, via
EMM(A + B), based on partial atomic charges derived from the QM calculation.
The use of this method, called classical or mechanical embedding,65 means that
information about the electrostatics of the system flows only from the QM up to the
MM level. There is no conduit by which the electrostatic environment of the MM
atoms is able to flow down to the QM region, to polarise the QM wavefunction.65
An alternative method of interfacing QM and MM calculations, called electronic
embedding, solves this problem. In electronic embedding, the partial charges of the
MM atoms are embedded within the QM Hamiltonian. This allows the QM
wavefunction to be polarised by the MM atoms, thereby providing a two-way
conduit for electrostatic information between the QM and MM regions. The
ONIOM method has since been extended to use electronic embedding,65 thereby
overcoming one of the fundamental weaknesses of the algorithm.
20 | Chem. Modell., 2008, 5, 13–50
This journal is
c
The Royal Society of Chemistry 2008
www.pdfgrip.com
Electronic embedding has had a rich and long history of application in other QM/
MM schemes. The first application of electronic embedded QM/MM to a biomolecular system was in the ground-breaking work of Warshel and Levitt in 1976.39
They developed the method to study the reaction mechanism of hen egg-white
lysozyme. Warshel has recently produced a detailed, clear and very interesting
review63 of the details of the algorithm and the developments in QM/MM methodology since his pioneering work in the 1970s. Acceptance of this approach took a
long time,63 but there is now a large body of QM/MM applications that use such
methodology (see one of the very many reviews of QM/MM methods available in the
literature66–69). The underlying theory of this method has been covered by many
different authors68,69 so only a brief description will be given here. The biomolecular
system is divided into a QM region, for example a substrate, and an MM region, e.g.
an enzyme and surrounding solvent molecules. The total energy of the system is the
sum of the energy of the MM region, evaluated using a standard MM forcefield, the
energy of the QM region, and the QM/MM interaction energy between the two
regions. The QM/MM interaction energy is split into two parts: an electrostatic part
and a vdW part. The vdW part is calculated by assigning LJ parameters to all of the
QM atoms and calculating the interaction between the QM and MM atoms using the
LJ equation. The electrostatic part of the QM/MM interaction is calculated by
bundling it together with the calculation of the QM energy of the QM region. This is
achieved by embedding within the QM Hamiltonian the locations and partial
charges of all of the MM atoms that are within a pre-determined cutoff distance
of the QM atoms. Normally the MM atoms are represented as point charges in the
QM calculation, but Gaussian charge distributions, with the width of the Gaussian
that is similar to the covalent radii of the MM atoms may instead be used. 70 The
MM partial charges act to polarise the QM wavefunction of the QM atoms, and
therefore the evaluation of the energy of this wavefunction returns both the
intramolecular energy of the QM atoms and the electrostatic interaction between
the QM and MM atoms. The simple split of the QM/MM interaction into
electrostatic and vdW parts is, however, only possible if the interface between
regions lies between molecules, i.e. all molecules are either QM or MM, and there are
no molecules that sit across the interface. It is desirable (e.g. within computational
enzymology) for a single molecule to be able to bridge this interface, e.g. while the
majority of an enzyme is modelled at the MM level, it is usually necessary to
represent some (e.g. catalytic) active site residues at the QM level. The problem with
having a single molecule straddle the interface is that the QM/MM interaction
energy, as well as modelling electrostatic polarisation, must now also include terms
that account for the chemical bonding between atoms modelled at the QM level and
atoms modelled at the MM level. This is a particular problem with the QM side of
the calculation, as when the boundary bisects a covalent bond, the electron density is
terminated abruptly at the end of the QM region and electrons of the bonded atom
are missing, potentially leading to unpaired electrons.66 The three most popular
methods for solving this problem are the link atom method,66,71,72 the local selfconsistent field (LSCF) method73,74 and the generalized hybrid orbital (GHO)
method.75,76
The ‘‘dummy junction atom’’ or link atom approach introduces so-called link
atoms to satisfy the valence of the atoms on the QM side of the QM/MM
interface.66,71,72 Usually this atom is a hydrogen, but other atom types have also
been used, e.g. halogens such as fluorine or chlorine.77 The link atom method can be
used with both the Warshel-type QM/MM methods and ONIOM methods.65,78 The
link atom method has been criticised because it introduces extra unphysical atoms to
the system, which come with associated extra degrees of freedom. Another problem
is that a C–H bond is clearly not chemically exactly equivalent to a C–C bond.
Despite these problems, the simplicity of the link atom method means that it is used
widely in the QM/MM modelling of proteins and other biological molecules. 79
Chem. Modell., 2008, 5, 13–50 | 21
This journal is
c
The Royal Society of Chemistry 2008
www.pdfgrip.com
Zhang et al.80,81 solved these problems with their pseudo bond method. If X and Y
are the bonded QM and MM boundary atoms, respectively, then the link atom
method effectively replaces the Y atom with a hydrogen, thereby changing X–Y into
X–H. This has the problem that H is not necessarily chemically similar to Y, and also
the X–H bond length may not be the same as the X–Y bond length. In the
pseudobond method, Y is instead replaced by Yeff. This is a one-free valence
boundary atom that has a parameterized effective core potential that mimics the
strength of the X–Y bond. This method has been tested successfully80 with both
Hartree Fock (HF), density functional theory (DFT) and MP282 QM/MM calculations. Antes and Thiel83 have also introduced a conceptually similar approach,
which they call ‘‘adjusted connection atom’’, that works at the semiemprical QM
level. Parameterisation of these effective link atoms is not straight-forward, as they
must minimally perturb the electron density compared to a QM calculation of the
entire molecule. Roăthlisberger and co-workers have developed a sophisticated
scheme that derives parameters for these atoms via density functional perturbation
theory.84
A second approach to define the bonding interface between the QM and MM
regions is the local self-consistent field (LSCF) algorithm developed by Rivail and
co-workers.73,74 In the LSCF method the bonds between the QM and MM atoms are
represented by strictly localised bond orbitals, which are parameterized by separate
QM calculations on small molecules.75 These localised orbitals are assumed to be
transferable to the protein system, and are used, and kept constant, throughout the
self-consistent field (SCF) QM calculation. An elegant feature of the LSCF method
is that it does not require use of link atoms, and a comparison of the LSCF and link
atom methods85 showed that both give equivalent energy results. However, the
parameters for the localised bond orbitals have to be determined for each new
system studied.75 Fonili et al.86 have recently shown that it is possible to use frozen
core orbitals on the MM frontier atom within the LSCF scheme. This provides an
explicit description of the core electrons of the atoms on the MM frontier atom,
thereby improving the physical description of the interface, thus reducing the error
on the calculation.
While the LSCF method is elegant, and has been shown to work well, the
parameters for the localised orbitals are not very portable. Gao et al. addressed
this problem by developing generalised hybrid orbitals (GHOs).75,76 The QM
boundary atom at the QM/MM interface has the standard valence s and p orbitals
as all of the other non-hydrogen atoms in the QM region. These four sp orbitals are
transformed into a set of orthogonal hybrid orbitals, which can be defined by the
bound geometry of the atom.75 These hybrid orbitals are used, along with the atomic
orbitals of the QM region, to determine the QM energy. However, only one hybrid
orbital points along the bond between the QM and MM boundary atoms, and it is
this active hybrid orbital that needs to be optimised. The complete set of this one
active hybrid orbital, plus all of the atomic orbitals from the other QM atoms form
the active set that is optimised during the SCF calculation. The remaining three
hybrid orbitals, called auxiliary orbitals, act, together with the nucleus charge, to
generate an effective core potential for the QM boundary atom. Gao et al. realised
that these auxiliary orbitals may be parameterized to mimic the effective core
potential for the active electrons from the MM region. Therefore rather than
parameterising the charge density of the hybrid orbitals for each specific system,
as is the case for the LSCF method, they instead optimise the semiemprical
parameters for the boundary atom to reproduce the bonding properties of full
QM systems. As a result, the parameters for this GHO method are expected to be
general and transferable in the same way as all the semiemprical parameters. 75 The
method has since been extended by Gao and co-workers87 to work with the selfconsistent charge density-functional tight binding (SCC-DFTB) method.88 The
accuracy and efficiency of SCC-DFTB is making it a popular choice to model the
QM region, and its use in computational enzymology has increased greatly in recent
22 | Chem. Modell., 2008, 5, 13–50
This journal is
c
The Royal Society of Chemistry 2008
www.pdfgrip.com
years.88 Gao and co-workers76,89 have also developed a GHO method that is suitable
for ab initio Hartree Fock (HF)89 and density functional theory (DFT) calculations.76
Koănig et al.90 recently compared several dierent algorithms to model chemical
bonds between QM and MM atoms, and concluded that while none of the
approaches were perfect, error cancellation meant that enzyme catalysed reaction
free energies using an SCC-DFTB QM region were only marginally affected by the
choice of algorithm, if total charge was conserved during the reaction. The conclusion contrasts with a comparison study in which we were involved,91 in which we
compared the link atom and GHO boundary methods. While we found that both
methods, when properly applied, can lead to similar behaviour, the inclusion of
conformational sampling amplified the effects of the differences between the
methods. This led to the two methods returning different free energy reaction
profiles despite being applied to the same enzyme system. It is important to make
clear that while the same enzyme system was used, the QM region was not the same
for the two boundary methods. This was because of the different constraints on the
partitioning of the two methods (the GHO method requires that the QM atom
bound to an MM atom must be an sp3 carbon91). The origin of the observed
difference may well be in the different QM and MM systems, rather than in the
partitioning schemes themselves. Despite these questions over the fine detail of how
QM/MM methods are applied, their development and application has now matured
to a point where they can provide near-quantitive results for activation enthalpies
and free energies of reaction.92 It is now possible to perform electronic structure
calculations on large systems approaching chemical accuracy, thus allowing quantitative studies of reaction mechanisms in enzymes.92
2.4.
Two-way parallel QM/MM methods
Quantum mechanics calculations are computationally demanding. This has caused
difficulties with their application to the calculation of thermodynamic properties,
such as free energies. This is because thermodynamic properties are calculated as an
average over a large ensemble of conformations of a system. Sampling methods,
such as molecular dynamics (MD)35,36 or Monte Carlo (MC)36,37 must be used for
the rigorous generation of such ensembles. The computational expense of QM
calculations means that it is only practical to generate small ensembles via standard
MD or MC, e.g. current methods are limited to picoseconds of molecular dynamics.
Two-way parallel QM/MM interfaces have therefore been developed in an attempt
to overcome this problem. We use the term parallel methods to mean those that use
both a QM or QM/MM simulation running in parallel with a standard MM
simulation, using only periodic exchange of information between the two simulation
levels.
Warshel and co-workers developed a successful QM/MM parallel method in a set
of pioneering papers in the late 1990s.93,94 The aim of this method was to calculate
the free energy difference between two systems, A and B. For example, system A
could be a substrate bound to an enzyme, while system B could be the transition
state. The free energy difference between these two corresponds to the activation free
energy of the enzyme catalysed reaction. Warshel and co-workers calculated the
relative free energy of A and B by first using a molecular mechanics type potential.
Because an MM potential was used, molecular dynamics sampling was efficient, and
therefore a large ensemble, and well-converged relative free energy were calculated.
This relative free energy, DGMM(A - B), can only be as good as the MM potential
used during the calculation. Ideally, this free energy should be calculated using a QM
or QM/MM representation of A and B, giving DGQM(A - B). However, the
computational expense of the QM calculation prevents the efficient generation of
the large ensembles necessary to evaluate converged free energies. Warshel and coworkers solved this by rather than calculating DGQM(A - B) directly, they used the
Chem. Modell., 2008, 5, 13–50 | 23
This journal is
c
The Royal Society of Chemistry 2008
www.pdfgrip.com
Fig. 1 The free energy cycle93,94 used to calculate the QM/MM free energy difference between
systems A and B, DGQM/MM(A - B). The free energy difference between A and B is first
estimated using an approximate potential (e.g. an MM potential), giving DGMM(A - B). This
is then corrected to the QM/MM value by calculating the free energy necessary to perturb
system A from MM to QM/MM (DGMM-QM/MM(A)) and the free energy to perturb system B
from MM to QM/MM (DGMM-QM/MM(B)).
MM ensembles to calculate the difference in free energy between the QM and MM
representations of A and B. In essence, Warshel and co-workers calculated the free
energy error associated with using the MM forcefield. By calculating these errors,
Warshel and co-workers were able to correct DGMM(A - B) so that it was formally
equal to DGQM(A - B)93,94 (see Fig. 1). The correction free energies were calculated
by generating ensembles for systems A and B using the MM model. The difference in
energy between the QM and MM models was calculated for a subset of each
ensemble, and the difference between these energies used as input to a single-step free
energy perturbation (FEP)95,96 between the MM model (the FEP reference state) and
the QM model (the FEP perturbed state). As long as the MM model is a good
approximation of the QM model, i.e. the phase space overlap of the two models is
good, then the average calculated via the FEP equation will converge to an accurate
estimate of the correction free energy. The key advantage of this method is that all of
the thermodynamic sampling is performed using only the MM model of the system.
QM or QM/MM calculations are run in parallel with the MM sampling to estimate
the correction free energies. The disadvantage of this method is the requirement of
good overlap between the QM and MM models. Warshel and co-workers mitigate
this disadvantage through the development of the empirical valence bond
(EVB)93,94,97 forcefield, which has been designed to give energies that are in good
agreement with experiment and QM calculations. In addition, the EVB potential has
been developed so that it can be used to study chemical reactions, something that is
not possible using most of the biological MM forcefields. The EVB potential and the
Warshel parallel QM/MM method have been very successful, and have been used to
study a variety of systems.98–102
Warshel and co-workers developed their method to avoid the problem of poor
sampling of a QM or QM/MM Hamiltonian. MD methods are currently limited to
picoseconds of QM/MM dynamics for typical biomolecular applications, even using
relatively low levels of QM theory. Monte Carlo (MC) methods suffer from even
greater problems. MC works by performing typically millions of small random
moves of the biomolecular system, each of which are tested according to the change
in energy associated with that move. MC sampling of a QM/MM Hamiltonian
would potentially require millions of QM energy evaluations, which is impractical
using current methods and computers. A second class of parallel QM/MM methods
attempt to solve this problem. These methods use a novel Monte Carlo algorithm
developed by Hastings in 1970.103 This is a multiscale sampling method that uses
MC sampling at one modelling level to generate an ensemble which is formally
correct for a different modelling level. The algorithm works by creating a new type of
Monte Carlo move. The move starts at configuration i. The energy of this configuration is evaluated using both the fast, high-level model, giving Efast(i), and the
slower, low-level forcefield, giving Eslow(i). A block of MC moves is then performed
using the fast forcefield. This results in a new configuration, j. The energy of this
configuration is evaluated using both forcefields, giving Eslow(j) and Efast(j). These
24 | Chem. Modell., 2008, 5, 13–50
This journal is
c
The Royal Society of Chemistry 2008
www.pdfgrip.com
Fig. 2 Application of the Metropolis-Hastings103 algorithm to accelerate sampling of a system
represented using a QM/MM Hamiltonian.104,105 The Monte Carlo move starts at configuration i. The energy of this configuration is evaluated using the target QM/MM Hamiltonian
(giving EQM/MM(i)) and on an approximate (MM) Hamiltonian (giving EMM(i)). Standard
Metropolis Monte Carlo moves are then attempted from configuration i using only the
approximate MM Hamiltonian, until after a set number of moves, the system is in configuration j. The energy of configuration j is evaluated using both the QM/MM and MM
Hamiltonians (giving EQM/MM(j) and EMM(j)). Configuration j is then accepted into the QM/
MM ensemble according to the probability min{1,exp(ÀDDE/kBT)} where DDE = (EQM/MM(j)
À EMM(j)) À (EQM/MM(i) À EMM(i)).
energies are used to test configuration j according to a new MC acceptance test.
Configuration j is accepted into the ensemble if this test is passed (see Fig. 2).
Otherwise the whole block of sampling is rejected and the simulation is reset to
configuration i. The form of the MC test is such that even though the
trial configurations are generated using the fast forcefield, they are accepted
into the ensemble of the slow forcefield with the correct Boltzmann probability.
This algorithm was popularised for applications to QM and QM/MM
systems by Schofield and co-workers,104,106 who coined the phrase ‘‘molecular
mechanics-based importance function’’ (MMBIF). One of the problems with this
algorithm is that the acceptance ratio of the MC test will be low if there is poor
overlap between the fast and slow forcefields. This is similar to the problem
encountered in Warshel’s correction free energy method. Effort may therefore need
to be spent optimising the fast forcefield such that it is a better match to the slow
forcefield.
We have developed a parallel QM/MM method105 that combines the advantages
of both the Warshel and MMBIF algorithms. The method works by using the
MMBIF method to generate QM/MM ensembles from which the Warshel correction free energies can be calculated in full. The correction free energy is calculated
using thermodynamic integration (TI)107,108 over a fictional l scaling parameter that
maps from the QM model to the MM model. This l parameter allows the QM model
to be changed over a series of windows into the MM model. The MMBIF algorithm
can then be used to generate ensembles of conformations of the system at different
values of l, so that the gradient of the free energy with respect to l can be calculated
at several points between the QM and MM models. These gradients can then be
integrated across l to return the correction free energy. We overcome the problems
associated with potentially poor overlap between the QM and MM models by using
replica exchange8,9,109 moves across the l coordinate during the simulation. Replica
exchange moves are additional Monte Carlo moves that lightly couple multiple
trajectories together. All of the MC simulations at different l values are run in
parallel. Neighbouring pairs of simulations are tested periodically according to a
replica exchange MC acceptance test. If this test is passed, then the l values of the
neighbouring pairs are swapped. This has the effect of allowing each MC simulation
to sample multiple l values during the simulation. This enhances convergence of the
free energy averages. We have used this method105 to calculate converged relative
hydration free energies of water and methane, using an MP2 ab initio QM model of
water and methane, solvated by a periodic box of MM waters. While this was a nonbiological application, we are currently using this method to calculate QM/MM
relative binding free energies of protein-ligand systems (using a DFT QM model of
the ligand and an MM model of the protein and explicit solvent), and are planning to
use it to perform some computational enzymology calculations.
Chem. Modell., 2008, 5, 13–50 | 25
This journal is
c
The Royal Society of Chemistry 2008
www.pdfgrip.com
3. Interfacing atomistic with coarse grain models
In mid 2007, Leontiadou, Mark and Marrink110 produced a paper in which they
used atomistic molecular dynamics simulations to model the effects of ionic
concentration on the transport of ionic species across a pore in a lipid membrane.
This work involved several large simulations, involving 128 dipalmitoylphosphatidylcholine (DPPC) lipids (each containing 130 atoms) and about 6000 water
molecules. This work pushed the limits of what is achievable with current atomistic
molecular dynamics, and, by using approximations such as modelling long-range
electrostatics using a reaction field, and using bond constraints such that a 5 fs
integration timestep could be used, they were able to run several simulations of
between 50 ns to 100 ns in length. Despite the impressive size of these simulations,
they are still limited to biologically small length and time scales. 128 lipids is merely
an 8 Â 8 membrane bilayer, which is too small to model effects such as membrane
curvature or membrane waves.111,112 100 ns is also too short a time to capture events
such as membrane protein aggregation or lipid raft formation within a membrane.111
Coarse grain models provide a route to longer time and length scales in biomolecular
simulations. Coarse grain models are a class of mesoscale model that work by
grouping several atoms together and modelling them as a single interaction site. In
effect, groups of atoms are smeared together into beads. For example, a coarse grain
model could be constructed that represents an amino acid residue as a single bead,
and a protein as a string of beads. The use of coarse grain models reduces the
computational expense of a simulation, as coarse graining reduces the number of
interaction sites. In addition, CG models contain fewer degrees of freedom, and use
forcefields that lead to smoother potential energy surfaces. The smoother potential
energy surface reduces the problems associated with frustration or non-ergodic
trapping, thereby leading to improved sampling and a lower correlation time. Also
coarse graining tends to remove the stiffest degrees of freedom from the model (e.g.
the C–H bond vibrational modes), thereby allowing a CG model to use a larger
timestep. All of these effects mean that CG simulations provide a route to modelling
length and time scales that are far beyond that which is practically achievable via
atomistic molecular dynamics. Coarse grain modelling is currently undergoing a
renaissance, and there is now significant international effort being spent developing
and applying coarse grain methods to model biological systems. It is not the purpose
of this review to cover all of these recent developments in depth, so the interested
reader is directed to several excellent modern reviews of the development and
application of CG methods.113–115
Coarse grain models allow simulators to routinely access length and time scales
that are not practically possible using atomistic modelling methods. However, in
smearing out the atomistic detail, CG models run the possibility of missing out
important atomistic effects, much in the same way that molecular mechanics models,
in smearing out all of the electronic detail, can fail to model important electronic
effects such as polarisation. There is now significant interest in interfacing atomistic
and coarse grain models within a multiscale framework, so that this problem may be
overcome. Just as there is significant variation in the type and strength of interaction
in the different methods developed to interface QM and MM models, so too is there
significant variation in the type and strength of interaction of the different methods
of interfacing atomistic and CG models. The type of interfaces broadly fall into four
categories, which are similar in nature and definition to the interfaces that have been
developed for QM/MM interfaces:
1. One-way, bottom-up methods. These involve a single transfer of information
from atomistic simulations or calculations to the CG simulation, e.g. by using an
atomistic simulation to parameterize a CG model.
2. One-way, top–down methods. These involve a transfer of information from the
CG simulation to the atomistic simulation, e.g. using a CG model to enhance
26 | Chem. Modell., 2008, 5, 13–50
This journal is
c
The Royal Society of Chemistry 2008
www.pdfgrip.com