Document Text (Pages 41-50) Back to Document

In Silico Drug Design of Biofilm Inhibitors of Staphylococcus epidermidis

by Al-mulla, Aymen Faraoun, MS

Page 41

III. Molecular Docking
Docking is an automated computer algorithm that determines how a
compound will bind in the active site of a protein. This includes
determining the orientation of the compound, its conformational geometry,
and the scoring (Figure 2.3). The scoring may be a binding energy, free
energy, or a qualitative numerical measure. In some way, every docking
algorithm automatically tries to put the compound in many different
orientations and conformations in the active site, and then computes a score
for each. Some programs store the data for all of the tested orientations, but
most only keep a number of those with the best scores (Young, 2009). In
general, there are two key components of molecular docking (Leach and
Gillet, 2003),as follows:
a. Accurate pose prediction or binding conformation of the ligand inside
the binding site of the target protein.
b. Accurate binding free energy prediction, which later is used to rank the
order of the docking poses.

Figure 2.3 Example of molecular docking between chemical compound and a protein

Page 42

The docking algorithm usually carries out the first part of the docking
(predicting binding conformation) and the scoring function associated with
the docking program carries out the second part that is binding free energy
calculations. The key components of molecular docking are further
displayed below.
Pose prediction: Docking algorithms usually perform pose predictions
which aim to identify molecular features that are responsible for molecular
recognition. Pose predictions are very complex and often difficult to
understand when simulated on a computer (Kitchen et al., 2004).
Activity prediction: After the pose prediction by docking algorithm, the
immediate step in the docking process is activity prediction, which is also
termed scoring. Docking score is achieved by the scoring functions
associated with the particular docking software. Scoring functions are
designed to calculate biological activity by estimating the interactions
between the compound and protein target. During the early stages of the
docking experiments, scoring was performed based on the simple shape
and electrostatic complementarities. However, currently, the docking
conformers are often treated with sophisticated scoring methods that
include the Van der Waals interactions, electrostatic interactions, solvation
effects and entropic effects (Gohlke and Klebe, 2002; Young, 2009).
Docking Algorithms
Depending on the flexibility of protein and ligand, docking algorithms can
be divided into 3 types (Kasam, 2009), as follows:

-flexible docking: protein is fixed and ligand is flexible

Page 43

Based on the principle of conformation generation, the search methods are
categorized into

The search algorithm positions molecules in various locations,
orientations, and conformation within the active site. Some of the earliest
docking programs positioned a molecule in the active site, holding it rigid
with respect to conformational changes, but all modern docking algorithms
include ligand conformational changes. The choice of search algorithm
determines how thoroughly the program checks possible molecule
positions, and how long it takes to run. The search algorithm does not
determine whether the docking program gives accurate results. But the
scoring function is responsible for determining whether the orientations
chosen by the search algorithm are the most energetically favorable, and is
responsible for computing the binding energy. Thus, a search algorithm
that does not sample the space thoroughly will give inaccurate results if the
correct orientation is not sampled. However, most search functions will
sample the space adequately if they are given the correct input parameters.
Many search algorithms have been developed depending on the principles
of conformation generation. One of the earliest used algorithms was Monte
Carlo search algorithm, which is built around a random number generator.
In the simplest implementation, position, orientation, and conformation are
all chosen at random. Sometimes, position and conformation are checked
independently. Thus, a position is chosen and many conformations are
tested while in that position; then a new position is chosen, and the process
repeats. Another important algorithm is the tabu search algorithm. Most
tabu searches are implemented as a modified version of the Monte Carlo

Page 44

search. Like the Monte Carlo search, the tabu search chooses orientations
and confirmations randomly. However, the Monte Carlo algorithm utilizes
no knowledge of what positions have already been sampled, and thus
sometimes results in recomputing positions that have already been
computed. The tabu algorithm keeps track of which positions have already
been sampled, and avoids sampling those positions again. Thus, it can give
the same results with fewer iterations, by eliminating any duplication of
work. Genetic algorithms can sample a space thoroughly, if the parameters
are chosen wisely, and can run very quickly. Many docking algorithms
were originally developed to simulate the ligand binding in a crevice in the
surface of the folded protein. Some programs have difficulty in docking
compounds in the active site that is completely enclosed. This can happen
when the protein folds down over the active site or the entire active site
opens and closes via a clam shell movement of two large sections of the
protein. When this occurs, additional inputs are needed that will allow the
docking program to function correctly with an encapsulated active site
(Young, 2009).
Scoring functions
One of the two important components of molecular docking is scoring.
While docking aims at reproducing binding conformation close to the X-
ray crystal structure, the aim of scoring is to quantify the free energy
associated with protein and ligand in the formation of the protein-ligand
interactions. Most of the docking softwares are equipped with scoring
functions, which enable computing free energy associated with proteinligand
interactions (docking score). The docking score is used to rank the
chemical compounds in a virtual screening campaign. Wide ranges of
scoring functions are available to calculate the binding between the protein
and virtual ligand. These methods range from estimating binding by a
simple shape and electrostatic complementarities to the estimation of free

Page 45

energy of protein and ligand complex in aqueous solutions. Only few of
them are capable of addressing the thermodynamic process involved in the
binding process. However, methods based on thermodynamic parameters
require an extensive simulating time, and consequently significant CPU
time. Therefore, these methods are restricted to a smaller set of compounds,
making it impractical to use them in large-scale virtual screening
experiments. Currently, three main types of scoring functions are applied:
Force field-based, empirical scoring functions and knowledge based
scoring functions (Moitessier et al., 2008).
Force field-based scoring functions: This type relies on the molecular
mechanics methods. Force field-based methods calculate both the proteinligand
interaction energy and ligand internal energy and later sum both the
energies. The following represents total energy equation based on force
where the components of the covalent and
noncovalent contributions are given by the following summations:


Ebond represent potential energy of covalent bonds.
Eangle represent potential energy between angled bonds.
Edihedral represent potential energy of tortion of bonded atoms.
Eelectrostatic represent potential energy of electrostatic forces.
Evan der Waals represent potential energy of van der Waals forces.

Different force field functions are based on different force field
parameter sets. For example, AutoDock relies on the Amber force field and

Page 46

G-Score relies on the Tripos force field (Moitessier et al., 2008). Van der
Waals and electrostatic energy terms describe both the internal energy of
the ligand and the interactions between the protein and ligand. The van der
Waals energy term is described by the Lennard Jones potential.
Electrostatic terms are described by the Coulombic formula with a distance
dependent dielectric constant for charge separation. Advantages of force
field-based scoring functions include accounting of solvent, and
disadvantages include over-estimation of binding affinity (Moitessier et
al., 2008) and arbitrarily choosing non bonded cutoff terms (Kitchen et al.,
Knowledge based scoring functions: It uses atom pair interaction
potentials as in potential of mean force (PMF). Atom pair interaction
potentials are usually derived from structural information stored in the
databases (ChemBridge structural database and protein data bank) of
protein-ligand complexes. It relies on the assumption that repeated
occurrence of close intermolecular interactions between certain types of
functional groups or atom types are energetically more favorable than the
randomly occurring interactions, thus complementarily contribute to the
binding affinity. The robust nature of this scoring function makes it usable
in virtual screening. Knowledge based scoring functions rely on existing
intermolecular interaction databases. One major limitation of this method
is the limited availability of such structural information in the
intermolecular interaction databases. D-score (Gohlke et al., 2000) and
PMF scoring functions rely on knowledge based scoring functions
(Muegge and Martin, 1999).
Empirical Scoring functions: The score in the empirical scoring function
is derived from the individual energy contributions of each component
involved in the intermolecular interactions, as shown in the equation

Page 47

ΔGbind = ΔGdesolvation + ΔGmotion + ΔGconfiguration + ΔGinteraction

desolvation enthalpic penalty for removing the ligand from solvent.
motion entropic penalty for reducing the degrees of freedom when a
ligand binds to its receptor.
configuration conformational strain energy required to put the ligand in
its "active" conformation.
interaction enthalpic gain for "resolvating" the ligand with its receptor
(Böhm, 1994).
Empirical scoring functions are easier to apply and are subjected to less
computational error. For example, Kuntz in his early work emphasized on
the molecular shape, because shape complementarity is certainly essential
for a ligand to be placed in the binding site and can be easily and accurately
computed. However, in his later work he added chemical information,
molecular mechanical energies, and empirical hydrophobicities to make
the scoring function more accurate (Kuntz et al., 1982; Kuntz, 1992).
Bӧhm developed another empirical scoring function that takes into account
hydrogen bonding, ionic interactions, lipophilic contact surface and a
number of rotatable bonds (Böhm, 1994; Böhm et al., 2000). Due to their
robust nature, empirical scoring functions are widely used in virtual
screening experiments along with knowledge based scoring functions.
One of the major limitations of the empirical scoring function is that it
works very well with rigid ligands, but the results are not satisfying with
flexible ligands. This is because most of the empirical scoring functions
ignore the internal energy of the ligand. Scorings such as ChemScore

Page 48

(docking tool) (Eldridge et al., 1997) and Ludi (de novo design tool)
(Böhm, 1994) rely on the empirical scoring function.

B- Ligand Based Drug Design
In the past century many drug’s target proteins were unknown. This is
still fairly common, although it is slowly becoming less so as the body of
knowledge about biological systems expands. The success of the design is
greater if the target is known and a structure-based drug design process can
be followed. However, there are times when there is a good reason for
using a drug design without a known target. For example, cell surface
receptors make excellent drug targets, but are very difficult to crystallize.
So if homology modelling was unreliable or low identity score for the
homolog protein was observed, in this case the techniques used for
structure-based drug design cannot be used. Pharmacophore models and
3D-QSAR models can be used instead. A 3D-QSAR is a computational
procedure used for quantitatively predicting the interaction between a
molecule and the active site of a specific target. The great advantage of a
3D-QSAR is that it is not necessary to know what the active site looks like.
Thus, it is possible to use this technique when the target is unknown. A 3D-
QSAR is a mathematical attempt to define the properties of the active site
without knowing its structure. This is done by computing the electrostatic
and steric interactions that an imaginary probe atom would have if it were
placed at various positions on a grid surrounding a known active
compound. In some cases, other interactions, such as hydrogen bonding,
will also be included. After doing this for multiple active compounds, a
partial least squares algorithm can be used to determine what spatial
arrangement of features there could be in an active site that interacts with
the known active molecules (Young, 2009).

Page 49

2.2 Bacterial Studies
2.2.1 Staphylococcus epidermidis
Staphylococci are Gram-positive cocci. They have a diameter ranging
from 0.5-1.5μm, being arranged in pairs, tetrads and small clusters. They
usually produce the enzyme catalase and are non-spore forming. under
anaerobic conditions, almost all Staphylococci produce acid from glucose,
lowering the pH of the surrounding environment. The ability to clot plasma
separates them into coagulase-positive or coagulase-negative
staphylococci (Holt et al., 1994). Staphylococcus epidermidis is the most
commonly isolated staphylococcal species from human sources. It can be
distinguished from S. aureus by its inability to produce coagulase. For a
long time, it was considered as a non-pathogenic organism, but now is
recognized as the most important pathogen in foreign body device
infections. It rarely causes infection in the healthy host. Compromised
hosts and patients with foreign devices or implants are more likely to be
the target of S. epidermidis infections. Some strains can produce a slime
like material which enhances their adherence to and accumulation on the
smooth surfaces of metal devices (Vuong and Otto, 2002).
S. epidermidis colonies are smooth, raised, glistening, circular and
translucent or nearly opaque. Single colonies may reach 2.6- 6 mm in
diameter on non-selective media. With time and elevated temperature
(above 35°C) or crowding, colonies develop depressed dark centres and
become more sticky in consistency. The colonies of slime producing
strains become very sticky. Most strains produce grey to grayish white
colonies. Other rare strains may produce colonies that are yellowish,
brownish or violet in colour (Balows et al., 1992).

Page 50

2.2.2 Clinical importance of Staphylococcus epidermidis
For the following reasons, S. epidermidis is considered as a commensal
opportunistic pathogen and has gained substantial interest in recent years
(Shahrooei, 2010);
- S. epidermidis is the major pathogen of coagulase negative
Staphylococci; it comprises 65- 90% of all Staphylococcus spp. recovered
from human sources (Huebner and Goldmann, 1999).
- It is estimated that approximately 250,000 cases of intravascular catheterrelated
blood stream infections (CRBSI) occur yearly in the United States
alone (Rogers et al., 2009), and S. epidermidis is responsible for most of
these infections (Cherifi et al., 2013).
- Prosthetic valve endocarditis (PVE) and native valve endocarditis (NVE)
are caused by coagulase negative Staphylococci in 15-50% and 5-8% of
cases, respectively. S. epidermidis is the most prominent cause of both PVE
and NVE (Huebner and Goldmann, 1999; Lalani et al., 2006; Chu et al.,
2008; Rogers et al., 2009).
- Coagulase negative Staphylococci are the predominant pathogens causing
48% to 67% of cerebrospinal fluid shunt infections; S. epidermidis is the
most common bacterial species isolated from these infections (Roos,
- UTI related staphylococcal infection was found to be caused by S.
epidermidis more than S. aureus (Fadhel et al., 2013).
- S. epidermidis and S. aureus are ranked second as a cause of surgical site
infections (Rogers et al., 2009; Wilson et al., 1988).
- Coagulase negative Staphylococci are currently responsible for more than
50% of all nosocomial infections in neonatal intensive care units. S.

© 2009 All Rights Reserved.