Download Conserved Positions for Ribose Recognition: Importance of Water

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

G protein–coupled receptor wikipedia , lookup

Signal transduction wikipedia , lookup

Protein wikipedia , lookup

Ligand binding assay wikipedia , lookup

Biochemistry wikipedia , lookup

Western blot wikipedia , lookup

Evolution of metal ions in biological systems wikipedia , lookup

Protein purification wikipedia , lookup

Interactome wikipedia , lookup

Proteolysis wikipedia , lookup

Structural alignment wikipedia , lookup

Homology modeling wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Metalloprotein wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Transcript
doi:10.1016/S0022-2836(02)00975-0 available online at http://www.idealibrary.com on
w
B
J. Mol. Biol. (2002) 323, 523–532
Conserved Positions for Ribose Recognition:
Importance of Water Bridging Interactions Among
ATP, ADP and FAD-protein Complexes
Mariana Babor*, Vladimir Sobolev and Marvin Edelman
Departament of Plant Sciences
Weizmann Institute of Science
Rehovot 76100, Israel
Analysis of the spatial arrangement of protein and water atoms that form
polar interactions with ribose has been performed for a structurally nonredundant dataset of ATP, ADP and FAD-protein complexes. The 26
ligand –protein structures were separated into two groups corresponding
to the most populated furanose ring conformations (N and S-domains).
Four conserved positions were found for S-domain protein– ligand complexes and five for N-domain complexes. Multiple protein folds and
secondary structural elements were represented at a single conserved position. The following novel points were revealed: (i) Two complementary
positions sometimes combine to describe a putative atomic spatial
location for a specific conserved binding spot. (ii) More than one third of
the interactions scored were water-mediated. Thus, conserved spatial
positions rich in water atoms are a significant feature of ribose –protein
complexes.
q 2002 Elsevier Science Ltd. All rights reserved
*Corresponding author
Keywords: protein– ligand interactions; molecular recognition; hydrogen
bonding; consensus structure; nucleotides
Introduction
Ribose is an essential component of many
nucleotides that constitute a broad family of
ligands. Mononucleotides, such as ATP and ADP,
are involved in energy exchange processes in biological systems, whereas FAD, among other
dinucleotides, acts as an ancillary electron carrier
for a wide variety of enzymes involved in
metabolism. Knowledge of the principles that
govern ribose recognition by proteins is sketchy
because tertiary and even secondary structural
elements involved in nucleotide recognition are
very diverse among proteins having different
folds.1 Furthermore, although some sequence
motifs have been identified within ligand binding
folds, in general they are not extendable to
unrelated structures.2,3 Therefore, identification of
common recognition patterns among different
folds may require an analysis of the spatial
arrangement of atoms that interact with a given
moiety. Indeed, this strategy enabled the identification of an adenine binding motif shared by
ATP-grasp and protein kinase folds.4,5 Subsequent
Abbreviations used: PDB, Protein Data Bank.
E-mail address of the corresponding author:
mariana.babor@weizmann.ac.il
studies showed that this motif is also present in
many mononucleotide and dinucleotide binding
folds.6,7 Similarly, analysis of the phosphate moiety
present in mononucleotides allowed the identification of structural phosphate binding motifs
among several superfamilies, even in cases where
there was no sequence signature for binding
phosphate.8 In both adenine and phosphate binding motifs, primarily backbone atoms participate
in the nucleotide binding. This may explain why
these motifs are independent of protein sequence.
Ribose, the sugar present in most nucleotides,
contains a semi-flexible furanose ring, whose conformation can be described in terms of the maximum torsion angle and the pseudorotation phase
angle (P ).9 The P parameter ranges from 08 to 3608
and describes the puckering of the ring by considering its five endocyclic torsion angles. The
five-membered ring is asymmetrically substituted
in nucleotides. As a result, potential energy
thresholds are created that limit pseudorotation
and lead to two preferred ranges of conformations,
centered at C30 -endo (P ¼ 2 18 to 348) and C20 -endo
(P ¼ 1378 to 1948).9 – 12
ATP binding proteins engage the adenine moiety
by hydrophobic atomic contacts above and beneath
the plane of the purine ring, and by hydrophilic
contacts, including hydrogen bonds, in the plane
0022-2836/02/$ - see front matter q 2002 Elsevier Science Ltd. All rights reserved
524
Water Bridging in Ribose Recognition
Figure 1. Ribose P phase angles within ATP, ADP, and
FAD-protein complexes. ATP, ADP, and FAD-protein
structures of resolution 2.2 Å or better, with R-factors
lower than 23% and sharing 30% sequence identity or
less, were grouped according to their ribose ring pseudorotation (P ) phase angle. In the vast majority of cases,
the P values cluster in two specific domains (S and N)
of the pseudorotational cycle. S-domain structures: 1djn,
1efv, 1qjd, 1h82, 3uag, 1el5, 1bg0, 1hpm, 1pbe, 3grs,
1qla, 1f8r, 1b4v, 1d7y, 1h7w, 1cjc, 1cqx, 1fl2, 1d4x, 1npx,
1fwk, 1ep3, 1bg2, 1fnn, 1hi5, 1ayl, 1fp6, 1hck, 1dak,
1nsf, 1qf9, 1d4a, 1fo4, 2mbr, 1f0x, and 1gpe; and Ndomain structures: 1gsa, 1rkd, 1a49, 1a8p, 1csn, 1vom,
1b0u, 1nsy, 1qnf, 1cs0, 1b8a, 1byq, 1atp, 1mjh, 1kdn,
1iov, 1fnd and 2hgs. P values of structures that do not
belong to either of these domains are shown in light
gray (1e19, 1php, 1f9a and 1ndh).
of the rings.2,6,13 Structural and functional studies
show that the ribose moiety can also be essential
for ligand recognition. For instance, the affinity of
NADP-dependent dehydrogenase is switched
from NADP to NAD by introduction of an acidic
residue that forms hydrogen bonds with the ribose
hydroxyls.14,15 Likewise, changing NAD coenzyme
activity to NADP requires the replacement of
this acidic amino acid.16 Detailed analysis of the
structural arrangement of atoms that interact with
the ribose moiety, however, has not been performed.
Here, we address this issue, analyzing ATP, ADP
and FAD-protein complexes. Specifically, protein
atoms and water molecules that form polar interactions with ribose hydroxyls were identified.
A non-redundant dataset of ribose-containing
structures was created. The data files were superimposed and atoms from different structures that
interact with equivalent ribose hydroxyl groups
were clustered. Conserved spatial positions for the
binding of either ribose 20 or 30 -hydroxyls were
identified which are shared by a large number of
diverse folds. These conserved positions are
independent of amino acid sequence as well as of
secondary or tertiary structural elements. Moreover, some of them are occupied primarily by
water oxygen atoms, indicating that water plays a
key role in ribose-protein interactions.
Results
Grouping of structures according to ribose
ring conformation
The ribose furanose ring is flexible. We therefore
sought populations with homogeneous ribose ring
conformations for analysis of binding sites. A database of 145 wild-type PDB structures of FAD, ATP
and ADP-protein complexes with a resolution of
2.2 Å or better and an R-factor lower than 23%
was extracted from the PDB. Complexes sharing
more than 30% sequence identity were grouped
and the member with highest resolution retained,
yielding 58 structures. Sorting according to their
ribose pseudorotation phase angle (P ), resulted in
54 structures clustering within two groups
(Figure 1) corresponding to the known S and
N-domains.9,10,12 The amplitude of pseudorotation
(tmax) ranged from about 35 to 508 with a few
Table 1. Structurally non-redundant data set for the ribose S-domain
PDB representative
CATH fold
Code
Res. (Å)
Protein
Ligand
Code
No. of members
Water mol.a
1hpm
1bg2
1bg0
1dak
1fnn
3uag
1hi5
1ayl
1hck
1cjc
3grs
1f0x
1fo4
2mbr
1cqx
1.70
1.80
1.86
1.60
2.00
1.77
1.80
1.80
1.90
1.70
1.54
1.90
2.10
1.80
1.75
Heat shock protein 70kd
Kinesin motor domain
Arginine kinase
Dethiobiotin synthetase
Cell division control protein 6
D -Glutamate ligase
Eosinophil-derived neurotoxin
PEP carboxykinase
Cyclin-dependent kinase 2
Adrenodoxin reductase
Glutathione reductase
D -Lactate dehydrogenase
Xanthine dehydrogenase
UDP-N-acetylmuramate dehydrogenase
Flavohemoprotein
ADP
ADP
ADP
ADP
ADP
ADP
ADP
ATP
ATP
FAD
FAD
FAD
FAD
FAD
FAD
7.1.44
3.40.850
3.30.590
3.40.50
8.1.59
3.90.190
3.10.130
2.170.8
1.10.510
6.1.67
3.50.50
7.1.247
8.1.15
3.30.43, 3.90.78b
2.40.10
2
1
1
7
1
1
1
1
1
2
11
1
1
1
1
14
13
12
7
11
11
10
13
4
20
19
7
16
13
15
a
b
Number of water molecules associated with the ligand
The ribose hydroxyls interact with several CATH domains
525
Water Bridging in Ribose Recognition
Table 2. Structurally non-redundant data set for the ribose N-domain
PDB representative
CATH fold
Code
Res. (Å)
Protein
Ligand
Code
No. of members
Water mol.a
1cs0
1byq
1rkd
1kdn
1vom
1csn
1mjh
1b8a
1a49
1qnf
1fnd
2.00
1.50
1.84
2.00
1.90
2.00
1.70
1.90
2.10
1.80
1.70
Carbamoyl phosphate synthetase
Heat shock protein 90kd
Ribokinase
Nucleoside diphosphate kinase
Myosin
Casein kinase-1
Protein Mj0577
Aspartyl-tRNA synthetase
Pyruvate kinase
Photolyase
Ferredoxin-NADP(þ ) reductase
ADP
ADP
ADP
ADP
ADP
ATP
ATP
ATP
ATP
FAD
FAD
3.30.470
3.30.565
3.90.77
3.30.70
3.30.538
1.10.510
3.40.50
3.40.690
3.20.20, 2.40.10b
1.10.399
5.1.860
4
1
1
1
1
2
3
1
1
1
2
7
19
7
7
10
6
7
13
10
8
15
a
b
Number of water molecules associated with the ligand
The ribose hydroxyls interact with several CATH domains
exceptions (22, 25, 26, 32 and 558 in PDB entries
1fwk, 1e15, 1b0u, 1pbe and 1fp6, respectively).
The S-domain was populated with 36 structures
and the N-domain with 18. Four structures with
ribose ring conformations different from S or N
were not dealt with further. Structures with the
same CATH fold classification17 were grouped and
the member with the highest resolution was
chosen as fold representative. Three structures (or
their homologs) not classified by CATH were
excluded from the list. The representative PDB
files of the final CATH folds are listed in Table 1
for S-domain structures and in Table 2 for
N-domain structures. This structurally nonredundant data set was used for identification of
conserved atomic positions forming hydrogen
bonds with either ribose 20 or 30 -hydroxyls.
Conserved positions of atoms that form
hydrogen bonds with ribose
All protein atoms in the S and N data sets that
hydrogen bond with either ribose 20 or 30 hydroxyls were determined. Water can mediate
hydrogen bonds between ligand and protein.
Therefore, water molecules that form hydrogen
bonds with either 20 or 30 -hydroxyl and with at
least one protein atom were included. The number
of water molecules associated with the ligand in
each structural complex is given in Tables 1 and 2.
Approximately 60% of these water molecules are
hydrogen bonded to both ligand and protein,
while about a quarter of the latter involve the
ribose moiety.
Ribose binding sites were superimposed
spatially and atoms hydrogen-bonded to ribose 20
or 30 -hydroxyl, and within a distance of 1.5 Å,
were clustered together. Four clusters were identified for S-domain structures (Table 3). Three are
composed of atoms that form hydrogen bonds
with ribose 20 -hydroxyl and one with ribose 30 hydroxyl (Figure 2(a)). In all cases, the geometric
centers of the clusters agree with accepted hydrogen bond angle definitions.18,19 In cluster S1, angle
C – O –Cc (where Cc is a cluster center) is equal to
1208. Corresponding angles for S2, S3, and S4 are
1158, 1198, and 1418, respectively. All the ATP, ADP
and FAD-protein complexes are represented by a
protein atom at least once. Ten different folds are
represented in cluster S1, in nine cases by protein
Table 3. S-domain clusters forming polar interactions with ribose hydroxyls
20 -Hydroxyl
Ligand
ADP
ATP
FAD
PDB code
Cluster S1
1hpm
1bg2
1bg0
1dak
1fnn
3uag
1hi5
1ayl
1hck
1cjc
3grs
1f0x
1fo4
2mbr
1cqx
Glu268_O12
His93_Nd1
Thr311_O
Cluster S2
Water
Gln40A_N
Met74A_O
Leu257A_N
Ile45_O
Pro232A_O
Cluster S3
Cluster S4
Lys271_Nz
Water
Water
Glu211A_O12
Water
Lys39A_Nz
30 -Hydroxyl
Glu297_O12
Gln131_O
Water
Arg204A_Nv2
Water
Water
Asp317A_Od2
Water
Water
Glu38A_O12
Glu50_O12
Asp86_Od2
Water
Water
Water
Water
Water
526
Water Bridging in Ribose Recognition
Figure 2. Conserved positions of
atoms that interact with ribose 20 or
30 -hydroxyl groups. (a) S-domain:
ribose binding sites (see Materials
and Methods) of the S-domain
structures shown in Table 3 were
superimposed spatially, and atoms
hydrogen-bonded to ribose 20 or 30 hydroxyl were clustered together.
Four conserved positions were
identified among ATP, ADP and
FAD-protein complexes. Three
form hydrogen bonds with ribose
20 -hydroxyl and correspond to
clusters S1 (r ¼ 1.5 Å; dark blue),
S2 (r ¼ 0.8 Å; light blue) and S3
(r ¼ 1.5 Å; green). The fourth forms
hydrogen bonds with ribose 30 hydroxyl and corresponds to
cluster
S4
(r ¼ 1.3 Å;
red).
(b) N-domain: ribose binding sites
of the N-domain structures shown
in Table 4 were superimposed
spatially, and atoms hydrogenbonded to ribose 20 or 30 -hydroxyl
were clustered together. Five conserved positions were observed
among ATP and ADP-protein complexes. Three form hydrogen bonds
with ribose 20 -hydroxyl and correspond to clusters N1 (r ¼ 1.1 Å;
dark blue), N2 (r ¼ 1.2 Å; light
blue) and N3 (r ¼ 1.1 Å; red). Two form hydrogen bonds with ribose 30 -hydroxyl and correspond to clusters N4
(r ¼ 0.6 Å; green) and N5 (r ¼ 1.2 Å; yellow). Radius r is defined as the distance from the geometric center and the
furthest atom.
atoms that belong to different residues. These residues are located in diverse secondary structural
elements. Moreover, in some cases these atoms
belong to the backbone and in others to the sidechains. Nor is there a preference for acceptor or
donor atoms within the cluster. Cluster S2 is
mainly represented by ATP and ADP-protein complexes. Although this cluster has few members,
when considered together with cluster S1, 14 out
of 15 structures are represented (Table 3). This
suggests that clusters S1 and S2 are complementary
(Figure 3). Cluster S3 is compact, all atoms except
one being closer than 0.7 Å from the geometrical
center of the cluster. All protein atoms in cluster
S3 belong to side-chains of acidic or basic residues,
located at the end of b-strands or a-helices,
respectively. Cluster S4 is rich in water oxygen
atoms that mediate interactions between ribose 30 hydroxyl and the protein. The fact that these
water atoms are detected in the crystal structures
indicates that they are forming stable interactions
and, thus, can be considered as part of the
ligand– protein binding site. Indeed, 44% of all the
hydrogen bond interactions listed between the proteins and the ribose hydroxyls of the S-domain
structures are water-mediated.
Five clusters were identified for N-domain structures (Table 4). Three correspond to atoms that
form hydrogen bonds with ribose 20 -hydroxyl and
two with ribose 30 -hydroxyl (Figure 2(b)). Here
too, the geometric centers of the clusters agree
with accepted hydrogen bond angle definitions.18,19
The C – O –Cc angle for N1, N2, N3, N4, and N5 is
1248,1238, 1158, 1088, and 1008, respectively. All the
ATP and ADP-protein complexes are represented
by a protein atom at least once, except 1vom and
1byq, which are represented by water atoms alone
in at least two clusters. FAD-protein folds, to start
with, poorly represented within the N-domain
group (Table 2), did not contribute any structures
to the N-domain clusters (Table 4). In this regard,
we note that the adenosine moiety in the ferrodoxin reductases is considerably more mobile than
the rest of the ligand, producing very few hydrogen bonds or even none at all with protein
atoms.20 Thus, more structures are required before
an analysis of N-domain FAD-protein complexes
can be made.
Atoms forming polar interactions from the nine
N-domain ATP and ADP-protein complexes were
amassed in several clusters or cluster pairs. Cluster
N1 is represented mainly by protein atoms that
belong either to the backbone or to side-chains
and no preference for donors or acceptors was
observed. Considering clusters N1 and N2 (both
of which form polar interactions with ribose 20 hydroxyl) as a complementary pair, all the ATP
and ADP-protein complexes, except one, are
527
Water Bridging in Ribose Recognition
additional members besides the chosen representative. We analyzed whether the clusters identified
employing these representatives (see Tables 3 and
4) effectively represent the members of a fold as a
whole.
S-domain
Figure 3. Complementary cluster pairs. S-domain cluster-pair S1 and S2 (dark gray), and N-domain clusterpair N1 and N2 (dark gray), exhibit complementary positions of atoms that form hydrogen bonds with ribose 20 hydroxyl, while cluster-pair N4 and N5 (black) is composed of atoms that form hydrogen bonds with ribose
30 -hydroxyl group. Clusters S4 and N3, which are rich
in water-mediated interactions, are pair-less. A white
area indicates that the entry is not represented in the
cluster.
represented (Figure 3). Cluster N3 is rich in water
atoms, indicating again the important role that
water plays in stabilizing ligand –protein complexes. Clusters N4 and N5 (which form polar
interactions with ribose 30 -hydroxyl) can also be
considered as a complementary pair, in which
seven out of the nine ADP and ATP-protein complexes are represented (Figure 3).
Conservation of cluster positions within a fold
Several CATH folds in the high-resolution, nonredundant data sets of Tables 1 and 2 have
FAD-NAD(P ) binding domain. CATH fold 3.50.50
contains 11 high-resolution members, as indicated
in Table 1. The results in Table 5 show that all
except one of the members of this fold have protein
atoms which hydrogen bond with the ribose 20 hydroxyl group. In cluster S3, all ten atoms are
located within 0.9 Å of the cluster center and all
belong to a residue positioned at the end of a
b-strand of the bab dinucleotide binding motif
(DBM). In all cases except one, the contacting
atom is part of an acidic amino acid, as in the
Rossmann-fold fingerprint region.21 – 23 The only
structure (1d7y) not represented in S3 has a water
oxygen in cluster S2 that interacts with the mentioned conserved acidic residue. Six of the 11
FAD-NAD(P) binding domain structures also interact with ribose 30 -hydroxyl (Table 5, cluster S4).
Most interactions involve water atoms and all of
these interact with a backbone oxygen located
three to four residues downstream of the one in
cluster S3.
Rossmann-like fold. CATH fold 6.1.67 contains two
high resolution members, as indicated in Table 1.
The same pattern observed for clusters S3 and S4
for the FAD-NAD(P) binding domain holds for
this fold, which is a Rossmann-related fold with a
similar nucleotide binding mode.21,23 – 25
Actin fold. CATH fold 7.1.44 contains two highresolution members, as indicated in Table 1. Their
interactions with the S-domain clusters are highly
similar (Table 5). In both, protein atoms in clusters
S1 and S3 belong to residues located in a-helices;
in the former, consecutive in sequence, in the latter,
consecutive in helix turns. The two structures are
represented in cluster S4 by water atoms. However,
in this fold, the water molecules interact with sidechain oxygen atoms of residues located, in
sequence, far from those represented in cluster S3.
Classical mononucleotide binding fold. CATH fold
3.40.50 contains seven high-resolution members,
Table 4. N-domain clusters forming polar interactions with ribose hydroxyls
20 -Hydroxyl
Ligand
ADP
ATP
PDB code
1cs0
1byq
1rkd
1kdn
1vom
1csn
1mjh
1b8a
1a49
Cluster N1
30 -Hydroxyl
Cluster N2
Cluster N3
Cluster N4
Cluster N5
Glu215A_O12
Gly241A_N
Water
Glu215A_O11
Water
GlnA285_N12
Water
His279_Nd1
Gly228_O
Lys16A_Nz
Asn119A_Nd2
Water
Gly127A_N
Ile362A_O
Lys206A_Nz
Water
Water
Pro11A_O
Water
Asp135_O
Glu361A_O12
528
Water Bridging in Ribose Recognition
Table 5. S-domain clusters that interact with ribose hydroxyl groups for each fold
20 -Hydroxyl
Fold
FadNad(P)
CATH code
PDB code
3.50.50
3grs
1pbe
1npx
1f8r
1qjd
1b4v
1qla
1el5
1gpe
1h82
1d7y
1cjc
1h7w
1hpm
1d4x
1dak
1fp6
1qf9
1nsf
1efv
1djn
1d4a
Rossmann-like
6.1.67
Actin
7.1.44
Mononucleotide
3.40.50
Cluster S1
Cluster S2
Arg32_Nv2
Lys157A_Nz
Water
Gln338A_N12
Gln83A_N12
Lys39A_Nz
Water
Glu268_O12
Glu214A_O12
Tyr399A_OH
Water
Water
30 -Hydroxyl
Cluster S3
Cluster S4
Glu50_O12
Glu32_O12
Glu32_O12
Glu63A_O12
Glu156A_O12
Glu40A_O11
Ser35A_Og2
Asp33A_Od2
Glu55A_O12
Glu35A_O12
Water
Water
Water
Glu38A_O12
Glu218A_O12
Lys271_Nz
Lys213A_Nz
Gly103A_O
Water
Glu41A_O12
Water
Water
Water
Water
Gln218A_O11
Water
Glu211A_O12
Water
Asn505_N
Asn300A_Od1
Asp419A_Od1
as indicated in Table 1. The results in Table 5 show
that six out of the seven are represented in at least
one of the clusters, with cluster S3 again being the
most populated. The remaining one, quinone
reductase (1d4a), the only member of this fold
with an FAD ligand, does not form hydrogen
bonds at all with the ribose. The poor hydrogenbonding pattern with FAD is also observed for
ferrodoxin reductase (1fnd, see below).
N-domain
ATP-grasp fold. CATH fold 3.30.470 contains four
high-resolution members, as indicated in Table 2.
All structures of this fold are represented in at
least one ribose 20 -hydroxyl and one 30 -hydroxyl
cluster (Table 6). Three structures are represented
in clusters N2 and N4 by atoms belonging to, or
interacting with, glutamic acid. Indeed, alignment
of proteins sharing an ATP-grasp fold showed that
the glutamic acid of cluster N4 is highly
conserved.26 The remaining structure, glutathione
synthase (1gsa), has the same fold and binding
site, but its acidic residue, likewise highly
conserved,6,26 interacts with ribose 30 -hydroxyl and
is represented in cluster N5.
Protein kinase fold. CATH fold 1.10.510 contains
two high-resolution members, as indicated in
Table 2. Both structures of this fold are represented
in clusters N2 and N4 (Table 6). As previously
described for the protein kinase fold,6 ribose 20 hydroxyl sometimes contacts a glutamic acid sidechain and sometimes interacts with a serine backbone through a water molecule. Here, we observe
that the glutamic acid residue and water molecule
are represented in the same cluster (Table 6, cluster
N2).
Classical mononucleotide binding fold. CATH
fold 3.40.50 contains three high-resolution
members, as indicated in Table 2. All three
structures in this fold are represented in cluster
N3 (Table 6).
Ferrodoxin reductase fold. CATH fold 5.1.860 contains two high-resolution members, as indicated in
Table 2. Ribose hydroxyls do not hydrogen bond
with the proteins in either case. The substantial
mobility of the adenosine moiety in these cases
results in a weak interaction between the protein
Table 6. N-domain clusters that interact with ribose hydroxyl groups for each fold
CATH
Fold
Atp-grasp
Code
Code
3.30.470
1cs0
1iov
2hgs
1gsa
1csn
1atp
1mjh
1nsy
1b0u
Protein Kinase
1.10.510
Mononucleotide
3.40.50
20 -Hydroxyl
PDB
Cluster N1
30 -Hydroxyl
Cluster N2
Cluster N3
Cluster N4
Cluster N5
Glu215A_O12
Glu187_O11
Water
Gly241A_N
Glu215A_O11
Glu187_O12
Glu425A_O11
Gln285A_N12
Lys452A_Nz
Gly 234_N
Asp208_Od2
Water
Glu127E_O12
Gly127A_N
Water
Pro11A_O
Asp177A_Od2
Water
Asp135_O
Glu170E_O
Water
Water Bridging in Ribose Recognition
529
Figure 4. Structures with different folds share common ribose binding positions. Overall folds are shown on the left
and common spatial positions are enlarged on the right. Upper panel: actin fold protein HSP-70 (1hpm) is represented
in clusters S1 and S3 by atoms from Glu268 and Lys271, respectively (consecutive in terms of the helix wheel); and in
cluster S4 by a water atom that mediates interaction with a side-chain oxygen of Asp234. Middle panel: Rossmann
fold protein adrenodoxin reductase (1cjc) is represented in clusters S1 and S3 by atoms from Lys39 and Glu38,
respectively (located at the end of a b-sheet and at the beginning of the loop that follows), and in cluster S4 by a
water atom that mediates interaction with the backbone oxygen of Val42. Lower panel: uridine diphospho-n-acetylenolpyruvylglucosamine reductase fold protein UDP-acetylmuramate dehydrogenase (2mbr) is represented in clusters
S1 and S3 by atoms from Ile45 and a water atom that mediates interaction with Gln168, respectively; and in cluster S4
by a water atom that interacts with a side-chain oxygen of Glu334.
and the molecule. This may explain why the
ferrodoxin reductase fold is the only one known
that can use either FAD or FMN (that lacks the
adenosine) without loss of activity.20 Therefore, it
seems that the ribose moiety is not involved in
ligand recognition within this fold.
Discussion
Analysis of the spatial arrangement of atoms that
hydrogen bond ribose 20 or 30 -hydroxyls in ATP,
ADP and FAD-protein complexes within different
folds, enabled identification of four conserved,
clustered positions for S-domain protein –ligand
complexes and five for N-domain ones. It is clear
from our results that this cluster pattern could not
have been identified at the amino acid sequence
or secondary structural level, since the atoms
within a conserved position can belong to different
residues, located in different structural elements,
or even in water molecules mediating ligand –
protein hydrogen bonds. Analysis of the clusters
within a fold showed that, in general, all structures
were represented, and for most folds, tended to
share a similar pattern. Therefore, the conserved
positions identified are independent of the fold
representatives used.
530
Water molecules are found in abundance at protein interfaces and play major roles in polar interactions that stabilize complexes.27 – 29 Furthermore,
there are conserved hydration sites mediating
interactions within DNA –protein complexes.30 – 32
Thus, it is important to consider water-mediated
ligand– protein interactions when studying the
rules that govern ligand recognition. In fact, we
have observed that water molecules play a crucial
role in ribose hydroxyl-protein interactions. Overall, one third of the 98 interactions scored in this
study were first-shell mediating water molecules,
however, the latter were not evenly distributed.
Cluster S4, showed a relatively high content of
water oxygen atoms (76 – 78%, Tables 3 and 5). On
the other hand, cluster S1 was water-interaction
poor (10 – 13%, Tables 3 and 5). In general, water
molecules belonging to clusters of the S-domain in
the non-redundant dataset formed hydrogen
bonds equally with backbone and side-chain
protein atoms (in total, 26 bonds; as expected,18
mainly with oxygen), while those of the N-domain
formed hydrogen bonds (in total, nine) almost
exclusively with side-chain atoms. We do not have
an explanation for this difference between the S
and N-domain groups.
We observed for both S and N-domains that two
conserved cluster positions interacting with the
same ribose oxygen may represent almost all the
members of the structurally non-redundant data
set. The complementary nature of cluster-pairs
S1-S2, N1-N2 and N4-N5, depicted in Figure 3,
may partly be due to their close proximity in 3D
space. Distances between centers of complementary clusters range from 2.4 to 2.8 Å (the
shortest distance between non-complementary
clusters interacting with the same oxygen is
3.9 Å). This close proximity limits the appearance
of atoms in both positions simultaneously because
of steric effects. In the infrequent cases where both
positions were occupied (i.e. 1cjc, 1sc0 and 1qjd in
Tables 3 –5) the atoms in the first two structures
are hydrogen-bonded, while in the third, they are
4.3 Å apart. We suggest that, in the cases of
complementary pairs, the pattern recognition for a
specific position can be viewed as the sum of the
contribution of the two spatial positions.
An aspect of ribose structural clusters is their
independence of traditional structural classification. For example, three S-domain ribose-protein
complexes from the non-redundant data set are
displayed in Figure 4: heat shock protein 70 kDa
(1hpm), adrenodoxin reductase (1cjc), and UDPN-acetylmuramate dehydrogenase (2mbr). Each
complex contributes an atom to clusters S1, S3 and
S4 (Table 3), however each complex derives from a
different fold. Each is represented in cluster S4 by
a water atom, in the first case the water atom interacts with a side-chain oxygen from a helix aspartate, in the second, with a backbone oxygen of a
loop valine and in the third, with a side-chain
oxygen of a loop glutamate. The protein atoms in
clusters S1 and S3 are from consecutive residues;
Water Bridging in Ribose Recognition
however, in the first case, consecutive in a helix
wheel sense whereas, in the second, directly consecutive between the end of a b-sheet and the
beginning of a loop. In the third case, cluster S3 is
water-mediated. Thus, ribose, which is bound in
two major conformations (S and N forms) in ATP,
ADP and FAD, provides an example of a ligand
moiety whose polar binding spots cut across
traditional elements of structural classification.
A network of binding spots that comprises the
binding pocket for a ligand moiety in any protein
might be an elemental “constant”.33 – 36 To date the
accuracy of the algorithms used for the identification of ligand binding pockets is limited by the
difficulty to discriminate between several putative
binding sites. Incorporation of conserved spatial
positions for binding specific moieties may improve
accuracy by imposing new spatial constrains.
Materials and Methods
Generation of data sets
ATP, ADP, and FAD-protein complexes with a resolution of 2.2 Å or better, and an R-factor lower than 23%,
were extracted from the Protein Data Bank (PDB).37,38
Structures with sequence identity greater than 30% were
grouped,39 and the member with the highest resolution
was retained. Structures were sorted according to their
ribose ring conformation using the pseudorotation
phase angle (P ). The five endocyclic torsion angles of
the furanose ring were used to define the P angle according to the equation of Altona & Sundaralingam:9
tan P ¼
ðt4 þ t1 Þ 2 ðt3 þ t0 Þ
2t2 ðsin 368 þ sin 728Þ
where t0 – 4 are ribose ring dihedral angles about the
O4 – C1 , C1 – C2 , C2 – C3 , C3 – C4 , C4 – O4 bonds, respectively; P ranges from 08 to 3608; and if t2 , 0 then
P ¼ P þ 1808. The amplitude of pseudorotation was
calculated as tmax ¼ t2/cos P.
Fold classification was according to the topology level
of the v2.3 version of the CATH database.17,40 Structures
with the same overall fold (i.e. similar number, arrangement and connectivity of secondary structures) were
grouped and the member with the highest resolution
retained as the fold representative of the set. For proteins
having more than one domain, the comparisons were
performed on the domains that hydrogen bond ribose
hydroxyls only. Domain length was determined using
the CATH-linked web tool, PDBsum. Structures that
were not classified in CATH but shared more than 35%
identity with another that was, were assigned to that corresponding fold. This stringent cutoff ensures a correct
fold assignment.40 Note that among the five remaining
unclassified structures, two were reported to share a
high degree of structural similarity to other classified
structures in the vicinity of the ligand binding site,41,42
thus they were assigned to the fold group of the similar
structure.
0
0
0
0
0
0
0
0
0
0
Identification and clustering of atoms
All protein and water atoms that form hydrogen
bonds with ribose 20 or 30 -hydroxyl were identified by
the ligand – protein contact (LPC) program,43 using a
Water Bridging in Ribose Recognition
maximum distance of 3.5 Å between donor and acceptor
atoms. Water interactions were considered only if the
water molecules also formed hydrogen bonds with the
protein. An option to identify water molecules was introduced into the LPC software. Water molecules associated
with ligands, as shown in Tables 1 and 2, were identified
by LPC software using the distance constraints described
by Thanki et al.44
Each ribose 20 -hydroxyl binding site, defined as the
ribose, and protein and water atoms that hydrogenbond to the ribose 20 -hydroxyl, was placed in a new
system of coordinates by superimposing its ribose ring
to a reference one, selected as the furanose ring with the
minimum average RMSD within the group. Ribose 30 hydroxyl binding sites were superimposed in the same
manner. After ribose binding sites from all the structures
were superimposed spatially, an algorithm that searches
for those atoms that hydrogen-bond one of the ribose
hydroxyls and are close to one another in 3D space was
used to cluster the atoms. Briefly, the procedure13 consisted of taking one atom and counting the number of
neighboring atoms within a radius of 1.5 Å. After repeating this for all atoms, the atom with the highest number
of neighbors, plus its neighbors within the accepted
radius, were considered the first cluster. The second,
third, and following clusters were found by reiterating
this procedure after eliminating the atoms already
clustered.13 In choosing a suitable threshold for our
study, several values were tested. We found 1.5 Å as
optimal. This value is small enough to locate the central
atom within the dense region of the cluster and large
enough not to leave atoms present in the vicinity of the
cluster without being included.
Acknowledgements
We thank E. Eyal for programming assistance
and Drs Z. Shakked, M. Eisenstein and
B. McConkey for valuable comments to the manuscript. V.S. is supported in part by the Ministry of
Absorption, the Center for Absorption of Scientists.
References
1. Schulz, G. E. (1992). Binding of nucleotides by
proteins. Curr. Opin. Struct. Biol. 2, 61 – 67.
2. Moodie, S. L., Mitchell, J. B. & Thornton, J. M. (1996).
Protein recognition of adenylate: an example of a
fuzzy recognition template. J. Mol. Biol. 263, 486– 500.
3. Traut, T. W. (1994). The functions and consensus
motifs of nine types of peptide segments that form
different types of nucleotide-binding sites. Eur.
J. Biochem. 222, 9 –19.
4. Kobayashi, N. & Go, N. (1997). A method to search
for similar protein local structures at ligand binding
sites and its application to adenine recognition. Eur.
Biophys. J. 26, 135– 144.
5. Kobayashi, N. & Go, N. (1997). ATP binding proteins
with different folds share a common ATP-binding
structural motif. Nature Struct. Biol. 4, 6 –7.
6. Denessiouk, K. A. & Johnson, M. S. (2000). When
fold is not important: a common structural framework for adenine and AMP binding in 12 unrelated
protein families. Proteins: Struct. Funct. Genet. 38,
310–326.
531
7. Denessiouk, K. A., Rantanen, V. V. & Johnson, M. S.
(2001). Adenine recognition: a motif present in
ATP-, CoA-, NAD-, NADP-, and FAD-dependent
proteins. Proteins: Struct. Funct. Genet. 44, 282– 291.
8. Kinoshita, K., Sadanami, K., Kidera, A. & Go, N.
(1999). Structural motif of phosphate-binding site
common to various protein superfamilies: allagainst-all structural comparison of protein – mononucleotide complexes. Protein Eng. 12, 11 – 14.
9. Altona, C. & Sundaralingam, M. (1972). Conformational analysis of the sugar ring in nucleosides
and nucleotides. A new description using the concept of pseudorotation. J. Am. Chem. Soc. 94,
8205 –8212.
10. Saenger, W. (1984). Principles of Nucleic Acids
Structure, Springer, New York.
11. Olson, W. & Sussman, J. L. (1982). How flexible is the
furanose ring? A comparison of experimental and
theoretical studies. J. Am. Chem. Soc. 104, 270– 278.
12. Moodie, S. L. & Thornton, J. M. (1993). A study into
the effects of protein binding on nucleotide conformation. Nucl. Acids Res. 21, 1369– 1380.
13. Kuttner, Y. Y., Babor, M., Edelman, M. & Sobolev, V.
(2001). Structural commonality in protein binding
sites for ATP. In Currents in Computational Molecular
Biology 2001 (El-Mabrouk, N., Lengauer, T. &
Sankoff, D., eds), pp. 91 – 92, Les Publications CRM,
Montreal.
14. Scrutton, N. S., Berry, A. & Perham, R. N. (1990).
Redesign of the coenzyme specificity of a dehydrogenase by protein engineering. Nature, 343, 38 –43.
15. Hurley, J. H., Chen, R. & Dean, A. M. (1996).
Determinants of cofactor specificity in isocitrate
dehydrogenase: structure of an engineered
NADP þ ! NAD þ specificity-reversal
mutant.
Biochemistry, 35, 5670–5678.
16. Galkin, A., Kulakova, L., Ohshima, T., Esaki, N. &
Soda, K. (1997). Construction of a new leucine
dehydrogenase with preferred specificity for
NADP þ by site-directed mutagenesis of the strictly
NAD þ -specific enzyme. Protein Eng. 10, 687– 690.
17. Pearl, F. M., Lee, D., Bray, J. E., Sillitoe, I., Todd, A. E.,
Harrison, A. P. et al. (2000). Assigning genomic
sequences to CATH. Nucl. Acids Res. 28, 277– 282.
18. Baker, E. N. & Hubbard, R. E. (1984). Hydrogen
bonding in globular proteins. Progr. Biophys. Mol.
Biol. 44, 97– 179.
19. McDonald, I. K. & Thornton, J. M. (1994). Satisfying
hydrogen bonding potential in proteins. J. Mol. Biol.
238, 777– 793.
20. Bruns, C. M. & Karplus, P. A. (1995). Refined crystal
structure of spinach ferredoxin reductase at 1.7 Å
resolution: oxidized, reduced and 20 -phospho-50 AMP bound states. J. Mol. Biol. 247, 125– 145.
21. Wierenga, R. K., De Maeyer, M. C. H. & Hol, W. G. J.
(1985). Interaction of pyrophosphate moieties with
a-helices in dinucleotide binding proteins.
Biochemistry, 24, 1346–1357.
22. Wierenga, R. K., Terpstra, P. & Hol, W. G. (1986). Prediction of the occurrence of the ADP-binding beta
alpha beta-fold in proteins, using an amino acid
sequence fingerprint. J. Mol. Biol. 187, 101– 107.
23. Rossmann, M. G., Liljas, A., Branden, C. I. &
Banaszak, L. J. (1975). Evolutionary and structural
relationships among dehydrogenases. In The
Enzymes (Boyer, P. D., ed.), Academic Press, New
York.
24. Ziegler, G. A., Vonrhein, C., Hanukoglu, I. & Schulz,
G. E. (1999). The structure of adrenodoxin reductase
532
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
Water Bridging in Ribose Recognition
of mitochondrial P450 systems: electron transfer for
steroid biosynthesis. J. Mol. Biol. 289, 981– 990.
Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia,
C. (1995). SCOP: a structural classification of proteins
database for the investigation of sequences and
structures. J. Mol. Biol. 247, 536– 540.
Galperin, M. Y. & Koonin, E. V. (1997). A diverse
superfamily of enzymes with ATP-dependent carboxylate-amine/thiol ligase activity. Protein Sci. 6,
2639– 2643.
Janin, J. (1999). Wet and dry interfaces: the role of
solvent in protein – protein and protein– DNA
recognition. Struct. Fold. Des. 7, R277– R279.
Meyer, E. (1992). Internal water molecules and
H-bonding in biological macromolecules: a review
of structural features with functional implications.
Protein Sci. 1, 1543– 1562.
Williams, M. A., Goodfellow, J. M. & Thornton, J. M.
(1994). Buried waters and internal cavities in monomeric proteins. Protein Sci. 3, 1224 –1235.
Shakked, Z., Guzikevich-Guerstein, G., Frolow, F.,
Rabinovich, D., Joachimiak, A. & Sigler, P. B. (1994).
Determinants of repressor/operator recognition
from the structure of the trp operator binding site.
Nature, 368, 469– 473.
Schneider, B. & Berman, H. M. (1995). Hydration of
the DNA bases is local. Biophys. J. 69, 2661– 2669.
Reddy, C. K., Das, A. & Jayaram, B. (2001). Do water
molecules mediate protein – DNA recognition? J. Mol.
Biol. 314, 619– 632.
Klebe, G. (1994). The use of composite crystal-field
environments in molecular recognition and the de
novo design of protein ligands. J. Mol. Biol. 237,
212– 235.
Laskowski, R. A., Thornton, J. M., Humblet, C. &
Singh, J. (1996). X-site: use of empirically derived
atomic packing preferences to identify favourable
interaction regions in the binding sites of proteins.
J. Mol. Biol. 259, 175– 201.
Verdonk, M. L., Cole, J. C. & Taylor, R. (1999). SuperStar: a knowledge-based approach for identifying
36.
37.
38.
39.
40.
41.
42.
43.
44.
interaction sites in proteins. J. Mol. Biol. 289,
1093– 1108.
Rantanen, V.-V., Denessiouk, K. A., Gyllenberg, M.,
Koski, T. & Johnson, M. S. (2001). A fragment library
based on gaussian mixtures predicting favorable
molecular interactions. J. Mol. Biol. 313, 197– 214.
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G.,
Bhat, T. N., Weissig, H. et al. (2000). The Protein
Data Bank. Nucl. Acids Res. 28, 235– 242.
Bernstein, F. C., Koetzle, T. F., Williams, G. J., Meyer,
E. E., Jr, Brice, M. D., Rodgers, J. R. et al. (1977). The
Protein Data Bank: a computer-based archival file
for macromolecular structures. J. Mol. Biol. 112,
535– 542.
Needleman, S. B. & Wunsch, C. D. (1970). A general
method applicable to the search for similarities in
the amino acid sequence of two proteins. J. Mol.
Biol. 48, 443– 453.
Orengo, C. A., Michie, A. D., Jones, S., Jones, D. T.,
Swindells, M. B. & Thornton, J. M. (1997). CATH—a
hierarchic classification of protein domain structures.
Structure, 5, 1093– 1108.
Senda, T., Yamada, T., Sakurai, N., Kubota, M.,
Nishizaki, T., Masai, E. et al. (2000). Crystal structure
of NADH-dependent ferredoxin reductase component in biphenyl dioxygenase. J. Mol. Biol. 304,
397– 410.
Dobritzsch, D., Schneider, G., Schnackerz, K. D. &
Lindqvist, Y. (2001). Crystal structure of dihydropyrimidine dehydrogenase, a major determinant of
the pharmacokinetics of the anti-cancer drug
5-fluorouracil. EMBO J. 20, 650– 660.
Sobolev, V., Sorokine, A., Prilusky, J., Abola, E. E. &
Edelman, M. (1999). Automated analysis of interatomic contacts in proteins. Bioinformatics, 15,
327– 332.
Thanki, N., Thornton, J. M. & Goodfellow, J. M.
(1988). Distributions of water around amino acid
residues in proteins. J. Mol. Biol. 202, 637– 657.
Edited by J. Thornton
(Received 21 June 2002; received in revised form 1 September 2002; accepted 5 September 2002)