* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Conserved Positions for Ribose Recognition: Importance of Water
G protein–coupled receptor wikipedia , lookup
Signal transduction wikipedia , lookup
Ligand binding assay wikipedia , lookup
Biochemistry wikipedia , lookup
Western blot wikipedia , lookup
Evolution of metal ions in biological systems wikipedia , lookup
Protein purification wikipedia , lookup
Interactome wikipedia , lookup
Proteolysis wikipedia , lookup
Structural alignment wikipedia , lookup
Homology modeling wikipedia , lookup
Two-hybrid screening wikipedia , lookup
doi:10.1016/S0022-2836(02)00975-0 available online at http://www.idealibrary.com on w B J. Mol. Biol. (2002) 323, 523–532 Conserved Positions for Ribose Recognition: Importance of Water Bridging Interactions Among ATP, ADP and FAD-protein Complexes Mariana Babor*, Vladimir Sobolev and Marvin Edelman Departament of Plant Sciences Weizmann Institute of Science Rehovot 76100, Israel Analysis of the spatial arrangement of protein and water atoms that form polar interactions with ribose has been performed for a structurally nonredundant dataset of ATP, ADP and FAD-protein complexes. The 26 ligand –protein structures were separated into two groups corresponding to the most populated furanose ring conformations (N and S-domains). Four conserved positions were found for S-domain protein– ligand complexes and five for N-domain complexes. Multiple protein folds and secondary structural elements were represented at a single conserved position. The following novel points were revealed: (i) Two complementary positions sometimes combine to describe a putative atomic spatial location for a specific conserved binding spot. (ii) More than one third of the interactions scored were water-mediated. Thus, conserved spatial positions rich in water atoms are a significant feature of ribose –protein complexes. q 2002 Elsevier Science Ltd. All rights reserved *Corresponding author Keywords: protein– ligand interactions; molecular recognition; hydrogen bonding; consensus structure; nucleotides Introduction Ribose is an essential component of many nucleotides that constitute a broad family of ligands. Mononucleotides, such as ATP and ADP, are involved in energy exchange processes in biological systems, whereas FAD, among other dinucleotides, acts as an ancillary electron carrier for a wide variety of enzymes involved in metabolism. Knowledge of the principles that govern ribose recognition by proteins is sketchy because tertiary and even secondary structural elements involved in nucleotide recognition are very diverse among proteins having different folds.1 Furthermore, although some sequence motifs have been identified within ligand binding folds, in general they are not extendable to unrelated structures.2,3 Therefore, identification of common recognition patterns among different folds may require an analysis of the spatial arrangement of atoms that interact with a given moiety. Indeed, this strategy enabled the identification of an adenine binding motif shared by ATP-grasp and protein kinase folds.4,5 Subsequent Abbreviations used: PDB, Protein Data Bank. E-mail address of the corresponding author: mariana.babor@weizmann.ac.il studies showed that this motif is also present in many mononucleotide and dinucleotide binding folds.6,7 Similarly, analysis of the phosphate moiety present in mononucleotides allowed the identification of structural phosphate binding motifs among several superfamilies, even in cases where there was no sequence signature for binding phosphate.8 In both adenine and phosphate binding motifs, primarily backbone atoms participate in the nucleotide binding. This may explain why these motifs are independent of protein sequence. Ribose, the sugar present in most nucleotides, contains a semi-flexible furanose ring, whose conformation can be described in terms of the maximum torsion angle and the pseudorotation phase angle (P ).9 The P parameter ranges from 08 to 3608 and describes the puckering of the ring by considering its five endocyclic torsion angles. The five-membered ring is asymmetrically substituted in nucleotides. As a result, potential energy thresholds are created that limit pseudorotation and lead to two preferred ranges of conformations, centered at C30 -endo (P ¼ 2 18 to 348) and C20 -endo (P ¼ 1378 to 1948).9 – 12 ATP binding proteins engage the adenine moiety by hydrophobic atomic contacts above and beneath the plane of the purine ring, and by hydrophilic contacts, including hydrogen bonds, in the plane 0022-2836/02/$ - see front matter q 2002 Elsevier Science Ltd. All rights reserved 524 Water Bridging in Ribose Recognition Figure 1. Ribose P phase angles within ATP, ADP, and FAD-protein complexes. ATP, ADP, and FAD-protein structures of resolution 2.2 Å or better, with R-factors lower than 23% and sharing 30% sequence identity or less, were grouped according to their ribose ring pseudorotation (P ) phase angle. In the vast majority of cases, the P values cluster in two specific domains (S and N) of the pseudorotational cycle. S-domain structures: 1djn, 1efv, 1qjd, 1h82, 3uag, 1el5, 1bg0, 1hpm, 1pbe, 3grs, 1qla, 1f8r, 1b4v, 1d7y, 1h7w, 1cjc, 1cqx, 1fl2, 1d4x, 1npx, 1fwk, 1ep3, 1bg2, 1fnn, 1hi5, 1ayl, 1fp6, 1hck, 1dak, 1nsf, 1qf9, 1d4a, 1fo4, 2mbr, 1f0x, and 1gpe; and Ndomain structures: 1gsa, 1rkd, 1a49, 1a8p, 1csn, 1vom, 1b0u, 1nsy, 1qnf, 1cs0, 1b8a, 1byq, 1atp, 1mjh, 1kdn, 1iov, 1fnd and 2hgs. P values of structures that do not belong to either of these domains are shown in light gray (1e19, 1php, 1f9a and 1ndh). of the rings.2,6,13 Structural and functional studies show that the ribose moiety can also be essential for ligand recognition. For instance, the affinity of NADP-dependent dehydrogenase is switched from NADP to NAD by introduction of an acidic residue that forms hydrogen bonds with the ribose hydroxyls.14,15 Likewise, changing NAD coenzyme activity to NADP requires the replacement of this acidic amino acid.16 Detailed analysis of the structural arrangement of atoms that interact with the ribose moiety, however, has not been performed. Here, we address this issue, analyzing ATP, ADP and FAD-protein complexes. Specifically, protein atoms and water molecules that form polar interactions with ribose hydroxyls were identified. A non-redundant dataset of ribose-containing structures was created. The data files were superimposed and atoms from different structures that interact with equivalent ribose hydroxyl groups were clustered. Conserved spatial positions for the binding of either ribose 20 or 30 -hydroxyls were identified which are shared by a large number of diverse folds. These conserved positions are independent of amino acid sequence as well as of secondary or tertiary structural elements. Moreover, some of them are occupied primarily by water oxygen atoms, indicating that water plays a key role in ribose-protein interactions. Results Grouping of structures according to ribose ring conformation The ribose furanose ring is flexible. We therefore sought populations with homogeneous ribose ring conformations for analysis of binding sites. A database of 145 wild-type PDB structures of FAD, ATP and ADP-protein complexes with a resolution of 2.2 Å or better and an R-factor lower than 23% was extracted from the PDB. Complexes sharing more than 30% sequence identity were grouped and the member with highest resolution retained, yielding 58 structures. Sorting according to their ribose pseudorotation phase angle (P ), resulted in 54 structures clustering within two groups (Figure 1) corresponding to the known S and N-domains.9,10,12 The amplitude of pseudorotation (tmax) ranged from about 35 to 508 with a few Table 1. Structurally non-redundant data set for the ribose S-domain PDB representative CATH fold Code Res. (Å) Protein Ligand Code No. of members Water mol.a 1hpm 1bg2 1bg0 1dak 1fnn 3uag 1hi5 1ayl 1hck 1cjc 3grs 1f0x 1fo4 2mbr 1cqx 1.70 1.80 1.86 1.60 2.00 1.77 1.80 1.80 1.90 1.70 1.54 1.90 2.10 1.80 1.75 Heat shock protein 70kd Kinesin motor domain Arginine kinase Dethiobiotin synthetase Cell division control protein 6 D -Glutamate ligase Eosinophil-derived neurotoxin PEP carboxykinase Cyclin-dependent kinase 2 Adrenodoxin reductase Glutathione reductase D -Lactate dehydrogenase Xanthine dehydrogenase UDP-N-acetylmuramate dehydrogenase Flavohemoprotein ADP ADP ADP ADP ADP ADP ADP ATP ATP FAD FAD FAD FAD FAD FAD 7.1.44 3.40.850 3.30.590 3.40.50 8.1.59 3.90.190 3.10.130 2.170.8 1.10.510 6.1.67 3.50.50 7.1.247 8.1.15 3.30.43, 3.90.78b 2.40.10 2 1 1 7 1 1 1 1 1 2 11 1 1 1 1 14 13 12 7 11 11 10 13 4 20 19 7 16 13 15 a b Number of water molecules associated with the ligand The ribose hydroxyls interact with several CATH domains 525 Water Bridging in Ribose Recognition Table 2. Structurally non-redundant data set for the ribose N-domain PDB representative CATH fold Code Res. (Å) Protein Ligand Code No. of members Water mol.a 1cs0 1byq 1rkd 1kdn 1vom 1csn 1mjh 1b8a 1a49 1qnf 1fnd 2.00 1.50 1.84 2.00 1.90 2.00 1.70 1.90 2.10 1.80 1.70 Carbamoyl phosphate synthetase Heat shock protein 90kd Ribokinase Nucleoside diphosphate kinase Myosin Casein kinase-1 Protein Mj0577 Aspartyl-tRNA synthetase Pyruvate kinase Photolyase Ferredoxin-NADP(þ ) reductase ADP ADP ADP ADP ADP ATP ATP ATP ATP FAD FAD 3.30.470 3.30.565 3.90.77 3.30.70 3.30.538 1.10.510 3.40.50 3.40.690 3.20.20, 2.40.10b 1.10.399 5.1.860 4 1 1 1 1 2 3 1 1 1 2 7 19 7 7 10 6 7 13 10 8 15 a b Number of water molecules associated with the ligand The ribose hydroxyls interact with several CATH domains exceptions (22, 25, 26, 32 and 558 in PDB entries 1fwk, 1e15, 1b0u, 1pbe and 1fp6, respectively). The S-domain was populated with 36 structures and the N-domain with 18. Four structures with ribose ring conformations different from S or N were not dealt with further. Structures with the same CATH fold classification17 were grouped and the member with the highest resolution was chosen as fold representative. Three structures (or their homologs) not classified by CATH were excluded from the list. The representative PDB files of the final CATH folds are listed in Table 1 for S-domain structures and in Table 2 for N-domain structures. This structurally nonredundant data set was used for identification of conserved atomic positions forming hydrogen bonds with either ribose 20 or 30 -hydroxyls. Conserved positions of atoms that form hydrogen bonds with ribose All protein atoms in the S and N data sets that hydrogen bond with either ribose 20 or 30 hydroxyls were determined. Water can mediate hydrogen bonds between ligand and protein. Therefore, water molecules that form hydrogen bonds with either 20 or 30 -hydroxyl and with at least one protein atom were included. The number of water molecules associated with the ligand in each structural complex is given in Tables 1 and 2. Approximately 60% of these water molecules are hydrogen bonded to both ligand and protein, while about a quarter of the latter involve the ribose moiety. Ribose binding sites were superimposed spatially and atoms hydrogen-bonded to ribose 20 or 30 -hydroxyl, and within a distance of 1.5 Å, were clustered together. Four clusters were identified for S-domain structures (Table 3). Three are composed of atoms that form hydrogen bonds with ribose 20 -hydroxyl and one with ribose 30 hydroxyl (Figure 2(a)). In all cases, the geometric centers of the clusters agree with accepted hydrogen bond angle definitions.18,19 In cluster S1, angle C – O –Cc (where Cc is a cluster center) is equal to 1208. Corresponding angles for S2, S3, and S4 are 1158, 1198, and 1418, respectively. All the ATP, ADP and FAD-protein complexes are represented by a protein atom at least once. Ten different folds are represented in cluster S1, in nine cases by protein Table 3. S-domain clusters forming polar interactions with ribose hydroxyls 20 -Hydroxyl Ligand ADP ATP FAD PDB code Cluster S1 1hpm 1bg2 1bg0 1dak 1fnn 3uag 1hi5 1ayl 1hck 1cjc 3grs 1f0x 1fo4 2mbr 1cqx Glu268_O12 His93_Nd1 Thr311_O Cluster S2 Water Gln40A_N Met74A_O Leu257A_N Ile45_O Pro232A_O Cluster S3 Cluster S4 Lys271_Nz Water Water Glu211A_O12 Water Lys39A_Nz 30 -Hydroxyl Glu297_O12 Gln131_O Water Arg204A_Nv2 Water Water Asp317A_Od2 Water Water Glu38A_O12 Glu50_O12 Asp86_Od2 Water Water Water Water Water 526 Water Bridging in Ribose Recognition Figure 2. Conserved positions of atoms that interact with ribose 20 or 30 -hydroxyl groups. (a) S-domain: ribose binding sites (see Materials and Methods) of the S-domain structures shown in Table 3 were superimposed spatially, and atoms hydrogen-bonded to ribose 20 or 30 hydroxyl were clustered together. Four conserved positions were identified among ATP, ADP and FAD-protein complexes. Three form hydrogen bonds with ribose 20 -hydroxyl and correspond to clusters S1 (r ¼ 1.5 Å; dark blue), S2 (r ¼ 0.8 Å; light blue) and S3 (r ¼ 1.5 Å; green). The fourth forms hydrogen bonds with ribose 30 hydroxyl and corresponds to cluster S4 (r ¼ 1.3 Å; red). (b) N-domain: ribose binding sites of the N-domain structures shown in Table 4 were superimposed spatially, and atoms hydrogenbonded to ribose 20 or 30 -hydroxyl were clustered together. Five conserved positions were observed among ATP and ADP-protein complexes. Three form hydrogen bonds with ribose 20 -hydroxyl and correspond to clusters N1 (r ¼ 1.1 Å; dark blue), N2 (r ¼ 1.2 Å; light blue) and N3 (r ¼ 1.1 Å; red). Two form hydrogen bonds with ribose 30 -hydroxyl and correspond to clusters N4 (r ¼ 0.6 Å; green) and N5 (r ¼ 1.2 Å; yellow). Radius r is defined as the distance from the geometric center and the furthest atom. atoms that belong to different residues. These residues are located in diverse secondary structural elements. Moreover, in some cases these atoms belong to the backbone and in others to the sidechains. Nor is there a preference for acceptor or donor atoms within the cluster. Cluster S2 is mainly represented by ATP and ADP-protein complexes. Although this cluster has few members, when considered together with cluster S1, 14 out of 15 structures are represented (Table 3). This suggests that clusters S1 and S2 are complementary (Figure 3). Cluster S3 is compact, all atoms except one being closer than 0.7 Å from the geometrical center of the cluster. All protein atoms in cluster S3 belong to side-chains of acidic or basic residues, located at the end of b-strands or a-helices, respectively. Cluster S4 is rich in water oxygen atoms that mediate interactions between ribose 30 hydroxyl and the protein. The fact that these water atoms are detected in the crystal structures indicates that they are forming stable interactions and, thus, can be considered as part of the ligand– protein binding site. Indeed, 44% of all the hydrogen bond interactions listed between the proteins and the ribose hydroxyls of the S-domain structures are water-mediated. Five clusters were identified for N-domain structures (Table 4). Three correspond to atoms that form hydrogen bonds with ribose 20 -hydroxyl and two with ribose 30 -hydroxyl (Figure 2(b)). Here too, the geometric centers of the clusters agree with accepted hydrogen bond angle definitions.18,19 The C – O –Cc angle for N1, N2, N3, N4, and N5 is 1248,1238, 1158, 1088, and 1008, respectively. All the ATP and ADP-protein complexes are represented by a protein atom at least once, except 1vom and 1byq, which are represented by water atoms alone in at least two clusters. FAD-protein folds, to start with, poorly represented within the N-domain group (Table 2), did not contribute any structures to the N-domain clusters (Table 4). In this regard, we note that the adenosine moiety in the ferrodoxin reductases is considerably more mobile than the rest of the ligand, producing very few hydrogen bonds or even none at all with protein atoms.20 Thus, more structures are required before an analysis of N-domain FAD-protein complexes can be made. Atoms forming polar interactions from the nine N-domain ATP and ADP-protein complexes were amassed in several clusters or cluster pairs. Cluster N1 is represented mainly by protein atoms that belong either to the backbone or to side-chains and no preference for donors or acceptors was observed. Considering clusters N1 and N2 (both of which form polar interactions with ribose 20 hydroxyl) as a complementary pair, all the ATP and ADP-protein complexes, except one, are 527 Water Bridging in Ribose Recognition additional members besides the chosen representative. We analyzed whether the clusters identified employing these representatives (see Tables 3 and 4) effectively represent the members of a fold as a whole. S-domain Figure 3. Complementary cluster pairs. S-domain cluster-pair S1 and S2 (dark gray), and N-domain clusterpair N1 and N2 (dark gray), exhibit complementary positions of atoms that form hydrogen bonds with ribose 20 hydroxyl, while cluster-pair N4 and N5 (black) is composed of atoms that form hydrogen bonds with ribose 30 -hydroxyl group. Clusters S4 and N3, which are rich in water-mediated interactions, are pair-less. A white area indicates that the entry is not represented in the cluster. represented (Figure 3). Cluster N3 is rich in water atoms, indicating again the important role that water plays in stabilizing ligand –protein complexes. Clusters N4 and N5 (which form polar interactions with ribose 30 -hydroxyl) can also be considered as a complementary pair, in which seven out of the nine ADP and ATP-protein complexes are represented (Figure 3). Conservation of cluster positions within a fold Several CATH folds in the high-resolution, nonredundant data sets of Tables 1 and 2 have FAD-NAD(P ) binding domain. CATH fold 3.50.50 contains 11 high-resolution members, as indicated in Table 1. The results in Table 5 show that all except one of the members of this fold have protein atoms which hydrogen bond with the ribose 20 hydroxyl group. In cluster S3, all ten atoms are located within 0.9 Å of the cluster center and all belong to a residue positioned at the end of a b-strand of the bab dinucleotide binding motif (DBM). In all cases except one, the contacting atom is part of an acidic amino acid, as in the Rossmann-fold fingerprint region.21 – 23 The only structure (1d7y) not represented in S3 has a water oxygen in cluster S2 that interacts with the mentioned conserved acidic residue. Six of the 11 FAD-NAD(P) binding domain structures also interact with ribose 30 -hydroxyl (Table 5, cluster S4). Most interactions involve water atoms and all of these interact with a backbone oxygen located three to four residues downstream of the one in cluster S3. Rossmann-like fold. CATH fold 6.1.67 contains two high resolution members, as indicated in Table 1. The same pattern observed for clusters S3 and S4 for the FAD-NAD(P) binding domain holds for this fold, which is a Rossmann-related fold with a similar nucleotide binding mode.21,23 – 25 Actin fold. CATH fold 7.1.44 contains two highresolution members, as indicated in Table 1. Their interactions with the S-domain clusters are highly similar (Table 5). In both, protein atoms in clusters S1 and S3 belong to residues located in a-helices; in the former, consecutive in sequence, in the latter, consecutive in helix turns. The two structures are represented in cluster S4 by water atoms. However, in this fold, the water molecules interact with sidechain oxygen atoms of residues located, in sequence, far from those represented in cluster S3. Classical mononucleotide binding fold. CATH fold 3.40.50 contains seven high-resolution members, Table 4. N-domain clusters forming polar interactions with ribose hydroxyls 20 -Hydroxyl Ligand ADP ATP PDB code 1cs0 1byq 1rkd 1kdn 1vom 1csn 1mjh 1b8a 1a49 Cluster N1 30 -Hydroxyl Cluster N2 Cluster N3 Cluster N4 Cluster N5 Glu215A_O12 Gly241A_N Water Glu215A_O11 Water GlnA285_N12 Water His279_Nd1 Gly228_O Lys16A_Nz Asn119A_Nd2 Water Gly127A_N Ile362A_O Lys206A_Nz Water Water Pro11A_O Water Asp135_O Glu361A_O12 528 Water Bridging in Ribose Recognition Table 5. S-domain clusters that interact with ribose hydroxyl groups for each fold 20 -Hydroxyl Fold FadNad(P) CATH code PDB code 3.50.50 3grs 1pbe 1npx 1f8r 1qjd 1b4v 1qla 1el5 1gpe 1h82 1d7y 1cjc 1h7w 1hpm 1d4x 1dak 1fp6 1qf9 1nsf 1efv 1djn 1d4a Rossmann-like 6.1.67 Actin 7.1.44 Mononucleotide 3.40.50 Cluster S1 Cluster S2 Arg32_Nv2 Lys157A_Nz Water Gln338A_N12 Gln83A_N12 Lys39A_Nz Water Glu268_O12 Glu214A_O12 Tyr399A_OH Water Water 30 -Hydroxyl Cluster S3 Cluster S4 Glu50_O12 Glu32_O12 Glu32_O12 Glu63A_O12 Glu156A_O12 Glu40A_O11 Ser35A_Og2 Asp33A_Od2 Glu55A_O12 Glu35A_O12 Water Water Water Glu38A_O12 Glu218A_O12 Lys271_Nz Lys213A_Nz Gly103A_O Water Glu41A_O12 Water Water Water Water Gln218A_O11 Water Glu211A_O12 Water Asn505_N Asn300A_Od1 Asp419A_Od1 as indicated in Table 1. The results in Table 5 show that six out of the seven are represented in at least one of the clusters, with cluster S3 again being the most populated. The remaining one, quinone reductase (1d4a), the only member of this fold with an FAD ligand, does not form hydrogen bonds at all with the ribose. The poor hydrogenbonding pattern with FAD is also observed for ferrodoxin reductase (1fnd, see below). N-domain ATP-grasp fold. CATH fold 3.30.470 contains four high-resolution members, as indicated in Table 2. All structures of this fold are represented in at least one ribose 20 -hydroxyl and one 30 -hydroxyl cluster (Table 6). Three structures are represented in clusters N2 and N4 by atoms belonging to, or interacting with, glutamic acid. Indeed, alignment of proteins sharing an ATP-grasp fold showed that the glutamic acid of cluster N4 is highly conserved.26 The remaining structure, glutathione synthase (1gsa), has the same fold and binding site, but its acidic residue, likewise highly conserved,6,26 interacts with ribose 30 -hydroxyl and is represented in cluster N5. Protein kinase fold. CATH fold 1.10.510 contains two high-resolution members, as indicated in Table 2. Both structures of this fold are represented in clusters N2 and N4 (Table 6). As previously described for the protein kinase fold,6 ribose 20 hydroxyl sometimes contacts a glutamic acid sidechain and sometimes interacts with a serine backbone through a water molecule. Here, we observe that the glutamic acid residue and water molecule are represented in the same cluster (Table 6, cluster N2). Classical mononucleotide binding fold. CATH fold 3.40.50 contains three high-resolution members, as indicated in Table 2. All three structures in this fold are represented in cluster N3 (Table 6). Ferrodoxin reductase fold. CATH fold 5.1.860 contains two high-resolution members, as indicated in Table 2. Ribose hydroxyls do not hydrogen bond with the proteins in either case. The substantial mobility of the adenosine moiety in these cases results in a weak interaction between the protein Table 6. N-domain clusters that interact with ribose hydroxyl groups for each fold CATH Fold Atp-grasp Code Code 3.30.470 1cs0 1iov 2hgs 1gsa 1csn 1atp 1mjh 1nsy 1b0u Protein Kinase 1.10.510 Mononucleotide 3.40.50 20 -Hydroxyl PDB Cluster N1 30 -Hydroxyl Cluster N2 Cluster N3 Cluster N4 Cluster N5 Glu215A_O12 Glu187_O11 Water Gly241A_N Glu215A_O11 Glu187_O12 Glu425A_O11 Gln285A_N12 Lys452A_Nz Gly 234_N Asp208_Od2 Water Glu127E_O12 Gly127A_N Water Pro11A_O Asp177A_Od2 Water Asp135_O Glu170E_O Water Water Bridging in Ribose Recognition 529 Figure 4. Structures with different folds share common ribose binding positions. Overall folds are shown on the left and common spatial positions are enlarged on the right. Upper panel: actin fold protein HSP-70 (1hpm) is represented in clusters S1 and S3 by atoms from Glu268 and Lys271, respectively (consecutive in terms of the helix wheel); and in cluster S4 by a water atom that mediates interaction with a side-chain oxygen of Asp234. Middle panel: Rossmann fold protein adrenodoxin reductase (1cjc) is represented in clusters S1 and S3 by atoms from Lys39 and Glu38, respectively (located at the end of a b-sheet and at the beginning of the loop that follows), and in cluster S4 by a water atom that mediates interaction with the backbone oxygen of Val42. Lower panel: uridine diphospho-n-acetylenolpyruvylglucosamine reductase fold protein UDP-acetylmuramate dehydrogenase (2mbr) is represented in clusters S1 and S3 by atoms from Ile45 and a water atom that mediates interaction with Gln168, respectively; and in cluster S4 by a water atom that interacts with a side-chain oxygen of Glu334. and the molecule. This may explain why the ferrodoxin reductase fold is the only one known that can use either FAD or FMN (that lacks the adenosine) without loss of activity.20 Therefore, it seems that the ribose moiety is not involved in ligand recognition within this fold. Discussion Analysis of the spatial arrangement of atoms that hydrogen bond ribose 20 or 30 -hydroxyls in ATP, ADP and FAD-protein complexes within different folds, enabled identification of four conserved, clustered positions for S-domain protein –ligand complexes and five for N-domain ones. It is clear from our results that this cluster pattern could not have been identified at the amino acid sequence or secondary structural level, since the atoms within a conserved position can belong to different residues, located in different structural elements, or even in water molecules mediating ligand – protein hydrogen bonds. Analysis of the clusters within a fold showed that, in general, all structures were represented, and for most folds, tended to share a similar pattern. Therefore, the conserved positions identified are independent of the fold representatives used. 530 Water molecules are found in abundance at protein interfaces and play major roles in polar interactions that stabilize complexes.27 – 29 Furthermore, there are conserved hydration sites mediating interactions within DNA –protein complexes.30 – 32 Thus, it is important to consider water-mediated ligand– protein interactions when studying the rules that govern ligand recognition. In fact, we have observed that water molecules play a crucial role in ribose hydroxyl-protein interactions. Overall, one third of the 98 interactions scored in this study were first-shell mediating water molecules, however, the latter were not evenly distributed. Cluster S4, showed a relatively high content of water oxygen atoms (76 – 78%, Tables 3 and 5). On the other hand, cluster S1 was water-interaction poor (10 – 13%, Tables 3 and 5). In general, water molecules belonging to clusters of the S-domain in the non-redundant dataset formed hydrogen bonds equally with backbone and side-chain protein atoms (in total, 26 bonds; as expected,18 mainly with oxygen), while those of the N-domain formed hydrogen bonds (in total, nine) almost exclusively with side-chain atoms. We do not have an explanation for this difference between the S and N-domain groups. We observed for both S and N-domains that two conserved cluster positions interacting with the same ribose oxygen may represent almost all the members of the structurally non-redundant data set. The complementary nature of cluster-pairs S1-S2, N1-N2 and N4-N5, depicted in Figure 3, may partly be due to their close proximity in 3D space. Distances between centers of complementary clusters range from 2.4 to 2.8 Å (the shortest distance between non-complementary clusters interacting with the same oxygen is 3.9 Å). This close proximity limits the appearance of atoms in both positions simultaneously because of steric effects. In the infrequent cases where both positions were occupied (i.e. 1cjc, 1sc0 and 1qjd in Tables 3 –5) the atoms in the first two structures are hydrogen-bonded, while in the third, they are 4.3 Å apart. We suggest that, in the cases of complementary pairs, the pattern recognition for a specific position can be viewed as the sum of the contribution of the two spatial positions. An aspect of ribose structural clusters is their independence of traditional structural classification. For example, three S-domain ribose-protein complexes from the non-redundant data set are displayed in Figure 4: heat shock protein 70 kDa (1hpm), adrenodoxin reductase (1cjc), and UDPN-acetylmuramate dehydrogenase (2mbr). Each complex contributes an atom to clusters S1, S3 and S4 (Table 3), however each complex derives from a different fold. Each is represented in cluster S4 by a water atom, in the first case the water atom interacts with a side-chain oxygen from a helix aspartate, in the second, with a backbone oxygen of a loop valine and in the third, with a side-chain oxygen of a loop glutamate. The protein atoms in clusters S1 and S3 are from consecutive residues; Water Bridging in Ribose Recognition however, in the first case, consecutive in a helix wheel sense whereas, in the second, directly consecutive between the end of a b-sheet and the beginning of a loop. In the third case, cluster S3 is water-mediated. Thus, ribose, which is bound in two major conformations (S and N forms) in ATP, ADP and FAD, provides an example of a ligand moiety whose polar binding spots cut across traditional elements of structural classification. A network of binding spots that comprises the binding pocket for a ligand moiety in any protein might be an elemental “constant”.33 – 36 To date the accuracy of the algorithms used for the identification of ligand binding pockets is limited by the difficulty to discriminate between several putative binding sites. Incorporation of conserved spatial positions for binding specific moieties may improve accuracy by imposing new spatial constrains. Materials and Methods Generation of data sets ATP, ADP, and FAD-protein complexes with a resolution of 2.2 Å or better, and an R-factor lower than 23%, were extracted from the Protein Data Bank (PDB).37,38 Structures with sequence identity greater than 30% were grouped,39 and the member with the highest resolution was retained. Structures were sorted according to their ribose ring conformation using the pseudorotation phase angle (P ). The five endocyclic torsion angles of the furanose ring were used to define the P angle according to the equation of Altona & Sundaralingam:9 tan P ¼ ðt4 þ t1 Þ 2 ðt3 þ t0 Þ 2t2 ðsin 368 þ sin 728Þ where t0 – 4 are ribose ring dihedral angles about the O4 – C1 , C1 – C2 , C2 – C3 , C3 – C4 , C4 – O4 bonds, respectively; P ranges from 08 to 3608; and if t2 , 0 then P ¼ P þ 1808. The amplitude of pseudorotation was calculated as tmax ¼ t2/cos P. Fold classification was according to the topology level of the v2.3 version of the CATH database.17,40 Structures with the same overall fold (i.e. similar number, arrangement and connectivity of secondary structures) were grouped and the member with the highest resolution retained as the fold representative of the set. For proteins having more than one domain, the comparisons were performed on the domains that hydrogen bond ribose hydroxyls only. Domain length was determined using the CATH-linked web tool, PDBsum. Structures that were not classified in CATH but shared more than 35% identity with another that was, were assigned to that corresponding fold. This stringent cutoff ensures a correct fold assignment.40 Note that among the five remaining unclassified structures, two were reported to share a high degree of structural similarity to other classified structures in the vicinity of the ligand binding site,41,42 thus they were assigned to the fold group of the similar structure. 0 0 0 0 0 0 0 0 0 0 Identification and clustering of atoms All protein and water atoms that form hydrogen bonds with ribose 20 or 30 -hydroxyl were identified by the ligand – protein contact (LPC) program,43 using a Water Bridging in Ribose Recognition maximum distance of 3.5 Å between donor and acceptor atoms. Water interactions were considered only if the water molecules also formed hydrogen bonds with the protein. An option to identify water molecules was introduced into the LPC software. Water molecules associated with ligands, as shown in Tables 1 and 2, were identified by LPC software using the distance constraints described by Thanki et al.44 Each ribose 20 -hydroxyl binding site, defined as the ribose, and protein and water atoms that hydrogenbond to the ribose 20 -hydroxyl, was placed in a new system of coordinates by superimposing its ribose ring to a reference one, selected as the furanose ring with the minimum average RMSD within the group. Ribose 30 hydroxyl binding sites were superimposed in the same manner. After ribose binding sites from all the structures were superimposed spatially, an algorithm that searches for those atoms that hydrogen-bond one of the ribose hydroxyls and are close to one another in 3D space was used to cluster the atoms. Briefly, the procedure13 consisted of taking one atom and counting the number of neighboring atoms within a radius of 1.5 Å. After repeating this for all atoms, the atom with the highest number of neighbors, plus its neighbors within the accepted radius, were considered the first cluster. The second, third, and following clusters were found by reiterating this procedure after eliminating the atoms already clustered.13 In choosing a suitable threshold for our study, several values were tested. We found 1.5 Å as optimal. This value is small enough to locate the central atom within the dense region of the cluster and large enough not to leave atoms present in the vicinity of the cluster without being included. Acknowledgements We thank E. Eyal for programming assistance and Drs Z. Shakked, M. Eisenstein and B. McConkey for valuable comments to the manuscript. V.S. is supported in part by the Ministry of Absorption, the Center for Absorption of Scientists. References 1. Schulz, G. E. (1992). Binding of nucleotides by proteins. Curr. Opin. Struct. Biol. 2, 61 – 67. 2. Moodie, S. L., Mitchell, J. B. & Thornton, J. M. (1996). Protein recognition of adenylate: an example of a fuzzy recognition template. J. Mol. Biol. 263, 486– 500. 3. Traut, T. W. (1994). The functions and consensus motifs of nine types of peptide segments that form different types of nucleotide-binding sites. Eur. J. Biochem. 222, 9 –19. 4. Kobayashi, N. & Go, N. (1997). A method to search for similar protein local structures at ligand binding sites and its application to adenine recognition. Eur. Biophys. J. 26, 135– 144. 5. Kobayashi, N. & Go, N. (1997). ATP binding proteins with different folds share a common ATP-binding structural motif. Nature Struct. Biol. 4, 6 –7. 6. Denessiouk, K. A. & Johnson, M. S. (2000). When fold is not important: a common structural framework for adenine and AMP binding in 12 unrelated protein families. Proteins: Struct. Funct. Genet. 38, 310–326. 531 7. Denessiouk, K. A., Rantanen, V. V. & Johnson, M. S. (2001). Adenine recognition: a motif present in ATP-, CoA-, NAD-, NADP-, and FAD-dependent proteins. Proteins: Struct. Funct. Genet. 44, 282– 291. 8. Kinoshita, K., Sadanami, K., Kidera, A. & Go, N. (1999). Structural motif of phosphate-binding site common to various protein superfamilies: allagainst-all structural comparison of protein – mononucleotide complexes. Protein Eng. 12, 11 – 14. 9. Altona, C. & Sundaralingam, M. (1972). Conformational analysis of the sugar ring in nucleosides and nucleotides. A new description using the concept of pseudorotation. J. Am. Chem. Soc. 94, 8205 –8212. 10. Saenger, W. (1984). Principles of Nucleic Acids Structure, Springer, New York. 11. Olson, W. & Sussman, J. L. (1982). How flexible is the furanose ring? A comparison of experimental and theoretical studies. J. Am. Chem. Soc. 104, 270– 278. 12. Moodie, S. L. & Thornton, J. M. (1993). A study into the effects of protein binding on nucleotide conformation. Nucl. Acids Res. 21, 1369– 1380. 13. Kuttner, Y. Y., Babor, M., Edelman, M. & Sobolev, V. (2001). Structural commonality in protein binding sites for ATP. In Currents in Computational Molecular Biology 2001 (El-Mabrouk, N., Lengauer, T. & Sankoff, D., eds), pp. 91 – 92, Les Publications CRM, Montreal. 14. Scrutton, N. S., Berry, A. & Perham, R. N. (1990). Redesign of the coenzyme specificity of a dehydrogenase by protein engineering. Nature, 343, 38 –43. 15. Hurley, J. H., Chen, R. & Dean, A. M. (1996). Determinants of cofactor specificity in isocitrate dehydrogenase: structure of an engineered NADP þ ! NAD þ specificity-reversal mutant. Biochemistry, 35, 5670–5678. 16. Galkin, A., Kulakova, L., Ohshima, T., Esaki, N. & Soda, K. (1997). Construction of a new leucine dehydrogenase with preferred specificity for NADP þ by site-directed mutagenesis of the strictly NAD þ -specific enzyme. Protein Eng. 10, 687– 690. 17. Pearl, F. M., Lee, D., Bray, J. E., Sillitoe, I., Todd, A. E., Harrison, A. P. et al. (2000). Assigning genomic sequences to CATH. Nucl. Acids Res. 28, 277– 282. 18. Baker, E. N. & Hubbard, R. E. (1984). Hydrogen bonding in globular proteins. Progr. Biophys. Mol. Biol. 44, 97– 179. 19. McDonald, I. K. & Thornton, J. M. (1994). Satisfying hydrogen bonding potential in proteins. J. Mol. Biol. 238, 777– 793. 20. Bruns, C. M. & Karplus, P. A. (1995). Refined crystal structure of spinach ferredoxin reductase at 1.7 Å resolution: oxidized, reduced and 20 -phospho-50 AMP bound states. J. Mol. Biol. 247, 125– 145. 21. Wierenga, R. K., De Maeyer, M. C. H. & Hol, W. G. J. (1985). Interaction of pyrophosphate moieties with a-helices in dinucleotide binding proteins. Biochemistry, 24, 1346–1357. 22. Wierenga, R. K., Terpstra, P. & Hol, W. G. (1986). Prediction of the occurrence of the ADP-binding beta alpha beta-fold in proteins, using an amino acid sequence fingerprint. J. Mol. Biol. 187, 101– 107. 23. Rossmann, M. G., Liljas, A., Branden, C. I. & Banaszak, L. J. (1975). Evolutionary and structural relationships among dehydrogenases. In The Enzymes (Boyer, P. D., ed.), Academic Press, New York. 24. Ziegler, G. A., Vonrhein, C., Hanukoglu, I. & Schulz, G. E. (1999). The structure of adrenodoxin reductase 532 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. Water Bridging in Ribose Recognition of mitochondrial P450 systems: electron transfer for steroid biosynthesis. J. Mol. Biol. 289, 981– 990. Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia, C. (1995). SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536– 540. Galperin, M. Y. & Koonin, E. V. (1997). A diverse superfamily of enzymes with ATP-dependent carboxylate-amine/thiol ligase activity. Protein Sci. 6, 2639– 2643. Janin, J. (1999). Wet and dry interfaces: the role of solvent in protein – protein and protein– DNA recognition. Struct. Fold. Des. 7, R277– R279. Meyer, E. (1992). Internal water molecules and H-bonding in biological macromolecules: a review of structural features with functional implications. Protein Sci. 1, 1543– 1562. Williams, M. A., Goodfellow, J. M. & Thornton, J. M. (1994). Buried waters and internal cavities in monomeric proteins. Protein Sci. 3, 1224 –1235. Shakked, Z., Guzikevich-Guerstein, G., Frolow, F., Rabinovich, D., Joachimiak, A. & Sigler, P. B. (1994). Determinants of repressor/operator recognition from the structure of the trp operator binding site. Nature, 368, 469– 473. Schneider, B. & Berman, H. M. (1995). Hydration of the DNA bases is local. Biophys. J. 69, 2661– 2669. Reddy, C. K., Das, A. & Jayaram, B. (2001). Do water molecules mediate protein – DNA recognition? J. Mol. Biol. 314, 619– 632. Klebe, G. (1994). The use of composite crystal-field environments in molecular recognition and the de novo design of protein ligands. J. Mol. Biol. 237, 212– 235. Laskowski, R. A., Thornton, J. M., Humblet, C. & Singh, J. (1996). X-site: use of empirically derived atomic packing preferences to identify favourable interaction regions in the binding sites of proteins. J. Mol. Biol. 259, 175– 201. Verdonk, M. L., Cole, J. C. & Taylor, R. (1999). SuperStar: a knowledge-based approach for identifying 36. 37. 38. 39. 40. 41. 42. 43. 44. interaction sites in proteins. J. Mol. Biol. 289, 1093– 1108. Rantanen, V.-V., Denessiouk, K. A., Gyllenberg, M., Koski, T. & Johnson, M. S. (2001). A fragment library based on gaussian mixtures predicting favorable molecular interactions. J. Mol. Biol. 313, 197– 214. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H. et al. (2000). The Protein Data Bank. Nucl. Acids Res. 28, 235– 242. Bernstein, F. C., Koetzle, T. F., Williams, G. J., Meyer, E. E., Jr, Brice, M. D., Rodgers, J. R. et al. (1977). The Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535– 542. Needleman, S. B. & Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443– 453. Orengo, C. A., Michie, A. D., Jones, S., Jones, D. T., Swindells, M. B. & Thornton, J. M. (1997). CATH—a hierarchic classification of protein domain structures. Structure, 5, 1093– 1108. Senda, T., Yamada, T., Sakurai, N., Kubota, M., Nishizaki, T., Masai, E. et al. (2000). Crystal structure of NADH-dependent ferredoxin reductase component in biphenyl dioxygenase. J. Mol. Biol. 304, 397– 410. Dobritzsch, D., Schneider, G., Schnackerz, K. D. & Lindqvist, Y. (2001). Crystal structure of dihydropyrimidine dehydrogenase, a major determinant of the pharmacokinetics of the anti-cancer drug 5-fluorouracil. EMBO J. 20, 650– 660. Sobolev, V., Sorokine, A., Prilusky, J., Abola, E. E. & Edelman, M. (1999). Automated analysis of interatomic contacts in proteins. Bioinformatics, 15, 327– 332. Thanki, N., Thornton, J. M. & Goodfellow, J. M. (1988). Distributions of water around amino acid residues in proteins. J. Mol. Biol. 202, 637– 657. Edited by J. Thornton (Received 21 June 2002; received in revised form 1 September 2002; accepted 5 September 2002)