Structural Chemistry of Human SET Domain Protein Methyltransferases
Matthieu Schapira*, 1, 2
Identifiers and Pagination:Year: 2011
Issue: Suppl 1
First Page: 85
Last Page: 94
Publisher Id: CCGTM-5-85
Article History:Received Date: 27/1/2011
Revision Received Date: 6/4/2011
Acceptance Date: 25/4/2011
Electronic publication date: 22/8/2011
Collection year: 2011
open-access license: This is an open access article licensed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted, non-commercial use, distribution and reproduction in any medium, provided the work is properly cited.
There are about fifty SET domain protein methyltransferases (PMTs) in the human genome, that transfer a methyl group from S-adenosyl-L-methionine (SAM) to substrate lysines on histone tails or other peptides. A number of structures in complex with cofactor, substrate, or inhibitors revealed the mechanisms of substrate recognition, methylation state specificity, and chemical inhibition. Based on these structures, we review the structural chemistry of SET domain PMTs, and propose general concepts towards the development of selective inhibitors.
Epigenetics mechanisms rely extensively on histone-mediated signaling, in which chemical modifications can make or break complex biological circuits [1, 2]. Among the different histone marks, methylation of specific lysine and arginine side-chains can regulate chromatin compaction, repress or activate transcription, and control cellular differentiation [3, 4]. The transfer of a methyl group from the cofactor S-adenosyl-L-methione (SAM) to substrate peptides can be catalyzed by two classes of enzymes [5, 6]. Nine arginine protein methyltransferases (PMTs) are known in human, whose function, structure, chemistry, and chemical inhibition have recently been reviewed [7-9] (Yost et al. this issue). Lysine methylation is catalyzed by SET domain PMTs, a family of about fifty proteins in human , and DOT1L, an enzyme that lacks the canonical SET domain, but shares the same fold as arginine PMTs . This review focuses on the SET domain lysine PMTs. The SET domain is a sequence of 130 amino-acids, originally named after the Drosophila genes Su(var), E(z) and Trithorax in which it was originally identified. It is defined by a specific fold organized around a pseudo-knot, and by the presence of two signature motifs, ELxF/YDY and NHS/CxxPN, x being any amino-acid [12-14].
While the SET domain is responsible for catalysis, the methyltransferase activity of PMTs also depends on the presence of adjacent domains that recruit the substrate, or other structural modules, sometimes remote, that act as binding platforms for interaction partners within large multi-subunit complexes . For instance, the PMT EZH2 is only active within the PRC2 complex when associated with EED and SUZ12; recruitment of EED is mediated by a region located 500 residues upstream of EZH2’s SET domain . Remote structural modules may not be necessary for PMT activity, but sometimes recognize the methylation substrate or reaction product. For instance, it was shown that an Ankyrin repeat distinct from the catalytic domain of GLP could recognize mono- or di-methylated lysine 9 of histone 3, the very reaction product of GLP’s SET domain .
As previously observed for histone deacetylases and histone acetyltransferases, it is becoming clear that histones are not the only subtrates of some PMTs. For instance, G9a and GLP can methylate the tumor suppressor p53 . These emerging signaling mechanisms, unrelated to the histone code, add to the already large body of evidence associating SET domain PMTs to multiple disease areas, and further drive the research community towards the development of chemical tools to better interrogate their function .
OVERALL ARCHITECTURE OF THE CATALYTIC DOMAIN
The catalytic domain is composed of a core SET domain that is structurally conserved and includes residues critical for catalysis, surrounded with a limited set of regions that vary in nature, sequence and shape (Fig. 2). These adjacent domains act like a shell around the SET fold, and can be divided into two categories. First, the I-SET and post-SET domains (respectively inserted within, and immediately C-terminal to the SET fold) form the binding groove for the substrate peptide, and, to a lesser extent, contribute to the cofactor binding pocket. A landmark feature of SET domain PMTs is that the substrate peptide and cofactor bind distinct sites, on different sides of the protein, and meet at the core of the structure where catalysis takes place. Available ternary complexes of SETD7 and GLP reveal how the side-chain of the substrate lysine inserts into a narrow channel at the junction of the SET, Post-SET, and I-SET domains [14, 18]. In this configuration, the lysine is shielded from the solvent, which is believed to be required for catalysis . In the SETD8 ternary structure, a wide pocket, rather than a channel, is occupied by the histone lysine and a flanking histidine [19, 20]. A catalytically inactive structure of MLL1 features a more open peptide binding groove, which leaves the substrate lysine exposed to solvent. Other protein binding partners, part of the MLL complex, are expected to stabilize the active conformation, probably closer to that captured by the GLP, SETD7 and SETD8 structures . Since both I- and Post-SET domains are involved in substrate recognition, it comes as no surprise that both are present in all SET domain PMTs. The I-SET domain has a fixed topology, and is structurally static, while the post-SET domain adopts variable folds and is structurally dynamic, which has implications for the mechanism of substrate recognition (vide infra). Available PRDM structures reveal an extremely short and unfolded post-SET, which may explain the absence of observed biochemical activity for these protein constructs.
Other domains surrounding the SET domain include Pre-SET (N-terminal to SET), N-SET (N-terminal to SET or Pre-SET), MYND (between SET and I-SET), and CTD (C-terminal to Post-SET) (Fig. 2). It is believed that some, if not all of these variable domains are acting as binding interfaces to other proteins or DNA. A general concept would therefore be that different combinations of domains with diverse sequence, structure, and electrostatics, would dress the core SET fold in very distinct ways, and allow selective recruitment of interaction partners, or facilitate specific positioning relative to the nucleosome, with functional implications.
Recent structures of SMYD proteins illustrate how the CTD domain can adopt an open, catalytically competent conformation, as observed in SMYD1, or an inactive conformation that partially occupies the substrate peptide binding site (Fig. 3) [22, 23]. It was proposed that binding of HSP90, which activates SMYD3, stabilizes the open conformation of the protein. Domains adjacent to SET may therefore not only act as protein interaction interfaces, but also as auto-inhibitory components.
Display of the molecular surface of PMTs according to their electrostatic potential reveals that the substrate-binding groove is consistently electronegative, as illustrated Fig. (4) for an H3K9, H3K4, and H4K20 PMT (GLP, SETD7 and SETD8 respectively). This is in contrast with histone tails, which are enriched in lysine and arginine residues, and highly electropositive. This observation suggests a general mechanism whereby long-range electrostatic attractions can bring the PMTs and their peptide substrates together in a loose complex, prior to sequence-specific recognition.
A close inspection of PMT structures co-crystallized with substrate peptides reveals that the substrate lysine is anchored in a deep channel, and is the major contributor to binding enthalpy. Surprisingly, in all available structures, an arginine side-chain located one to four residues upstream or downstream the substrate lysine is the next most important contributor to interaction, and makes extensive contacts with a well-defined cleft of the I-SET domain (Fig. 4) [14, 18, 19, 21, 24]. Interestingly, the shape, structural environment, and position of this cleft relative to the lysine binding channel varies from one enzyme to the other, suggesting that it could be exploited to design selective inhibitors. This concept was validated in the case of G9a and GLP. Indeed, co-crystallized selective inhibitors were shown to occupy the arginine binding site, as discussed below [25, 26]. Another observation with possible mechanistic consequences is the fact that histone residues projecting towards the groove are enriched in serine and threonine, two other sites of post-translational modification. It is tempting to speculate that this trend reflects a general structural mechanism where distinct combinations of histone marks would antagonize or possibly enhance substrate recognition by specific PMTs. This hypothesis is supported by some experimental observations, but is beyond the scope of this study (see for instance [27-29]).
As mentioned above, the I-SET domain varies in sequence, but is structurally conserved across PMTs. On the other hand, the Post-SET domain has variable topologies, sometimes organized around a coordinating Zn atom, as is observed for instance in the H3K9 PMTs G9a , or the H3K4 PMT MLL1 . SETD7 was crystallized in its apo state, in a binary complex with cofactor, and ternary complex with cofactor and substrate peptide [14, 30, 31]. The I-SET structure remains unchanged between the three states (with the exception of a tryptophan side-chain), while the conformation of the Post-SET domain varies considerably (Fig. 5). Interestingly, a sequential mechanism seems to take place: the apo-conformation is completely unfolded. Binding of the cofactor induces partial folding, where an helix contributing to the cofactor binding site adopts its final conformation. Finally, proper positioning of the substrate peptide relative to the static I-SET induces a final conformational adjustment of the Post-SET domain. Based on similar observations, a model was proposed for the processivity of substrate methylation where an opening and closing motion of the Post-SET domain would allow release into the solvent of the cofactor and of a proton from the substrate lysine after a first methylation event. Cofactor exchange and deprotonation of the substrate are both necessary before further methylation can take place .
We propose a general structural mechanism integrating electrostatic phenomena, Post-SET dynamics, and histone mark cross-talk (Fig. 5). Long range electrostatic attractions bring together the electropositive histone tail and a loose electronegative binding groove, composed of a pre-formed I-SET and open Post-SET. SAM binding stabilizes a partially folded Post-SET conformation. I-SET acts as a reading platform for the substrate peptide. The PMT may slide along the histone tail, held in place by non-specific electrostatics. Once a specific combination of histone side-chains comes into register with I-SET, the substrate lysine loses a proton to the solvent, and the complex clicks into a catalytically competent conformation where (1) a catalytic tyrsosine located at the C-terminus of the SET domain completes formation of the lysine channel and projects towards the active site, (2) a conserved double-hydrogen bond flanking the substrate lysine is engaged with the I-SET domain, (3) the post-SET domain closes onto the bound peptide, shielding the catalytic center from solvent. Histone marks deposited by other enzymes on flanking serine, threonine or arginine side-chains can affect the formation of this catalytically competent state.
Structures of the three H3K36 PMTs SETD2, SETMAR, and NSD1 (PDB codes 3h6l, 3bo5, 3ooi resp.), and the H3K9 tri-methylase SUV39H2 (PDB code 2r3a) are lacking a peptide binding groove, which seems to contradict this model (Fig. 6, top). In these structures, the I-SET domain superimposes well with the I-SET of active structures, such as histone-bound GLP, but a side-chain of the Post-SET domain projects into what would be the substrate lysine channel, and flanking Post-SET residues occupy the peptide binding groove (Fig. 6, bottom). The functional relevance of this auto-inhibitory mechanism, originally reported for SUV39H2, remains unknown at this time .
Catalysis takes place at the SET domain, where the departing methyl of the cofactor lies in close proximity with the de-protonated ε-nitrogen of the substrate lysine, at the bottom of the lysine channel (Fig. 7). The nucleophilicity of the departing methyl group is enhanced by neighboring main chain carbonyl oxygens, and the hydroxyl end of a catalytic tyrosine. Another surrounding tyrosine forms a hydrogen bond with the substrate lysine, thereby aligning the lone pair of the deprotonated ε-nitrogen with the scissile methyl-sulfur bond. A nucleophilic attack follows, which results in methylation of the lysine, and release of SAH . A correlation has been observed between the number of residues susceptible of forming a hydrogen bond with the substrate lysine - generally a tyrosine - and the methylation state. Indeed, adding hydrogen bonds restrains the rotational freedom of the nitrogen atom, which is necessary to align its lone pair with the scissile bond of the sulfonium group. Mutational analyses have confirmed experimentally that a Tyr-Phe switch in the active site can effectively control the methylation product (Fig. 7) [18, 32-34]. Additionally, the extra bulk created by the tyrosine’s hydroxy group, or, as shown in SETD8, by a bound water molecule, can sterically prohibit higher methylation states . Interestingly, this switch was recently reported as a frequent somatic mutation in lymphoma, changing the EZH2 from a multifunctional mono- di- and trimethylase to an enzyne with increased trimethylase activity, but little or no mono- and dimethylase activity [35, 36]. Inhibitors specifically recognizing the mutant enzyme may be of interest.
Catalysis and control of methylation specificity. Backbone carbonyl oxygens and a catalytic tyrosine (orange) surrounding the departing methyl group of SAM (yellow) favor a nucleophilic attack by the ε-nitrogen of the substrate lysine (magenta), which must have been de-protonated beforehand . A limited number of residues (green, gray, cyan) restrict the alignment of a lone-pair on the accepting nitrogen with the scissile sulfur-methyl bond, a geometry necessary for methyl transfer to occur. Mutational analyses reported for G9a [18, 33], SETD7 [14, 37], SETD8 , MLL  and somatic mutations reported in EZH2  confirm that these residues control the final methylation product: mono-, di- or tri-methyl lysine (Kme, Kme2, or Kme3 respectively).
The cofactor and substrate peptide bind at two distinct pockets and meet at the catalytic site (Fig. 2, top). This suggests two avenues for drug design: competitive inhibition of cofactor or peptide binding. Potent small molecule inhibitors can only be developed if a site is druggable. Selective inhibition relies on the site’s diversity. Currently the very homologous enzymes GLP and G9a are the only two lysine PMTs that were crystallized in complex with substrate peptide competitors: Bix-01294 (IC50 ~1 µM), E67 (IC50 ~ 270 nM), E72 (IC50 ~ 100 nM), UNC0224 (IC50 ~ 15 nM) and UNC0638 (IC50 ~ 10 nM) (Fig. 1) [25, 26, 38-40]. We used the program SiteMap (Schrodinger, NY) to evaluate the druggability of the pockets exploited by these inhibitors (Fig. 8). A druggability score (Dscore), validated against a large training set, is calculated as a function of volume, hydrophobicity, and enclosure of the site . A score larger than 0.95 indicates that the site is druggable; a value below 0.8 reflects poor druggability; a Dscore between 0.8 and 0.95 is in the gray zone, where no reliable conclusion can be drawn . Bix-01294 occupies the open section of the peptide binding groove, but does not exploit the lysine channel (Fig. 8). The druggability of the corresponding pseudo-site, which artificially excludes the lysine channel, is unclear. UNC0638, another peptide competitor, recapitulates the binding pose of Bix-01294, but has an additional aliphatic chain ending with a pyrrolidine that extends into the lysine channel (Fig. 8). With a Dscore of 1.05, the corresponding site is clearly druggable, as confirmed by the high potency of the ligand.
Druggability of the peptide and cofactor binding sites. A druggability score was calculated with SiteMap (Schrodinger, NY) for the binding sites of the GLP/G9a inhibitors Bix-01294, UNC0638, and the cofactor (top) . The peptide-binding site (center) is druggable, and can accommodate highly potent compounds, such as UNC0638. When the lysine channel is ignored, the druggability index drops sharply (left). The cofactor site (right) is also predicted to be druggable (blue bar represents the mean value across all available co-crystallized human SET domain PMT structures), but with varying druggability index from one enzyme to another (blue error bar), which reflects a variation in hydrophilicity and enclosure of the site. Green: pocket used to calculate the druggability score. The molecular surface of the enzyme is clipped across the Z-axis for better visualization.
We also calculated the druggability of the cofactor binding site, defined as the pocket occupied by SAM, SAH, or the close analogue synefungin. The Dscore varied from 0.92 to 1.1 across all co-crystallized structures of human lysine PMTs, with a mean value of 1.0 (Fig. 8). While this site appears more druggable in some lysine PMTs than others, it is predicted to be druggable in all cases. The SETMAR structure is an exception, with a Dscore of 0.92, due to its particularly high hydrophilicity. This highlights a challenging feature shared by all enzymes. The cofactor site includes numerous polar groups that cannot be buried by hydrophobic ligands without significant desolvation penalty. These must be matched by a complex and specific network of hydrogen bond donors and acceptors decorating the inhibitor.
We have seen that in all available ternary structures, an arginine side-chain flanking the substrate lysine is an important contributor to binding enthalpy. It is interesting to note that the co-crystallized inhibitors all occupy the arginine binding site (Fig. 8), a feature that could inspire by analogy the design of SETD7 or SETD8 inhibitors (Fig. 4 bottom). Interaction hot spots that should be exploited by potent chemical inhibitors can be predicted based on receptor-ligand contacts conserved across all available structures. At the peptide binding site, a conserved double hydrogen-bond between the backbone of the substrate lysine and a beta-strand of the I-SET domain seems to be important for the interaction (Fig. 9A, top). Interestingly, this interaction is partially recapitulated by the pyrrolidine group of the potent inhibitor UNC0638 (Fig. 9A, bottom). At the cofactor binding site, a series of 6 hydrogen bonds engaged with five backbone atoms and one conserved asparagine side-chain of the SET domain is observed in all available structures (Fig. 9B, top). These hydrogen-bonds are clustered at two specific locations, acting as anchoring point for the cofactor, one at the adenine ring, the other at the methionine end. It is likely that potent inhibitors will need to mimic this profile of interaction.
Selective inhibition can only be achieved if the structural chemistry of the pocket is sufficiently specific to a given enzyme. The peptide binding sites of lysine PMTs have evolved to recognize specific sequences. It is therefore reasonable to infer that structural features used to read specific sequences can be exploited to design selective inhibitors. This is in part confirmed by the selectivity profile of UNC0638 an inhibitor that specifically inhibit the H3K9 PMTs G9a and GLP, but not the H3K4 PMT SETD7, the H4K20 PMT SETD8, or even the H3K9 PMT SUV39H2 ). The question of selectivity is not as clear for the cofactor site as it recognizes the same cofactor across all enzymes. The chemogenomic profiling of human kinases has demonstrated that selectivity can be engineered into ATP competitors. A recent study shows that the structural diversity of the SAM site in PMTs is comparable to that of the ATP site in kinases, suggesting that selective inhibition could be achieved at the PMT cofactor site . The selectivity profile of chaetocin, a fungal metabolite that competes with SAM with some specificity for H3K9 PMTs, reinforces the hypothesis that selective inhibition at the cofactor site is chemically tractable .
We have highlighted general concepts regarding the structural mechanism of SET domain PMTs. A variety of domains can dress the core SET structure, and act as docking platforms for specific binding partners associated with diverse cellular events. Sequence and post-translational modification status of substrate peptides are mainly recognized by the I-SET domain, while a limited set of polar groups surrounding the substrate lysine control methylation specificity. The peptide and cofactor binding sites are chemically tractable, and can be targeted by selective small molecule inhibitors, independently or simultaneously. Conserved interaction patterns observed in co-crystal structures strongly suggest the presence of a discrete number of interaction hotspots that can be exploited to achieve potent inhibition.
The SGC is a registered charity (number 1097737) that receives funds from the Canadian Institutes for Health Research, the Canadian Foundation for Innovation, Genome Canada through the Ontario Genomics Institute, GlaxoSmithKline, Karolinska Institutet, the Knut and Alice Wallenberg Foundation, the Ontario Innovation Trust, the Ontario Ministry for Research and Innovation, Merck & Co., Inc., the Novartis Research Foundation, the Swedish Agency for Innovation Systems, the Swedish Foundation for Strategic Research and the Wellcome Trust.
CONFLICT OF INTEREST
I would like to thank Cheryl Arrowsmith for comments on the manuscript, as well as Kong Nguyen and Calvin Santiago for computing druggability indices.