Melissa Moore - U. mass/hhmi - Part 2 - Spliceosome structure and dynamics

and the Howard Hughes Medical Institute and in this lecture I'm going to tell you about some work from our own lab on spliceosome structure and dynamics. So as we saw in the first lecture, eukaryotic genes are split, in that they have expressed regions or exons, up here, and introns. And the introns have to be removed and in my previous cartoon I just used the scissors and tape. So in this lecture we're going to be talking about "What is the nature of those scissors and tape, and how do they actually do the reaction?" So first let's talk about how the splicing actually happens. And it happens in two chemical steps. In the first step of splicing, the branch site--so we talked about that, that's a conserved adenosine in the intron near the 3 prime splice site. The 2 prime hydroxyl of the branch site attacks the phosphate at the 5 prime splice site, and that generates a lariat intermediate, which has a 2 prime, and 5 prime and a 3 prime phosphate all coming off that adenosine. And it releases the 5 prime exon. Now that 5 prime exon doesn't float away, it's going to be held onto by the splicing machinery, which we'll be talking about in a moment. The second step of splicing--so here are intermediates in the reaction. This 3 prime hydroxyl that was generated in the first step on the 5 prime exon--it now attacks the phosphate at the 3 prime splice site, and now it kicks out the intron and ligates the two exons together. Now we know this is the chemistry and that these occur as single transesterification reactions. The two steps of splicing are catalyzed by a large complex in the cell called the spliceosome. The spliceosome is arguably the most complicated macromolecular machine in the cell, as we'll see in a moment. And the spliceosome consists of four major chunks, large pieces, subunits that need to come together with each round of splicing in this complicated dance called the spliceosome cycle that we'll talk about in a few slides. But what I want to show you here is that the pieces of the spliceosome, the major pieces are these components called U1, U2, the triple snRNP U4/5/6, and the NineTeen complex, or the NTC. Now what are these things? Now what are these U1, U2 things that I showed you in the last slide? Well they are so-called snRNPs for small nuclear RNA protein complexes. And so each of these snRNPs consists of a small nuclear RNA somewhere between a hundred and a couple hundred of nucleotides. It's a stable RNA complexed with a set of proteins. So for example, U1 snRNP has U1 snRNA in it and a set of core proteins called the Sm proteins. There are seven proteins. They make a ring-like structure, and you can see they are common to a lot of the snRNPS. And then some specific proteins that are common to U1 snRNP and in U1's case--70K, A and C. U2 is more complicated, and these are all of the proteins that are associated with U2 snRNP. And then with the triple snRNP, a so-called triple snRNP because it has three small nuclear RNAs in it, and it has even a larger set of proteins. In addition to the snRNPs, there's the NineTeen complex, so named because it contains a protein called Prp19 and its associated factors. So it is just like a snRNP except it does not contain an RNA component. And then in addition to these main stable components, there are also things called splicing factors and these are proteins that come and go, but are not stably associated with any single snRNP. And included in this class are RNA helicases that change the structure of RNAs or can change the structure of RNA protein complexes. Certainly there are RNA binding proteins, we talked about two of the classes of those in the last lecture--the SR proteins and the hnRNP proteins. And there are unexpected proteins like cis/trans prolyl isomerases and ubiquitin ligases. Altogether the complete spliceosome parts list as we now understand it consists of 5 snRNAs (U1, U2, U4, U5 and U6) and a hundred proteins in yeast, a hundred different proteins, and about three hundred different proteins in humans. And the reason that the human splicing machinery is so much more complicated than the yeast splicing machinery, you can imagine why because we have so many different introns, and we do all this alternative splicing that the yeast don't. And so most of these proteins, the extra ones, are involved in alternative splicing. Now let me tell you a little bit about the snRNAs. They're called "U" snRNAs because they're uridine-rich RNAs, and their numbering came from--if one just purifies all of the stable RNAs out of the nucleus and runs them on a gel, the most abundant one is U1, the second most abundant is U2, and so forth. And it turns out that U3 does exist, but it is involved in ribosome biogenesis as is U7. But the other five of the six most abundant are all involved in pre-mRNA splicing, or the spliceosome. Here we are at the spliceosome, back at the spliceosome cycle, and we're going to look at this a little more closely because I'm going to be telling you about some experiments we've recently done to test this cycle. So in the first part of the cycle, at the beginning of spliceosome assembly, U1 snRNP interacts with the 5 prime splice site and U1 snRNA actually base pairs and recognizes the 5 prime splice site. Similarly, U2 snRNA base pairs with and recognizes the branch site consensus sequence, and U2 snRNA, when it joins the U1 snRNA, forms A complex. E complex stands for "early" complex and then A, B, and C is-- we'll see in a minute--were just named by where they ran on a gel. So A complex comes first, and then the next big chunk of the spliceosome that comes in is the triple snRNP-- U4/5/6. Once the triple snRNP is there, we have B complex. Then there's a structural rearrangement, where U1 snRNP is actually ejected. It's kicked out, and somehow things are majorly rearranged such that U6 snRNA is interacting with the 5 prime splice site. In another rearrangement U4 snRNP is kicked out, and then the NineTeen complex comes in, and now we have the catalytically active spliceosome, where the first step and then the second step of splicing occur. Once the second step of splicing is complete, the splice product is released, and the spliced out intron leaves with the rest of the spliceosome. That splicing machinery has to disassemble, and then it's reassembled anew on each new round of splicing, so ergo the spliceosome cycle. So how do we know a lot of these details of the mechanism that I've been telling you about? Well one of the ways that we know is by performing in vitro splicing reactions. So in an in vitro splicing reaction, we take a piece of RNA, usually a piece of RNA that would be a couple hundred of nucleotides that would consist of an exon with one intron followed by a downstream exon. And the little asterisk here, the red asterisks, are to indicate that this RNA would be radioactively labeled, so we would transcribe it in vitro and put radioactive nucleotides all throughout. We then mix this RNA with either whole cell extract if we're working on the yeast spliceosome or nuclear extract if we're working on the human spliceosome. And for example, we might get that from HeLa cells, which is a very common tissue culture cell for humans. And then also ATP because ATP is essential for splicing because many of the spliceosome transitions that I was showing you here all of the ones after E complex formation, each one of those steps requires ATP and also going around the backside here. Now why do we use whole cell extract or nuclear extract? Well I've just told you that the splicing machinery is incredibly complicated. It has in yeast a hundred different polypeptides; in humans, three hundred different polypeptides. There's really just no way at present that we can purify each one of those proteins, have that in a test tube and fully reconstitute the machinery. So right now, the best way to study the spliceosome is simply almost in its native environment, and that is in unfractionated cell extract. So if we then take those splicing reactions and take out time points and then purify the radioactive RNA and run it on a denaturing gel--and in this case a pretty high percentage denaturing gel--what we can see is that over time (and this time course goes from about zero to sixty minutes in vitro) we can see the substrate gradually disappear. And then at early time points, the two intermediates of splicing appear, the lariat intermediate and the 5 prime exon. And then you can see at later time points the lariat product, the intron product, and the spliced exon product appear. And the reason that the lariats run high in the gel, even though they're smaller than the pre-messenger RNA, is that they have this circular structure, and because of that unusual structure, they're retarded in the gel more than a linear RNA and so they actually run higher than you would expect. But now if instead we take that same splicing reaction but don't purify the RNA and just run it on a native gel, so now we are looking at the complexes that contain the RNA. Here we can see those complexes I told you about before. So here's E complex, the early complex, A, B and C and they build up and go away over time, as you might expect. And so that's where the names of these different complexes came from, simply by their migration on the gel. Alright, now these different complexes can be purified. They're stable. They're stable enough to survive a native gel, and there have been many different ways devised now to purify the different complexes. Here's one example from my laboratory. What we did was we took a splicing substrate, where we mutated the 3 prime splice site, so the splicing machinery could build up on the RNA. It could do the first step, but it couldn't do the second step because the 3 prime splice site was mutated. And into that intron, we built stem loops that were recognized by the MS2 protein, which is a viral coat protein that binds very tightly to its recognition sequence. That viral coat protein called MS2 we linked to maltose binding protein. Maltose binding protein likes to bind amylose resin, so we could use that as an affinity tag to pull down these spliceosomes and purify them. And you can see here an EM image of those spliceosomes and the spliceosomes are all around 20-30 Angstroms in size. Now from those EM images, you can do single particle reconstruction to start to get structures of the splicing machinery. And at this point, in comparison to the ribosome, for example, the structural information we have for the spliceosome is rather limited. We do have crystal structures now--two different crystal structures of U1 snRNP, which is the most common of the snRNPs. And those are at 5.5 and 4.4 Angstrom resolution, so enough to see the RNA and the proteins. But for the bigger complexes, we are still at the electron microscopy stage, and so for example, here are some images from Reinhardt Luhrmann's lab of the yeast splicing complexes--the B complex, the activated B complex and then the C complex, which is the spliceosome that has intermediates in it. And so in the coming years, we're really--the splicing community--is really struggling hard but looking forward to having high-resolution structures because we would really like to see where all of these parts bind and how they all fit together to form these just really remarkable machines. Now in the meantime, that's where we are on the structural front. In addition to structure for any biochemical complex or machine, you really need to note something about their dynamics, so I've gone over this whole splicing cycle with you but as I explained before, the splicing cycle is based on the complexes that are stable enough to be resolvable on a gel or you can affinity purify them. But it doesn't tell you about the kinetics of things coming and going. So for example, is it going to be true that on every intron U1 has to come before U2, and do these two have to come before the triple snRNP? And all of these arrows that we're showing here are one-way arrows, but most biochemical reactions and chemical reactions really are two-way. So are these arrows really one-way: is it a one-way street? Or is this process in any way reversible? So to get at that information my laboratory has recently been collaborating with Jeff Gelles's laboratory at Brandeis University and Virginia Cornish at Columbia University as well as some co-workers at New England Biolabs to develop new tools in order to look at the dynamics of the spliceosome. The main method that we have been utilizing is called total internal reflectance. Now this is an experiment you can try at home. So this is simply a laser pointer that's going into an aquarium of water. And if you correctly position the laser pointer at the critical angle, when there's a change in refractive index, in this case between water and air, then all of the laser light will be completely reflected, that's called total internal reflectance. Except right at the point where the laser contacts there is a little bit of energy that goes through the other side, called the evanescent wave. So the evanescent wave--in this case now we're going to be having the lasers come through the air to a microscope slide. So here is the change in refractive index is going from the microscope slide to the aqueous layer above the microscope slide. The evanescent wave will go about a hundred nanometers into the solution above the microscope slide. So imagine having not just one laser but let's say three different colors of lasers. It turns out we can now do five lasers. I'm only going to show you three, three today. But imagine having three different colors of lasers all going in at their critical angles, and having something tethered to the surface within this 100 nanometers and having molecules that are fluorescent in the colors that are excited by your three different lasers. So molecules that are in solution above the evanescent wave are not fluorescent because they're outside the area of where the light energy is. And so only the molecules that are tethered to the surface are going to be fluorescent. So we can use this to then ask, for anything that's tethered to the surface what different colored molecules at any one time are associated with the molecule that's tethered to the surface. So let's just see how this looks. So imagine you're looking down at this surface and we're going to be looking at the molecules on the surface. So we call this technique colocalization of single molecules spectroscopy, or CoSMos. And this technique was pioneered by Jeff Gelles and his co-worker Larry Friedman at Brandeis University. So here we are looking at the fluorescence, and each one of these spots is a single molecule on a glass coverslip that has different colored things on it. In this case, the molecules are a strand of DNA, and the colored things are different oligos that are complementary to that strand of DNA, but they have different fluorophores on them. And so you can see for example that this molecule of DNA had all three of the oligos bound to it but this molecule of DNA only had this one--only had the green and the blue bound to it. And you can see, here's one that only had the blue molecule bound to it. So it's very simple because all we're doing is we're looking at this, say, constellations, different constellations of spots and we're going to learn something about our biological system. And in particular if we can look at how these spots change over time, we can learn about the dynamics of the system. Now in order to use this to study the spliceosome, we had to develop a number of different or new technologies to enable us to label parts of the spliceosome so that we could see them. And so one of the things that we had to do was to create fluorescently tagged pre-mRNAs because we need to know where the pre-mRNAs are on the surface. Also our pre-mRNAs have to have some way to be tethered to the surface. The way we do that is to put a biotin molecule on one end, and then we also have biotin on the glass surface. We have biotinylated PEG--polyethylene glycol. And then we make a sandwich, where we have streptavidin. Streptavidin can bind four molecules of biotin, so you can use that to make a sandwich and bind your RNA there. Now the other thing that we had to develop were other ways of tagging the snRNPs because what we really wanted to do was look at the snRNPs coming and going in real time. So the way that we're tagging the snRNPs is using two protein tags. One is the SNAP tag that was developed by Kai Johnsson and is now available through New England Biolabs. SNAP is based on a protein that is a suicide enzyme that removes alkyl groups from guanine nucleotides of DNA. And so it transfers those alkyl groups to itself. So in this case if you have benzyl guanine--so here's guanine and then there's a benzyl group on it and if you attach to that a fluorescent dye, the SNAP tag protein will transfer that dye to itself, and if you've made a fusion protein between the SNAP tag and your protein of interest--in this case a snRNP protein--then you can specifically label your snRNP protein. Here the other tag that we've been using is the E. coli DHFR, dihydrofolate reductase tag, and bacterial dihydrofolate reductase binds very tightly to trimethoprim--this molecule down here. It's a non-covalent interaction. Trimethoprim is an inhibitor of E. coli DHFR but this molecule does not bind to eukaryotic DHFR. But this is a very tight interaction and again if we tether a dye with that, this dye will interact with our DHFR tag and allow us to label that protein. And this technology was developed by Virginia Cornish and her coworkers at Columbia University. So how do we get these tags on our snRNPs? The way that we do this is we're using the yeast system and we're using homologous recombination. So we make versions of different protein genes that we want to tag--in this case two U1 proteins and a U2 protein. We then place the tag of interest at the C-terminus of that protein, or the gene for that protein and then we have a selectable maker. And we use homologous recombination to put these modified genes into haploid yeast. And that means the only gene that is encoding that protein in the yeast is our protein of interest--our tagged protein. So then from those yeast strains, we can make a whole cell extract. And in this case we have U1 having two DHFR tags on two different proteins, or U1 having two tags and U2 having a SNAP tag on it, so a triple-tagged strain. We then take those extracts and we either can simply add the TMP to label the DHFR or to label the SNAP tag, we take our Benzylguanine that has a fluorescent tag on it. We react that with the whole cell extract. We remove the excess dye by gel filtration and then now we can add our TMP. And so the really great thing about this system is first of all we know that the proteins we're tagging are active because 1) they're the only copy of the protein in the cell and we're only tagging essential proteins. Many of the proteins in the spliceosome are essential, and so we know that if the cells grow, because splicing is essential, then that protein must be active. Secondly, there's absolutely no protein reconstitution required, so we're not making any recombinant proteins, purifying them and putting them back in. We're using the endogenous proteins. We just added a small protein tag to the thing. So let's look now at how these experiments are going to go. So we're going to have our pre-mRNA that's attached to the surface via this biotin-streptavidin sandwich. It has a fluorophore in it so we can keep track of where the pre-mRNA molecules are. And this is actually a view through the microscope of what a field of these pre-mRNAs look like, where each one of these spots is a single molecule of pre-mRNA. And in the movies I'm going to show you we're going to be looking at U1 snRNP binding to those pre-mRNAs over time in splicing reactions. One of the things about single molecule reactions is that you're really seeing everything that's going on-- anything that's fluorescent, any kind of dust or anything you can see, so you really need to do a lot of controls to make sure that you know what you're looking at. So the first thing I'm going to show you is a movie where we're doing some controls, where either we have left off the fluorescent RNA, or we don't have the tags on U1 snRNP (and so we wouldn't expect signal) or we have the complete reaction where we have the fluorescently tagged RNA and the fluorescently tagged snRNP. So let's watch that movie. This movie shows two control fields of view and then one experimental field of view on the right. The field of view all the way to the left, we have the wild type pre-mRNA present. We can't see that in this field of view because we're not looking in that channel. We're looking at the Cy3 channel, which is the TMP channel. And we also have the Cy3-TMP in the extract, but there is no tagged protein. So you can see that we do have a little bit of background with material binding non-specifically to the slide and so this is why it's important to do those controls, to make sure your background is not too high. In the middle panel, we now have the tagged U1 and the Cy3-TMP but we don't have any pre-mRNA on the slide, so again we only see background binding. And then in the rightmost panel, which is the one with all the blinking lights, we have all three components. So we have tagged U1. We have the pre-mRNA on the surface, and we have Cy3-TMP. Now one thing that you can see immediately from this is that U1 interaction with the RNAs is highly dynamic. So even in the absence of ATP, U1 is binding and releasing multiple times from each pre-mRNA. Now that we know our system's working, let's really do some experiments. And the really cool thing about these experiments is that you can just see the answer with your eyes. So I'm going to show you some movies next, where we've put two fluorescent tags on each of the major subcomplexes So in one extract, in one quadrant, you're going to see extract that has labels on U1 snRNP, as you've already seen on two different proteins. Then we have another extract that has labels on U2 snRNP, on the U5 component of the triple snRNP, and also on the NineTeen complex. And in this first set of movies, we're going to not have ATP present, so in the absence of ATP we've known from the studies on gels that the only complex that should form is this E complex, so only U1 should be able to stably interact with the RNA. So now let's look and see if that's the case. Here's a movie, showing four different extracts with a different snRNP labeled in each extract--either U1 in the top left-hand corner, the triple snRNP in the bottom left-hand corner, U2 in the top right-hand corner or the NTC in the bottom right-hand corner. And in the absence of ATP, what you can see as we saw in the previous movies, that U1 is coming and binding reversibly, but for all the other snRNPs, we do not see any significant binding over background. If we take the data from each of those fields of view and simply count the number of spots over time--the total number of spots over time--what you can see is that in the absence of ATP only U1 builds up. And none of the other snRNPs or the NTC really have much occupancy at any one time in the absence of ATP. So now let's run the whole spliceosome cycle, so now we're going to add ATP and see what happens. This movie is now in the same order as before but now we've added ATP to the reaction. And if you watch very carefully, you can see the apparent order of addition of the snRNPs. So early in the movie, and the movie is continually looping, you can see that U1 is binding and coming and going. The next snRNP to build up is U2 and then after U2, we start to see U4, 5 and 6 come up. And then the NTC, we see less of it, but it accumulates much later in the reaction. So what you can see from this movie is that we can see in real time all four of these snRNPs binding to the surface that's covered with pre-mRNA molecules. And as I will show you in the next slide, we can see that all of the snRNPs are binding dynamically--that is, they're coming and going, that none of them are coming and staying permanently. One of the things that you can see from those movies is not only can we see that all of the snRNPs are binding in the presence of ATP, but unlike the spliceosome cycle that I showed you before with all the one-way arrows, all of the snRNPs are binding reversibly. And we can see this by--here's looking at one individual RNA molecule, and we're just looking at the intensity over time for that one RNA molecule and you can see for U1 it bound twice. Here's an RNA molecule where two molecules of U2 bound, U5, and the NTC. The reason we don't just see two binding events often times especially for U1, we'll see three, sometimes up to ten binding events. The reason we're showing these particular traces is it shows you that this binding is due to reversible binding and not due to photobleaching. So photobleaching is always a problem in these single molecule reactions because under the intense laser light the dyes can often photobleach and then they go blank and then when the signal disappears you don't know if it's because your complex has gone out of the evanescent wave or your dye has photobleached. So this is why we attached two different fluorophores to each snRNP because when we see the stepping behavior, this is either due to dye release because we're using the DHFR tag or it's due to photobleaching. But that means that this molecule which went away in one step really had to be a molecule that went away. The whole complex went away. Because it's very unlikely that you'd have two simultaneous photobleaching steps or two simultaneous dye steps. And also you can see it went away and then another one came back. So again this is another one of the controls that you need to do when you're doing single molecule experiments. Alright so thinking back to those movies, if we count up the total number of spots in each frame and just plot that number here, and this would be the number of dyes per pre-mRNA molecule. You can see now in the presence of ATP U1 builds up first, then U2, then U5, and then the NTC after that. So this gives us an apparent ordered process for spliceosome assembly, and it's consistent with that. But it doesn't actually tell us for any one molecule that the spliceosome assembly was ordered. But we can test that directly with our single molecule methods simply by following two snRNPs at once. So now we're going to do three-color experiments. So one color on the pre-mRNA, one color on--in this case--U2 snRNP, and another color on U1. And in the same experiment, by watching these snRNPs simultaneously, we can see does U1 come first or does U2 come first? And so here's where these data look like. So here again is one of these individual single molecule traces, but this is one pre-mRNA molecule where U1 and U2 came to the same pre-mRNA in the same extract. And you can see very clearly that U1 came first and then U2 came. But we can quantify this by measuring the on time for both U2 and U1 and taking this difference, the time of arrival of U2 minus the time of arrival of U1, and that's the delay time between the two. So if that number's positive, that means U2 came after U1. If that number is negative, that means U2 came before U1. And then we can look at this number over many different pre-mRNA molecules. So this is a so-called probability density plot. It's a bar graph where the probability density is the bin height divided by the bin width. The important thing to see is that here we're looking at 82 different molecules and that almost all of them had a positive number for this tU2 minus tU1. So that means that on almost all of them, U2 came after U1. Now there were a few here where U2 apparently came first, but that would be consistent with the amount of labeling of our extracts because we can't get completely a 100% labeling. It is impossible without our extracts dying. So we have about 90% labeling efficiency in our extracts, and this level would be consistent with a dark U1 coming before U2. So it's not inconsistent with an ordered model. So we've done this for all of the pairs of the complexes. Here's U2 versus U1. This is another set of data now with 111 molecules. Here's U5 versus U2, and here's the NTC versus U5. And you can see all of these, most of the events, gives you a positive number so therefore the second complex came after the first complex. And so that leads us to conclude that for this particular pre-mRNA that we've been working with (and this is RP51A--it's a model splicing substrate in yeast), it is a highly ordered assembly pathway with U1 almost always preceding U2 and then the triple snRNP, and then the NTC comes after that. But what we now know that's new that we didn't know before is that every step in this pathway is reversible. So that means that the pre-mRNA is not necessarily committed to splicing at the very first step with U1 addition but that it increasingly gets committed as you go through the spliceosome assembly pathway. Also, in terms of alternative splicing in the human system if the spliceosome can be disassembled, for example here at later points, then you can imagine you could inhibit splicing at particular splice sites anywhere along this pathway because it could go backward along the pathway if it's inhibited. So this has important implications for our understanding of alternative splicing. Finally, I need to thank the people who actually did the work. And obviously anything this complicated took the input of many different people. And so from my laboratory I showed you data today from Melissa, Aaron, Danny, Eric, Jing, and Nick contributed. Also all of this work was done in collaboration with Jeff Gelles's lab, which developed the CoSMos technology, in particular Larry and Alex. And then also the Cornish Laboratory, who developed the DHFR labeling technique, and finally the New England Biolabs for their help with the SNAP tag. And this work was funded by HHMI and the National Institutes of Health. Thank you very much.