Pranav Kalavade - Intel - Nand cell operation, disturb, technology scaling

So, yes, everybody is making all kinds of drives. The enterprise drives are, there is really no one class of enterprise drives, there is a whole bunch of classes of enterprise drives. The real high performance, a petabyte drive, was published, I think, two weeks ago. I forget who it was from, but yes, they make SLC drives, they can go 250K cycles or north of that. So it's a very high cycle count, and yes, as enterprise guys look at that, and they do pay a premium for that. The market size is however very small. So that's the [UNKNOWN]. So, it all goes down to what is good enough. MLC can become good enough, and then suddenly the [INAUDIBLE] demand for SLC and PLC can also become good enough and then again you'll have no demand for MLC because to make it, it becomes cheaper. so we talked a little about read. the second thing we wanted to do was erase. The name Flash comes about because this technology, it is not bit erasable. You have to erase a whole bunch of. Bits, in the same shot. It's good for parallelism, but it's very bad if you just want to do a bit toggling kind of thing. It adds a whole bunch of overhead that you really need to do. So the flash mechanism is very simple. it's by Fowler-Nordheim Tunneling. you apply a very high positive voltage on the well. hold the control gate to ground, create the large negative fill that you want and you can take off all the electrons in the floating gate that you've put. Characteristics are very textbook like characteristics. They follow their lock time behavior in terms of erase versus time. So very textbook like behavior. and, the erase is done very fast. So, I want to, if I show you an erase command, a whole chunk of block, which is like four mega-byte or two mega-byte is going to be raised in one millisecond. So if you [UNKNOWN] over a bit, it's a very fast operation as compared to program. So, program, the last operation that we really want to do, is again, exactly the same but in the reverse direction. So you're applying a high volatage in the gate and grounding the substraight. The actual program time is relatively small, as to compared to erase, so you can finish the whole operation in something like a 100 microsecond time. The beauty of NAND, again here, is it's all displacement current. There is no, the injection efficiency is essentially one. You're not really wasting any current, unlike [INAUDIBLE] where you have to supply a whole bunch of current and then hope for some, lucky electron to actually go into the floating gate. And as a result, you can actually do a whole bunch of parallelism. So you can write 16 K bytes in one shot and that gives you very, very competitive [UNKNOWN] words in applications like cameras, smart phones, or even SSDs. So a low programming current is a key feature and that enables the real parallelism that we have in NAND Flash. the, the biggest risk in functionality for NAND is, Program Disturb or Program Inhibit. So I'm trying to program you, that's great. But when you try to even, put it in a small array, you need to make sure that, that program operation is not messing up the rest of the data that you've written or you're going to write. So this is a very key part of NAND and always is the hardest one to meet as a device engineer. This is a very hard criteria to meet for every single technology. So I've probably worked on four technologies by now and this is the problem that we really face. Program Disturb. What, what do I mean by program disturb. So I am trying to program this cell. The way I program is, I have taken the programming word line high. I have made sure that the channel of the Flash cell is held to ground by keeping the select gate open and the bit line grounded. So, essentially I have the whole 20 volts sitting across the tunnel oxide and I can program. How do I make sure that the neighboring word, the other cells on the same word line which is the odd page, in this particular example, that doesn't get programmed. the way to do that is you make sure that the bit line is cut off from the world and as a result the sub straight here doesn't really have. Any grounding going on, it's not connected to any fixed bias. So when you take all the word lines to program or inhibit, the voltage goes up. The channel comes along with it for a ride and the field essentially reduces. So you have the channel going to something like six seven volts. Six minus seven is much, much smaller than, zero minus. 6 minus 20 is 13 much smaller than fn and so you don't really get a disturb on this particular word line. So what you use is a capacitive coupling between the control gate and the channel to boost the channel high enough to actually in a bit programming. >> [INAUDIBLE] . the, yes. So essentially you're, making sure that the bit line is floating. And, it just goes along for the ride. And the only way it really, the boost voltage goes down is through a generation recombination. Which is our friend. Which takes a long time because the sink, of silicone quality is pretty good. when I say capacitive couple, you can clearly see that not only is the program voltage important but also is the inhibit voltage. The larger voltages are used to couple it up, the higher this capacitive coupling is going to be. The more this voltage that I show, which is six in this particular example that's going to be. And, the better it is going to be for my program disturb. So, when it's tempted to just crank it up as high as you want. But, there is another factor that limits us from doing that and that is inhibit disturb. So, what happens to the cells on the selected string now. We have grounded the channel, all of them are sitting at zero. We need to do that so that this guys programs at 20 volts and if you can club the voltage on the unselected word lines the electric field on that word line is going to be pretty high. And those guys, those cells are going to start programming up. So that limits us from increasing the inhibit voltage too high. And that's what sends the engineering boundary. That you need to be able to go as high as you can but not too high because then it's going to have inhibit disturb concerns. So the program disturb is on the cells that are being programmed where as the inhibit cells are on the other word lines on the block, and the string which is also being programmed. This, along with the, other criteria that I mentioned in terms of [UNKNOWN] are the two big reliability concerns for NAND and then everything just gets worse with cycling. But, essentially, those are the two functionality risks that we have in terms of, NAND. Any questions? Comments? Alright, so moving on in terms of scaling, we talked about basic, Basics of Flash. We know the structure that is a floating gate. We know how to read. We know how to program. We know how to erase. let's see how it actually behaves in real world when we start making things smaller and smaller. the, we, we, probably have a, at least I was when I was a grad student, a lot more aware of about c m o s scaling issues. And about flash scaling issues. So, how does C mos work? You make all dimensions small, you make all voltages small, and that's how you keep scaling. Flash scaling is very different to that kind of an approach. you have, a very hard limit in terms of how thin you can make the tunnel oxide or the gate oxide. You just can't go below 70 because you want to your trying to guarantee retention, and if you go below that hopping cell all that, retention mechanisms kick in, and you lose retention. So it's set at a minimum thickness and the thickness is around 70 angstroms. You try to go below that you're essentially trading retention to other capabilities. So you can't do that, which is the bread and butter of the whole NAND of the Morse [INAUDIBLE] industry. They make high K and all that to make the reflective gate oxide smaller. You can't really do that in Flash. the ONO has a very similar limit. You can't really go below certain limit for exactly the same, vertically field reasons and that is limited to around 120-140 angstroms electrically. the other thing we do in c moss, you scale down the voltages. No, you're not allowed to do that either because you, you need to have 20 volts between the control and the channel to either program or erase and making cells smaller laterally doesn't change that voltage at all. So, you're stuck with very high voltages. You're stuck with very high vertical non scaling stacks in some sense but you want to make things smaller in the horizontal direction. And you can see that this is essentially not engineer friendly situation where you're actually increasing all kinds of fields but not really allowing things to scale down proportionally. So, limited voltage and oxide scaling essentially sets you on a very different scaling, paradigm than what CMOS does for, for Flash. sorry I didn't see that, go ahead. >> [INAUDIBLE] >> If you scale the oxide the voltage goes down, but bullet number one says, hey, don't scale the oxide because then you don't have any retention left. So you can go all the way, you can think of it like a [UNKNOWN]. If we go to 20 angstroms, it's essentially not going to have any, retention. But your voltage is going to go down and that's essentially what. You can think of it like a d-, it's a non volatile memory, you want to save the, it's good for five years. But you need to be able to, and as a result things just follow that you can't scale the vertical stack it all. At least in the conventional CMOS sense. good news but we have been able to do that. So cell area has scaled 2X every generation for as long as I can show it on this particular plot, and it keeps the engineering minds very excited because every scaling node is that much more challenging. So a second plot of challenge versus node curve it's essentially, I would say, it's logarithmic [UNKNOWN] in the reverse sense where it's really getting very hard to scale and you need to be very creative to scale. I look at it as a good opportunity to be very creative, and contribute very high. So it's been fun working on these kind of scale notes. what are the key challenges? So, we talked about qualitatively about the volted scaling. Is there anything else? There are a lot of other challenges. the key challenge from a, a floating gate NAND point of view, is the interference, cell to cell interference. So, and in a very earlier slide I said that, gate coupling ratio is very important, which means how much control the gate has over the control gate versus periphery. You can, verses all the parasitic that a cell has. You can imagine that as things come closer, the parasitic's are going to dominate over the control of the actual gate, and that's essentially what it boils down to [UNKNOWN] called as cell to cell interference. that really sets a pretty hard limit. and in subsequent slides, what I'm going to try to do is tell you what the first principle's limits are, and what engineering solutions we have been using as an industry, and at Intel and Micron, to actually overcome those. So that's one key listing. We'll go into details in the subsequent slides. A few electron effects. we also went through a very simple equation which said as you scale down capacitances, the number of electrons that you need to program are, and to retain a particular state are also going down proportionally. As you start dealing with finite, I mean few electrons. You have to start worrying about the next level of problem, where random telegraph noise becomes a deal breaker in some sense. As well as program fluctuations, where every electron, so if you're trying to put ten electrons, you can either put nine or 11, and that makes a pretty big difference to the VTO programming. Those are pretty fundamental limits, and we deal with them, but they are pretty fundamental limits as you keep scaling the cell size. We also talked about the Voltage Scaling. So I wanted to repeat myself ahead, and then I'll go into what the other solutions would be for future scaling and trends. Of cell to cell interference. So, with scaling, the influence of the neighboring cells becomes important as well. so, a really thorough so we can see in their, just a, cross-section 2D diagram you can see neighbor cup, control gates coming closer to the floating gate. That capacitance is going to go up. the source intend capacitance is going to go up, the word line to word line or floating gate to floating gate and the word line direction, that capacitance is going to go up. And exactly the same thing happens in the other direction also. Now this is a good picture maybe for a hundred nanometer node, but as you keep scaling down, you need to worry about many, many second order effects also. For example, the depletion layer. So you have a floating gate to channel coupling. Which we don't even know the existence of at the 70 nonometer nodes but it's a very large per portion of our floationg gate node today . And, what really happens here is, a floating gate potential of cell one. Effects the inversion, inverted channel of the cell two shown here. So effectively, if cell one is programmed, a higher gate potential is required to achieve the same inversion layer, whereas if it's not, you need a lower voltage. So that adds to the effect of coupling and this is not a direct Floating gate to Floating gate. Like a two connector paddle plate kind of a problem. This is actually effecting your depletion regions at the bottom. So it's the next letter of the problem. And, as a result capacitances go pretty non-linearly, in terms of being increased. So, try to plot a historic trend. Floating gate interference would have become about 50% of a cell or 50% of a whole capacitance, at the node that we are making today. So, if you are at 20 nanometer node, and historic projections, if we weren't trying to be smart about things, you would have had essentially no control from the gate. All the cells would have been controlled from the parasitics. So what did we really do. again it's a, the idea is pretty simple. The tricks are actually making sure it works. We put air gaps to essentially reduce the dielectric constant and as a result, capacitance. So we have tried putting air gaps in a different way, and this is our paper in IEDM three years ago, two years ago. And in our 25 nanometer node, we actually were the first ones to actually put in air gap. I think a [UNKNOWN] student did something similar in the bit line direction a long, long time ago. but we actually are putting it very, very close to the cell right there and that has been a savior in terms of coupling. So, you can see at the 25 nano meter node where we would have been something like 37%, we are actually more like a very, very manageable 25% and give us. At least one, maybe two scaling node opportunities to actually make things work for them. So, we put in air gap between the floating gates. The other things that we do, which are not as visually impressive, but we do a whole bunch of floating gate thickness. So again this is a parallel plate example, you make things as small as possible and you reduce the coupling and behind this, there is a whole bunch of algorathimic tricks that are also being played. So you want to make sure that you are doing programming in a way where you are minimizing the effect of neighbors being programmed. So we do a large amount of algorithms innovation also to keep our Floating Gate impact the same. So I can think of this as go and change the Floating Gate coupling in the first place and then secondly be smart about how you program things so that the impact for the same Floating Gate coupling is also optimized