Tip:
Highlight text to annotate it
X
So,
yes, everybody is making all kinds of drives.
The enterprise drives are, there is really no one class of
enterprise drives, there is a whole bunch of classes of enterprise drives.
The real high performance, a petabyte drive,
was published, I think, two weeks ago.
I forget who it was from, but yes, they make
SLC drives, they can go 250K cycles or north of that.
So it's a very high cycle count, and
yes, as enterprise guys look at that, and they do pay a premium for that.
The market size is however very small.
So that's the
[UNKNOWN].
So, it all goes down to what is good enough.
MLC can become good enough, and then suddenly the
[INAUDIBLE]
demand for SLC and PLC can also become good enough and then again you'll have
no demand for MLC because to make it, it becomes
cheaper. so we talked a little about read.
the second thing we wanted to do was erase.
The name Flash comes about because this technology, it is not bit erasable.
You have to erase a whole bunch of.
Bits, in the same shot.
It's good for parallelism, but it's very bad if
you just want to do a bit toggling kind of thing.
It adds a whole bunch of overhead that you really need to do.
So the flash mechanism is very simple.
it's by Fowler-Nordheim Tunneling.
you apply a very high positive voltage on the well.
hold the control gate to ground, create the
large negative fill that you want and you
can take off all the electrons in the floating gate that you've put.
Characteristics are very textbook like characteristics.
They follow their lock time behavior in terms of erase versus time.
So very textbook like behavior.
and, the erase is done very fast.
So, I want to, if I show you an erase command, a whole chunk of
block, which is like four mega-byte or two
mega-byte is going to be raised in one millisecond.
So if you
[UNKNOWN]
over a bit, it's a very fast operation as compared to program.
So, program, the last operation that we really want to
do, is again, exactly the same but in the reverse direction.
So you're applying a high volatage in the gate and grounding the substraight.
The actual program time is relatively small, as to compared to erase,
so you can finish the whole operation in something like a 100 microsecond time.
The beauty of NAND, again here, is it's all displacement current.
There is
no, the injection efficiency is essentially one.
You're not really wasting any current, unlike
[INAUDIBLE]
where you have to supply a whole bunch of current and then
hope for some, lucky electron to actually go into the floating gate.
And as a result, you can actually do a whole bunch of parallelism.
So you can write 16 K bytes in one shot and that gives you very, very competitive
[UNKNOWN]
words in applications like cameras, smart phones, or even SSDs.
So a low programming current is a key feature and
that enables the real parallelism that we have in NAND Flash.
the, the biggest risk in functionality for
NAND is, Program Disturb or Program Inhibit.
So I'm trying to program you, that's great.
But when you try to even, put it in
a small array, you need to make sure that, that
program operation is not messing up the rest of
the data that you've written or you're going to write.
So this is a very key part of NAND and
always is the hardest one to meet as a device engineer.
This is a very hard criteria to meet for every single technology.
So I've probably worked on four technologies by now
and this is the problem that we really face.
Program Disturb.
What, what do I mean by program
disturb. So I am trying
to program this cell.
The way I program is, I have taken the programming word line high.
I have made sure that the channel of the Flash cell is held
to ground by keeping the select gate open and the bit line grounded.
So, essentially I have the whole 20 volts
sitting across the tunnel oxide and I can program.
How do I make sure that the neighboring word, the other cells on
the same word line which is the odd page, in this particular example,
that doesn't get programmed.
the way to do that is you make sure that the bit line is cut off from the world
and as a result the sub straight here doesn't really have.
Any grounding going on, it's not connected to any fixed bias.
So when you take all the word lines to program or inhibit, the voltage goes up.
The channel comes along with it for a ride and the field essentially reduces.
So you have the channel going to something like six seven volts.
Six minus seven is much, much smaller than, zero minus.
6 minus 20 is 13 much smaller than fn and so
you don't really get a disturb on this particular word line.
So what you use is a capacitive coupling between
the control gate and the channel to boost the channel
high enough to actually in a bit programming.
>>
[INAUDIBLE]
.
the, yes.
So essentially you're, making sure that the bit line is floating.
And, it just goes along for the ride.
And the only way it really, the boost
voltage goes down is through a generation recombination.
Which is our friend.
Which takes a long time because the sink, of silicone quality is pretty good.
when I say capacitive couple, you can clearly see
that not only is the program voltage important but also is the inhibit voltage.
The larger voltages are used to couple it
up, the higher this capacitive coupling is going to be.
The more this voltage that I show, which
is six in this particular example that's going to be.
And, the better it is going to be for my program disturb.
So, when it's tempted to just crank it up as high as you want.
But, there is another factor
that limits us from doing that and that is inhibit disturb.
So, what happens to the cells on the selected string now.
We have grounded the channel, all of them are sitting at zero.
We need to do that so that this guys programs at 20 volts and if you can club
the voltage on the unselected word lines the electric
field on that word line is going to be pretty high.
And those guys, those cells are going to start programming up.
So that
limits us from increasing the inhibit voltage too high.
And that's what sends the engineering boundary.
That you need to be able to go as high as you
can but not too high because then it's going to have inhibit disturb concerns.
So the program disturb is on the cells
that are being programmed where as the inhibit cells
are on the other word lines on the
block, and the string which is also being programmed.
This, along with the, other criteria that I mentioned in terms of
[UNKNOWN]
are the two big reliability concerns for NAND
and then everything just gets worse with cycling.
But, essentially, those are the two functionality
risks that we have in terms of, NAND.
Any questions?
Comments? Alright, so moving on in
terms of scaling, we talked about basic, Basics of Flash.
We know the structure that is a floating gate.
We know how to read.
We know how to program. We know how to erase.
let's see how it actually behaves in real
world when we start making things smaller and smaller.
the, we, we, probably have a, at least I was when I was a
grad student, a lot more aware of about c m o s scaling issues.
And about flash scaling issues.
So, how does C mos work?
You make all dimensions small, you make all
voltages small, and that's how you keep scaling.
Flash scaling is very different to that kind of an approach.
you have, a very hard limit in terms of how thin you can make the tunnel oxide or
the gate oxide.
You just can't go below 70 because you
want to your trying to guarantee retention, and
if you go below that hopping cell all
that, retention mechanisms kick in, and you lose retention.
So it's set at a minimum thickness and the thickness is around 70 angstroms.
You try to go below that
you're essentially trading retention to other capabilities.
So you can't do that, which is the bread and butter
of the whole NAND of the Morse
[INAUDIBLE]
industry.
They make high K and all that to make the reflective gate oxide smaller.
You can't really do that in Flash.
the ONO has a very similar limit.
You can't really go below certain limit for exactly the same,
vertically field reasons and that is
limited to around 120-140 angstroms electrically.
the other thing we do in c moss, you scale down the voltages.
No, you're not allowed to do that either because you, you need to have 20 volts
between the control and the channel to either
program or erase and making cells smaller laterally
doesn't change that voltage at all. So, you're stuck with very high voltages.
You're stuck with very high vertical non scaling stacks in some
sense but you want to make things smaller in the horizontal direction.
And you can see that this is
essentially not engineer friendly situation where you're
actually increasing all kinds of fields but
not really allowing things to scale down proportionally.
So,
limited voltage and oxide scaling essentially sets you on a
very different scaling, paradigm than what CMOS does for, for Flash.
sorry I didn't see that, go ahead. >>
[INAUDIBLE]
>> If you scale the oxide the voltage goes down, but bullet number one
says, hey, don't scale the oxide because then you don't have any retention left.
So you can go all the way, you can think of it like a
[UNKNOWN].
If we go to 20 angstroms, it's essentially not going to have any, retention.
But your voltage is going to go down and that's essentially what.
You can think of it like a d-, it's a non
volatile memory, you want to save the, it's good for five years.
But you need to be able to, and as a result
things just follow that you can't scale the vertical stack it all.
At least in the conventional CMOS sense.
good news but we have been able to do that.
So cell area has scaled 2X every generation for
as long as I can show it on this
particular plot, and it keeps the engineering minds very
excited because every scaling node is that much more challenging.
So a second plot of challenge versus node
curve it's essentially, I would say, it's logarithmic
[UNKNOWN]
in the reverse sense where it's really getting very hard
to scale and you need to be very creative to scale.
I look at it as a good opportunity to be very creative, and contribute very high.
So it's been fun working on these kind of scale notes.
what are the key challenges?
So, we talked about qualitatively about the volted scaling.
Is there anything else?
There are a lot of other challenges.
the key challenge from a, a floating gate NAND
point of view, is the interference, cell to cell interference.
So, and in a very earlier slide I said that, gate coupling ratio is very
important, which means how much control the
gate has over the control gate versus periphery.
You can, verses all the parasitic that a cell has.
You can imagine that as things come
closer, the parasitic's are going to dominate
over the control of the actual gate, and that's essentially what it boils down to
[UNKNOWN]
called as cell to cell interference. that really sets a pretty hard limit.
and in subsequent slides, what I'm going to try to
do is tell you what the first principle's limits are,
and what engineering solutions we have been using as an
industry, and at Intel and Micron, to actually overcome those.
So that's one key listing. We'll go into details in
the subsequent slides. A few electron effects.
we also went through a very simple equation which
said as you scale down capacitances, the number of
electrons that you need to program are, and to
retain a particular state are also going down proportionally.
As you start dealing with finite, I mean few electrons.
You have to start worrying about the next
level of problem, where random telegraph noise
becomes a deal breaker in some sense.
As well as program fluctuations, where every electron,
so if you're trying to put ten electrons, you
can either put nine or 11, and that
makes a pretty big difference to the VTO programming.
Those are pretty fundamental limits, and we deal with them, but
they are pretty fundamental limits as you keep scaling the cell size.
We also talked about the Voltage Scaling.
So I wanted to repeat myself ahead, and then I'll go
into what the other solutions would be for future scaling and trends.
Of cell to cell interference.
So, with scaling, the influence of
the neighboring cells becomes important as well.
so, a really thorough so we can see in their,
just a, cross-section 2D diagram you can see neighbor cup,
control gates coming closer to the floating gate.
That capacitance is going to go up.
the source intend capacitance is going to go up, the word line to word line or
floating gate to floating gate and the word
line direction, that capacitance is going to go up.
And exactly the same thing happens in the other direction also.
Now this is a good picture maybe for a hundred nanometer node, but as
you keep scaling down, you need to
worry about many, many second order effects also.
For example, the depletion layer.
So you have a floating gate to channel coupling.
Which we don't even know the existence of at the 70 nonometer nodes
but it's a very large per portion of our floationg gate node today
.
And, what really happens here is, a floating gate potential of cell one.
Effects the inversion, inverted channel of the cell two shown here.
So effectively, if cell one is programmed, a higher gate potential is required to
achieve the same inversion layer, whereas if it's not, you need a lower voltage.
So that adds to the effect of coupling and
this is not a direct Floating gate to Floating gate.
Like a two connector paddle plate kind of a problem.
This is actually effecting your depletion regions at the bottom.
So it's the next letter of the problem.
And, as a result capacitances go pretty non-linearly, in terms of being increased.
So,
try to plot a historic trend.
Floating gate interference would have become about 50% of a cell
or 50% of a whole capacitance, at the node that we are making today.
So, if you are at 20 nanometer node, and historic projections, if we weren't trying
to be smart about things, you would have had essentially no control from the gate.
All the cells would have been controlled from the parasitics.
So what did we really do.
again it's a, the idea is pretty simple.
The tricks are actually making sure it works.
We put air gaps to essentially reduce
the dielectric constant and as a result, capacitance.
So we have tried putting air gaps in a different way, and
this is our paper in IEDM three years ago, two years ago.
And in our 25 nanometer node, we actually were
the first ones to actually put in air gap.
I think a
[UNKNOWN]
student did something similar in the bit line direction a long, long time ago.
but we actually are putting it very, very close to the cell
right there and that has been a savior in terms of coupling.
So, you can see at the 25 nano meter node where we would have been
something like 37%, we are actually more like
a very, very manageable 25% and give us.
At least one,
maybe two scaling node opportunities to actually make things work for them.
So, we put in air gap between the floating gates.
The other things that we do, which are not as visually
impressive, but we do a whole bunch of floating gate thickness.
So again this is a parallel plate example, you make things as small as possible and
you reduce the coupling and behind this, there
is a whole bunch of algorathimic tricks that
are also being played. So you want to make sure that you
are doing programming in a way where you are minimizing the effect of neighbors
being programmed. So we do a large amount
of algorithms innovation also to keep our Floating
Gate impact the same. So I can think
of this as go and change the Floating Gate coupling in the first place
and then secondly be smart about how you program things so that the
impact for the same Floating Gate coupling is also optimized