Tip:
Highlight text to annotate it
X
now let's ask
how fast the procedure let me just described for index creation actually is
let's assume there are and documents
and works
and on the average w worst production
for example in the case of the web
will be a really large number
maybe in the millions billions of trillions
will be
another large number
perhaps not as large as and
but quite large nevertheless
and w may or may not be very large because every document may have
uh... a few thousand or a few hundred thousand words
but certainly not distance wars
lets you know what the answer is
knowing that structure here we have a documents
and new zealand ah... and documents
and total possible words non-germans dumping boris production
first thing you notice is that
everywhere
he's document needs to be read atleast once
so the complexity is at least order and doubly
additionally as each word it's red
we need to look up
this sort of structural worse
to figure out whether or not it has already
inserted before
the cost to looking this kind of a structure up is log and as we've seen
earlier
for example of a balanced binding trees used to store the squares
further
we must insert the you are a lot of the document against the word
if we happen to find it
already in the index
tell us
we must consider it
an entry for that work if we don't find it and then search
the document against it
now we claim that each of these
is just a constant cost for work
obviously the seconds
bach which is inserting at blank entry for a work let world width
it's just
one fresh document
is clearly a cost to cost
once we know where to insert the word
suppose
they were already exists in their thousands of documents already against
it
and it's not that the clear
that it's a constant costume insert
the fresh i_d_ in an existing long list of ideas
so there's a new board resumption
but this is going to be a constant cost and the reason for that emotion to come
to an hour later lecture
for though it will go ahead with that is option
if either of you missed it
in the homework
and got confused
that's okay because you were thinking pretty well
but for more than a consumption
therefore the complexity of the procedure is order and w
login
using a balanced binding crystal awards
and as you mean adventuring afresh document
in existing list against somewhere
is a classic cost