Tip:
Highlight text to annotate it
X
>> SMOLA: Templates which you can pretty much use right away to extend paralyze a number
of algorithms and a lot of those are actually from more or less the systems world and so
part of this talk is a bit of tutorial to, you know, show you some basic tricks that
you can use and pretty much, you know, so if you have questions that you can apply to
actually make that work directly for specific algorithms. So, first of all thanks to those
five guys who contributed to that. Sergei the most, that's why he's in the middle, and
like it was Jay and Marcos had some great inspiring discussions. So, let's talk a little
bit about okay, let's say the agenda of this. So, I'll first explain a little bit various
distribution low balancing algorithms, so those of you who've taken a systems class
some of this may look very, very straightforward. Those of few who haven't it may look actually
quite surprising what you can do, so I'll talk a little bit about consistent hashing,
these has the hash tables and so on. Maybe in the end if we have time I'll get to peer-to-peer
networks but probably we won't have time and then I'll show you one application in detail
and then we draw the sketches, how you can actually turn data sketches into something
that is in a both scalable full torrent and also deals with temporal variability in the
data. And that's actually three attributes that you usually don't find in conjunction
with the run of the mill data sketch algorithms. And I'll temporary quickly show you one or
two slides how the very same ideas can be used to this sort of graphical models and
if you want to know more about the latter you should probably go to the none [INDISTINCT]
based talk workshop in the afternoon, but don't run away yet. So, distribution strategies.
So well, let's just look a little bit at what we would want. So, one thing that we would
want is we want to have, you know, a variable and below distribution. So, in other words,
the objects needs to be distributed over possibly a large--full of machines. And well, those
machines can be faulty and then the data needs to go across various numbers of machines,
the computational load needs to be distributed. And one way of actually achieving that is
quite nicely done by what's called consistent hashing and I'll show you that in a moment.
So, especially you can actually find entirely uniform distributions but you could also have
machines with different capabilities, you know, some maybe have twice as fast the process
or a larger disk and you can actually send more data to them so this is what's called
proportional hashing and I'll probably skip over the overlaying networks entirely. Before
going into details, let's briefly talk about hash functions. So, hash functions is essentially
a computational device to fake around a number. I mean, yes, we use a random number generators
anyway but I'm willing to go actually more extreme to say well, you know, rather than
invoking your number generator you should probably stop the hash function with the parameters,
such as that you can just recall that random number at any time. So, the goal why you might
want to do that is because maybe you want to reevaluate that random number if you want
to for instance randomly assign items to machines at a later time, so the naive thing would
be well, you know, for each new key that we see we compute a random number. We store it
in a big look up table and it's, you know, as random [INDISTINCT] number a generator
can be--reduces a ridiculous amount of memory because I now need to increase that table
as I insert more items. And it gets slow as we do more or that and I can't really merge
it well between computers, that's a big deal. So basically if I want to synchronize random
number generators between different machines it's a pain, because yes, you could basically
start them with the same seed but then oh, my god if you call--if some part of your code
calls a random number generator more than the other part, it is good. So you can't really
go back. So better idea is to use well, a--exchange a random number generator with a seed and
you just in invoke it each time. So the good thing is you don't really need any memory
for that because you just, you know, use your key as a seed and then you just, you know,
pull one random number out and now you can actually immerse things between machines,
which is really handy. And well, now the speed of this operation of course is independent
of how many times I've invoked it. There's a lot more beautiful [INDISTINCT] and analysis
for that's why I very much recommend that you actually help look at this paper here
by [INDISTINCT]. So he talks a lot about cryptography, pseudorandom permutation generator and so
on. It's a very beautiful treatment of, you know, what various types of independence mean.
In practice, use something like murmur hash. So murmur hash version three is actually very,
very fast. I think it process in the maybe four of five gigabytes per second per core.
If you want to do something very cheaply this is probably good enough. So this is like a
fast linear congruential generator based--in other words, you take your key multiplied
by an integer, add another integer to it, more the third integer and those coefficients
a, b, and c you can look them up on Wikipedia. And there's probably about 10 different sets
and you pick the one that you like. So, why all this introduction about hash functions?
Well, the first thing I want to show you is the Argmin Hash. This is the probably one
of the older ones. I could trace it back at least to [INDISTINCT] '99 paper, but it's
probably older than that. So, the idea is I have a set of keys and I want uniformly
distribute them over a machine pool. And I want to have that fully determined by hash
function. So what I do is I take well, the Sally the panther, yeah. So, I basically take
the hash off the key and the machine IDs. So for each machine and my machine pool I
do that. And then I pick the machine with the smallest hash value. Okay. Now, this procedure
will, as you can easily see, give me a uniform distribution of keys of the machines. The
nice thing about it is that if a machine dies then--well, for all the keys for which this
machine wasn't the smallest one, nothing happens. So nothing gets reassigned. Furthermore, for
those machines for which the key actually--well, the machine won't have the smallest hash entry,
this machine--well, the next machine is going to be like uniformly random again. So in other
words only a fraction of one over n if I have n machines, only a fraction of one over n
of the data gets randomly assigned, uniformly over the other end machines. That's quite
convenient. Because it basically means that if I have a failure, my low bouncing is going
to spread across my entire set of machines. Likewise, if I need to add additional resources,
again only a very small fraction of keys will be reassigned and it will pull them uniformly
from all the other machines. It's kind of cute. You can do a little bit better than
that if you use a lot of replications. So, for instance if I want to have a distributed
file system with lots of replications and on a single master, then what I could do is
I could just, you know, take case smallest guys rather than the smallest guy. And I gain
the same properties hold and there's actually a beautiful file system which does this, it's
called Ceph, C-E-P-H. And it's a very nice distributed replicated file system and they
do a lot of other [INDISTINCT] tricks in it. So if you're interested in that kind of stuff
look it through. There's one big downside to this as you can immediately see. That's
okay if your machine pulls in, you know, a couple of thousand maybe a hundred machines
but from there on it gets really messy because I mean you need to compute this hash for each
single machine. That can be very, very expensive. So can we fix that? Well, the--one way to
fix it is what's called the distributed hash table. So for instance Cassandra uses a--well,
the slightly hacky version of that, so basically what you do is you take your machines and
you have--well, for each machine ID-able let's take their pin number, I compute the hash
and I put them on to the ring of n key. So it's the mathematical ring of n keys and now
what I do is, for each item that comes along, I go and hash it and I didn't assign it to
well, in this case the--well, rightmost machine to that key. A new item comes along, you know,
assigned to this machine and so on. Now, if you do that you're obviously very fast because
you have a logarithmic time look up in the number of machines, but the problem is that
as you can easily see those distances here can actually vary quite a bit. The good thing
is--as I'll show you in a moment that variation can be made reasonably small but you have
a lot more variations. The other downside is if I lose a machine here then immediately
well, this machine here will get all the keys from that one. So, I have significant loading
balance and if I--as I insert the remover machine and furthermore if--well, the overall
blocks are quite large too. So, let's see how we can actually fix that a little bit.
Let's first do a little bit of analysis. So as you can reasonably straightforwardly see
you can always define the machine that you're interested in as being at position zero. So
then the probability for that segment to the right inside being larger than--being larger
than C is probably just one minus C to the M minus one, where M is the number of machines.
You can work out a density just by looking at the derivative and obviously by symmetry
the expected link of a segment is one over M. So plugging this inside in here and making
the assumption of very large number of machines you can see that the probability that the
segment is K times as long as the average segment is like E to the minus K. So, you
can use this fact to then, you know, say, well actually per machine we don't only pick
one segment but maybe ten segments or so. And then, you know, concentration starts kicking
in and the average say--well, basically the size distribution of those ten sub segments
is much less than, you know, what you would've gotten with the single one. And then also
a few balancing operation works better. You can do more beautiful things, so this is called
proportional hashing. So there's for instance a very nice paper by [INDISTINCT] 2011 which
describes that in great detail for designing a video caching system and the idea is essentially
very similar to what you will get in the rejection sampler. So, you basically allocate--pre-allocate
does a certain interval. And then for each machine and you have machine pool; you basically
pre-define a small segment size. And then if your machine--if a key comes along I go
and compute the hash. And, okay, well here I hit something. And they go and compute another
hash. Well, it didn't hit anything so I need to rehash and I hit something. And I keep
on doing this until I get something. So, you can easily see this now gives me a proportional
load value of distribution, however you can also see that, you know, once you ran out
of space you're done for. So, you know, you pick that maybe a factor of 10 or something
to make sure that there's enough space for improvement. Now, for those of you who have
built samplers would immediately see this looks awfully like a rejection sampler, right?
You have a proposal distribution which is uniform over those larger interval, my acceptance
is only if I hit those bins and I think there's actually some very nice statistical combination--well,
connections between, you know, sampling load distribution in machine learning that are
worthwhile exploring, talk to me later if you're interested in that. So, a nice application
of that are the random caching trees of Akamai from--of Karger et al. That's, I think, pretty
much the paper that started Akamai and the ideas you build random trees for each key
and then, well you just--therefore, you know, if I'm now creating hotspots because, you
know, each key is being cached in a different way. So this will be the master and those
mix labeled nodes would only be instantiated whenever you actually have enough requests.
And there's a lot more fun stuff to it. Okay. So this is what you get, basically you get
the same [INDISTINCT] at each node, the smallest, you know, uniform work. So, now that's just
the background a little bit. To give you a bit of an idea of what's going on; let me
now show you what we use this for. Let's take sketching algorithms. So, well what you have
is, you know, a real time data stream like, for instance queries and you might want to
get a real time response of aggregate statistics. And I'm going to make the following relaxation
namely; it's perfectly okay if I don't quite get the exact answer as long as it's reasonably
good. And I want to be able to get aggregate accounts, as in I want to be able to ask,
you know, how many times did I see a particular query in a particular time range. And unfortunately
of course instead queries or whatever other items is not known beforehand and what I want
to make sure is that I have perfect scaling. As in more machines means I can process more
data. They mean I have higher fault tolerance. They mean I have higher accuracy and I can
also pull more requests out of it. And I'll now show you how those very simple ideas that
I--those hashing ideas that I demonstrated before can be used to extend something like
Count min sketch immediately to such an algorithm. Why do we care? Well, you know, if I have
stuff that's you know several days old well, okay, maybe I can use Hadoop or whatever to
do offline batch processing, this is for the real time sketching system and it makes for
a very simple system. So the three tools that we'll need is the Count min sketch, which
I'm going to explain to you now. The inconsistent hashing, what we looked at before and then
some interpolation tricks, you know, where to look back in terms of the marginals. Okay.
So here's the Count min sketch. So, who of--who in the audience has--knows how bloom filter
works? Cool. Okay. Now, think of the bloom filter with integers rather than bits and
you pretty much have the Count min sketch. Now, in--okay. So, here's what you do. You
basically take one row for hash functions so that's a little bit different from a bloom
filter. And the inside algorithm works as follows, for a given key you compute for each
row the hash for this key. And you increment the corresponding bin by one. At query time
I go and look at all the bins that I have touched and I take the minimum over those
bins. So why is this a good idea? Well, first of all--and the assumption is we only do inserts,
there are interesting variations on that theme but, basically for all on the inserts we know
all those numbers are on negative. Furthermore, we know that each of those bins got incremented
whenever we saw that particular item. So, therefore each of those numbers is going to
be an upper bound. Since all of them are going to be an upper bound, I can take the minimum,
there's still going to be an upper bound. So, this is a very nice algorithm by Cormode
Muthukrishnan which does this. So, the guarantees--and these are--that's a really amazing thing is
I mean, okay. This part you can see immediately that the count that I get from this is going
to be well, a great or equal than the actual number of counts. You can also see that this
is--and this is a little bit more of a proof but it's actually a very elementary proof
that it's basically upper bounded by epsilon, where epsilon is basically--it seems to be
one of over a number of bins times the total number of inserts that I made. Furthermore,
if I have a power law distribution--this is actually in there. So, this is a very, very
nice sketch because it's actually first and I'm not sure whether it's so far still the
only sketch that has an order of one over a number of bins dependency rather one over
a square root number of bins. And so, it seems to be the way how you prove it is you--okay.
[INDISTINCT] is trivial with the upper bound, you first prove the expected value and then
you essentially show the probability that you have a certain deviation per hash row.
And then you use essentially just, you know, multi--you just boost the probability by multiplying
them in order to show a tight bound. So, it's a very, very elementary proof and you will
enjoy reading that paper, it's beautiful. So, the problem however is well, you know,
the accuracy while it's really great and it's like one over a number of bins, well at some
point they ran out of memory so if I want to have high accuracy well I can't. Furthermore,
this works beautifully if I have a single machine where I insert, if I have many machines
well, you know, I, first of all, need to be able to actually process the data. Secondly,
I need reliability and then finally this is a fully [INDISTINCT] sketch. What I want to
be able to do is actually get the similar estimate for times that you start. Okay. So
the second tool we're going to use is now an application of consistent hashing. So,
and now, show you this in about three simple steps. So, first of all how do we increase
throughput? Well that's really easy, right? We have a single machine, now we have several
servers and I just chart my keys over those servers. In other words I just, you know,
uniformly distribute the keys over, you know, through this Argmin Hash. And I could use
something different but Argmin is good enough for me for that because I might have dozens
of servers. Now, this is really good, right? Because, you know, the accuracy actually goes
off like one over number of servers because now each server gets only a smaller fraction
of the data. And since the accuracy was amount of data inserted over memory locations my
accuracy goes up, subject to all small load factors but I'm ignoring them here. Furthermore,
the throughput increases because if I have K servers I can insert K times as many keys.
But I'm really screwed because the reliability decreases. So if I lose one machine basically
my system goes down. How can I fix this? Now, let's look at that part. So, suppose now--here
we have here various hashes, each of them go--all of them go into one machine. Well,
there's absolutely no need for that. I could go and essentially insert one hash per machine
and then the good thing is, you know, I would really have to kill all those machines in
order to lose all information about that data. Okay? So, in other words I now go and insert
into, you know, K servers rather than to a single one and that way I gained reliability.
This is great. The accuracy actually increases because now each machine needs to store all
its data. The throughput is more or less constant depending on my inter-process communications
library but the problem is that, you know, the latency actually goes off a lot for queries
because I need to wait until all servers have returned the argument and there's no acceleration.
Okay. So, that's pretty much the opposite end of what I showed you before in terms of
scaling up. Now, we increase reliability. Okay. Next step. Can we actually increase
query throughput? Well, that's fairly straightforward right? I just over insert in two more servers;
let's say I need four hashes. To get a good answer, I insert into maybe eight machines
and if I have a lot more queries than inserts, you know I just pick a random subset of four
machines out of the eight that have my data and I'll be fine. And on top of that, I gain
reliability. So this is also easy. Now, let's just combine the tricks. So the trick is basically
you assign keys only to a subset of machines. You over replicate them for reliability, and
you over replicate them for creative parallelism. And there will be a picture on it in a moment.
So what you do is, again you use this consistency attaching, so now for each key there will
be another random subset of K machines that are responsible for it. Each of them gets
has one hash function, to make just sure that you seed it with the machine specific key.
Otherwise, if you have the same hash functions on several machines, you get no improvement;
it's just if you implement it it's a useful thing to know. And then you request from a
smaller subset of those machines that can you--can you use consistent hashing to have
a random subset depending on the client. So then you can actually prove a few useful things
for it. So, the useful bit here is, I mean--I mean, simplifying all of this, essentially
you need about one extra machine to hold your sketch if you have let's say 10 failures and
at least 20 machines. So, the argument goes as follows, you basically can pick--you can--well,
since hashing randomized is the inserts of the machines, an adversary cannot release
[INDISTINCT] in an arbitrary bad way, because--so therefore I can essentially randomize over
the failures. If I assume that insert into less than half the machines in the machine
pool, I can majorize the drawing with--without replacement, because basically if a machine
is dead well, I can't kill it again. [INDISTINCT] something where the machine gets killed over
and over. So this upper bound is the number of failures that I can experience. And it's
a very simple three-fold line analysis where you just have a bino--just a binomial expansion
to get that in equality. So, there's more detail in the paper that set at the moment
under submission. It's a very, very simple analysis. So just to show you the picture
of what's going on, so here is our machine pool and we decide to insert into those pool
machines and for a query, I picked the random subset of those three machines to get the
answer. This scales linearly with the number of servers in terms of accuracy, reliability
and well, scalability. So this is kind of nice. Now, I haven't told you yet how to fix
the temporal aspect. So, the time series interpolation essentially works as follows; so the Count
min sketch is a linear statistic, that's the key idea. So in other words, if I want to
sketch two sets, I just sum the sketches and everything's fine. So if I want to get a sketch
of a longer term interval, I take the first sketch, the second sketch and then sum them
off, easy. So, the other thing is, I can also reduce the bit resolution by essentially folding
the sketch over unto itself and so--and the good thing is I can do that with the process
data as opposed to ever having to touch the original data again. So let me first show
you the time aggregation. So what are we doing is, something actually very straightforward.
We picked an exponentially increasing set of intervals, so 1, 2, 4, 8 and so on. And
so, each two to the--each two to the N time [INDISTINCT], I need to compute--okay. So
it's not two N but two to the N, sorry. That went wrong. I need to recompute all the bins
up to two to the N. The trick here is that to have accumulative sum where I have to once
twice. That makes everything work out nicely so instead now the accumulative sum is basically
just powers of two. In other words, if I want to compute that interval, I just need to sum
over all those interval if they were consecutively aligned. So what I do is I basically let those
intervals age until they are perfectly aligned. Then I do an aggregation sweep and I get the
next interval. I'll show you that in the picture in a moment. The good thing is, I only have
to write into the first bin, this is the only one that's really going to get all the inserts.
Everything else is fine so I have reasonably memory locate--locality, and you can actually
show that the aggregation is basically log log t time, in terms of the aggregation cost.
So it's a non issue. The good thing is, we get loggers mixed storage in terms of the
time interval that we sketch. So, let me show you the intervals. Let's say, first we have
interval one. Then okay, then it ages, we write into another interval and it ages. Now,
we have to--now, we basically go sum those two up and get to the interval of links too
and this one gets--this one gets aged. Okay. So now, please move over. Now, we can compute
the interval of links four and so on and so on and we can go on. Okay. So we have link
one, two, four and eight. However, and this is stored in memory in that way. Correct?
We always write here and these intervals age. The other thing is just basically going from,
you know, backwards and forwards in time. Now, I can do a similar thing with a bit resolution.
So if I have my bit resolution every two to the n steps. You can easily see that, therefore,
you know, a links of two to the n will always require the same amount of memory. So, if
the first one for the--for, you know, time intervals two and three, I need the same storage
for four, five, six and seven and I need to get the same storage. Okay. So why did I show
you those two techniques? All right. I mean, one in a way gives you a good aggregation
over time. Every one gives an aggregation of item. So we basically get a trading of
two knobs here. We can either, reduce the item resolution of our sketch and keep the
time resolution accurate. So in the most extreme case, maybe for historical data that's two
years old, we will only know that, well in that particular year, basically how many times
that the query machine learning was issued in a particular year? The other extreme is
that for, you know, one year ago at exactly around midnight, I have maybe a million queries.
All right? But I won't know anymore which ones they are. Now, that's effectively, you
know, pinning down the marginals of a joint distribution which I would really like to
be able to store but which I can't [INDISTINCT] of storage cost. And this is fixed by--at
least approximately fixed by then just taking the product of the marginals as a good estimate.
Now, you would immediately say, well hey there's so many smart algorithms out there it's like,
you know, just any exponential family sketching, you know, estimation algorithm will do better,
sure. But, these algorithms require more than or the one time in order to get the number,
so therefore, we do the really cheap and dirty thing. There's actual interesting question
of what you would do and tie with the marginals. Okay. Let me show you very briefly that it
works. Okay. So data that we had it for the nice power low distribution. And, okay. So
here are three sketches that you will see in its certified and interesting way. So,
okay, you can't really see the colors but basically, we compare the sketch which is
just aggregate, so with the items, you know, which basically at each two to the entire
steps, reduces the item resolution one, with one that basically just, you know, makes longer
and longer intervals and then it seems you predicts the [INDISTINCT] the cost and function
into something that uses the product. And the green line is the product and we can see
it's doing reasonably well. The other thing that you can see is that as we get more and
more data, this blue line which basically it's just to reduces the bits, does a bit
better. And that's not very surprising because if you had a very heavy hitter, if you look
at the original Cormode Muthukrishnan guarantees, you can see that they basically are absolute
deviations and well, I thought a relative one. So, if I have a very heavy hitter, then,
it doesn't hurt me so much if I have collisions with a lot of irrelevant inquiries. So, you
could probably do something smart here by switching between those two estimators. The
fact that these purple curves have slight peaks, it has to do with the fact that this
is a non-uniform volume. So, of course if you have a lot of volume, well, the error
loss will go up, especially if I estimate something [INDISTINCT] constant. Similar figures
I can show you for different error measures and things will be hypothetically the same.
This is probably quite interesting. Here is the number of read request and on the next
slide is the number of insert request over now between--well, basically one and ten servers.
Obviously if I have--if I want to read from at least three machines, I need to start here
and from five questions, I need to start here. And what you can see is it's basically a linear
scaling. So this is as good as it gets a similar thing here. You can use the same tricks for
other sketches. So, you have points on SpaceSaving Sketch. In that case well--okay. Can't really
go into much detail on how it works but basically you use the very same machinery. Okay. We're
going to skip peer-to-peer and I'm quickly going to show you more or less the same picture
here. So, for distributing random variables in an inference procedure, again you have
essentially a tree--well, you basically need to build the replica tree. And that replica
tree is instantiated through a tree which will reduce fix, you know, the layout that
you have within your cluster. Within the rack you want to be as flat as possible because
you sit on a switch, you have perfect, you know, communication between the racks while
you probably want to use the--well, take--well, basically respect the fact that, you know,
you're between the rack bandwidth is a lot lower than that the within rack bandwidth.
And, okay, I can't talk about synchronization. Distribution, fault tolerance, Joey told me
anyway that my time is up. Here's a very, very brief thing that you sometimes may want
to worry about, namely, suppose I have, you know, K machines that need to talk to another
set of K machines. And suppose that the message is that, each of those machines have is like,
you know, one over a number of machines. So, basically the more machines that I have, the
smaller--the smaller packets each machine needs to tell to another one. So, if I think
just naively let them all talk to everybody at the same time, I will get a very low data
rate between any pair. Which is really bad because basically my switch and my entire
[INDISTINCT] will create big over hits. So, one way to fix this is to actually, you know,
schedule who talks to whom in a specific order. Now, if I wrote to--you could always think--okay
let's just pick a nice, you know, synchronized order but that's not a good idea because as
soon as a single machine is delayed, all the delays pile up and bad things happens. So,
what you can do instead is, you just essentially start with a replacement by picking a random
schedule for each machine. Now, if you do that and I just pick, you know, one machine
at a time, I create two problems. One is some machines will be entirely left out. So, that's
exactly the one over E that you also know from the bootstrap and what's--at least this
path, some machines will be hotspots because they will actually receive inserts from more
than one machine. Now, there's a very easy fix to it. Well, you pick more than one but
maybe two, turns out four is a good number and there's a little inequality that I'll
show you in the next slide. And so, for instance with two connections, you can already see
that the efficiency of this communication scheme is between .78 and .89. Okay. So, how
did I get those numbers? Well, these are basically nine machines. Well, this machine is left
out so I definitely can only get eight, ninth of the efficiency, to get that lower bound
you basically say, well; each local machine spreads its network bandwidth evenly between
all the links. And then, I just add up with the [INDISTINCT] here if I have, you know,
let's say three at--it's just coming in--I'll just throw one of them away. And that gives
you the lower bound. In practice it's somewhere in between. Let's just--for this specific
random graph--in general you can get this inequality in the ***--in the limit of large
number of machines and it seems you fall simultaneous connections are enough, if you do that, that's
what you get. And I think; now I fully overstayed my time. So, it's just--well, two billion
documents, not million, billion on a thousand machines and that's basically a same figure,
you know, give sampler procedure to--well, do some use of profiling. So, the last slide
and this is essentially what you could almost use as a--some of the for research agenda
is take your algorithm, ask what happens--what guarantees can I get if some of the machines
die. Furthermore, I want to have linear scaling as I throw more hardware at the problem. The
third issue is--well, I actually have communication constraints because--well, I have non-uniform
connectivity structure within my server centers and I want to keep parameters local. And this
gives me a lot of really fun questions. One corollary, if you design the algorithms, try
to avoid having a master. Because masters are single points of failure. And you have
to work hard to make them go away. Okay. That's all I want to say. Yes?
>> [INDISTINCT] we have time to [INDISTINCT] your questions.
>> You said [INDISTINCT] the connections between problem solving here and [INDISTINCT] of the
[INDISTINCT] measuring sides and flows [INDISTINCT] with [INDISTINCT]
>> SMOLA: Great. So if--it's actually a flow problem. It's definite--it's a very simple
flow problem and well, we did the very--I'm sorry, that's my--all right. That's mine.
Thank you. We did the very sloppy analysis of just checking them actually, you know,
your flow is reasonably well controlled. So, basically 80% was good enough for us because
everything beyond that is just, you know, lost in the network stack anyway but yes there
is also connection or relationship to random codes and so on. Yes?
>> [INDISTINCT] >> SMOLA: Yes. So, as a matter of fact you
will find a lot--oh, are you okay? >> Yes.
>> SMOLA: You will find a lot of the same ideas of consistent caching and distribution
also on--for instance on [INDISTINCT] paper. [INDISTINCT] paper has to worry about a few
other things such as wording and so on. Which we have to worry about before when we have
a--well, latency bound protocol because I mean first we didn't have [INDISTINCT] and
so on but that's essentially a small instance of what you will get--you know, just with
this [INDISTINCT]. And the difference is we have lowered your ability requirements. At
least, you know, for these scenarios and what you will get maybe on the [INDISTINCT]. But
you could use a lot of those traits. So, for instance, if you want to make sure that even
your [INDISTINCT] partitions, your writes still work, you could essentially take the
[INDISTINCT] modified and turn it into a sketching service. And that would probably actually
work fairly well. >> [INDISTINCT]
>> SMOLA: Yes. >> [INDISTINCT]
>> SMOLA: Okay. >> [INDISTINCT]
>> SMOLA: Thanks. I mean thanks for the point term; I'll definitely look it up. I mean a
lot of really cool things in sketching have happened recently--as in maybe the past five
years--five to ten years. So, it's maybe a good idea to visit some of those old ideas
if just, you know, apply it in the [INDISTINCT] sketches.
>> [INDISTINCT] time.