Nips 2011 Big Learning - Algorithms, systems, & tools workshop - Real time data...

>> SMOLA: Templates which you can pretty much use right away to extend paralyze a number of algorithms and a lot of those are actually from more or less the systems world and so part of this talk is a bit of tutorial to, you know, show you some basic tricks that you can use and pretty much, you know, so if you have questions that you can apply to actually make that work directly for specific algorithms. So, first of all thanks to those five guys who contributed to that. Sergei the most, that's why he's in the middle, and like it was Jay and Marcos had some great inspiring discussions. So, let's talk a little bit about okay, let's say the agenda of this. So, I'll first explain a little bit various distribution low balancing algorithms, so those of you who've taken a systems class some of this may look very, very straightforward. Those of few who haven't it may look actually quite surprising what you can do, so I'll talk a little bit about consistent hashing, these has the hash tables and so on. Maybe in the end if we have time I'll get to peer-to-peer networks but probably we won't have time and then I'll show you one application in detail and then we draw the sketches, how you can actually turn data sketches into something that is in a both scalable full torrent and also deals with temporal variability in the data. And that's actually three attributes that you usually don't find in conjunction with the run of the mill data sketch algorithms. And I'll temporary quickly show you one or two slides how the very same ideas can be used to this sort of graphical models and if you want to know more about the latter you should probably go to the none [INDISTINCT] based talk workshop in the afternoon, but don't run away yet. So, distribution strategies. So well, let's just look a little bit at what we would want. So, one thing that we would want is we want to have, you know, a variable and below distribution. So, in other words, the objects needs to be distributed over possibly a large--full of machines. And well, those machines can be faulty and then the data needs to go across various numbers of machines, the computational load needs to be distributed. And one way of actually achieving that is quite nicely done by what's called consistent hashing and I'll show you that in a moment. So, especially you can actually find entirely uniform distributions but you could also have machines with different capabilities, you know, some maybe have twice as fast the process or a larger disk and you can actually send more data to them so this is what's called proportional hashing and I'll probably skip over the overlaying networks entirely. Before going into details, let's briefly talk about hash functions. So, hash functions is essentially a computational device to fake around a number. I mean, yes, we use a random number generators anyway but I'm willing to go actually more extreme to say well, you know, rather than invoking your number generator you should probably stop the hash function with the parameters, such as that you can just recall that random number at any time. So, the goal why you might want to do that is because maybe you want to reevaluate that random number if you want to for instance randomly assign items to machines at a later time, so the naive thing would be well, you know, for each new key that we see we compute a random number. We store it in a big look up table and it's, you know, as random [INDISTINCT] number a generator can be--reduces a ridiculous amount of memory because I now need to increase that table as I insert more items. And it gets slow as we do more or that and I can't really merge it well between computers, that's a big deal. So basically if I want to synchronize random number generators between different machines it's a pain, because yes, you could basically start them with the same seed but then oh, my god if you call--if some part of your code calls a random number generator more than the other part, it is good. So you can't really go back. So better idea is to use well, a--exchange a random number generator with a seed and you just in invoke it each time. So the good thing is you don't really need any memory for that because you just, you know, use your key as a seed and then you just, you know, pull one random number out and now you can actually immerse things between machines, which is really handy. And well, now the speed of this operation of course is independent of how many times I've invoked it. There's a lot more beautiful [INDISTINCT] and analysis for that's why I very much recommend that you actually help look at this paper here by [INDISTINCT]. So he talks a lot about cryptography, pseudorandom permutation generator and so on. It's a very beautiful treatment of, you know, what various types of independence mean. In practice, use something like murmur hash. So murmur hash version three is actually very, very fast. I think it process in the maybe four of five gigabytes per second per core. If you want to do something very cheaply this is probably good enough. So this is like a fast linear congruential generator based--in other words, you take your key multiplied by an integer, add another integer to it, more the third integer and those coefficients a, b, and c you can look them up on Wikipedia. And there's probably about 10 different sets and you pick the one that you like. So, why all this introduction about hash functions? Well, the first thing I want to show you is the Argmin Hash. This is the probably one of the older ones. I could trace it back at least to [INDISTINCT] '99 paper, but it's probably older than that. So, the idea is I have a set of keys and I want uniformly distribute them over a machine pool. And I want to have that fully determined by hash function. So what I do is I take well, the Sally the panther, yeah. So, I basically take the hash off the key and the machine IDs. So for each machine and my machine pool I do that. And then I pick the machine with the smallest hash value. Okay. Now, this procedure will, as you can easily see, give me a uniform distribution of keys of the machines. The nice thing about it is that if a machine dies then--well, for all the keys for which this machine wasn't the smallest one, nothing happens. So nothing gets reassigned. Furthermore, for those machines for which the key actually--well, the machine won't have the smallest hash entry, this machine--well, the next machine is going to be like uniformly random again. So in other words only a fraction of one over n if I have n machines, only a fraction of one over n of the data gets randomly assigned, uniformly over the other end machines. That's quite convenient. Because it basically means that if I have a failure, my low bouncing is going to spread across my entire set of machines. Likewise, if I need to add additional resources, again only a very small fraction of keys will be reassigned and it will pull them uniformly from all the other machines. It's kind of cute. You can do a little bit better than that if you use a lot of replications. So, for instance if I want to have a distributed file system with lots of replications and on a single master, then what I could do is I could just, you know, take case smallest guys rather than the smallest guy. And I gain the same properties hold and there's actually a beautiful file system which does this, it's called Ceph, C-E-P-H. And it's a very nice distributed replicated file system and they do a lot of other [INDISTINCT] tricks in it. So if you're interested in that kind of stuff look it through. There's one big downside to this as you can immediately see. That's okay if your machine pulls in, you know, a couple of thousand maybe a hundred machines but from there on it gets really messy because I mean you need to compute this hash for each single machine. That can be very, very expensive. So can we fix that? Well, the--one way to fix it is what's called the distributed hash table. So for instance Cassandra uses a--well, the slightly hacky version of that, so basically what you do is you take your machines and you have--well, for each machine ID-able let's take their pin number, I compute the hash and I put them on to the ring of n key. So it's the mathematical ring of n keys and now what I do is, for each item that comes along, I go and hash it and I didn't assign it to well, in this case the--well, rightmost machine to that key. A new item comes along, you know, assigned to this machine and so on. Now, if you do that you're obviously very fast because you have a logarithmic time look up in the number of machines, but the problem is that as you can easily see those distances here can actually vary quite a bit. The good thing is--as I'll show you in a moment that variation can be made reasonably small but you have a lot more variations. The other downside is if I lose a machine here then immediately well, this machine here will get all the keys from that one. So, I have significant loading balance and if I--as I insert the remover machine and furthermore if--well, the overall blocks are quite large too. So, let's see how we can actually fix that a little bit. Let's first do a little bit of analysis. So as you can reasonably straightforwardly see you can always define the machine that you're interested in as being at position zero. So then the probability for that segment to the right inside being larger than--being larger than C is probably just one minus C to the M minus one, where M is the number of machines. You can work out a density just by looking at the derivative and obviously by symmetry the expected link of a segment is one over M. So plugging this inside in here and making the assumption of very large number of machines you can see that the probability that the segment is K times as long as the average segment is like E to the minus K. So, you can use this fact to then, you know, say, well actually per machine we don't only pick one segment but maybe ten segments or so. And then, you know, concentration starts kicking in and the average say--well, basically the size distribution of those ten sub segments is much less than, you know, what you would've gotten with the single one. And then also a few balancing operation works better. You can do more beautiful things, so this is called proportional hashing. So there's for instance a very nice paper by [INDISTINCT] 2011 which describes that in great detail for designing a video caching system and the idea is essentially very similar to what you will get in the rejection sampler. So, you basically allocate--pre-allocate does a certain interval. And then for each machine and you have machine pool; you basically pre-define a small segment size. And then if your machine--if a key comes along I go and compute the hash. And, okay, well here I hit something. And they go and compute another hash. Well, it didn't hit anything so I need to rehash and I hit something. And I keep on doing this until I get something. So, you can easily see this now gives me a proportional load value of distribution, however you can also see that, you know, once you ran out of space you're done for. So, you know, you pick that maybe a factor of 10 or something to make sure that there's enough space for improvement. Now, for those of you who have built samplers would immediately see this looks awfully like a rejection sampler, right? You have a proposal distribution which is uniform over those larger interval, my acceptance is only if I hit those bins and I think there's actually some very nice statistical combination--well, connections between, you know, sampling load distribution in machine learning that are worthwhile exploring, talk to me later if you're interested in that. So, a nice application of that are the random caching trees of Akamai from--of Karger et al. That's, I think, pretty much the paper that started Akamai and the ideas you build random trees for each key and then, well you just--therefore, you know, if I'm now creating hotspots because, you know, each key is being cached in a different way. So this will be the master and those mix labeled nodes would only be instantiated whenever you actually have enough requests. And there's a lot more fun stuff to it. Okay. So this is what you get, basically you get the same [INDISTINCT] at each node, the smallest, you know, uniform work. So, now that's just the background a little bit. To give you a bit of an idea of what's going on; let me now show you what we use this for. Let's take sketching algorithms. So, well what you have is, you know, a real time data stream like, for instance queries and you might want to get a real time response of aggregate statistics. And I'm going to make the following relaxation namely; it's perfectly okay if I don't quite get the exact answer as long as it's reasonably good. And I want to be able to get aggregate accounts, as in I want to be able to ask, you know, how many times did I see a particular query in a particular time range. And unfortunately of course instead queries or whatever other items is not known beforehand and what I want to make sure is that I have perfect scaling. As in more machines means I can process more data. They mean I have higher fault tolerance. They mean I have higher accuracy and I can also pull more requests out of it. And I'll now show you how those very simple ideas that I--those hashing ideas that I demonstrated before can be used to extend something like Count min sketch immediately to such an algorithm. Why do we care? Well, you know, if I have stuff that's you know several days old well, okay, maybe I can use Hadoop or whatever to do offline batch processing, this is for the real time sketching system and it makes for a very simple system. So the three tools that we'll need is the Count min sketch, which I'm going to explain to you now. The inconsistent hashing, what we looked at before and then some interpolation tricks, you know, where to look back in terms of the marginals. Okay. So here's the Count min sketch. So, who of--who in the audience has--knows how bloom filter works? Cool. Okay. Now, think of the bloom filter with integers rather than bits and you pretty much have the Count min sketch. Now, in--okay. So, here's what you do. You basically take one row for hash functions so that's a little bit different from a bloom filter. And the inside algorithm works as follows, for a given key you compute for each row the hash for this key. And you increment the corresponding bin by one. At query time I go and look at all the bins that I have touched and I take the minimum over those bins. So why is this a good idea? Well, first of all--and the assumption is we only do inserts, there are interesting variations on that theme but, basically for all on the inserts we know all those numbers are on negative. Furthermore, we know that each of those bins got incremented whenever we saw that particular item. So, therefore each of those numbers is going to be an upper bound. Since all of them are going to be an upper bound, I can take the minimum, there's still going to be an upper bound. So, this is a very nice algorithm by Cormode Muthukrishnan which does this. So, the guarantees--and these are--that's a really amazing thing is I mean, okay. This part you can see immediately that the count that I get from this is going to be well, a great or equal than the actual number of counts. You can also see that this is--and this is a little bit more of a proof but it's actually a very elementary proof that it's basically upper bounded by epsilon, where epsilon is basically--it seems to be one of over a number of bins times the total number of inserts that I made. Furthermore, if I have a power law distribution--this is actually in there. So, this is a very, very nice sketch because it's actually first and I'm not sure whether it's so far still the only sketch that has an order of one over a number of bins dependency rather one over a square root number of bins. And so, it seems to be the way how you prove it is you--okay. [INDISTINCT] is trivial with the upper bound, you first prove the expected value and then you essentially show the probability that you have a certain deviation per hash row. And then you use essentially just, you know, multi--you just boost the probability by multiplying them in order to show a tight bound. So, it's a very, very elementary proof and you will enjoy reading that paper, it's beautiful. So, the problem however is well, you know, the accuracy while it's really great and it's like one over a number of bins, well at some point they ran out of memory so if I want to have high accuracy well I can't. Furthermore, this works beautifully if I have a single machine where I insert, if I have many machines well, you know, I, first of all, need to be able to actually process the data. Secondly, I need reliability and then finally this is a fully [INDISTINCT] sketch. What I want to be able to do is actually get the similar estimate for times that you start. Okay. So the second tool we're going to use is now an application of consistent hashing. So, and now, show you this in about three simple steps. So, first of all how do we increase throughput? Well that's really easy, right? We have a single machine, now we have several servers and I just chart my keys over those servers. In other words I just, you know, uniformly distribute the keys over, you know, through this Argmin Hash. And I could use something different but Argmin is good enough for me for that because I might have dozens of servers. Now, this is really good, right? Because, you know, the accuracy actually goes off like one over number of servers because now each server gets only a smaller fraction of the data. And since the accuracy was amount of data inserted over memory locations my accuracy goes up, subject to all small load factors but I'm ignoring them here. Furthermore, the throughput increases because if I have K servers I can insert K times as many keys. But I'm really screwed because the reliability decreases. So if I lose one machine basically my system goes down. How can I fix this? Now, let's look at that part. So, suppose now--here we have here various hashes, each of them go--all of them go into one machine. Well, there's absolutely no need for that. I could go and essentially insert one hash per machine and then the good thing is, you know, I would really have to kill all those machines in order to lose all information about that data. Okay? So, in other words I now go and insert into, you know, K servers rather than to a single one and that way I gained reliability. This is great. The accuracy actually increases because now each machine needs to store all its data. The throughput is more or less constant depending on my inter-process communications library but the problem is that, you know, the latency actually goes off a lot for queries because I need to wait until all servers have returned the argument and there's no acceleration. Okay. So, that's pretty much the opposite end of what I showed you before in terms of scaling up. Now, we increase reliability. Okay. Next step. Can we actually increase query throughput? Well, that's fairly straightforward right? I just over insert in two more servers; let's say I need four hashes. To get a good answer, I insert into maybe eight machines and if I have a lot more queries than inserts, you know I just pick a random subset of four machines out of the eight that have my data and I'll be fine. And on top of that, I gain reliability. So this is also easy. Now, let's just combine the tricks. So the trick is basically you assign keys only to a subset of machines. You over replicate them for reliability, and you over replicate them for creative parallelism. And there will be a picture on it in a moment. So what you do is, again you use this consistency attaching, so now for each key there will be another random subset of K machines that are responsible for it. Each of them gets has one hash function, to make just sure that you seed it with the machine specific key. Otherwise, if you have the same hash functions on several machines, you get no improvement; it's just if you implement it it's a useful thing to know. And then you request from a smaller subset of those machines that can you--can you use consistent hashing to have a random subset depending on the client. So then you can actually prove a few useful things for it. So, the useful bit here is, I mean--I mean, simplifying all of this, essentially you need about one extra machine to hold your sketch if you have let's say 10 failures and at least 20 machines. So, the argument goes as follows, you basically can pick--you can--well, since hashing randomized is the inserts of the machines, an adversary cannot release [INDISTINCT] in an arbitrary bad way, because--so therefore I can essentially randomize over the failures. If I assume that insert into less than half the machines in the machine pool, I can majorize the drawing with--without replacement, because basically if a machine is dead well, I can't kill it again. [INDISTINCT] something where the machine gets killed over and over. So this upper bound is the number of failures that I can experience. And it's a very simple three-fold line analysis where you just have a bino--just a binomial expansion to get that in equality. So, there's more detail in the paper that set at the moment under submission. It's a very, very simple analysis. So just to show you the picture of what's going on, so here is our machine pool and we decide to insert into those pool machines and for a query, I picked the random subset of those three machines to get the answer. This scales linearly with the number of servers in terms of accuracy, reliability and well, scalability. So this is kind of nice. Now, I haven't told you yet how to fix the temporal aspect. So, the time series interpolation essentially works as follows; so the Count min sketch is a linear statistic, that's the key idea. So in other words, if I want to sketch two sets, I just sum the sketches and everything's fine. So if I want to get a sketch of a longer term interval, I take the first sketch, the second sketch and then sum them off, easy. So, the other thing is, I can also reduce the bit resolution by essentially folding the sketch over unto itself and so--and the good thing is I can do that with the process data as opposed to ever having to touch the original data again. So let me first show you the time aggregation. So what are we doing is, something actually very straightforward. We picked an exponentially increasing set of intervals, so 1, 2, 4, 8 and so on. And so, each two to the--each two to the N time [INDISTINCT], I need to compute--okay. So it's not two N but two to the N, sorry. That went wrong. I need to recompute all the bins up to two to the N. The trick here is that to have accumulative sum where I have to once twice. That makes everything work out nicely so instead now the accumulative sum is basically just powers of two. In other words, if I want to compute that interval, I just need to sum over all those interval if they were consecutively aligned. So what I do is I basically let those intervals age until they are perfectly aligned. Then I do an aggregation sweep and I get the next interval. I'll show you that in the picture in a moment. The good thing is, I only have to write into the first bin, this is the only one that's really going to get all the inserts. Everything else is fine so I have reasonably memory locate--locality, and you can actually show that the aggregation is basically log log t time, in terms of the aggregation cost. So it's a non issue. The good thing is, we get loggers mixed storage in terms of the time interval that we sketch. So, let me show you the intervals. Let's say, first we have interval one. Then okay, then it ages, we write into another interval and it ages. Now, we have to--now, we basically go sum those two up and get to the interval of links too and this one gets--this one gets aged. Okay. So now, please move over. Now, we can compute the interval of links four and so on and so on and we can go on. Okay. So we have link one, two, four and eight. However, and this is stored in memory in that way. Correct? We always write here and these intervals age. The other thing is just basically going from, you know, backwards and forwards in time. Now, I can do a similar thing with a bit resolution. So if I have my bit resolution every two to the n steps. You can easily see that, therefore, you know, a links of two to the n will always require the same amount of memory. So, if the first one for the--for, you know, time intervals two and three, I need the same storage for four, five, six and seven and I need to get the same storage. Okay. So why did I show you those two techniques? All right. I mean, one in a way gives you a good aggregation over time. Every one gives an aggregation of item. So we basically get a trading of two knobs here. We can either, reduce the item resolution of our sketch and keep the time resolution accurate. So in the most extreme case, maybe for historical data that's two years old, we will only know that, well in that particular year, basically how many times that the query machine learning was issued in a particular year? The other extreme is that for, you know, one year ago at exactly around midnight, I have maybe a million queries. All right? But I won't know anymore which ones they are. Now, that's effectively, you know, pinning down the marginals of a joint distribution which I would really like to be able to store but which I can't [INDISTINCT] of storage cost. And this is fixed by--at least approximately fixed by then just taking the product of the marginals as a good estimate. Now, you would immediately say, well hey there's so many smart algorithms out there it's like, you know, just any exponential family sketching, you know, estimation algorithm will do better, sure. But, these algorithms require more than or the one time in order to get the number, so therefore, we do the really cheap and dirty thing. There's actual interesting question of what you would do and tie with the marginals. Okay. Let me show you very briefly that it works. Okay. So data that we had it for the nice power low distribution. And, okay. So here are three sketches that you will see in its certified and interesting way. So, okay, you can't really see the colors but basically, we compare the sketch which is just aggregate, so with the items, you know, which basically at each two to the entire steps, reduces the item resolution one, with one that basically just, you know, makes longer and longer intervals and then it seems you predicts the [INDISTINCT] the cost and function into something that uses the product. And the green line is the product and we can see it's doing reasonably well. The other thing that you can see is that as we get more and more data, this blue line which basically it's just to reduces the bits, does a bit better. And that's not very surprising because if you had a very heavy hitter, if you look at the original Cormode Muthukrishnan guarantees, you can see that they basically are absolute deviations and well, I thought a relative one. So, if I have a very heavy hitter, then, it doesn't hurt me so much if I have collisions with a lot of irrelevant inquiries. So, you could probably do something smart here by switching between those two estimators. The fact that these purple curves have slight peaks, it has to do with the fact that this is a non-uniform volume. So, of course if you have a lot of volume, well, the error loss will go up, especially if I estimate something [INDISTINCT] constant. Similar figures I can show you for different error measures and things will be hypothetically the same. This is probably quite interesting. Here is the number of read request and on the next slide is the number of insert request over now between--well, basically one and ten servers. Obviously if I have--if I want to read from at least three machines, I need to start here and from five questions, I need to start here. And what you can see is it's basically a linear scaling. So this is as good as it gets a similar thing here. You can use the same tricks for other sketches. So, you have points on SpaceSaving Sketch. In that case well--okay. Can't really go into much detail on how it works but basically you use the very same machinery. Okay. We're going to skip peer-to-peer and I'm quickly going to show you more or less the same picture here. So, for distributing random variables in an inference procedure, again you have essentially a tree--well, you basically need to build the replica tree. And that replica tree is instantiated through a tree which will reduce fix, you know, the layout that you have within your cluster. Within the rack you want to be as flat as possible because you sit on a switch, you have perfect, you know, communication between the racks while you probably want to use the--well, take--well, basically respect the fact that, you know, you're between the rack bandwidth is a lot lower than that the within rack bandwidth. And, okay, I can't talk about synchronization. Distribution, fault tolerance, Joey told me anyway that my time is up. Here's a very, very brief thing that you sometimes may want to worry about, namely, suppose I have, you know, K machines that need to talk to another set of K machines. And suppose that the message is that, each of those machines have is like, you know, one over a number of machines. So, basically the more machines that I have, the smaller--the smaller packets each machine needs to tell to another one. So, if I think just naively let them all talk to everybody at the same time, I will get a very low data rate between any pair. Which is really bad because basically my switch and my entire [INDISTINCT] will create big over hits. So, one way to fix this is to actually, you know, schedule who talks to whom in a specific order. Now, if I wrote to--you could always think--okay let's just pick a nice, you know, synchronized order but that's not a good idea because as soon as a single machine is delayed, all the delays pile up and bad things happens. So, what you can do instead is, you just essentially start with a replacement by picking a random schedule for each machine. Now, if you do that and I just pick, you know, one machine at a time, I create two problems. One is some machines will be entirely left out. So, that's exactly the one over E that you also know from the bootstrap and what's--at least this path, some machines will be hotspots because they will actually receive inserts from more than one machine. Now, there's a very easy fix to it. Well, you pick more than one but maybe two, turns out four is a good number and there's a little inequality that I'll show you in the next slide. And so, for instance with two connections, you can already see that the efficiency of this communication scheme is between .78 and .89. Okay. So, how did I get those numbers? Well, these are basically nine machines. Well, this machine is left out so I definitely can only get eight, ninth of the efficiency, to get that lower bound you basically say, well; each local machine spreads its network bandwidth evenly between all the links. And then, I just add up with the [INDISTINCT] here if I have, you know, let's say three at--it's just coming in--I'll just throw one of them away. And that gives you the lower bound. In practice it's somewhere in between. Let's just--for this specific random graph--in general you can get this inequality in the ***--in the limit of large number of machines and it seems you fall simultaneous connections are enough, if you do that, that's what you get. And I think; now I fully overstayed my time. So, it's just--well, two billion documents, not million, billion on a thousand machines and that's basically a same figure, you know, give sampler procedure to--well, do some use of profiling. So, the last slide and this is essentially what you could almost use as a--some of the for research agenda is take your algorithm, ask what happens--what guarantees can I get if some of the machines die. Furthermore, I want to have linear scaling as I throw more hardware at the problem. The third issue is--well, I actually have communication constraints because--well, I have non-uniform connectivity structure within my server centers and I want to keep parameters local. And this gives me a lot of really fun questions. One corollary, if you design the algorithms, try to avoid having a master. Because masters are single points of failure. And you have to work hard to make them go away. Okay. That's all I want to say. Yes? >> [INDISTINCT] we have time to [INDISTINCT] your questions. >> You said [INDISTINCT] the connections between problem solving here and [INDISTINCT] of the [INDISTINCT] measuring sides and flows [INDISTINCT] with [INDISTINCT] >> SMOLA: Great. So if--it's actually a flow problem. It's definite--it's a very simple flow problem and well, we did the very--I'm sorry, that's my--all right. That's mine. Thank you. We did the very sloppy analysis of just checking them actually, you know, your flow is reasonably well controlled. So, basically 80% was good enough for us because everything beyond that is just, you know, lost in the network stack anyway but yes there is also connection or relationship to random codes and so on. Yes? >> [INDISTINCT] >> SMOLA: Yes. So, as a matter of fact you will find a lot--oh, are you okay? >> Yes. >> SMOLA: You will find a lot of the same ideas of consistent caching and distribution also on--for instance on [INDISTINCT] paper. [INDISTINCT] paper has to worry about a few other things such as wording and so on. Which we have to worry about before when we have a--well, latency bound protocol because I mean first we didn't have [INDISTINCT] and so on but that's essentially a small instance of what you will get--you know, just with this [INDISTINCT]. And the difference is we have lowered your ability requirements. At least, you know, for these scenarios and what you will get maybe on the [INDISTINCT]. But you could use a lot of those traits. So, for instance, if you want to make sure that even your [INDISTINCT] partitions, your writes still work, you could essentially take the [INDISTINCT] modified and turn it into a sketching service. And that would probably actually work fairly well. >> [INDISTINCT] >> SMOLA: Yes. >> [INDISTINCT] >> SMOLA: Okay. >> [INDISTINCT] >> SMOLA: Thanks. I mean thanks for the point term; I'll definitely look it up. I mean a lot of really cool things in sketching have happened recently--as in maybe the past five years--five to ten years. So, it's maybe a good idea to visit some of those old ideas if just, you know, apply it in the [INDISTINCT] sketches. >> [INDISTINCT] time.