#perfmatters - Optimizing network performance - Chrome dev summit 2013 - Ilya grigorik

JAKE: He wrote the high-performance browser networking book for O'Reilly, which is also available for free in the links on his website. If the internet is a series of tubes, then this is one of the world's greatest plumbers. Hands together for Ilya Grigorik. [APPLAUSE] ILYA GRIGORIK: All right, thanks, Jake. All right, so we're going to talk a little bit about optimizing network performance and specifically some of the things that we've been doing on the Chrome team for helping deliver better apps. And I guess the first thing that we should ask is, does it matter, right? What's the problem we're trying to solve? And Tony Gentilcore, who's actually somewhere here in the room, ran a number of different tests over the last couple of months, where he's been kind of deep diving into where do we spend our time. Like when we try to render a web page, what are the bottlenecks today? And he has a series of these posts on Blink-dev if you guys are interested in kind of low-level guts of how Blink works and in Chrome kind of end to end. But one test to me stood out, in particular. And this is a test where we took the top 1 million Alexa sites and just ran them through Chrome and looked at where do we spend our time? Like in terms of the actual main Blink thread, where is the time going? And the big takeaway here is that, approximately 70% of the time, we're just basically idling on the network, right? That's that big chunk right here in the blue. And then after that, you have all of your usual offenders, things like, well, we've got to get the JavaScript, we've got to paint pixels, and all the rest, do layouts. So this should not be surprising, right? This is specifically for the first page load. There's a very different profile, of course, once the page is loaded and you're interacting with the page. That's a different problem. But this, in part, is one big problem that we're trying to solve. Like how do we make this blue part smaller or just go faster? So there's two takeaways that you can take from this. One is, page loads in network are a problem, right? That's 70% of loading the page today. But the good news is that if we can do anything to the network stack in terms of improving that latency and improving performance, it's going to have a significant impact on how we experience the web. So even small fractional wins in this space will, in fact, have huge performance impact. So kind of with that in mind, what I wanted to do is actually take a look at some of the things that we've been working on internally in Chrome. This is kind of looking under the hood. This is not perhaps something that you would be, as a developer, looking at APIs or trying to figure out how to optimize. This is the kind of stuff that Chrome does internally. But we have a very dedicated and awesome performance team working on this stuff. And I wanted to highlight some of the wins that we had over the last year and also kind of essentially so you know what we're working on and also highlight the potential areas for improvement in the future. And after that, we're going to look at some of the new additions, specifically, kind of low-level network plumbing stuff that we support in Chrome, so things like SPDY, some notes about QUIC, and other things. And then finally we'll talk about measurements, right? Of course, performance is the big theme throughout this entire event. And we want to make sure that we give you the tools to measure performance in the best way possible. You should be able to measure anything you need in the stack. So first, let's actually do a quick survey. This is going to be kind of all over the map, but I want to highlight a few things. First, in Chrome 26, we landed the new asynchronous DNS resolver, which is kind of low-level plumbing stuff, so we're no longer relying on the operating system DNS resolver. We actually have our own. Today, it's available on Windows, Mac, and Chrome OS, so this is not yet on mobile. Hopefully, it will be. So why did we want to do this? Well, first of all, it gives us a lot more control. We can do a lot smarter strategies for high resolve names and other things. And here's some performance numbers in terms of what we've seen since we've landed M26. It took us a couple of tries to actually kind of get the performance numbers as good as they are. But you can see that there's significant wins across the board. And for things like Chrome OS, we've reduced the DNS resolution time significantly, 36%. And not only that, but we're also measuring the resolve plus TCP connect. And you can see that there are wins across the board. And of course, some of these are platform specific. Some platforms just do a better job of implementing their DNS resolvers in the first place. But the cool thing is that we can actually now kind of take control now that we've got the basic plumbing working. We can take control and do smarter things. So, for example, we can raise different resolutions for IPv6 and IPv4. We are now actually doing adaptive retry, so we actually remember which DNS servers we've used. So we can do a better job of making these resolutions faster in the future. And this is definitely a space for a lot of improvement and also kind of subtle things like providing better user error pages, right? Before you would just get a failed timeout from DNS resolution. I mean, you just kind of like, we give up. We have no idea. We can't give any useful feedback to the user. Now we can go much, much further. So that's pretty cool. Moving on, in M27, we landed this big and important improvement, which is we completely rewrote how we schedule resources. It's one thing for us to get the HTML bytes. We then discovered the resources. And then we need to figure out how do we schedule them efficiently on the wire, like we care about JavaScript before images and other things. And the big change that we've done in there, in M27, is we replaced that scheduler. And we also started focusing on perceived performance. So instead of just measuring the page load time, we started measuring things like speed index. So what kind of optimizations we can do in Resource Scheduler to improve speed index. In fact, we've made decisions where we've intentionally chosen speed index over page load time, or unload. So there are changes that have gone in where we've regressed, in some cases, on load time. But we've improved speed index, because we think that perceived performance, getting useful pixels on the screen, is a win for the user. And one interesting takeaway from this work that was done in M27 was that we realized that a lot of pages were actually competing for bandwidth unnecessarily. So they were trying to download too many things. We've gotten so good at charting our assets that it's actually backfiring on a lot of sites. So, in particular, one big interesting change that went in in that iteration was that the new scheduler would only download up to 10 images in parallel. So, for example, if you have a gallery of images, you have let's say 30 of them on the page, and you sharded them in 20 different ways, we would not open more than 10 connections at once. Because we found that that actually hurts performance in most cases. So if you're developing your site today, Chrome will limit you to 10 image downloads. But in other browsers, you'll still have no problem. I'm not sure what exact scheduling algorithms they're using, but perhaps something you should consider on your site. There is such thing as oversharding your site. Later in M28, speaking of perceived performance, we've also improved the SPDY performance quite a bit. So the change here is actually pretty awesome and pretty trivial in that now that we have control over the Resource Scheduler we said, look, if you're using SPDY, we have a much better way to schedule resources, which is we know the priority. We can send that priority to the server. The server can do the right thing. So we won't delay any resource scheduling on the client, which is kind of this like this fake latency-- not fake, unnecessary latency that we're otherwise introducing. So if you're using SPDY, this is a nice performance win because it allows us once again to get those pixels visible earlier on the screen. So if you haven't already, I definitely encourage you to look into playing with SPDY. So if you're using Apache, you can sell them on SPDY EngineX, and other server are supported as well. And actually, we'll come back to SPDY a little bit later. In M30, there's been yet more improvements to the Resource Scheduler. We keep improving and iterating on all of these different strategies. One interesting kind of takeaway that we had in this iteration was that we actually started distinguishing between optimizing for the popular sites versus sites in the tail. There's different ways that sites are constructed in terms of kind of patterns that they use, how they lay out the resources, and all the rest. And this iteration, in particular, actually helped quite a bit in terms of accelerating the sites in the long tail. And if you think about a 10% improvement in firing the onload, this is just like one Chrome m revision, it's huge. That's a 10% win in onload and a 9% improvement in speed index. So there's just faster pixels on the screen. So these are impressive numbers. And I think what's most exciting for me is if we look forward, based on the work that we have in the pipeline now, and project it a little bit, we see significant improvements that we can still make to these algorithms. So right now, at least based on the current code that we have, you can expect more wins rolling out to our users. So this is great. As far as I'm concerned, this is free performance. Like the apps, it's the same apps, they're just rendering faster, because we're doing a better job of how we schedule those resources in Chrome. So that's pretty exciting. Another huge win that's coming and that's available on Android today is what we're calling the "simple Cache". So one of the problems that we realized that we had on Android and mobile phones, in particular, is that in order for us to dispatch a network request, we actually had to do a number of different context switches. Like we would go from the main threads to an I/O thread to we'd do another jump. We would always do a check on the file system, which in itself can take quite a bit of time. And the idea behind Simple Cache is to try to simplify that, as the name implies, to the extent that we can, and ideally, avoid any context switches ongoing to disk. So that should help quite a bit in terms of the actual performance of the Simple Cache. And here's some early numbers. These look very, very good. The blue line on the bottom is the original, and what you see here is the latency. So you kind of had this like long tail distribution, where basically every request incurred a minimum of several milliseconds. But then you had this long tail, where it wasn't atypical for a request to take 50 milliseconds before we could even dispatch it. Because we had to kind of do a couple of thread hops and then check disk, or check Flash, in this case, and kind of bubble that back up. With the new Simple Cache, basically we can just complete it immediately, most of the requests. Every once in a while, we still have some delays, but this is the type of line where you want to see on all of your performance charts. And this is quite amazing because once we have the Simple Cache, based on our measurements, this has improved all HTTP transfers, the speed of these transfers, in terms of the time from the first request byte that we want to send to completion by 10%, which, if you think about it, is massive, right? And not only that, but in M31 we're seeing 7% page load time improvement. So this is simply eliminating that extra latency at the beginning of each and every request. And once again, there's more work going into M32, and we hope that we can improve this even further. So this is huge, and this will be an awesome win for mobile browsers. And then finally, one of the last things that we've started iterating towards the end of the year here, and something that I'm really, really excited about, is focusing on improving the speculative optimizations that we already do in Chrome. We do a lot of speculative optimization as it is today. But now we're also looking at how do we refine these? How do we expose the right primitives, and how do we make better use of them? One example is something like prefetch, right? So if you're familiar with a link rel=prefetch, what it allows you to say is, hey, I will need this resource perhaps on the next page. That could be an HTML page, that could be a CSS file, an image, what have you. Please fetch this for me, such that I don't have to fetch that, or I can just fetch it out of the cache when the user initiates that load. One of the gotchas there was, if that request did not complete in time for the next navigation, it would get canceled. So you kind of incur the double download and it just didn't make sense. So, for example, we have this new patch that's in. It's not available in Canary yet, but it's coming soon, called detachable prefetch, which will actually keep the prefetch alive even as you navigate away, such that you can still make use of that resource once you get to your destination. So that's pretty awesome. And this will also apply to other things like prerenders and other types of improvements. So this is pretty cool. And this is how, basically, it looks. Chrome allows you to actually dynamically create these hints. So, for example, if, let's say, the user initiates some sort of an action, like they click on the Checkout button or they click on Add To Cart button and you know that they're going to go to the checkout page, at that moment you can actually inject one of these link elements and say, hey, I would like you to prefetch that asset for me, because now I know I will need it. And vice versa, you can actually delete this element out of the DOM, and we will cancel the prefetch as well. So you can dynamically script how and basically drive Chrome to do these prefetches for you. So this is pretty cool stuff. And I think this is a place where we can do a lot more in the future as well. So that's a little bit about kind of the low-level guts and improvements in Chrome. Now let's take a look at some of the protocols that we've been working on. So back in 2009, roughly, actually four years ago almost on the dot, we announced our work on SPDY or initial efforts around SPDY. And since then we've gone, I think, quite a long way. We've had several iterations of the protocol itself, so v2, v3, 3.1. Now we're working on Version 4. And that actually became the foundation of HTTP 2.0, which is pretty exciting. And HTTP 2.0 work in itself is progressing quite rapidly, and I'm really excited about that. So today we actually have both SPDY and HTTP 2.0 support in Chrome, although HTTP 2.0 is under a flag. But it is there. It's something that we're iterating on. And then once HTTP 2.0-- I know this is a common question. Once HTTP 2.0 is marked as ready, as a standard, we'll just switch over to HTTP 2.0. So think of SPDY as kind of like an experimental ground for us to try different ideas and feed them back into the HTTP 2.0 spec, right? So like it'd be great if we had this feature. Let's go and try and implement that feature. We try it, and we discover the rough edges, and then we kind of feed that back into HTTP 2.0. So earlier in the year, we actually deployed SPDY 3.1 across all of our Google servers and, of course, added support in Chrome. Firefox also supports SPDY v3.1. And here's some numbers. We've never released this before, but these are the performance numbers that we see for SPDY across some of the major Google properties, and these are consistent across all the different Google sites. So you're kind of looking at the right order of magnitude, anywhere between 20 to 40 to 50% improvement in latency as compared to HTTPS. And in some cases, we're actually-- so even despite the fact that we have these extra handshake round trips and all the rest in CLS, oftentimes we actually end up going faster than just vanilla HTTP as well, which is, of course, the point of this whole exercise to begin with. So this is really exciting. And I guess the important bit here is also that not only is it helping the median, which is, of course, what we like to see, but it's also consistently helping all of our users, the ones on fast connections, and especially so for the ones that are ion the slow connections or the ones with the high RTT times, which is especially relevant for things like mobile, where RTTs are definitely higher. So this is really exciting. This is very promising. And I hope that this will help kind of drive the HTTP 2.0 adoption as well. So if you haven't looked at SPDY, I definitely encourage you to do so. There are modules for virtually every popular server out there today that you can enable and just play with, enable it on your site. And there's also commercial support for it as well, so F5, Akamai, and others support SPDY. So that's pretty cool. And as I mentioned, we also do have HTTP 2.0. If you're curious, if you want to play with it, we do have HTTP 2.0 support under a flag. So you can actually enable that and then run it against your local server. I think the only big public site that supports HTTP 2.0 today is twitter.com. So in theory, you can test it on that. But there are also open source servers that speak HTTP 2.0 today that you can play with. So SPDY is kind of a production version, if you will. HTTP 2.0 is coming soon and hopefully, fingers crossed, sometime in 2014. So that's SPDY. You may have caught the wind of some other protocol that we started working on earlier in the year, which is QUIC, which is Quick UDP Internet Connections. And the idea here is actually to kind of take what we've done with SPDY and go one step beyond. And this was actually our intent right at the very beginning when we started thinking of SPDY. But it was just too much of a leap to change both the protocol, kind of the application protocol, and the transfer protocols. So we kind of decoupled those, and QUIC is basically that. We're trying to go one step further and say, well, could we build a better transport for HTTP traffic, period, on top of UDP? Could we experiment with new ideas? The core premise of this stuff is it's all about latency. We're trying to eliminate latency everywhere we can. So can we eliminate extra round trips to establish the secure tunnel? Can we do better congestion control? What if we do packet pacing? What if we do forward error correction? What can we do to innovate in the space to help reduce the page load times on the web? And there's a lot of interesting ideas. If you guys are curious about this kind of stuff, we posted our design docs. And it's a very long doc. I encourage you to read it and give us feedback. We have a Google group for that. And this question comes up quite frequently, which is, like, what's the point? What are you trying to do here? And the answer is very simple. We just want to make faster internet for everybody to use. And there are two ways that this will happen. One is we end up building a really awesome protocol that everybody loves and we take it to ITF. And just like with HTTP 2.0 and SPDY, we work with the community and kind of make that the standard. That's plausible and maybe that will happen. The alternative route is, we just experiment with QUIC. We experiment with different ideas. And those ideas get adopted, the good ones get adopted into existing protocol stacks, like TCP and TLS. And actually we're already seeing some of that, where based on our experience with the encryption stuff in QUIC, the TLS working group is looking at improvements in terms of can we eliminate some extra round trips. So in either case, the point is, no matter which one of these happens, the users will win. We'll get faster internet. And that's our intent with QUIC. So that's pretty awesome. We don't have any benchmarks for it as of today. We're still at a point where we want to make sure that it works and it works correctly before we start optimizing kind of all the edges around it. But you can actually play with QUIC today. We have it deployed on Google servers, and you can also enable it. If you go into Chrome flags, you can flip QUIC Support. And then you can, for example, access YouTube, and you'll get served-- youtube.com or other Google service-- over UDP, over QUIC. And if you're curious, you can dive into Chrome net internals and kind of look at the actual protocol and all this other stuff. So if you're into kind of low-level networking protocols, definitely a thing you want to check out and play with. There's lots of interesting ideas in the protocol. All right, shifting gears. Linus mentioned Chrome data compression. This is something that we launched early in the year. As you heard, it provides roughly 50% data savings. That's kind of the average number for a lot of users. It turns out there's a lot of poorly compressed content on the web. People still forget to gzip their content, which is one of the optimizations that we apply for text, like [INAUDIBLE]. And we also convert all the images to IP, which provides a significant savings. So this is a big benefit to a lot of users. But one thing that Linus didn't mention is that there are other secondary benefits to that. Because we run over SPDY, so between your phone and the Google server, it's actually a SPDY connection. It's an encrypted connection. So I actually use Chrome data compression in part for the data compression part, but also partially to secure my browsing. Because when I enable this, the secure traffic, if you're connecting to your bank, for example, an HTTPS site, it will go directly to the site. So that traffic is encrypted. But if you're trying to connect to some unencrypted site, it'll just flow basically as it is on the wire. With Chrome data compression, that goes through a secure tunnel, so even if you're on a Starbucks Wi-Fi or whatever, some unencrypted Wi-Fi and you're browsing around, all of your data is encrypted. So that's really nice. And maybe one important thing to highlight with Chrome data compression is, it is still the full fidelity HTML5 web experience, right? We're not doing anything to modify your site. We're not trying to render it on the server. Like you have all of the flexibility of JavaScript, CSS, and all the rest on your phone. That's where the code gets executed. So we're just modifying and optimizing some of the assets as they get delivered. Some common questions that I get about Chrome data compression, something you should know, is this is going through a proxy. So if you're developing a site where you're relying on GoIP functionality to customize the location to the user or maybe serve relevant ads, you should be looking for the X Forwarded For header, which is the IP address of the client as forwarded by the Chrome data proxy. And similarly, if for whatever reason you absolutely want to make sure that we don't do anything to your content, you can actually opt out on a per-resource basis. If you add a no transform header, it basically tells the Chrome data proxy to just be hands off with that resource. So we won't reoptimize that image, or we won't recompress that text, or other things. So these are standard kind of proxy directives. Chrome data compression proxies supports it. So, just an FYI. Shifting gears, web sockets. This is really, really exciting. Do we have any web socket developers in the room? Yes. Awesome. So web socket compression is going to be live in M32, which is a long overdue feature. One of the gotchas with web sockets was that you could transfer binary in text, but text would always go as uncompressed in both directions. Now that we have the spec up to date and we already have the code in Chrome, you can actually negotiate the deflate compression to apply in both directions, and the server can selectively compress any given frame. And the client, as of today, Chrome will compress every single frame going out from your mobile device or from your desktop device. And I'm not going to go into details here, but we also, actually, the spec provides a number of different parameters to customize how the compression will be done. For example, the size of the sliding window, so essentially you can control the resources used on your server and on your client, plus some other flags. So this is really, really exciting, because this has definitely been a sore point for web sockets. We heard about WebRTC and DataChannel. The way I think about DataChannel is basically WebSocket, but over UDP and P2P. So we can communicate directly between devices. We don't have to go through an intermediary like a server. And DataChannel and M31 has now officially switched to SCTP protocol. So previously we were using RTP data channels, and that was the reason for some of the incompatibilities with some of the other vendors. But as of M31, SCTP is the default, and we will aggressively remove support for RTP data channels. So if you're using data channels today, this is something you want to revisit. And if you're not familiar with data channels, I encourage you to check out the links. I'll post the slides later for how this works and why this is awesome. Because it allows you to define things like, fire-and-forget semantics, don't retransmit. So it's a really nice transport for doing low-latency data exchange. And then finally, let's talk about measurements, right? So there's a lot of kind of protocol improvements that are going on. But as we know, we need to be able to measure things in order to improve them. So, of course, we're all familiar with navigation timing, or I hope we are. Most of the people here I expect would be. You can get detailed low-level stats about how long did each connection take in terms of DNS times, TCP time, and all the other things. You can throw that into your analytics solution here. I'm showing you Google Analytics, which allows you to segment this data to say, well, I want to look at my mobile users versus desktop. You can segment it by any other variable you define, like has a user clicked the Checkout button, or have they registered, et cetera. This is all great. One gotcha with this is this is only for the main page, right? What about the other 85 resources or 100 resources that you have on your page? How are those performing? Well in Chrome, we have support for resource timing, which gives you that same level of access to all of the network metadata, or timestamps, I should say, on a per-resource basis. So you can see here that you can actually query for a specific resource, like your JavaScript file that you're loading. Maybe you're loading it from CDN and you're wondering how well is my CDN performing. You can get your real user measurement data for that specific resource and then look up the time for DNS, TCP connect time, total transfer time, et cetera. The only thing that you need to be aware of is that the resource has to manually opt in and allow the data to be gathered to begin with. This is done for privacy reasons to make sure that somebody can't just iterate or recache and figure out where you've been in the past, or something like it. So for your own resources you need to add this header. And then if you're using third party resources, if that origin is already not providing this header, then you should ask them to do so. Because here's one example where I have a web font on my site. Web fonts delay when the tech gets painted. So the question is, how is-- in this case, this is a Google CDN. How is Google CDN performing in terms of serving the actual font? Is it hurting my users? Well, now I can actually grab that data from Resource Timing, just as I showed you a few slides ago. And we can just pump that into Google Analytics. Here you can see that I'm tracking the DNS, TCP, and transfer times. And it turns out that the fonts coming from Google CDN, at least for my site, are being loaded in this case within 150 milliseconds, which to me was an acceptable time. And that was fine for me. But you can now think about using this sort of data to define third party SLAs. You rely on third party widgets you can say, well, your widgets must load in x amount of time, et cetera. You can actually track this with Resource Timing, which is pretty awesome. So as a quick recap, we covered a lot of ground. There's a new DNS Resolver in Chrome, which is double-digit performance improvement and actual DNS resolutions. And the new scheduler is definitely something we're really excited about. We've already seen huge improvements there, 10% and 20% improvement in the actual speed index and page load times. The Simple Cache stuff is a huge win on mobile, and I'm really excited to have that out there. And then moving forward, I'm hoping that we can make the preresolve and prefetch and the prerender stuff much, much smarter. And you saw the SPDY wins, right? So all of these things are incremental, 10% here, 20% there. Before you know it, you're actually saving hundreds of milliseconds, and sometimes seconds, for the user, which is a huge win. And some of these things you guys need to optimize for. These are the things where you need to install SPDY, you need to configure SPDY, you need to make sure that your stacks are configured correctly. And in other cases, it's just also doing a better job of scheduling this kind of stuff. And then finally, if you haven't already, I definitely encourage you to look at things like Nav Timing, User Timing, and Resource Timing. So I talked about Resource Timing. User Timing allows you to measure any chunk of code and just get high-resolution time stamps for this is when I started, this is when I ended, and beacon that back to your server. So all of these things are supported in Chrome. And what you can measure, you can optimize. So with that, I'll leave you the link to the slides. Thank you.