Tip:
Highlight text to annotate it
X
JAKE: He wrote the high-performance browser
networking book for O'Reilly, which
is also available for free in the links on his website.
If the internet is a series of tubes,
then this is one of the world's greatest plumbers.
Hands together for Ilya Grigorik.
[APPLAUSE]
ILYA GRIGORIK: All right, thanks, Jake.
All right, so we're going to talk a little bit
about optimizing network performance and specifically
some of the things that we've been doing on the Chrome team
for helping deliver better apps.
And I guess the first thing that we should ask
is, does it matter, right?
What's the problem we're trying to solve?
And Tony Gentilcore, who's actually somewhere here
in the room, ran a number of different tests
over the last couple of months, where
he's been kind of deep diving into where do we
spend our time.
Like when we try to render a web page, what
are the bottlenecks today?
And he has a series of these posts
on Blink-dev if you guys are interested in kind
of low-level guts of how Blink works
and in Chrome kind of end to end.
But one test to me stood out, in particular.
And this is a test where we took the top 1 million Alexa sites
and just ran them through Chrome and looked
at where do we spend our time?
Like in terms of the actual main Blink thread,
where is the time going?
And the big takeaway here is that, approximately 70%
of the time, we're just basically idling
on the network, right?
That's that big chunk right here in the blue.
And then after that, you have all of your usual offenders,
things like, well, we've got to get the JavaScript,
we've got to paint pixels, and all the rest, do layouts.
So this should not be surprising, right?
This is specifically for the first page load.
There's a very different profile,
of course, once the page is loaded
and you're interacting with the page.
That's a different problem.
But this, in part, is one big problem
that we're trying to solve.
Like how do we make this blue part smaller or just go faster?
So there's two takeaways that you can take from this.
One is, page loads in network are a problem, right?
That's 70% of loading the page today.
But the good news is that if we can do anything to the network
stack in terms of improving that latency and improving
performance, it's going to have a significant impact on how
we experience the web.
So even small fractional wins in this space
will, in fact, have huge performance impact.
So kind of with that in mind, what I wanted to do
is actually take a look at some of the things
that we've been working on internally in Chrome.
This is kind of looking under the hood.
This is not perhaps something that you
would be, as a developer, looking at APIs
or trying to figure out how to optimize.
This is the kind of stuff that Chrome does internally.
But we have a very dedicated and awesome performance team
working on this stuff.
And I wanted to highlight some of the wins
that we had over the last year and also kind of essentially
so you know what we're working on
and also highlight the potential areas for improvement
in the future.
And after that, we're going to look
at some of the new additions, specifically,
kind of low-level network plumbing
stuff that we support in Chrome, so things
like SPDY, some notes about QUIC, and other things.
And then finally we'll talk about measurements, right?
Of course, performance is the big theme
throughout this entire event.
And we want to make sure that we give you the tools
to measure performance in the best way possible.
You should be able to measure anything you need in the stack.
So first, let's actually do a quick survey.
This is going to be kind of all over the map,
but I want to highlight a few things.
First, in Chrome 26, we landed the new asynchronous DNS
resolver, which is kind of low-level plumbing stuff,
so we're no longer relying on the operating system DNS
resolver.
We actually have our own.
Today, it's available on Windows, Mac, and Chrome
OS, so this is not yet on mobile.
Hopefully, it will be.
So why did we want to do this?
Well, first of all, it gives us a lot more control.
We can do a lot smarter strategies
for high resolve names and other things.
And here's some performance numbers
in terms of what we've seen since we've landed M26.
It took us a couple of tries to actually kind of get
the performance numbers as good as they are.
But you can see that there's significant wins
across the board.
And for things like Chrome OS, we've
reduced the DNS resolution time significantly, 36%.
And not only that, but we're also
measuring the resolve plus TCP connect.
And you can see that there are wins across the board.
And of course, some of these are platform specific.
Some platforms just do a better job
of implementing their DNS resolvers in the first place.
But the cool thing is that we can actually now
kind of take control now that we've
got the basic plumbing working.
We can take control and do smarter things.
So, for example, we can raise different resolutions
for IPv6 and IPv4.
We are now actually doing adaptive retry,
so we actually remember which DNS servers we've used.
So we can do a better job of making these resolutions faster
in the future.
And this is definitely a space for a lot of improvement
and also kind of subtle things like providing better user
error pages, right?
Before you would just get a failed timeout
from DNS resolution.
I mean, you just kind of like, we give up.
We have no idea.
We can't give any useful feedback to the user.
Now we can go much, much further.
So that's pretty cool.
Moving on, in M27, we landed this big and important
improvement, which is we completely rewrote how
we schedule resources.
It's one thing for us to get the HTML bytes.
We then discovered the resources.
And then we need to figure out how do we schedule them
efficiently on the wire, like we care about JavaScript
before images and other things.
And the big change that we've done in there, in M27, is we
replaced that scheduler.
And we also started focusing on perceived performance.
So instead of just measuring the page load time,
we started measuring things like speed index.
So what kind of optimizations we can do in Resource Scheduler
to improve speed index.
In fact, we've made decisions where we've intentionally
chosen speed index over page load time, or unload.
So there are changes that have gone
in where we've regressed, in some cases, on load time.
But we've improved speed index, because we
think that perceived performance, getting
useful pixels on the screen, is a win for the user.
And one interesting takeaway from this work
that was done in M27 was that we realized that a lot of pages
were actually competing for bandwidth unnecessarily.
So they were trying to download too many things.
We've gotten so good at charting our assets that it's actually
backfiring on a lot of sites.
So, in particular, one big interesting change
that went in in that iteration was that the new scheduler
would only download up to 10 images in parallel.
So, for example, if you have a gallery of images,
you have let's say 30 of them on the page,
and you sharded them in 20 different ways,
we would not open more than 10 connections at once.
Because we found that that actually
hurts performance in most cases.
So if you're developing your site today,
Chrome will limit you to 10 image downloads.
But in other browsers, you'll still have no problem.
I'm not sure what exact scheduling algorithms
they're using, but perhaps something you
should consider on your site.
There is such thing as oversharding your site.
Later in M28, speaking of perceived performance,
we've also improved the SPDY performance quite a bit.
So the change here is actually pretty awesome
and pretty trivial in that now that we
have control over the Resource Scheduler
we said, look, if you're using SPDY,
we have a much better way to schedule resources, which
is we know the priority.
We can send that priority to the server.
The server can do the right thing.
So we won't delay any resource scheduling
on the client, which is kind of this like this fake latency--
not fake, unnecessary latency that we're otherwise
introducing.
So if you're using SPDY, this is a nice performance
win because it allows us once again to get
those pixels visible earlier on the screen.
So if you haven't already, I definitely
encourage you to look into playing with SPDY.
So if you're using Apache, you can sell them on SPDY EngineX,
and other server are supported as well.
And actually, we'll come back to SPDY a little bit later.
In M30, there's been yet more improvements
to the Resource Scheduler.
We keep improving and iterating on all
of these different strategies.
One interesting kind of takeaway that we had in this iteration
was that we actually started distinguishing
between optimizing for the popular sites versus sites
in the tail.
There's different ways that sites
are constructed in terms of kind of patterns that they use,
how they lay out the resources, and all the rest.
And this iteration, in particular,
actually helped quite a bit in terms
of accelerating the sites in the long tail.
And if you think about a 10% improvement
in firing the onload, this is just
like one Chrome m revision, it's huge.
That's a 10% win in onload and a 9% improvement in speed index.
So there's just faster pixels on the screen.
So these are impressive numbers.
And I think what's most exciting for me is if we look forward,
based on the work that we have in the pipeline now,
and project it a little bit, we see significant improvements
that we can still make to these algorithms.
So right now, at least based on the current code that we have,
you can expect more wins rolling out to our users.
So this is great.
As far as I'm concerned, this is free performance.
Like the apps, it's the same apps,
they're just rendering faster, because we're
doing a better job of how we schedule
those resources in Chrome.
So that's pretty exciting.
Another huge win that's coming and that's available on Android
today is what we're calling the "simple Cache".
So one of the problems that we realized
that we had on Android and mobile phones,
in particular, is that in order for us to dispatch a network
request, we actually had to do a number of different context
switches.
Like we would go from the main threads to an I/O thread to
we'd do another jump.
We would always do a check on the file system, which
in itself can take quite a bit of time.
And the idea behind Simple Cache is to try to simplify that,
as the name implies, to the extent
that we can, and ideally, avoid any context switches ongoing
to disk.
So that should help quite a bit in terms
of the actual performance of the Simple Cache.
And here's some early numbers.
These look very, very good.
The blue line on the bottom is the original,
and what you see here is the latency.
So you kind of had this like long tail distribution,
where basically every request incurred
a minimum of several milliseconds.
But then you had this long tail, where
it wasn't atypical for a request to take 50 milliseconds
before we could even dispatch it.
Because we had to kind of do a couple of thread
hops and then check disk, or check
Flash, in this case, and kind of bubble that back up.
With the new Simple Cache, basically we
can just complete it immediately,
most of the requests.
Every once in a while, we still have some delays,
but this is the type of line where
you want to see on all of your performance charts.
And this is quite amazing because once we have the Simple
Cache, based on our measurements,
this has improved all HTTP transfers, the speed
of these transfers, in terms of the time
from the first request byte that we
want to send to completion by 10%, which,
if you think about it, is massive, right?
And not only that, but in M31 we're
seeing 7% page load time improvement.
So this is simply eliminating that extra latency
at the beginning of each and every request.
And once again, there's more work going into M32,
and we hope that we can improve this even further.
So this is huge, and this will be
an awesome win for mobile browsers.
And then finally, one of the last things that we've
started iterating towards the end of the year here,
and something that I'm really, really excited about,
is focusing on improving the speculative optimizations
that we already do in Chrome.
We do a lot of speculative optimization as it is today.
But now we're also looking at how do we refine these?
How do we expose the right primitives,
and how do we make better use of them?
One example is something like prefetch, right?
So if you're familiar with a link rel=prefetch,
what it allows you to say is, hey,
I will need this resource perhaps on the next page.
That could be an HTML page, that could
be a CSS file, an image, what have you.
Please fetch this for me, such that I
don't have to fetch that, or I can just fetch it out
of the cache when the user initiates that load.
One of the gotchas there was, if that request did not
complete in time for the next navigation,
it would get canceled.
So you kind of incur the double download
and it just didn't make sense.
So, for example, we have this new patch that's in.
It's not available in Canary yet, but it's coming soon,
called detachable prefetch, which will actually
keep the prefetch alive even as you navigate away, such
that you can still make use of that resource
once you get to your destination.
So that's pretty awesome.
And this will also apply to other things like prerenders
and other types of improvements.
So this is pretty cool.
And this is how, basically, it looks.
Chrome allows you to actually dynamically create these hints.
So, for example, if, let's say, the user initiates
some sort of an action, like they click on the Checkout
button or they click on Add To Cart button
and you know that they're going to go to the checkout page,
at that moment you can actually inject one of these link
elements and say, hey, I would like
you to prefetch that asset for me, because now I
know I will need it.
And vice versa, you can actually delete this element out
of the DOM, and we will cancel the prefetch as well.
So you can dynamically script how and basically
drive Chrome to do these prefetches for you.
So this is pretty cool stuff.
And I think this is a place where
we can do a lot more in the future as well.
So that's a little bit about kind of the low-level guts
and improvements in Chrome.
Now let's take a look at some of the protocols
that we've been working on.
So back in 2009, roughly, actually four years ago
almost on the dot, we announced our work
on SPDY or initial efforts around SPDY.
And since then we've gone, I think, quite a long way.
We've had several iterations of the protocol itself, so v2,
v3, 3.1.
Now we're working on Version 4.
And that actually became the foundation
of HTTP 2.0, which is pretty exciting.
And HTTP 2.0 work in itself is progressing quite rapidly,
and I'm really excited about that.
So today we actually have both SPDY and HTTP 2.0 support
in Chrome, although HTTP 2.0 is under a flag.
But it is there.
It's something that we're iterating on.
And then once HTTP 2.0-- I know this is a common question.
Once HTTP 2.0 is marked as ready, as a standard,
we'll just switch over to HTTP 2.0.
So think of SPDY as kind of like an experimental ground for us
to try different ideas and feed them back
into the HTTP 2.0 spec, right?
So like it'd be great if we had this feature.
Let's go and try and implement that feature.
We try it, and we discover the rough edges,
and then we kind of feed that back into HTTP 2.0.
So earlier in the year, we actually
deployed SPDY 3.1 across all of our Google servers
and, of course, added support in Chrome.
Firefox also supports SPDY v3.1.
And here's some numbers.
We've never released this before,
but these are the performance numbers
that we see for SPDY across some of the major Google properties,
and these are consistent across all the different Google sites.
So you're kind of looking at the right order of magnitude,
anywhere between 20 to 40 to 50% improvement in latency as
compared to HTTPS.
And in some cases, we're actually-- so even
despite the fact that we have these extra handshake round
trips and all the rest in CLS, oftentimes
we actually end up going faster than just vanilla HTTP
as well, which is, of course, the point
of this whole exercise to begin with.
So this is really exciting.
And I guess the important bit here is also that not only is
it helping the median, which is, of course, what we like to see,
but it's also consistently helping all of our users,
the ones on fast connections, and especially so for the ones
that are ion the slow connections or the ones
with the high RTT times, which is especially
relevant for things like mobile, where RTTs are definitely
higher.
So this is really exciting.
This is very promising.
And I hope that this will help kind of drive the HTTP 2.0
adoption as well.
So if you haven't looked at SPDY,
I definitely encourage you to do so.
There are modules for virtually every popular server
out there today that you can enable and just
play with, enable it on your site.
And there's also commercial support for it
as well, so F5, Akamai, and others support SPDY.
So that's pretty cool.
And as I mentioned, we also do have HTTP 2.0.
If you're curious, if you want to play with it,
we do have HTTP 2.0 support under a flag.
So you can actually enable that and then
run it against your local server.
I think the only big public site that supports
HTTP 2.0 today is twitter.com.
So in theory, you can test it on that.
But there are also open source servers
that speak HTTP 2.0 today that you can play with.
So SPDY is kind of a production version, if you will.
HTTP 2.0 is coming soon and hopefully, fingers
crossed, sometime in 2014.
So that's SPDY.
You may have caught the wind of some other protocol
that we started working on earlier in the year, which
is QUIC, which is Quick UDP Internet Connections.
And the idea here is actually to kind of take
what we've done with SPDY and go one step beyond.
And this was actually our intent right at the very beginning
when we started thinking of SPDY.
But it was just too much of a leap
to change both the protocol, kind of the application
protocol, and the transfer protocols.
So we kind of decoupled those, and QUIC is basically that.
We're trying to go one step further and say, well,
could we build a better transport
for HTTP traffic, period, on top of UDP?
Could we experiment with new ideas?
The core premise of this stuff is it's all about latency.
We're trying to eliminate latency everywhere we can.
So can we eliminate extra round trips
to establish the secure tunnel?
Can we do better congestion control?
What if we do packet pacing?
What if we do forward error correction?
What can we do to innovate in the space
to help reduce the page load times on the web?
And there's a lot of interesting ideas.
If you guys are curious about this kind of stuff,
we posted our design docs.
And it's a very long doc.
I encourage you to read it and give us feedback.
We have a Google group for that.
And this question comes up quite frequently,
which is, like, what's the point?
What are you trying to do here?
And the answer is very simple.
We just want to make faster internet for everybody to use.
And there are two ways that this will happen.
One is we end up building a really awesome protocol
that everybody loves and we take it to ITF.
And just like with HTTP 2.0 and SPDY,
we work with the community and kind of make that the standard.
That's plausible and maybe that will happen.
The alternative route is, we just experiment with QUIC.
We experiment with different ideas.
And those ideas get adopted, the good ones get adopted
into existing protocol stacks, like TCP and TLS.
And actually we're already seeing
some of that, where based on our experience with the encryption
stuff in QUIC, the TLS working group is looking
at improvements in terms of can we
eliminate some extra round trips.
So in either case, the point is, no matter which one of these
happens, the users will win.
We'll get faster internet.
And that's our intent with QUIC.
So that's pretty awesome.
We don't have any benchmarks for it as of today.
We're still at a point where we want to make sure that it works
and it works correctly before we start
optimizing kind of all the edges around it.
But you can actually play with QUIC today.
We have it deployed on Google servers,
and you can also enable it.
If you go into Chrome flags, you can flip QUIC Support.
And then you can, for example, access YouTube,
and you'll get served-- youtube.com or other Google
service-- over UDP, over QUIC.
And if you're curious, you can dive into Chrome net internals
and kind of look at the actual protocol
and all this other stuff.
So if you're into kind of low-level networking protocols,
definitely a thing you want to check out and play with.
There's lots of interesting ideas in the protocol.
All right, shifting gears.
Linus mentioned Chrome data compression.
This is something that we launched early in the year.
As you heard, it provides roughly 50% data savings.
That's kind of the average number for a lot of users.
It turns out there's a lot of poorly compressed content
on the web.
People still forget to gzip their content, which
is one of the optimizations that we apply for text, like
[INAUDIBLE].
And we also convert all the images
to IP, which provides a significant savings.
So this is a big benefit to a lot of users.
But one thing that Linus didn't mention
is that there are other secondary benefits to that.
Because we run over SPDY, so between your phone
and the Google server, it's actually a SPDY connection.
It's an encrypted connection.
So I actually use Chrome data compression in part
for the data compression part, but also
partially to secure my browsing.
Because when I enable this, the secure traffic, if you're
connecting to your bank, for example, an HTTPS site,
it will go directly to the site.
So that traffic is encrypted.
But if you're trying to connect to some unencrypted site,
it'll just flow basically as it is on the wire.
With Chrome data compression, that
goes through a secure tunnel, so even if you're on a Starbucks
Wi-Fi or whatever, some unencrypted Wi-Fi
and you're browsing around, all of your data is encrypted.
So that's really nice.
And maybe one important thing to highlight with Chrome data
compression is, it is still the full fidelity HTML5 web
experience, right?
We're not doing anything to modify your site.
We're not trying to render it on the server.
Like you have all of the flexibility of JavaScript, CSS,
and all the rest on your phone.
That's where the code gets executed.
So we're just modifying and optimizing some of the assets
as they get delivered.
Some common questions that I get about Chrome data compression,
something you should know, is this is going through a proxy.
So if you're developing a site where you're
relying on GoIP functionality to customize
the location to the user or maybe serve relevant ads,
you should be looking for the X Forwarded
For header, which is the IP address
of the client as forwarded by the Chrome data proxy.
And similarly, if for whatever reason
you absolutely want to make sure that we don't do anything
to your content, you can actually
opt out on a per-resource basis.
If you add a no transform header,
it basically tells the Chrome data proxy to just
be hands off with that resource.
So we won't reoptimize that image,
or we won't recompress that text, or other things.
So these are standard kind of proxy directives.
Chrome data compression proxies supports it.
So, just an FYI.
Shifting gears, web sockets.
This is really, really exciting.
Do we have any web socket developers in the room?
Yes.
Awesome.
So web socket compression is going
to be live in M32, which is a long overdue feature.
One of the gotchas with web sockets
was that you could transfer binary in text,
but text would always go as uncompressed
in both directions.
Now that we have the spec up to date
and we already have the code in Chrome,
you can actually negotiate the deflate compression
to apply in both directions, and the server can selectively
compress any given frame.
And the client, as of today, Chrome
will compress every single frame going out
from your mobile device or from your desktop device.
And I'm not going to go into details here,
but we also, actually, the spec provides
a number of different parameters to customize
how the compression will be done.
For example, the size of the sliding window, so essentially
you can control the resources used on your server
and on your client, plus some other flags.
So this is really, really exciting,
because this has definitely been a sore point for web sockets.
We heard about WebRTC and DataChannel.
The way I think about DataChannel
is basically WebSocket, but over UDP and P2P.
So we can communicate directly between devices.
We don't have to go through an intermediary like a server.
And DataChannel and M31 has now officially switched
to SCTP protocol.
So previously we were using RTP data channels,
and that was the reason for some of the incompatibilities
with some of the other vendors.
But as of M31, SCTP is the default,
and we will aggressively remove support for RTP data channels.
So if you're using data channels today,
this is something you want to revisit.
And if you're not familiar with data channels,
I encourage you to check out the links.
I'll post the slides later for how this works
and why this is awesome.
Because it allows you to define things like,
fire-and-forget semantics, don't retransmit.
So it's a really nice transport for doing low-latency data
exchange.
And then finally, let's talk about measurements, right?
So there's a lot of kind of protocol improvements
that are going on.
But as we know, we need to be able to measure things
in order to improve them.
So, of course, we're all familiar with navigation
timing, or I hope we are.
Most of the people here I expect would be.
You can get detailed low-level stats
about how long did each connection take
in terms of DNS times, TCP time, and all the other things.
You can throw that into your analytics solution here.
I'm showing you Google Analytics, which
allows you to segment this data to say, well,
I want to look at my mobile users versus desktop.
You can segment it by any other variable
you define, like has a user clicked the Checkout button,
or have they registered, et cetera.
This is all great.
One gotcha with this is this is only for the main page, right?
What about the other 85 resources or 100 resources
that you have on your page?
How are those performing?
Well in Chrome, we have support for resource timing, which
gives you that same level of access to all of the network
metadata, or timestamps, I should
say, on a per-resource basis.
So you can see here that you can actually
query for a specific resource, like your JavaScript
file that you're loading.
Maybe you're loading it from CDN and you're
wondering how well is my CDN performing.
You can get your real user measurement
data for that specific resource and then look up
the time for DNS, TCP connect time, total transfer time,
et cetera.
The only thing that you need to be aware of
is that the resource has to manually opt in and allow
the data to be gathered to begin with.
This is done for privacy reasons to make sure
that somebody can't just iterate or recache and figure out
where you've been in the past, or something like it.
So for your own resources you need to add this header.
And then if you're using third party resources,
if that origin is already not providing this header,
then you should ask them to do so.
Because here's one example where I have a web font on my site.
Web fonts delay when the tech gets painted.
So the question is, how is-- in this case,
this is a Google CDN.
How is Google CDN performing in terms
of serving the actual font?
Is it hurting my users?
Well, now I can actually grab that data from Resource Timing,
just as I showed you a few slides ago.
And we can just pump that into Google Analytics.
Here you can see that I'm tracking the DNS, TCP,
and transfer times.
And it turns out that the fonts coming
from Google CDN, at least for my site,
are being loaded in this case within 150 milliseconds, which
to me was an acceptable time.
And that was fine for me.
But you can now think about using this sort of data
to define third party SLAs.
You rely on third party widgets you can say, well,
your widgets must load in x amount of time, et cetera.
You can actually track this with Resource Timing,
which is pretty awesome.
So as a quick recap, we covered a lot of ground.
There's a new DNS Resolver in Chrome,
which is double-digit performance improvement
and actual DNS resolutions.
And the new scheduler is definitely
something we're really excited about.
We've already seen huge improvements there,
10% and 20% improvement in the actual speed
index and page load times.
The Simple Cache stuff is a huge win on mobile,
and I'm really excited to have that out there.
And then moving forward, I'm hoping
that we can make the preresolve and prefetch and the prerender
stuff much, much smarter.
And you saw the SPDY wins, right?
So all of these things are incremental, 10% here,
20% there.
Before you know it, you're actually
saving hundreds of milliseconds, and sometimes seconds,
for the user, which is a huge win.
And some of these things you guys need to optimize for.
These are the things where you need to install SPDY,
you need to configure SPDY, you need
to make sure that your stacks are configured correctly.
And in other cases, it's just also doing
a better job of scheduling this kind of stuff.
And then finally, if you haven't already,
I definitely encourage you to look
at things like Nav Timing, User Timing, and Resource Timing.
So I talked about Resource Timing.
User Timing allows you to measure any chunk of code
and just get high-resolution time stamps for this
is when I started, this is when I ended, and beacon that back
to your server.
So all of these things are supported in Chrome.
And what you can measure, you can optimize.
So with that, I'll leave you the link to the slides.
Thank you.