Timesopen - Web performance at the new York times

ETIAN KONIGSBURG: So hi, everybody. My name's Etian Konigsburg. I'm a frontend software architect here at the "New York Times". Today I'm going to be talking about the very real history of our web performance efforts here at the "New York Times". Before I dive in, I wanted to share with you this tweet by Steve Souders himself. If you're not familiar Steve Souders, it's the place to start when you're talking about web performance. He tweeted, "Is performance of 'the Web' getting better or worse? My past week was painfully slow, marred by top sites w/bad WebPerf. Is it too hard?" I don't know if the "New York Times" was among the sites that Steve was having trouble with. We've known that we've had a little bit of a performance problem. I like to think that he's checking our site and he's actually complaining for real that our site is slow so that we make it faster him. But I'm going to get back to this question about whether it's too hard. So while I was preparing for this talk, I was thinking that this is a very real story. And it reminded me of a GEICO ad from about 2006 that I just kind of wanted to share with you. So just give me one sec to play. [VIDEO PLAYBLACK] -Paula Sala is a real GEICO customer, not an actor. So to help tell her story, we hired that announcer guy from the movies. -When the storm hit, both our cars were totally under water. -In a world where both of our cars were totally under water. -We thought it would take forever to get some help. -But a new wind was about to blow. -With GEICO, we had our check in two days. -Payback. This time, it's for real. -GEICO. Real service. Real savings. ETIAN KONIGSBURG: So I thought about doing this entire talk in movie announcer voice. I didn't think I'd be able to keep it up all the way through. Colt is actually the expert at that. So you should check out his talk on National Talk Like A Pirate Day. That was actually really great. Kudos. So in a world of static pages. Pages on our website are published basically like you would a newspaper. Somebody hits a button, the content goes through templates and is written to disk. And we mix it with a proprietary language that we compile to give a little bit of dynamicism on the server side. And it was fast. We have about 1 million of these static pages since 2004. And re-publishing them is really not an option for us. If hypothetically it took about five seconds to re-publish one of these pages, 5 million seconds I think comes out to about two months. We can't be doing that every time we make a change. So what ends up happening is it's extremely hard for us to change markup once it's been published. And every time we make a change, it's only present from the day we deployed onward, which means we have an unknown number of permutations of these markups to handle and support over time. So our frontend assets, our JavaScript and CSS, have to handle all of these permutations. And in order to be able to change these files, since we can't change the asset path because the HTML is static, we have to keep the time to live, the cache expiration timeout, very low on these files, which is a performance bad practice. And we also don't really know what pages on the site call which assets in general. So I want to talk about our CSS history. We didn't really have a CSS framework, which means we have a lot of repetitive styles. Files were imported basically as needed. Style sheets approached. The import limits in IE, 31 total and maybe up to three nested levels deep. There was inconsistent load style. We were using link tags in one case, and right after it, using a style tag with an @import directive. And basically, we also have to deal with web producers who who go through, and they're able to modify the look and feel of our page on the fly as the news happens. So they can be dropped into our content system and appear anywhere on the page. In the JavaScript department, JavaScript was very ad hoc. Since most of the dynamicism was on the server side, we basically used JS basically as needed to fix cross-browser issues, to do widgets that respond to events by the user in the browser. Most of the variables and functions are global. Script tags are dropped wherever they were needed. You would define a new div, leave it empty, and you'd put a script tag right after it. We didn't defer execution, another performance no no. We didn't have a DOM framework to handle any cross-browser functionality. That framework war happened and we still hadn't picked one yet. Some words on images. Basically, the UI images that are part of a product would get uploaded individually to production. You could easily forget how to do that when you were deploying, so they would go missing for a little while. Our content photography is extremely hard for us to optimize. We have a lot of pride in our photography. There's a lot of resistance in making them smaller and not featuring them as largely on our pages. We did some experiments with lossless compression. We actually, for color novices, we noticed some colors changing in whatever these tools were. There's a whole department here that does color correction, and the red walls don't appear in their area so that they don't have a little bit of red dilution on their screens to do this color correction. It's basically a manual process to resize these images. We have a lot of them. And in order for them to be fast and have them be smaller, people have to cut them and make sure they don't crop something out of the picture. Because it's manual, it doesn't happen a lot. Other concerns for the business. We have these editorial constraints. Like I mentioned earlier, producers have a need to adjust based on what's happening in the news. So we use this freeform module where they can drop any bit of HTML code. It doesn't even have to be balanced tags, so this can easily mess up our page. But they have the freedom to do that. And we have to be able to handle styles and scripts occurring in them. And ads, of course. They're top-level objects in our DOM. They're not treated in any special way. They bring their own copies of DOM frameworks in. We get older versions of jQuery. We saw an ad that tried to encapsulate prototype which, if you're familiar with how prototype.js works, it modifies built-in objects in the browser. You can't actually contain it in a box, so it didn't really work. Ads animate as soon as they load. We get a lot of painting issues when animations happen while the page is loading. And basically, they are higher in the DOM order as the page is loading than our own logo on the homepage. You'll see two ads, and then you'll see our logo if you look at the source. So just to caveat all of this history, it wasn't really entirely all bad. We do use a CDN. We've had Akamai for a number of years. It's been great. We did turn GSIP on in 2007. The bandwidth savings were actually immediate, and there were a lot of really happy people as soon as we did that. But it wasn't on previously, and it's worth mentioning that we did turn it on. So a new wind for our efforts, and what we were trying to do starting in 2009 to make the site faster. 2009 was the year we were focusing on tools and libraries. We basically built a new build system for frontend files. And this was our first foray into doing automatic builds. We used to manually untar uncompressed files in production. And we were actually pulling changes from version control as opposed to somebody's home directory. They would arrange the files the way they wanted them to appear, we would package them up, and someone would unpackage it. Now it's coming from version control directly. There were semi-stateful rollbacks. There was a rollback mechanism, but you could really only roll back the latest one. If you tried to roll back any before that, you would end up in an unknown state, and that was really bad. But the really exciting bit of it was that there were hooks for build scripts. And we can expand on this process basically dynamically. So we did just that for our CSS. It was the first system to use those new hooks. And we wrote a bunch proprietary scripts to not only unravel the ad import directives to make one concatenated file, we wrote a script to also try and to remove some of the white space and the comments. This was all downstream from development work, which is also a really good point. We didn't want our developers to feel that they couldn't comment their work adequately. While we were doing this, we were working on our CSS framework. We took the concepts from our design, and we turned them into these base files that can be used across the site. We split code into reusable modules and actually tried to reuse them in a sane way. We aim to have a single entry point at the top of the page. While we're doing this, we were also trying to lock down our JS development. We introduced a top-level namespace called NYTD, and the module pattern, also called the immediately invoked function expression. This is a great way to wrap all your variables in a closure, although thanks to Colt, I'm now afraid of closures. We introduced prototype.js as a DOM library. This one bullet point doesn't really reflect the amount of meetings and discussions and debates and fights we had over this decision. But prototype.js won out in the end. We also tried to write our own proprietary asynchronous JavaScript loader in the days before RequireJS. LABjs, any of these loaders, they didn't exist. We tried to write our own. It didn't really work. It became a synchronous JavaScript loader. So we started with our homepage, and there's a number of reasons why. It's published very frequently, so this problem with legacy markup really doesn't become an issue. It gets the most traffic, so it has the potential to save people the most amount of time. It has an isolated implementation. Basically, if something went wrong, nothing would go wrong on other parts of the site. And it's viewed very heavily internally. If something were really bad, it'd get noticed before it got out into production. So November, 2009, we put our CSS optimizations on the homepage. It saved 25 HTTP connections to what are very small CSS files. The first paint occurred a whole second earlier, and the homepage really felt faster. And we are very happy by this. So in 2010, we wanted to continue these efforts. The CSS framework basically rolled out to our section and article pages. Again, we were focusing on a single entry point. But we never optimized them. There were some concerns with these proprietary scripts. And it turns out later, when we did an audit, that we were right to be concerned. There were references to missing files. There were @import cycles. A would include B include C include A. And if you're ever curious about what the browser does to get out of that, I still don't know the answer. And there were syntax errors. There were styles not getting applied because they were typed improperly. So we decided to build a new JS build system in 2010. We forked the codebase. We wanted to create a clean build. And only code that was modular and namespaced would be able to be in this build. Basically, we have these manifests of files that had an @import-like syntax in a JS comment so that they would be inert if the build process failed. We used YUI Compressor for minification. We no longer want to write our own scripts. This was well supported at the time, so why not? We were focusing on two build files, one at the top, one at the bottom. If you're familiar with the best practices, they say that you're supposed to put your JS at the bottom of the page. So why did we have them at the top? If you remember that editors can put JavaScript on the page. If we wanted them to use our frameworks, it had to be there before they could actually run their code. So we put stuff at the top to make sure that that wouldn't break anything. This build process was executed manually, so it didn't see widespread usage. It's basically almost only on the homepage. And now we have two code bases and still no automatic build process. But when we did launch these JS changes to the homepage, it was pretty drastic. We got a 50% speedup, six seconds down from 12. We were very excited by that. So 2011 was a bit of a slow year. As you might remember if you're fans of "The New York Times", we introduced digital subscriptions in 2011. So resourcing for this kind of work was really, really hard to come by. Oh, hello. I don't know what I just did. All right. There were efforts to optimize our analytics packaging. We have a lot of analytics packages, and they were kind of doing some really bad stuff. They got aborted, mostly because we were afraid to break these actual reports. We wanted to make sure that the business was able to analyze things correctly, so we didn't continue that. So we used a series of code reviews and style guides to make sure that we were enforcing good standards and best practices to keep the status quo. But at the end of 2011, we made the decision. We decided to replace prototype.js with jQuery sitewide. And as you can imagine, this is really, really hard. We needed both of them to coexist during the transition. We weren't going to rewrite everything and launch a new site on a new framework all at once, so we needed both of them to basically be around while we did this. But in 2012, a group of five developers basically set out to try and excise prototype.js from article pages. And we were making really, really excellent progress. Things were going great. Hours and hours of work. And then we found out there were some inline scripts that made references to prototype.js. And if you remember that we have these static pages with markup that we can't change, we really have no idea of what's calling prototype.js in inline scripts. And so therefore, we're not ever able to remove prototype.js fully from our site. So now we have two frameworks. They both download. They're both still in use. And this actually causes some really strange bugs in older IE, where you don't have a get elements by class name. Prototype.js would add one, and jQuery will detect it and use it. It returns prototype objects and jQuery says, I have no idea what these are, and fails. So 2012 also marked the end of life for this proprietary build system. We basically migrated to a system that mirrored our entire SPN repositories via Hudson. We wrote a new script to concatenate these CSS imports. We needed feature parity with what we had before. So we had to write something new to make sure that we could handle it for the homepage, which already had it. It does have all these features, although CSSLint is added on the list to catch syntax errors that didn't exist until 2012, I believe. It uses YUI Compressor to do minification, and also has to be run manually. So again, it's really just limited to the homepage. So finally, I'm going to talk a bit about some efforts in modern times of what we're trying to do to really fix this problem. So we were really gifted a very rare opportunity by the business. In March, 2013, the "New York Times" announced that it was going to introduce a new article redesign. For us, it's more than just a new user interface and a new user experience. It's really a technological reboot. We've changed everything about our systems in order to deal with some of these legacy issues that we've had. So the best thing-- and I have an exclamation point here. And it really doesn't represent our excitement by this. We have dynamic pages now. We no longer have to deal with an unknown number of permutations. We can actually change our asset URLs on the fly to bypass the cache as soon as we do a deploy. We pushed a lot of the user customization into the client side. We're not doing it on the server, so we can actually cache our pages better on the backend. We introduced HTML 5 Boilerplate and Modernizr to give us a nice starting point for these new pages. So now we have a modern build system using Grunt. It can be run both in our developer sandboxes and in Hudson to prepare a build. And most of these web performance best practices are actually available as Grunt tasks already, which made it really easy to integrate some of this great advice for our code. We're using RPM packaging for deployments. They have very great rollback support, and they're really easy to use. So for style development, we switched to using LESS. LESS itself handles concatenation. Minification is supported by YUI Compressor through the Grunt task. So this was really easy to get, these two wins. Variables and mixins allow us to do some really interesting things in CSS that we had to do with images before. Arrows, buttons, data URIs, and spriting. So earlier this year, I wrote a Grunt task that automates the creation of spritesheets. I don't really know what the diction is around sprites and spritesheets. And I don't know. What do you call the individual images in a spritesheet? I have no idea. So I'm making up the terminology. Basically, the task takes in a bunch of images and exports not only the combined spritesheet, but also a bunch of mixins and LESS with the background coordinates, which makes it not only really easy to make these combined images-- you don't have to maintain them manually-- it's really easy for developers to use them in the code. The mixins are named after the image itself, so you just put the image name and you basically get the sprite. And the best part about it is if the coordinates change because we've added a new image, and if it's alphabetical and you added a C, right? And all the images shift a little bit, right? All you have to do is rebuild the system, and all the coordinates are updated. Nobody has to go in and actually figure something out and change anything. And we're currently combining about I think over 100 small images into one. That's 99 saved requests to very small files that we need for our new design. I'm not showing it here. You can check out the marketing page to see what it looks like. Just a heads up. For JavaScript, we switched to RequireJS. This handles, again, concatenation and minification in the Grunt task. And we also get source maps, which is something we're really excited to use for our development. We no longer need a global namespace. That's why the slide is titled that we're deleting our window.nytd namespace. We get asynchronous loading of JavaScript. And we basically decided to focus on larger build files so that once you've established a connection on mobile, you want to make sure you keep the latency low. That the initial handshake can be really high. So we keep these large build files around instead of using a bunch of individual RequireJS loads. Since we moved a lot of the client customization to the browser, we needed to make sure we could template properly. So we use Underscore JS Templates instead of building with the DOM or any HTML strings. And its way faster to precompile. You're essentially running a JavaScript function, which is optimized in the browser. Grunt can automate this compilation as a built-in task to do it. And we turn them into a RequireJS module that gets included as part of our build process. So we use Backbone to organize everything into modules. They are reusable and they are shared across applications if we would like them to be. But the real win is that they give us a bunch of inherited functionality. So we can abstract out touch events and analytics or some common element references so that we can-- we won't have to look them up all the time, and we can get this functionality and everything very easily. We get event delegation, fewer event listeners, so things run a little more smoothly. And the modules communicate with each other using events, so you can publish events or you can subscribe to another module's events to find out about changes. So this is the big topic. What do we do about ads? We took ads out of the critical path. What we do is on DOM ready, we write the ad markup into a new IFrame. You just take the frame's document, and you document.write it into the frame. And what happens if the ad itself calls document.write, you don't lose your whole page. You would just lose the content of the IFrame, which is blank to begin with. So we don't care what they do once they're in the frame, once they load. If they decide to bust out of the frame, they can. But we have this notion of trusted ads that aren't framed. And trusted basically means that they specifically don't do things that are bad for page speed. And also if there's actually a problem, it's very easy for us to get in touch with them and make the change. And this has been really, really effective for us. So this is the results page. I thought it would be great to show you a comparative slide of our waterfalls to show you how really dramatic this is. We're not quite ready to show you that information yet. It's still being built, so we're not quite ready to do that. But to give you an overview. Before starting to render the page, it used to download over 60 requests. It's about 10 now, maybe even fewer. Before we're fully loaded, we had to download 200 assets. It's about 70 now. Start render is under one second in Chrome, and document complete is under three seconds in Chrome. So this is a huge win for us. Before, previously, document complete would be about nine seconds on our article pages. So to answer Steve's question about is web performance too hard, no and yes. It's really not that hard. The optimization practices are really not that difficult to begin with. And there's excellent tooling that exists to make it really, really, really easy to do. However, there are some unique challenges that I think not just big websites, but companies tend to have their own challenges that people can't solve their problems for them. And not to mention any legacy decisions such as ours, where we have our static pages that was holding us back. We found that we did this negative feedback cycle. We had this bad, this poor setup. And we tried to this proprietary fix to make it better. And we found ourselves slightly worse off than we were before, even though we were getting slightly faster speed. So here's some stuff that we learned, share some advice. It's really worth it to fight to solve these big, difficult problems. They are the 800 pound gorilla in the room for a reason. Fight to solve them. Once they're out of your way, it really opens the path to making this a lot simpler. We live in a world where you don't have to rely on proprietary systems anymore. You might need some because it's your business, so you may need to make sure that you have all your requirements taken care of. But a lot of the tooling is just to make this very simple. I always say that you should HTTP connections before you count bytes in every file. Do the big easy wins first. It's really easy to combine files together than it is to figure out what code is old and needs to be removed from your files. And automate everything. As soon as you introduce manual to the equation, it really doesn't get done. So if you can automate everything, it's by far going to pave the road and make it really easy. So that's it. I think I'm running out of time. Thank you very much [APPLAUSE] MALE SPEAKER: [INAUDIBLE]? ETIAN KONIGSBURG: I don't think I have time for questions. But I will be around at dinner, so if you want to come find me. And I can talk endlessly about this, so please find me. I will share all of my knowledge.