Cs169 V13 W12l1s7

Speaker 1: A little wrap up about what I'm calling dev- ups. Dev- ups is a new category because like we said, when you're doing shrink wrap software, the idea that operations and maintenance would be your problem was generally pretty alien. This is something that's come across really because Software- as- a- Service has taken off so much where as a developer you cannot be completely ignorant of the operational things that come up in terms of performance and security and vice versa. What are some standard pitfalls to avoid as you go forward? Premature optimization is not the root of all evil, but it's pretty close. It's great that speed is a feature that users expect. It's great that we've done all this work on how to speed things up, but remember that until you've actually measured your application, you're pretty much otherwise just guessing what you'd want to speed up. We've looked at systems like New Relic which is a great way to do monitoring of all kinds of things in your application. They also write a pretty good blog that's just good helpful advice, but roughly speaking if you're not monitoring it, then either it's broken or you don't know if it's slow or not. Don't think about how can I make this run faster unless ( a) you've written it really idiotically to run slowly, like you've written it pathologically badly. If you've got an N+ 1 query problem, that's just pathologically bad. You should fix those things, but if you're concerned about whether some query is fast or slow compared to some other query, wait until you go to production and see where the problems actually are before you try to prematurely optimize. There's a standard, I don't know, a fallacy I guess that because we're doing cloud computing and we have these three tier apps where you can scale the tiers, that that relieves us of having to think about scalability. I think what the message of this chunk of the course is that there are some things you can do to help yourself stay in a tier where other people worry about scalability up to a certain point, but we know the database is particularly hard to scale and even if you do scale it, you still want to get as many expensive operations as you can out of the way because you care about response time. All of these techniques both help you get good performance and stay in the pasture longer. If you find that you outgrow that tier, you're going to just be solving these same problems but on your own. All of the same concepts, same types of measurements, all that same stuff applies. You still have to think about caching. You still have to design your apps so that the unit of cachability matches what's convenient to serve. Don't combine for example filter and non- filter behaviors in the same controller action because that doesn't allow you to selectively decide which entire actions to cache. That level of thinking is important whether you're going to be running in the past tier or whether you end up having to rebuild them in the stuff yourself. Another standard argument that if we're building a small site that has limited user audiences, why would anybody want to come after my little site? People don't store credit cards there. I have no sensitive information that anybody could monetize. Most of the time the hackers aren't necessarily after your site or your data, they're after your users. Increasingly there's a really fascinating emerging trend, disturbing but fascinating I guess, that there's an entire underground economy in essentially selling access to user's machines. What that means is if I can get you to visit a site and by virtue of visiting that site, you become the victim of a drive- by download of some malware, I can monetize that. I'm not even the guy who wants to install the malware. I'm just a broker. I'm selling . for X number of dollars, I will sell you 10,000 attack vectors that you can just start harvesting users that happen to visit those sites. Your site is actually not interesting in and of itself. It's interesting as a vector for spreading a disease if you will and that's the real reason you need to worry about this stuff. If you stay current with the best practices, if you stay on top of the notifications that your pass provider sends you, pass providers by and large do a very good job of dealing with infrastructure stuff. Even though incidents do happen, when they do happen, they tend to get addressed very quickly by people who really know what they're doing. Another reason that I think on the whole it's a benefit that most developers can take advantage of sites like Heroku and AWS rather than having to manually build all that infrastructure themselves. Prepare for catastrophe. You have to assume that at some point your site will be compromised and or your database will be compromised. You have to be able to restore it without losing very much data. Depending on who you're serving, it may be that having the previous night's backup might be enough, maybe a few times a day might be enough, but have a plan because its' absolutely going to happen. In fact I think an interesting extreme of this is at Netflix which is Amazon's largest customer. Netflix owns almost none of their infrastructure. All of the movie streaming that they do is all served off of Amazon's compute cloud. Not only do they prepare for catastrophes, they actually induce artificial disasters in their virtual data centers to make sure that they always are practicing the recovery procedures. It's like the equivalent of periodically crashing a plane so that you know how to recover from crashing the plane. It's not a simulated thing. They actually create a real emergency. I'm hoping to get a televised interview with Adrian Cockcroft who is director of engineering there and has a great story about this stuff. way back in 2007 or 8 when Amazon web services had this very big outage that lasted for over a day and it affected thousands of companies that were running their sites on AWS, Netflix was almost unaffected and it was in part because they had built out this infrastructure that not only had redundancy, but they had been practicing all the processes that would be used in case such a disaster ever happened. There's no substitute for that. Last question to wrap up our discussion of dev- ups; users are sporadically complaining that your site is slow, yet you've got New Relic monitoring turned on and they're reporting that the traffic levels aren't anything unusual. CPU utilization is nothing unusual. What is the most likely cause? There's not enough Dynos on Heroku so requests are getting stalled at the front end. There are some queries that are unusually slow maybe because there's other apps sharing your share database so requests are getting stalled at the backend. Some views are taking unusually long to render so maybe they're getting stalled in the browser because of JavaScript or not enough information to determine.