Tip:
Highlight text to annotate it
X
Speaker 1: A little wrap up about what I'm calling dev- ups.
Dev- ups is a new category because like we said, when you're
doing shrink wrap software, the idea that operations and
maintenance would be your problem was generally pretty alien.
This is something that's come across really because Software-
as- a- Service has taken off so much where as a developer you
cannot be completely ignorant of the operational things that
come up in terms of performance and security and vice versa.
What are some standard pitfalls to avoid as you go forward?
Premature optimization is not the root of all evil, but it's
pretty close. It's great that speed is a feature that users
expect. It's great that we've done all this work on how to speed
things up, but remember that until you've actually measured your
application, you're pretty much otherwise just guessing what
you'd want to speed up. We've looked at systems like New Relic
which is a great way to do monitoring of all kinds of things in
your application. They also write a pretty good blog that's just
good helpful advice, but roughly speaking if you're not
monitoring it, then either it's broken or you don't know if it's
slow or not. Don't think about how can I make this run faster
unless ( a) you've written it really idiotically to run slowly,
like you've written it pathologically badly. If you've got an N+
1 query problem, that's just pathologically bad. You should fix
those things, but if you're concerned about whether some query
is fast or slow compared to some other query, wait until you go
to production and see where the problems actually are before you
try to prematurely optimize. There's a standard, I don't know, a
fallacy I guess that because we're doing cloud computing and we
have these three tier apps where you can scale the tiers, that
that relieves us of having to think about scalability. I think
what the message of this chunk of the course is that there are
some things you can do to help yourself stay in a tier where
other people worry about scalability up to a certain point, but
we know the database is particularly hard to scale and even if
you do scale it, you still want to get as many expensive
operations as you can out of the way because you care about
response time. All of these techniques both help you get good
performance and stay in the pasture longer. If you find that you
outgrow that tier, you're going to just be solving these same
problems but on your own. All of the same concepts, same types
of measurements, all that same stuff applies. You still have to
think about caching. You still have to design your apps so that
the unit of cachability matches what's convenient to serve.
Don't combine for example filter and non- filter behaviors in
the same controller action because that doesn't allow you to
selectively decide which entire actions to cache. That level of
thinking is important whether you're going to be running in the
past tier or whether you end up having to rebuild them in the
stuff yourself. Another standard argument that if we're building
a small site that has limited user audiences, why would anybody
want to come after my little site? People don't store credit
cards there. I have no sensitive information that anybody could
monetize. Most of the time the hackers aren't necessarily after
your site or your data, they're after your users. Increasingly
there's a really fascinating emerging trend, disturbing but
fascinating I guess, that there's an entire underground economy
in essentially selling access to user's machines. What that
means is if I can get you to visit a site and by virtue of
visiting that site, you become the victim of a drive- by
download of some malware, I can monetize that. I'm not even the
guy who wants to install the malware. I'm just a broker. I'm
selling . for X number of dollars, I will sell you 10,000 attack
vectors that you can just start harvesting users that happen to
visit those sites. Your site is actually not interesting in and
of itself. It's interesting as a vector for spreading a disease
if you will and that's the real reason you need to worry about
this stuff. If you stay current with the best practices, if you
stay on top of the notifications that your pass provider sends
you, pass providers by and large do a very good job of dealing
with infrastructure stuff. Even though incidents do happen, when
they do happen, they tend to get addressed very quickly by
people who really know what they're doing. Another reason that I
think on the whole it's a benefit that most developers can take
advantage of sites like Heroku and AWS rather than having to
manually build all that infrastructure themselves. Prepare for
catastrophe. You have to assume that at some point your site
will be compromised and or your database will be compromised.
You have to be able to restore it without losing very much data.
Depending on who you're serving, it may be that having the
previous night's backup might be enough, maybe a few times a day
might be enough, but have a plan because its' absolutely going
to happen. In fact I think an interesting extreme of this is at
Netflix which is Amazon's largest customer. Netflix owns almost
none of their infrastructure. All of the movie streaming that
they do is all served off of Amazon's compute cloud. Not only do
they prepare for catastrophes, they actually induce artificial
disasters in their virtual data centers to make sure that they
always are practicing the recovery procedures. It's like the
equivalent of periodically crashing a plane so that you know how
to recover from crashing the plane. It's not a simulated thing.
They actually create a real emergency. I'm hoping to get a
televised interview with Adrian Cockcroft who is director of
engineering there and has a great story about this stuff. way
back in 2007 or 8 when Amazon web services had this very big
outage that lasted for over a day and it affected thousands of
companies that were running their sites on AWS, Netflix was
almost unaffected and it was in part because they had built out
this infrastructure that not only had redundancy, but they had
been practicing all the processes that would be used in case
such a disaster ever happened. There's no substitute for that.
Last question to wrap up our discussion of dev- ups; users are
sporadically complaining that your site is slow, yet you've got
New Relic monitoring turned on and they're reporting that the
traffic levels aren't anything unusual. CPU utilization is
nothing unusual. What is the most likely cause? There's not
enough Dynos on Heroku so requests are getting stalled at the
front end. There are some queries that are unusually slow maybe
because there's other apps sharing your share database so
requests are getting stalled at the backend. Some views are
taking unusually long to render so maybe they're getting stalled
in the browser because of JavaScript or not enough information
to determine.