Tip:
Highlight text to annotate it
X
So the other thing you mentioned to me that you're improving is this notion of, like, memcache ejection.
Right. So the memcache plant library has the ability to notice that a memcache node has gone down
and decide that it's not going to try to talk to it anymore.
It's kind of a heuristic.
It notices, like, a number of failures happen so it just decides to back off,
and it will basically, at that point, just treat the ring as if there's one fewer node.
And we can not use that until the locking is gone, but once we've done that, memcache will automatically heal itself.
So we can have self-healing memcache, not the one memcache dies, and the whole site gets a little wonky.
Right. And we probably could do with more memcache right now,
but we don't want to add more because it increases our risk of failure.
Yes. That was something I remember, too. We're always--
Sometimes we wouldn't add a memcache because we didn't want to redistribute the keys.
Right. Just the simple act of redistributing the keys was kind of a scary thought.
Yeah. And right now, even with consistent hashing, if we add one memcache, like in the middle of the night,
the database slaves will actually be kind of unhappy for maybe an hour or two.
And that's just one server.
One thing I used to do, this is kind of a hacky thing.
I can feel people losing respect for me as I describe this up.
Whenever we'd bring up a new database slave, I would actually go into the database,
and I would have an app server that would connect only to that machine,
and then I would hit all the most popular pages.
I knew they were the most popular.
I'd do it by hand. I'd got to reddit hot, funny hot, pics hot.
And I would just load all those queries to make sure that the cache,
everything, was all up to date and warm and the database was good,
because when you bring up a new database slave,
it hasn't run any queries yet so nothing's been cached.
One of the things that Postgres does really, really nice is it manages the kind of disk memory dichotomy.
You know, some of the data's on disk. No, all the data's on disk but only some of it fits in the memory.
And you need to kind of basically tell Postgres this is the data I want in memory now.
And so we have to run these queries to get these data machines up,
because the first few times we bring up a new read slave, we turn it on
and all of a sudden, it was performing at, like one-tenth the speed of the master,
and you get monster, like, lag issues. It's a bad situation.
Yeah, and I mean, heating it up is great because then you don't have to worry about, like, the piling on effect.
Yes. Yeah. We talked about in unit 6 that cache stampede.
That shows up in a lot of different flavors.
You know, if your cache isn't warm, a bunch of people are probably trying to do that query at the same time,
and it may take 1 second for one person, but if ten people ask at the same time, it's not going to take 10 seconds.
It might never finish because they might all be slowing each other down trying to,
like, bring this data in and out of the cache, and things start thrashing. It's nasty.