Tip:
Highlight text to annotate it
X
FRED SAUER: The rate at which you can write the same entity
group is limited to one entity group of write per second.
And that seems, the developers writing it, that
seems really low.
They were imagining a Facebook application with 200,000 daily
active users, which means something like 20,000
concurrent users on peak.
So they're kind of contrasting this and saying, well, on the
one hand, I have 20,000 concurrent users all making
requests at the very same moment, and you're telling me
one per second?
Where is the disconnect?
MANDY WAITE: I think it depends on what your entity
group was actually developed to represent and how
extensively you've modeled it and the way you've
modeled your data.
If your entity group is really specific to the particular
user that's making the call, then you won't really have to
worry about it.
One write per second would be perfectly adequate.
But if you kind of sprawled it a bit and the entity group
touches multiple users, then you're likely to have some
contention on that entity group.
And so you have multiple users banging away at
it at the same time.
But really, it's best practices at modeling your
entities that will actually avoid that kind of issue.
So if you have 20,000 concurrent requests, you're
likely to be accessing 20,000 different entity groups and
might have an issue.
FRED SAUER: And that should be fine.
If you want to do 2 million concurrent users on 2 million
different entity groups, that's absolutely fine.
MANDY WAITE: Absolutely.
FRED SAUER: To think about entity groups as for the most
part your unit of transactionality.
MANDY WAITE: Yeah.
FRED SAUER: So if you need some data related to a given
user, like a user and their achievements, for example, and
you have those stored in three or four different entities,
you can put them all in one entity group, and then the App
Engine Datastore will make sure that you can only have
one transaction in flight at a time for that entity group.
So generally, what we see is that each user becomes an
entity group or each order in an order entry system, or each
customer in a CRM system becomes an entity group.
We actually also have, we didn't have this initially,
but we introduced this again about a year ago, something
called cross-entity group transactions or XG
transactions.
And that allows you to transact up to five different
entity groups in a single transaction.
MANDY WAITE: Oh, OK.
That's awesome.
FRED SAUER: So it used to be the case that before we had XG
transactions, there was a little bit of this trade off
between, I want to make my entity groups bigger because I
want to do transactions, but I need to make them smaller and
have the right throughput.
And that was sometimes a challenge.
There were, in fact, libraries that sprung up that tried to
figure out how to--
the classic example is, I have two bank accounts.
I want to move $10 from this bank account
to that bank account.
I need to do that within a transaction.
And if I deduct $10 here and then add $10 there and
something goes wrong in the middle,
the $10 would disappear.
Or if I add $10 first and then remove $10, I've created $10
out of nothing.
MANDY WAITE: Oh, I like that one.
FRED SAUER: Well, let's do that one.
With cross-entity group transactions,
that's no longer a problem.
You can actually, in a single transaction, make that change.
So really, I think this is all about just structuring your
datastore in such a way that you do no more than one write
per second.
Another kind of classic way that people run into this is
they'll do something like they'll create a site counter.
They want to know how many visitors came to the website.
MANDY WAITE: Exactly.
FRED SAUER: Favorite example, right?
And so every time a user comes into the website, they
increment the counter.
And this is what you would do in a SQL database.
You would kind of increment a
particular row in the datastore.
And the problem is only one person at a time can update
that one record because that one record is on disk
somewhere and there's some server responsible for it, and
it can only touch that record one transaction at a time.
And that's really limiting for the number of
things you can count.
And so a typical strategy that you used for the App Engine
Datastore is to create something called a sharding
counter, sharded counter, where you partition the
counter into multiple counters.
So instead of say 1 counter, you split it out and you say,
OK, let's make 5 counters or 50, some number end that's
configurable.
And now, every time someone comes to the website, I'm
going to at random pick a number from one to five and
then I'm going to update that counter.
So let's say it's counter three this time.
The next visitor comes in, oh, it's again counter three.
Oh, now, it's counter two.
And you say, well, that's weird because now your total
page view count is split up around
five different counters.
But that's not a big deal, because you can easily select
five numbers and add them together.
This is a very easy task for a computer to do.
But by doing so, you've just increased the throughput of
your web counter five-fold.
If you need 50 transactions per second, you go 50 fold,
plus a little bit buffer, so maybe you go 60,
something like that.
But it's very easy to shard your counter out as far as you
need to go.
MANDY WAITE: And there's actually examples on the
website, isn't there?
FRED SAUER: Yeah, it's our article.
MANDY WAITE: In the development documentation.
FRED SAUER: Yep.
So with a little bit of data modeling, you can do as many
concurrent users as you like.
And hopefully you do a lot more than 20,000.
But 20,000 is pretty awesome.
I wish my website had--
MANDY WAITE: 20,000 is pretty good.
Yeah.
Well, I guess the message here really is avoid
shared mutable state.
I love shared mutable state all the same.
Every time you share a state that's mutable, you're going
to run into problems with concurrency.
So just keep it isolated.
FRED SAUER: If you do that, you're going
to have a bad time.
MANDY WAITE: You're going to have a bad time.
Absolutely.