Tip:
Highlight text to annotate it
X
>>Tony Voellm: We are definitely in the final stretch of GTAC 2013. We have two more lightning
talks that I'm going to introduce for you. And then we're going to have an academic talk,
and then we're going to round out and finalize the day with security.
So definitely stick around. I have a couple more of these Droids underneath the podium
here. >>Tony Voellm: So with that, I'm going to
introduce Yvette Nameth and Brendan Dhein. And they're going to be talking to you about
continuous maps data testing. Here you go, Yvette.
>>Yvette Nameth: Hi. I'm Yvette Nameth, and this is Brendan Dhein. And we are both Google
testers on Google Maps. If you didn't see the video earlier, you can take a look, but
I was the person in the lovely video that Tony showed at the beginning of the day.
And this talk is actually going to talk a little bit about giving you a hint as to how
we actually do the testing that I was talking about there.
So why are we doing this? Well, take a look. Something might be a little
wet in Mexico. Maybe a little global warming going on. Maybe the entire West Coast of Mexico
is entirely flooded. This is what can happen when a maps data bug
actually occurs. And this is not actually a software bug. This is actually the raw data
that we're using to build the maps images. So let's talk about how maps get rendered.
Well, we have all this data that's in a large repository. It's coming in from all these
different feeds and creates that world data repository that you see on the left.
In the middle, we have a data processing pipeline, sometimes known as our rendering pipeline,
which generates images based on all of the different features that are in the world data.
So things that would be in the world data would be features such as locations, cities,
restaurants, roads, et cetera. Each one of these has an associated geometry which we
need to then create a style for, which would potentially be like a polygon for a park,
containing a fill color and a stroke list, and a label. So if that raw data is crap,
obviously, the map coming out is going to kind of look like Mexico did. It was missing
a big chunk of land. We're at a testing conference, so what about
testing this data? We actually want to test every single piece independently. We can't
just test the end product. I am currently primarily focused on testing the end product.
But in order to get there, I had to first test the world data with Brendan.
>>Brendan Dhein: Cool. So, moving on, there are some patterns and antipatterns we wanted
to share with you. I'm not sure how many of you have seen that
photo of the donkey lying down in the sand. This was actually taken from a Street View
car a while back and made its way around CNN. Just to set the record straight, the donkey
is still alive. It is not dead. It actually was set up there, and it's just taking a nap.
How does this parlay into data testing? Well, if you notice, you can test a lot of
things in this picture. You can test whether the donkey is dead or arrive. That would be
a big feature. Or you can test if every single grain of sand in the picture still exists.
And you can see this sort of with data testing as well.
Let's say we have New York City. Do you want to test the exact geometry of every ZIP code
in New York City? Do people care if the geometry shifts one bit or another? Or let's say do
you want to test the name of every subway station?
You want to make sure that important features exist, like New York's still labeled. Granted,
for user experience, some things don't really matter. And part of the trick in doing data
testing is determining what matters and what doesn't.
For a naive approach, you could easily do a simple diff test. This sounds great initially,
and you're going to be, like, oh, yeah, I'm going to test every feature.
That won't work. And it doesn't take too much to realize it. But just to throw some numbers
out, let's say you had 100 million different geographic features in your data dump. What
happens if you had a 1% failure rate. Would you be able to triage that in a reasonable
amount of time? And by "reasonable," we mean time to make
a launch. Things that you really care about. So moving onward, that's all great, but how
might you come up with a reasonable solution? And this picture here, I picked it out, actually,
from my Christmas vacation to Barcelona. It's the Sagrada Familia. It's been under construction
for ages now. It will probably be under construction until we're all dead and beyond there. You
can still see the cranes there. And they're actively working.
And that's sort of like data testing. You need to start and you need to have some sort
of a plan, but you need to keep building up your corpus as time goes on.
You want to keep up and go through and test and test and test, but you want to do it carefully.
You don't want to do a simple diff test. You want to look at things like statistical analyses.
I mentioned that, like, if every subway station changed in New York City. That's something
you might care about. I mean, there was actually an episode on Google Maps where, for a very
brief period of time, we might have displayed additional subway stations in New York City,
like, by a factor of two and chose some really awkward names for the subways.
[ Laughter ] >>Brendan Dhein: For those of you who don't
know, apparently the New York City subway system was actually contrived of two or three
different systems. And when you get data from your third-party provider, they may use the
historic names. So we were displaying some IRT and BMT names. And when you had stations
that, in our view of the world, sort of joined together, well, we stopped doing that.
And that's sort of the type of data you want to avoid.
And you can do that with some structured diff tests that test exactly what you want.
You want to look, do my names still make sense for these critical features. You would want
to, say, test New York City exists, Washington, D.C., exists. And that's sort of a basic smoke
acceptance test. What you want to do, though, is, assuming
that those pass, you also want to look to see if the state of the world has changed
or not. You want to look, and you want to see, oh, have my oceans changed with -- or
within a given region, has the makeup of the region changed?
Let's say you're doing a data dump and you're processing it for export. Would you find it
odd if, say, a given city doubled its number of aquariums? Might be a bit concerning.
And along those lines, let's say you had a city that had a lot more area of airports.
Maybe an entire airport went wrong and is now covering a city. It can happen.
So if you try to build up tests like this, once you build a statistical corpus and divide
the world into regions, you can make a lot of progress.
What you can do is you can actually take each geographical region, say, a Metro area, or
something you know you care about, and it's something that's small enough to be understandable
by a human, but not large enough so that you're completely overwhelmed by the details of what's
changed, and use that as a basis for analysis. Then you can think, well, if I had a manual
tester, what would my manual tester be doing? My manual tester would probably look through
the area and say, well, do all my roads still exist? Do I still have the lake that's in
the middle of the city? Do I still have parks? These are tests that you typically see a manual
tester do. Now, we have computers, and this is GTAC,
we're trying to automate this. Could we perhaps have a system that goes through and looks
and listens and just tries to interpret what's changed? Has there been a dropout? Has there
been a suspicious gain? Have you gained an entire new set of features?
Now, assuming you can do this on one region and you get parsable, understandable, and
human-scale results, you could probably speed it up even. I mean, we're trying to do this
fast. Each region can actually be executed in parallel.
Now you have a testing architecture where you have locally specific outputs that actually
have quantifiable and reasonable results that you can understand and interpret before you
need to launch. And that's sort of the sweet spot you want to be in for data testing.
And on that note, we move towards frequency. >>Yvette Nameth: So now that you know that
we're going to be doing map reduces over all these different areas, how often should we
do this? We've got this very, very large repository
that takes a long time to make. And by "a long time," I don't mean minutes. Everything
in the Geo data world, whether it's that rendering pipeline, which can take up to a day, or data
processing prior to that pipeline, takes -- we're talking hours. Some of these processes take
days to do all the correlation. So how often do we want to do this?
Well, we push map tiles on about a monthly basis. But if you're trying to test the data
on a monthly basis, you've got a month worth of changes that come in from all the different
changes that are happening around us in the geographic world. These are coming in all
the time. We have this product called MapMaker that's pushing data changes to us all the
time. We're getting new feeds updated all the time with batch changes.
So that would be like trying to drink from a geyser. It just would really suck. It would
kick you back on your ***, you'd be bloody, and you'd be picking yourself up and fighting
fires, trying to get some version of the world that was actually a legitimate representation.
So we're not going to do that. And, you know, we're all overachieving testers.
So we think we should test every change. And that actually really, really sucks, too. Because
like I described, things are changing every millisecond in the data, and they might be
very small. And doing this parceling out of testing into MapReduces actually takes a long
time, too, unfortunately. So every change is kind of like drinking from
a waterfall. It's just this never-ending barrage of water. So we're not going to do that.
We kind of came up with this compromise that said daily. Daily gives us a really good signal.
It's something that we can look at and say that on this day, we see that all these parks
disappeared. What happened on that day? Was there a batch change to a feed that might
have parks? Was there, you know, some public data changed about them? Did some user come
in and decide to just delete all the parks in Washington, D.C.?
We don't know. But now we at least have a starting point that sort of gives us a little
bit of a temporal location. So we finally have our happy little water fountain to drink
from. And that's what I'll leave you with.
[ Applause ] >>Tony Voellm: Great.
Great. Thank you Brendan and Yvette. So we have Q&A so you can line up and ask
questions if you like. We can take one live. And, like, in five seconds, I will have the
moderator in front of me. My takeaway? Wow, you get to go to Hawaii
a lot. That's -- >>Yvette Nameth: That's Australia. This is
actually outside of Melbourne. There are baby Penguins and little Penguins that you can
find under your car. >>Tony Voellm: I think there are two questions
over here. But they're deciding who's going to go first.
Please. >>Yvette Nameth: Don't be afraid.
>>> I'm just curious about -- I assume users will sometimes find errors, since you can't
check everything? But how often would you say that your tests
find errors in the data as opposed to the users finding errors?
>>Yvette Nameth: I think the scale of the error that a user finds versus the scale of
an error that our test finds, we're talking massively different.
Like Brendan said, we're checking for the really big things, you know, the things that
have to be right. Like, the Eiffel Tower has to exist on the map, because a majority of
users would notice that. We're not testing that, like, your parcel of land, like, that
little outline, is exactly correct. Which is what, like, you know, my aunt complains
to me about when I tell her I work on Google Maps. But how am I ever going to know this?
So I would say that we catch a majority of the large ones. I don't know if I would give
a back of envelope percentage for that. And I would say we catch very few of the small
ones unless they are systemic, like every parcel of land is misplaced.
>>> Okay. >>Tony Voellm: Great. Next question up over
here to the right, please. >>> Hi, my name is Igor. I work for Nokia
maps. And my question is, do you compile data? If
yes, do it as both raw and compile data? >>Brendan Dhein: That's an interesting question.
And let's think about how to answer this correctly. So in terms of do we test our third-party
providers coming in and the actual data package that goes out to maps?
>>> Exactly. >>Brendan Dhein: Yes.
>>Yvette Nameth: So, yeah, we test both and can't really say much more than that.
>>> Thank you. >>Yvette Nameth: I really like my NDA and
my job. [ Laughter ]
>>Brendan Dhein: Exactly. >>Tony Voellm: Yeah, I always like these questions.
Great. We have time for just one more. So please.
>>> I'm Dylan (phonetic). I work for Google. High-level question. You talked about data,
and before now, we have mostly been talking about code. Does the situation come up where
you have a data push that's totally valid that suddenly sets off a code bug that maybe
has been in production for months, you know, and other mitigation -- The question is, what's
the mitigation strategy for that? >>Brendan Dhein: Depending upon how bad the
code bug you found, very fast roll-backs are essentially very key to doing a data push.
We want to be able to see, if we do have a problem in production, to turn it off. But
also canarying your data and just following good release hygiene and trying to simulate
any type of failure mode before it actually hits the user.
>>Tony Voellm: Great. >>Yvette Nameth: And that is the second most
common type of bug after the raw data bug. So....
>>Tony Voellm: Great. And with that, thank you, Yvette. Thank you,
Brendan. [ Applause ]