Tip:
Highlight text to annotate it
X
>>
HAWKINS: I'm going to go over in this presentation the rationale for testing some of the types
of tests that we have available to us and the tools that we use to do testing. The Chrome
team has around 200 developers and we average about 100 commits a day. So with all that
code flowing in and out we need to make sure that stability remains a high priority and
one ways--one way to assure that stability is to test the code that we're running. So
one of the things that test do is make sure that the code--the feature that you're running
runs the way you expect and you can't have people hand-testing your feature everyday.
So the tests make sure you don't have any regressions that pop-up. Another thing that
tests do is they document what your code is expected to do, so that when other people
come and read your code, they understand the rationale behind what you wrote and the logic
behind it. Tests also guide design of the code in the sense that when you write test
first, you start thinking about, "How am I going to implement this way so that this tests
pass?" It really modifies--it could modify your design heavily. A big piece of the infrastructure
that we have in Chrome is--are our Trybots. So Trybots are a pool of machines that you
can send Try jobs to that will take your Change List and run them through our suite of tests.
They'll mainly run them on the three main platforms; Windows, Linux, and Mac, but it
is possible to send in a bot parameter to specify another bot you want to run in. Some
of the bots that we have available are Valgrind, Chrome OS, Linux views. So when you're changing
code that affect those platforms it's good to run those tests as well. The main test
suite takes about an hour to run on average for the three platforms. So can you meet yourself
on that side? So if you want to cut down the running time of your Try test--your Trybot,
you can pass in the T-parameter which takes an argument of the test program name, call
in the filter that you want to use. So, for example, if I want to run a specific unit
test I'll pass in -t unit tests and then the name of the test that I want to run, and that
means it will only run that specific test, which greatly cuts down on the testing time.
There's also a concept for Trybots known as Last Known Good Revision. The Last Known Good
Revision is set by the official builders on the waterfall and it's the last green build
that we had. So whenever we get a green build on all the platforms, the Last Known Good
Revision or LKGR is updated. However, the Last Known Good Revision could be behind several
revisions from what you want to test so you can pass in the revision using the -r parameter.
Whenever you upload a CL, the Try jobs used to be automatically started, but that's no
longer the case because we have so many CLs uploaded these days so you need to make sure
that before you commit you at least run your Try job for your latest patch set. That makes
sure that any failures that could happen in your CL are cut short. So you'll realize the
failures are there before you actually make the commit, that keeps our main tree a lot
greener. The link at the bottom of this page is the try-server waterfall. So if you don't
find your Try job on the CL page itself, you can head over to that link and then you can
find your LDAP or your username and you can see the status of your Try job. There are
three main types of tests and I'm going to start with unit tests. These are our lowest
level tests. They test units of the codes, so a method on an object such as Watchdog.ArmAtStartTime.
So the unit tests for that method would test all of the logic pass in that method. Good
thing about unit test is they document exactly what's going to happen in this method for
all the input parameters that you could specify and you verify the output is correct. Unit
tests are small on the--so usually they run pretty quickly and they also don't run with
other tests. So they're isolated and you only test exactly what you want and you see those
results for that bit of code. As an example, I'm going to use the AutoFill feature which
has an InfoBar for credit cards, and our far writing a unit test for this I would be testing
the AutoFillCCInfoBarDelegate class itself which is the lowest level implementation of
this InfoBar. So there's three buttons in this class and there's a link and there's
an icon. So the methods on the AutoFillCCInfoBarDelegate specify how this InfoBar is supposed to look.
So I could run those methods and make sure that I get the three buttons back, the text
of those buttons, and make sure the link has the appropriate URL. There's unit tests as
a target itself but there are also unit tests for our other major modules in Chrome; some
of which are listed here. There are actually quite a few unit tests which is important,
we need a lot of tests. The next step up in our testing types is Browser tests. Browser
tests create a browser test--or create a browser object in the test so that you can test how
your code is interacting with the browser object itself. It's actually a specialized
unit test but there's infrastructure setup so that it loads the browser automatically
and you have access to browser internals. They run in at different process and the sandbox
is disabled. There is a running message loop, so you got to take that in consideration when
you run things on different threads and the default thread is the UI thread. Another thing
that is taken into consideration when you're testing your test, running the test yourself,
is that the browser window is not visible by default, but you can specify that it be
visible. Going back to the AutoFill InfoBar, an example of the Browser tests for the AutoFill
InfoBar would be; load up an InfoBar; make sure that the InfoBar is in the tab that you
specified the InfoBar is in; make sure that there's only one InfoBar per tab; make sure
that you can load up multiple InfoBars, one per tab, but that there's more than one. You
can call the method on the InfoBar class to close--to press the close button and then
make sure that in the browser object the InfoBar is no longer there. UI tests are high level
end-to-end testing integration test. They're an example of black-box testing and that you
don't have access to object internals and you have to implement the test through an
automation proxy there's an Automation API which allows you to control the browser. Now
there are several different ways we use the Automation API. The main way is UI tests which
are test written with Automation API. There are also automated UI test which use fuzzing.
They send--there's a list of commands that are implemented by the Automation API. For
example, click a button in the interface, load up a page, load up a new tab, and the
fuzzing creates a different--creates a list of these different combinations and runs them
and mutates the list. So you're basically getting all these different actions that you
can take with the browser and running those actions and seeing how they interact. There's
another test type called interactive UI tests which are necessary for the Windows Bots and
that there is--these tests interact with the UI by clicking or moving the browser around
and the bots themselves have to be specially configured to take advantage of this. The
last one is pyauto which is written by--you can write--it's a Python Interface for writing
automation test. It's a really easy way to run--to write UI test in Python. So we'd have
an API that loads a webpage with the form that we've written. The next API could fill-out
that form with data that we specify. You would have an API that clicks a button using the
actual OS infrastructure for clicking, programming the click to happen on a button. And then
at that point, you check that the InfoBar shows up as you expect. The next API could
also use the clicking API to simulate a clicking on the "Save" button or the "Don't Save" button.
For the save button, you want to make sure that the data's saved. There'd be another
API to say, "Is there a profile data? Does it match what we expected?" And if you click
on the "Don't Save" button, you also want to check that the CC data, Credit Card data
was not saved. So whenever you're running a test and it's not working out exactly as
you planned, maybe you got a crash or it's just not functioning correctly, you should
be able to debug the test. On Mac and Windows, it's pretty easy to debug test using the visual
interface whether it's XCode or Visual Studio. You just compile the test target and then
debug that test target. In Linux, it's possible to do that but in most cases you're going
to use GDB. So you pass in the GDB and then the test target itself, and then pass in the
GTest filter of the test you want to run with Args then you should be able to step through
the test and figure out why the test isn't working like you think it should. We have
a few frameworks for writing test that make test writing a lot easier. GTest is the biggest
piece of how we write tests and it has a lot of functionality for you. You can check out
the link at the bottom and there's a lot of good documentation on that. I'll give an example.
So the high level way you'd look at test in GTest is that you have a test program which
would be unit test, binary, browser test binary, UI test binary; that's your test program.
Your test case would be AutoFillInfoBar test which has mini-tests inside of it. Each test
is testing a specific piece of that functionality whether it'd be a method on the object, interaction
with the browser, or a high level end-to-end UI test. As an example of the GTest framework,
it's really easy to setup a test. You have the test macro, the test case name, and then
the test name itself. In this one, we're using assert equal which is a macro used to check
that something is how you expect. It's an assertion, so if the thing that you're testing
is not valid the test itself will stop. We also have expect equal which will not cause
a test to stop if the thing that you're checking is not true. Something that you can use to
reduce duplication of code are Test fixtures. They're a data setup for every test that you
run in a test case. I'll show you an example of that. So QueueTest is the name of the Test
fixture. We inherit from testing test. There's two methods that you could possibly overwrite;
setup, you usually want to override, and setup is run at the beginning of each test case.
In this setup, we are enqueueing some values into queues-- several queues. For this--and
for this test case, we don't need anything to be torn down, but for example, if you weren't
using scope pointers and you had allocated data, in TearDown is where you'd want to destroy
that data. So using the same QueueTest Test fixture, we have two test here; is initially
empty--IsEmptyInitially. We just expect that dequeues are empty and they are. DequeueWorks,
we load up data in our Test fixture. So if we go back here, we have the enqueues of each
of the queues. And then you expect that you dequeue the value you get back. There are
a lot of assertions that you can use. This map gives you quite a few of them, the asserts,
and then their--the fatal versions are the asserts, as in they will stop the test and
the test won't run anymore. Usually, I want to do these on things that will cause the
test to crash in another way that you don't expect. For example, asserting that a vector
is a certain size. You want to make sure because then you're going to access all the elements
of the vector. So you don't want to access those without asserting first. Expect the
expect assertions won't cause the test to crash. They'll just cause a test failure if
they're not--if they don't evaluated it true. Another piece of framework that we use is
Google Mock. Google--mock objects allow you to provide an object that you don't want to
implement fully you just want to specify the interaction with that object. For example,
in this one, we're going to--we'll start a PersonalDataLoadedObserverMock. So this is
an observer in a messaging sense. And what we have here is we're going to mock out a
method on personal data loaded which will be called by some method or by some class.
And the action that we expect to tape is--take is QuitUIMessageLoop. We have some assertions
inside the action and whatever action we need to take so we're going to quit the current
message loop in this action. In the test, we create a profile and we set the profile
info. So inside SetProfileInfo, the personal data observer is going to be called and we
expect that call to happen. We didn't have to create a whole personal data observer though,
it's just a mocked-up object with a mock method. We don't need--we don't care anything else
about the object itself, just that this one method gets called. So we have EXPECT_CALL
on the observer object that we care to be called. And then we say WillOnce(QuitUIMessageLoop),
WillOnce means we expect it to happen exactly once and we expect the action that'd be taken
as the QuitUIMessageLoop. There are will-repeatedly, which means after the call the SetProfileInfo,
the method that you're expecting--the action that you're expecting to happen could happen
several times, and that's a good way to specify that. At the end of this piece of code, we
need to run the current message loops because we've queued up this call in the message loop
and then we need to get back to that message to be called. So we run the message loop.
We have several tools for testing. One of the most useful one is Valgrind which is a
memory error checker. And so, memcheck is a tool that checks for things like leaks,
uninitialized memory, things of that nature, using the wrong delete. If you used delete
instead of delete array when it should be delete array, that'll notify you of that.
And then you could see the command, run it, right there. There's also ThreadSanitizer.
ThreadSanitizer detects date erases and the commander on that is down there as well. Mostly
though, you don't have to run this yourself and that we have Valgrind bots on the main
waterfall which continuously run these tests, they take about an hour and a half to run.
And they will catch the most--they will catch this error for us. And then once we do, we
file a bug and we notify the owner of the code that they have a memory error so then
you can run--once you are notified that you have a memory error, you can run the test
yourself to debug it. An important part of testing is Code Coverage; something we need
to keep in mind. Code Coverage is how much--how many lines of code are actually executed by
your test. So, obviously, the higher your code coverage, the more likely it is you're
going to catch errors. Although, you can never get--I mean, you can get to 100% code coverage
theoretically, but even if you do on average, you only find about half of the bugs because
they're--the number of input and output is just too much to test. So there are three
pieces of information with Code Coverage. The number of lines instrumented which are
the number of lines that are actually compiled into your test including testing code and
source code, the number of lines that are covered which are lines that were hit when
your test--when you ran your test. The missing piece of information is code that that was
a part of the source code but is not actually compiled in. And this is something you want
to take on to account for the different bots. For example, on Linux, if there was a Windows-only
test that we don't specify as Windows-only and it's not being compiled on Linux, the
instrumentation will think that these are missing lines, because they are, and that
will lower your coverage. We actually want to have the most accurate coverage that we
can possibly have, so there's a way to go in and say, "This is a Windows-only test.
We don't need to run it on Linux or Macs so don't count it against us." Incremental coverage
is a way of testing for each commit in the--in the tree how many lines of code are tested
in that commit for the additional lines; so any plus line in a diff, how many of those
lines are covered. Ideally, we want to have 50% incremental coverage. So in any commit
you have of all the new lines you want 50% of those lines to be tested. At that point,
you're at breakeven. You're not going to be losing coverage but you're not gaining anymore
coverage. At some point, we'd want to bump that up to 75%, but 50% is a tough enough
target as it is. We're currently working on adding the ability to track incremental coverage
and setting that 50% target so that it will be a bot on the main waterfall that will alert
any--any build that has a commit in it that is not 50% incremental coverage, the bot will
go red and the owner of that test will be--or the owner of that commit will be notified
that they didn't have 50% incremental coverage. And we're working on that. We're getting pretty
close to that. We do have a TryBot for coverage out there. There's a Linux bot available;
not right now. And once we get more machine infrastructure, we're going to get a Mac and
Windows bot as well. The jobs take about an hour and a half, but it's a good way to see--a
good way to find out how much your test affects your code; how many--how good is your coverage
for the test that you just wrote. There are--there are three bots up right now that run coverage
continuously even if they don't have incremental coverage. They're at the--they're on the experimental
waterfall which is the link at the bottom of the page. If you go to that page and you
search for coverage on the page, you'll see the Mac, Linux, and Windows coverage. Those
will give you our current coverage analysis and you can look at any code that you currently
work on to see how good your coverage is. One thing that crops up quite frequently with
the project of this size are failing tests. And failing can mean a lot of thing. Failing
could be an expectation or an assertion doesn't pass in the test. It could be a crashing test,
a hanging test. All of these are bad things, but some are worse than others. The fact that
we're disabling test is not a good thing because disabled test doesn't run at all. And so it's
effectively dead code at that point. There are two types of test that need to be disabled
and those are crashing or hanging test. And we disabled those because the rest of the
testing infrastructure does not continue to run. So, for example, with unit test, if you
crash on the very first test that you run in the unit test you're not going to run any
of the rest of your test which is the worst. So for those, it's okay to disable. To be
even more specific, you should use platform-specific defines that say, "Only disabled on Windows,"
because it's only crashing on Windows. And you want to be as specific as possible when
disabling. So we could still have continued coverage on Linux and Mac. Whenever you do
disable a test or whenever you change the moniker of a test to be disabled FAILS or
FLAKY, you want to make sure that you add a comment to the code, file a bug, and then
add that bug to the comments so that whoever comes to look at the test to possibly fix
it they have that bug for reference which should probably include the log to the failing
test. There's a FLAKY moniker which we should use for FLAKY test. FLAKY test fail spuriously.
So they might run green for a couple of times and then fail, and then go green for a couple
more times. They just fail on and off. There's also FAILS which is used for test that fail
continuously. The infrastructure behind FLAKY and FAILS is actually exactly the same. The
benefit to the developers is--are that you know right when you look at the code, "Okay,
this test is always failing. There's some error that's continuously happening," or,
"There's some sort of flake in the system. It's unknown but we're going to have to look
for a FLAKY issue here." There's a way to disable or to change the monikers for several
test at one time. So, say, you have a test that's--or several test cases that are only
failing or crashing on Mac OS. You can use the SKIP_MACOS macro which will take a test
name and then to say disabled_test, and you use those two pound marks to actually put
the name of the test in there. And then for every test that's failing for that platform,
you say SKIP_MACOS and then the name of the test. One thing you don't want to do is if
there is a Valgrind or a ThreadSanitizer error, you want to make sure that you don't necessarily
disable it at the code level if at all possible. For example, if you have a leak and the Valgrind
bot is red because of your leak, you shouldn't disable your test. You should either add a
suppression, which is most likely for a leak, or if the test is crashing when only run under
Valgrind, you should go into the Valgrind files, which they have a list of test that
should not be run just for Valgrind and ThreadSanitizer, and you should add your test to that list.
So in summation, we need to write a lot of tests. We need to then write some more tests.
We have a lot of tools that you can use, make the most of them, especially Valgrind, ThreadSanitizer.
If you have a failing test that is failing under one of these tools use that tool locally
to debug in. We need to take care of coverage to make sure that our features are as covered
as possible for the things that they need to be covered for. And whenever a test is
failing, really think about whether you should be disabling it or marking it as FAILS or
FLAKY so that we keep our coverage numbers up. So that's about it. Do we have any questions?
Okay. Thank you very much.