Tip:
Highlight text to annotate it
X
>>Tony Voellm: Welcome back. Hopefully, you are all rested and ready. We are in the final
sprint. We have three more talks here. The first talk
that's up is going to be done by Katerina Goseva-Popstojanova, and she is from West
Virginia University, and she is be speaking to us about empirical investigation of software
product line quality. And, you know, the best question I guess this time or maybe even the
first question at the end of this talk is going to earn baby John-Oh up here, and some
people have told me their kids have named this Andy and lots of other very cute little
names. So with that, Katerina.
>>Katerina Goseva-Popstojanova: Thank you. First of all, I want to say it's a pleasure
to be here. I have been enjoying all the talks yesterday and today.
I am associate professor of computer science at West Virginia University, and I have been
doing research related to software quality assurance, testing software reliability for
over ten years now. This specific talk is on empirical investigation
of software product line quality, and I would like to acknowledge the contributors. Tom
Devine is my graduate student at WVU. Robyn Lutz is faculty and her Ph.D. student at Iowa
state University, and Jenny Li from Avaya Labs. This work is funded by the National
Science Foundation. Before I go into details about what software
product lines is and a couple of other things, I felt that seems appropriate to give a little
bit more motivation for the talk that I'm giving today. So basically what this is about
is trying to find the patterns in the data that can help getting better idea of how good
the quality of our product is, how good predictions we can make for the future and how we can
benefit to make both software testing more effective and efficient as well as the post
release quality of the product being better. So it's about data. There is so much data
out there from which we can learn different patterns to have these benefits from.
The specific data that we have been looking in this and some other work is defect or software
bug and change repositories that everybody that develops software keeps, then into the
software code repositories, version control systems that also keep logs of all changes
that has been made into code metrics and many others. So the deal is what are the metrics
that are representative to show us what are the most fault-prone parts or change-prone
parts of our system and how we can use that to make the testing more efficient as well
as geared towards the most fault-prone part of the system as well as the (indiscernible)
quality of our products to be better. Now, the specific topic of this talk and of
our project is on software product lines. So basically a software product line is a
family of products that tries in a more systematic way to use commonalities that are shared among
the products and have a well defined set of reliabilities. A cruise control is a good
example to illustrate a software product line. You have basic common functionality of cruise
control software no matter what vehicle you are using it on, but then you have specifics
for different vehicles that are variabilities. I would think Android is also another good
example of a product line in which you have a core functionality of a mobile operating
system, but then you have all these different hardware platforms and many other configuration-related
issues that you can define as variabilities. So the whole goal of this more systematic
reuse is to mitigate the production cost and improve the quality.
And this slide here tries in a graphical way to represent what we are looking at. Every
software nowadays goes through multiple releases. Actually, we have been hearing this a couple
of days even more than just well-defined releases every six weeks of -- or every year. It's,
in a way, a continuous evaluation through different builds and deployment.
Now, software product lines has this other line of evaluation through multiple products
where you introduce a new product in your software product line and not all products
exist from the beginning. Some can come later. There is going to be a next hardware phone
by Samsung, for example, and you would need to have an Android that is going to run on
that phone. So what is the basic motivation is to try
to see does the systematic reuse in software product lines provide some measurable benefits?
Is it really what we expect to see? There are studies that exist that show the
benefit of reuse in a other context but not in a software product line context and that's
why we're doing this. Basically, our project is an evidence-based research so we try to
learn from data to do assessment and predictions. And since evidence based, you have to deal
with case studies. One of the case studies is a medium size industrial
product line called PolyFlow, and another is a large evolving open source product line,
Eclipse. I don't think that Eclipse needs any introduction. Later on, we will see why
it fits in the concept of a product software lines.
Now, the main research questions, actually I have more research questions along the way,
is first does the software product line development really benefit the quality in the way it's
expected? And the second is do the structural reuse
and high degree of commonality or shared code allows us to make better, more accurate prediction
for a future faults based on the previously experienced faults, change metrics and source
code metrics? And I will get into more details about the
metrics later on. Now, just a little introduction about the
case study 1. This is stuff that we have presented at last year ICST, software testing verification
and validation conference in Montreal. PolyFlow is a product line of a software testing
tools which is developed by Avaya corporation. Jenny Li is our collaborator who was kind
enough to provide us the repository of the source code as well as our change and back-tracking
system data. This, we looked at four products that together
consist of 42 components and have around 65,000 lines of code. So it's not really very big.
How it is a product line, this is actually tool for testing software that they are developing
in-house but it's supposed to work for different operating systems for different languages.
So you will have parcel for Java but different parcel for C++. So there are commonalities
in different abilities. And in this specific case study we only looked
into the pre-release faults. By faulty mean a bug that you have in the system. It's either
in your code or in your data or in any condition that when triggered leads to a failure.
And in this case, the post release information wasn't available. This is still in a stage
of development. The next slide here gives you a little bit
of an idea of these four products. You see the components. There is quite a lot of shared
code there, and the lines of code per product. And you also see the timelines of the development
of the products. Like product one and two start developing in parallel, and then product
three kicked after product one was developed, and product five after some more time.
So it's -- that another dimension of evolution of introducing new products that share commonalities
with the existing ones but also have variabilities that are code that are specific to them.
Now, the first thing that we actually learned from this study was, in a way, not very surprising
to us. Maybe it's not for most of you. Based on the whole ideas and books that exist on
software product lines, it was, in a way, saying you have a well defined set of commonalities
and each product has their own variabilities. It appears that it's a much more complex picture
than that and this Venn diagram is probably the best way to represent it.
So the circles are the four products, P1, P2, P3, P4, and in the center you see the
products that are shared among all. Then in the adjacent areas we see a number of components
that are shared between three products, and then we see some that are shared between two
products which we call low-reuse variation components and others that are used in only
one. So it's pretty much more complex than all
in variability. So there is different levels of reuse, and some are higher reuse, which
means used in all or some but not all, and others are just single use.
Another -- These are the metrics for this product line. The metrics are sometimes, in
a way, predetermined but I what has been kept for that specific product.
So what we are seeing here is we collected two types of metrics, code metrics and change
metrics. I would want to make a statement here that
a lot of related work earlier on trying to predict the fault-prone parts of the software
was based only on collecting code metrics because they're easy to collect. You just
run a tool and you get the metrics. Well, change metrics are somewhat harder because
you have to go through the history and see what has been changed and how has been changed.
So we see some change metrics here on the right side, such as code churn, which is number
of lines he had added and number of lines deleted. File churn is when new files are
introduced or files are deleted. As well as improvements and new features. They actually
kept track of -- good track of what change request was done for improvement, what was
for a new feature and what change request was for fixing faults.
Again, this is only for prerelease faults at that you -- bugs that when you find when
test your software. Now, each of these results have two parts.
One is the assessment, knowing what's your current state. And the second is the prediction.
Can we predict what's going to happen in the future.
So for the assessment part, our first question was is the number of prerelease faults correlated
with any of the gathered metrics? The deal here is I would like to make the
comment of -- that Ari made yesterday in his keynote, if you can't measure, you can't control
it. If you do automated testing, there's so much different logs that the systems -- you
keep in your systems, and after all, you have your version control so you can extract what
has been changed and when. You have your code and so on and so forth, but really what's
the good prediction and what's correlated with how fault prone your software is.
Now, why this is important, historically people were trying to come up with recommendations
to run -- to write a good code that will say don't write functions that are more than 50
lines of code and 100 cyclometric complexity because they tend to be more fault prone.
But in the community of modeling things, it's just garbage in, garbage out. If your metrics
are not good, your predictions may not be good either.
So if we look through more metrics, we actually can see here -- Is there any way I can see
the slides here on this monitor? Some or -- Okay. I will try -- some are too far for me to read
specific -- actually, yeah. I can use that. So the most important part of this slide is
the first line where we see how these different metrics are correlated with the number of
prerelease faults with the bugs that were detected while testing this software.
And the very general conclusion is at the bottom in the red. Prerelease faults that
you find when you test the software are highly correlated with change metrics, more highly
correlated with change metrics than with source code metrics. So we see that, for example,
those components that have higher new features metric had more faults. The correlation is
0.76. So that's very iterative; right? The more new things you introduce, the more faults
you introduce likely. You see the code churn is also very high.
How much the code has been changed and the lines deleted and the lines extracted. And
the lines of code and cyclometric complexity are really much lower correlation.
Now, why this is important, for a long time there was this really very intuitive belief,
if your code is more complex, and the cyclometric complexity is number of independent parts
to your code, it's likely to be more fault prone or more buggy. There is a positive correlation
but it's much lower than with the change metrics. And in a way, this actually confirms in the
product line context some of the results that were in the related work that start extracting
other metrics than only code metrics for the sake of assessment and prediction of fault
proneness. The second question I'm sure you have experienced
it is just proving what we all tend to know, that a small set of components contain majority
of prerelease faults. And as you can see on this graph, on X axis of the graph we have
the percentage of the components from most fault prone to least fault prone, and you
see that in 20% of the components, we see 80% of the faults that you find.
So it's a very skewed distribution. Many things are fine, but you have some, 20% that are
really very fault prone, and that's some sort of a heavy tail distribution.
And this is what actually motivates the whole work. If we can predict what these are ahead
of time, then we will be much more efficient into allocating our resources. When it comes
to those -- that graph when you want to run all these tests, you just start exponentially
requesting exponential growth in your resources. Now, the next question is -- does all these
things, fault proneness and change proneness change with the level of reuse? Remember,
there was another I would say anecdotal belief that the commonalities are not going to change
much if you design your system as a product line because that's well designed core functionality
that's going to stay stable. Now, there is a lot of bars on this graph,
but if you look at the rightmost side, you see the code churn per 10,000 lines of code,
the darkest one for the commonalities high reuse, low reuse, and single reuse. And you
see actually that the normalized code churn or how much things change appears to be highest
for the commonalities than for anything else. Basically, new features are added and also
commonalities keep (indiscernible) to the rest of the system and to the introduction
of the new product. Now, if you look at the middle graph and look
at the fault per kilo lines of code which is the fault density or how many bugs you
see per thousand lines of code, we see at the commonalities part, the middle line, that
it's actually fairly low. So even though they keep changing, the fault density remains to
be fairly stable, which in a way indicates that there is a benefit from having this reuse
in place. I will go through this slide much faster.
There is some numbers here that try to show do we benefit from product lines. And we see,
actually, for the later introduced products, like P3 and P4, a lot of faults that are previously
fixed are in shared part so we actually do benefit from reuse, and only see very few
faults that are in the newly developed parts of the system.
Now, the prediction parts of this study is, I would say, very small because it's much
smaller study, and we only have couple of new components to look at. But it was a good
motivation for what's following. And basically we see when we learn our models using simple
linear regression on the existing products, P1 and P3, and try to predict the number of
faults that are going to be there. In P3 and P4, we were successful. The one component
in P3 didn't have any faults and the other one has only one. So this is too small of
a sample to really show that this is a good prediction. And I am going to discuss more
on the next case study, which is a much larger one.
This is on Eclipse. When you work in academia and you don't own the data of software development,
then open source is almost like a promised land and Eclipse is an open source and a back-tracking
system and everything available. So what follows here is actually what we did
as a team, and this is a paper that we submitted to a journal and it's currently under review.
So there is some basic facts about Eclipse. It is a very large product line. Its product
line, because it has parts that work for different -- different -- has different members. The
ones that we looked at are classic, C, C++, Java, and JavaEE. These four products together
have 125,000 files and 20 million lines of code. And we looked at the evolution through
seven releases to see what are the trends. And unlike the previous study, our focus here
was actually on post release faults, which is what the users found after the product
-- products were released. This table here gives you an idea of the timeline.
So in the first couple of releases, there was only one product, classic, in Europa,
which is 3.3. Eclipse starts looking like a product line. Remember that graph that I
show you in the beginning. So we now have Classic, C/C++, Java, and JavaEE, that all
keep evolving through releases. And we here look specifically into the yearly big release
of Eclipse. Now, I'm inclined to point out things that
are consistent. Here we also saw that big picture of different degrees of reuse, so
there is some core of functionality shared by all, but there is also, like, 16 files
that are shared between -- here, these are packages. Classic Java and JavaEE or maybe
75 that are used only in JavaEE. So this just gives you the picture of the
evolution through releases Europa, Ganymede, Galileo and Helios for these four products.
Metrics that we collected here. This is not an easy task. I must say that the students
put so much effort and so much scripting into it. You have the back-tracking system, but
you have to do your own work to link that to the changes that have been made to the
code from this, in this case, CVS repository to get your change metrics.
We collected change metrics six months prior to release, code metrics on the day of the
release, and we predict post release faults. Six months after release all these bugs that
were find based -- find by the customers. So here are the metrics. It's much more metrics
here because we have more things available. On the left side, you will see the code metrics
such as lines of code, number of statements, percentage of branch statements, method codes
and so on. On the right side you see the change metrics
such as number of revisions that have been made to each. This is on a package level.
Refactoring that has been made. Bug fixes, number of times a given package has been changed.
Prerelease for fixing bugs. Authors, number of people that were participating into revisions,
number of lines code added, number of lines of code deleted. Of course I don't have time
to go through all metrics but just its age, how old it is. Is it introduced now or it's
something that exists 50 weeks, its age in weeks? You versus all.
So the whole point here is to go again through the assessment and the prediction part. And
there's probably too many slides to cover them all in detail, especially because I went
with little bit longer introduction. But I will try to give you the main ideas here.
Because Eclipse evolves through releases, here we are trying to see does the quality
improve as the product line matures from one release to another. The graph on the left
side shows the box plots for the post release bugs that are find by customers for each of
the releases, and you see that the million decreases as well as the variability decreases,
so there is an indication of improved quality, although there is new code added all the time.
Especially in Europa there is a huge amount of new code added because there were three
new products introduced. The graph on the right side shows you that
trend that we actually statistically tested by Kruskal�Wallis and then the Pirozhok
test that shows that there is statistical significant difference of the decreasing trends
of the full density. So there is new code but as it keeps evolving and maturing, the
quality improves because the customers see less faults per thousand lines of code.
Remember that graph with where the faults are? It's actually shows to be true here.
This shows for releases from 2.0 to HELIOS that from 66 to 93% of post release faults
were only again in 20% of the packages with an average of 81.
So it's only 20% of your packages that have 80% of all your post release bugs.
Another thing that we explored here is does the product line benefits from the reuse,
really? For three of the releases. I'm not showing the code churn graphs because there's
not enough time to go through all details, but you see here all the -- again, for different
levels of reuse, as in the previous study, one is used in only one product, two is used
in two products, and four is used in all four. You actually see although two releases, there's
a lot of code added. The post release fault density is very stable and very low.
So basically, anything that was preexisting tend to have stable fault density.
Now, for the newly developed packages that are not existing in the previous release,
the situation is really not that clear. You see some bars are much larger than others
so it's a lot of variability. Some, like in the last graph used in only
one component. In Helios we have 18 different packages and only two faults, but on the other
side, those that were reused between Java and JavaEE had a lot of faults.
A general idea here seems to be that things that are shared between two products tend
to be more fault prone than those that are used only in one. Maybe because of adjustments
needed. But this is not large enough sample here to -- there's not too many new packages
to really draw some conclusions. And by that method, this is a characteristic
of a long tail, heavy tail distribution. You have many good things and then one or two
that are really, really very bad. Same lining with insurance claims. You have
many small insurance claims, but one or two that are really, really very bad. And what
we care about here is to be able to predict those that are really bad, because that's
how we can efficiently put our resources. And more importantly, for this study what
we are trying to do, as I said from the metrics, we are trying to use prerelease data -- actually,
we are building a model on the previous release, which is machine learning model. We specifically
used generalized linear regression models to predict number of prerelease faults on
the next release from the prerelease data. So it's pretty much before you release your
software, can you tell what are going to be the most fault prone parts of it. And if you
can, then how you do the testing more efficiently to prevent that.
We built 19 different models for each product that we had through releases, and then we
used to predict for faults on all releases there.
Now, this is an interesting graph that gives you an idea of what we are doing here. It's
called Alberg diagram. On the X axis you see the percentage of packages in which the faults
were find, ordered by the most fault prone to least fault prone.
So the ones at the left are the ones where most of the bugs, post release bugs are.
The full line is the actual bugs, and the dotted line are the predictions we are making
based on our models trained on the previous release. So we built the model on 2.0, tried
to predict post release bugs from prerelease data for 2.1. And the closer these two lines
are, the better our predictions are. The vertical line of 20% is actually that
magical weight 80/20% because it appears that 20% of your code has 80% of your bugs. Then
the question is can we predict the number of faults and packages that are going to appear
into 20% of the most fault prone packages. That is a line that can be anywhere in this
graph, but 20% is the one we chose based on the data of 80/20 rule.
This here now shows the predictions. So I will probably need a little bit time
to explain what this is about. So this is so-called heat map in which you see the darker
cells are the better results, and then on this graph, at the bottom you see the releases
whereby we make the predictions 2.1, 3.0, Europa and so on.
On the left side, C/C++, Classic, Java, and javaEE are the products on which we make predictions
and on the top you see the products on which we build the models.
Now, for the time being, we don't look into any sort of patterns. What we look at is how
good the predictions are. No matter on what they are trained and where we do the predictions.
So it appeared that these models were able to predict from 76% to 97% of where the faults
are in those 20% of most fault packages which is fairly good result.
There are two exceptions, and here I would like to make a note that we wanted to make
this as practical as possible in the sense that we didn't take any outliers out, basically
because the outliers that have most of the bugs are actually the most interesting that
you want to find. Our predictions would have been more accurate, but not really very realistic.
So the light areas are where the predictions weren't good, and we went to look at why that
was the case in the raw data. On the C/C++, anything that we trained on
C/C++ appeared not to make good predictions because there was one very high faulty package
in the previous release that, as an outlier, make the generalized regression models not
work very well. The other light area is on the rightmost side
of prediction made on Java and JavaEE. That is only because of one package that is shared
between Java, JavaEE which was again an outlier with having a large number of faults. And
the numbers were predicting the number fairly accurately but weren't placing these among
the 20% of topmost ranked packages. So they were missing this package, and that's
how the predictions weren't so good. Now, here I will generalize a little bit the
question and say can we benefit from the information about additional products when we make predictions?
And as you can see on this graph, if there's no benefit of the additional information for
predicting your fault-prone packages, then what's on the diagonal is supposed to be the
best value, and that's almost never the case. If we want to look at some sort of patterns,
here it appears that whenever we try to make predictions on a smaller product, such as
C/C++ and Classic, which mainly consist of common components, then those predictions
are better. Now, with respect to where you build and train
your models, if you build the models on a larger product, such a Java and JavaEE, then
the predictions were also better. I will just quickly talk about something that's
really very interesting. We did feature selection to find out what are the most better predictive
-- the best measures that are having good predictive power of where your bugs are.
We used a stepwise regression for feature selection or machine learning method. And
if you see for our models, anywhere between one or up to 16 features were selected out
of 112 features. Basically, this says that only very few features
matter and have enough information to predict what's going to be fault prone.
This diagram here shows the frequency of different features that have been selected by different
models. And look at the features at the top that are used in highest number of models.
They are all change metrics. Total bugs and maximum -- total bug fixes and maximum bug
fixes, total authors, and then total code churn, total revisions.
Out of the first 15 metrics, only four are code metrics.
So this simply says if you want to predict what's most fault-prone part, you better use
change metrics than simple code metrics. Code complexity didn't appear to be very good
at all, and it was chosen only by one model. So if we want to -- we also looked at the
correlation between each metric with our response variable, which is post release bugs. But
if you want to tell this whole machine learning part as a simple story, it really sounds very
intuitive. The most -- The better -- the best predictors
of what's most fault prone in number of post release bugs are packages that has a lot of
bug fixes prerelease that has been handled by many authors that has a lot of code churn,
added and deleted lines of code, as well as many revisions.
Of the static code metrics, only four, such as maximum statements at level one and four
and maximum method call statements, were among the first 15 . Lines of code, complexity don't
appear to be a good predictor of where your bugs are going to be.
And lessons learned very fast, so I can allow some time for questions, these are very high-level
lessons learned that are consistent across both studies. There is a wide spectrum of
reuse levels, commonalities, but things that are used in some but not all, as well as variabilities
used in only one product. Then we found that both prerelease and post
release fault have very skewed distributions, which means very -- a large number of good
models and components but very few, 20%, that have most of your bugs, 80%.
Lesson three, although preexisting, old, packages, including those shared among common products,
continually changed, they retained low fault densities which showed the benefit of reuse.
Both prerelease and post release faults are more highly correlated with change metrics
than with static code metrics. So if you want predictions, those are the metrics that you
should look at. Predictions of both prerelease and post release
faults can be done accurately from prerelease data. This is the most important question.
Can I from the data that I collect during testing, in-house testing, find out how good
my product is going to be and predict where the problems are after the release?
And the predictions benefited from information on additional products. Remember, training
on larger products give better results. And revisiting those two big questions of
the systematic reuse in a software product line context with respect to the quality,
products benefited from faults that were fixed in reused components and packages and with
respect to the fault proneness prediction, predictions also benefited from the additional
product line information. So if you have data, you better use it. Your predictions are going
to be better. I hope I was able to convey the main idea
of you do automated testing, but you also can collect a lot of metrics and do automated
predictions of where the problems are now or going to be after you release the product.
So I would say that's one sentence that summarizes the general idea of my talk.
And I will be glad to answer any questions. [ Applause ]
>>Tony Voellm: That was a very fascinating talk. Thank you, Katerina. I learned two very
important lessons. Number one, Venn diagrams are beautiful. And garbage in, garbage out.
In all seriousness, one of the ideas I've been exploring for a while is this idea of
the behavior around the code is more important than the code itself in terms of predicting
failures. And you have scientifically nailed it. So
congratulations. That is quite awesome. So thank you.
>>Kostya Serebryany: >>Katerina Goseva-Popstojanova: Thank you.
[ Applause ] >>Tony Voellm: So with that, I think we're
going to go to questions. I think you were up first. So, please, this
way. Who you are, where you're from, and your question, please.
>>> Narindra from Audible, slash, Amazon. So did you normalize your data based on the
use cases, so certain packages are used more than others, like Java is used more than C++,
I hope? So did you take that into account? >>Katerina Goseva-Popstojanova: No. No. But
it's a very good point in that sense that the information about usage of the data is
not available when you deal with an open source. And I don't know whether it's very hard to
get the information. So that is possible. But pay attention that we don't deal with
failures here of how often things fail. We are dealing with faults, which means underlying
reasons for a failure. Now, you are right in the sense that if you
use something more often, it's more likely to hit the place where the bug is. But that's
one of the metrics that's almost impossible to collect, especially if you are a third
party. And for many products, would you know what the usage of Chrome is across the board?
So, no, it will be good to know. But it's high metric, very hard to get.
>>Tony Voellm: That was a great question. And since you work at Amazon, please put this
on your desk. [ Laughter ]
>>Tony Voellm: With that, we have another question over here. Please.
>>> Erin McGowan from Rackspace. For your case study one, did you include any
of the defects found in developer unit testing before it actually went into traditional test?
>>Katerina Goseva-Popstojanova: For the case study one, actually, yes. Actually, that case
study, because we had access and our coauthor was one of the developers, Dr. Jenny Li, from
Avaya. We actually even had bugs that were in the early design stage there. But we only
kept through -- and that's explained in the paper in detail -- we only kept code-related
bugs, because that's what we tried to predict here from the code. And, yes, some were unit
testing, some were system testing. For Eclipse, we only have bugs that are into
bug-tracking system, which -- for the prerelease, which unlikely that have unit tests.
>>Tony Voellm: Thank you. Okay. Sure, next question up, please.
>>> All right. Hi, I'm Dylan Salisbury from Google.
This is a really exciting topic, and it brought to mind the time at a previous company when
I remember real clearly a program that had, like, the bad module that was the source of
all the bugs, and it had a lot of the characteristics that you described.
And you suggested that you -- for another product, you may be able to use this to decide
when to focus your testing on before release. What I remember from my experience is that
this module was very untestable, you know, because of the things you talked about. It
had poor interfaces, it was very core to the -- there was no way to write small tests for
it because it had been baked throughout the product.
And the way it was finally improved was that a single engineer went in and refactored it,
the whole thing, to make it testable. So I just want to suggest if make a takeaway
that something that we kind of intuitively feel but this might provide evidence for is
that even before release, it may be targeting one of these modules to be -- not just do
more testing, but even refactoring or rewriting to make -- to allow more testing.
>>Katerina Goseva-Popstojanova: This is a really very good point, because these metrics
that I discussed in addition to being good predictors of where your faulty, most fault-prone
parts of the system are, that gives you other ideas as well.
One of the static code metrics was number of method calls. That was highly correlated
with the post-release bug. So if you see a model that has plenty of those,
then it's a model that may need more refactoring. Another thing is to look at things that have
high cochurn. Why is the reason? You know, the CVS keeps very good track of what has
been committed and when. So why some change proneness can be -- and we have done some
work which we -- there's no time to present here. It's another topic. Why things tend
to change is also important, because we see that the change also introduces fault.
So the point that I'm trying to make, what you are saying is right. Some metrics on your
own can give you an idea of how your code is. And I go back to what Ari said. If you
cannot measure it, you cannot control it. And when you guys automate everything, then
keep as much data as you can that then can be digested with some sort of machine learning
and other methods to really find out more about your code and how to improve it.
>>Tony Voellm: Thank you for your question. We have time for one more, and it's about
a 30-second question plus answer. And I think you were up here first. So, please.
>>> Nachum from Google. Did you find any correlation between contributors,
individual contributors, and faults? [ Laughter ]
>>Katerina Goseva-Popstojanova: You want to put the blame for the faults to people?
>>> Absolutely. Always. Yes. >>Katerina Goseva-Popstojanova: We only looked
at the metric that was extracting the number of authors that has been involved into refactoring
and all kinds of things. I know of some other work by companies that
do own the data and own information of communication between developers or testers that try to
figure out other metrics, such as how often people tend to communicate about specific
bug or module or component or how that reflect. And certainly if you own the data, you can
attribute things to people. And if that's what the goal is, if you want to post somebody
on that blackboard, on a -- >>Tony Voellm: Yeah, attribution is definitely
perilous. I actually did that at a time at another company, and I was a popular guy,
but not the kind of popular you want. So thank you, Katerina. That was great.
[ Applause ]