Gtac 2013 - Empirical investigation of software product line quality

>>Tony Voellm: Welcome back. Hopefully, you are all rested and ready. We are in the final sprint. We have three more talks here. The first talk that's up is going to be done by Katerina Goseva-Popstojanova, and she is from West Virginia University, and she is be speaking to us about empirical investigation of software product line quality. And, you know, the best question I guess this time or maybe even the first question at the end of this talk is going to earn baby John-Oh up here, and some people have told me their kids have named this Andy and lots of other very cute little names. So with that, Katerina. >>Katerina Goseva-Popstojanova: Thank you. First of all, I want to say it's a pleasure to be here. I have been enjoying all the talks yesterday and today. I am associate professor of computer science at West Virginia University, and I have been doing research related to software quality assurance, testing software reliability for over ten years now. This specific talk is on empirical investigation of software product line quality, and I would like to acknowledge the contributors. Tom Devine is my graduate student at WVU. Robyn Lutz is faculty and her Ph.D. student at Iowa state University, and Jenny Li from Avaya Labs. This work is funded by the National Science Foundation. Before I go into details about what software product lines is and a couple of other things, I felt that seems appropriate to give a little bit more motivation for the talk that I'm giving today. So basically what this is about is trying to find the patterns in the data that can help getting better idea of how good the quality of our product is, how good predictions we can make for the future and how we can benefit to make both software testing more effective and efficient as well as the post release quality of the product being better. So it's about data. There is so much data out there from which we can learn different patterns to have these benefits from. The specific data that we have been looking in this and some other work is defect or software bug and change repositories that everybody that develops software keeps, then into the software code repositories, version control systems that also keep logs of all changes that has been made into code metrics and many others. So the deal is what are the metrics that are representative to show us what are the most fault-prone parts or change-prone parts of our system and how we can use that to make the testing more efficient as well as geared towards the most fault-prone part of the system as well as the (indiscernible) quality of our products to be better. Now, the specific topic of this talk and of our project is on software product lines. So basically a software product line is a family of products that tries in a more systematic way to use commonalities that are shared among the products and have a well defined set of reliabilities. A cruise control is a good example to illustrate a software product line. You have basic common functionality of cruise control software no matter what vehicle you are using it on, but then you have specifics for different vehicles that are variabilities. I would think Android is also another good example of a product line in which you have a core functionality of a mobile operating system, but then you have all these different hardware platforms and many other configuration-related issues that you can define as variabilities. So the whole goal of this more systematic reuse is to mitigate the production cost and improve the quality. And this slide here tries in a graphical way to represent what we are looking at. Every software nowadays goes through multiple releases. Actually, we have been hearing this a couple of days even more than just well-defined releases every six weeks of -- or every year. It's, in a way, a continuous evaluation through different builds and deployment. Now, software product lines has this other line of evaluation through multiple products where you introduce a new product in your software product line and not all products exist from the beginning. Some can come later. There is going to be a next hardware phone by Samsung, for example, and you would need to have an Android that is going to run on that phone. So what is the basic motivation is to try to see does the systematic reuse in software product lines provide some measurable benefits? Is it really what we expect to see? There are studies that exist that show the benefit of reuse in a other context but not in a software product line context and that's why we're doing this. Basically, our project is an evidence-based research so we try to learn from data to do assessment and predictions. And since evidence based, you have to deal with case studies. One of the case studies is a medium size industrial product line called PolyFlow, and another is a large evolving open source product line, Eclipse. I don't think that Eclipse needs any introduction. Later on, we will see why it fits in the concept of a product software lines. Now, the main research questions, actually I have more research questions along the way, is first does the software product line development really benefit the quality in the way it's expected? And the second is do the structural reuse and high degree of commonality or shared code allows us to make better, more accurate prediction for a future faults based on the previously experienced faults, change metrics and source code metrics? And I will get into more details about the metrics later on. Now, just a little introduction about the case study 1. This is stuff that we have presented at last year ICST, software testing verification and validation conference in Montreal. PolyFlow is a product line of a software testing tools which is developed by Avaya corporation. Jenny Li is our collaborator who was kind enough to provide us the repository of the source code as well as our change and back-tracking system data. This, we looked at four products that together consist of 42 components and have around 65,000 lines of code. So it's not really very big. How it is a product line, this is actually tool for testing software that they are developing in-house but it's supposed to work for different operating systems for different languages. So you will have parcel for Java but different parcel for C++. So there are commonalities in different abilities. And in this specific case study we only looked into the pre-release faults. By faulty mean a bug that you have in the system. It's either in your code or in your data or in any condition that when triggered leads to a failure. And in this case, the post release information wasn't available. This is still in a stage of development. The next slide here gives you a little bit of an idea of these four products. You see the components. There is quite a lot of shared code there, and the lines of code per product. And you also see the timelines of the development of the products. Like product one and two start developing in parallel, and then product three kicked after product one was developed, and product five after some more time. So it's -- that another dimension of evolution of introducing new products that share commonalities with the existing ones but also have variabilities that are code that are specific to them. Now, the first thing that we actually learned from this study was, in a way, not very surprising to us. Maybe it's not for most of you. Based on the whole ideas and books that exist on software product lines, it was, in a way, saying you have a well defined set of commonalities and each product has their own variabilities. It appears that it's a much more complex picture than that and this Venn diagram is probably the best way to represent it. So the circles are the four products, P1, P2, P3, P4, and in the center you see the products that are shared among all. Then in the adjacent areas we see a number of components that are shared between three products, and then we see some that are shared between two products which we call low-reuse variation components and others that are used in only one. So it's pretty much more complex than all in variability. So there is different levels of reuse, and some are higher reuse, which means used in all or some but not all, and others are just single use. Another -- These are the metrics for this product line. The metrics are sometimes, in a way, predetermined but I what has been kept for that specific product. So what we are seeing here is we collected two types of metrics, code metrics and change metrics. I would want to make a statement here that a lot of related work earlier on trying to predict the fault-prone parts of the software was based only on collecting code metrics because they're easy to collect. You just run a tool and you get the metrics. Well, change metrics are somewhat harder because you have to go through the history and see what has been changed and how has been changed. So we see some change metrics here on the right side, such as code churn, which is number of lines he had added and number of lines deleted. File churn is when new files are introduced or files are deleted. As well as improvements and new features. They actually kept track of -- good track of what change request was done for improvement, what was for a new feature and what change request was for fixing faults. Again, this is only for prerelease faults at that you -- bugs that when you find when test your software. Now, each of these results have two parts. One is the assessment, knowing what's your current state. And the second is the prediction. Can we predict what's going to happen in the future. So for the assessment part, our first question was is the number of prerelease faults correlated with any of the gathered metrics? The deal here is I would like to make the comment of -- that Ari made yesterday in his keynote, if you can't measure, you can't control it. If you do automated testing, there's so much different logs that the systems -- you keep in your systems, and after all, you have your version control so you can extract what has been changed and when. You have your code and so on and so forth, but really what's the good prediction and what's correlated with how fault prone your software is. Now, why this is important, historically people were trying to come up with recommendations to run -- to write a good code that will say don't write functions that are more than 50 lines of code and 100 cyclometric complexity because they tend to be more fault prone. But in the community of modeling things, it's just garbage in, garbage out. If your metrics are not good, your predictions may not be good either. So if we look through more metrics, we actually can see here -- Is there any way I can see the slides here on this monitor? Some or -- Okay. I will try -- some are too far for me to read specific -- actually, yeah. I can use that. So the most important part of this slide is the first line where we see how these different metrics are correlated with the number of prerelease faults with the bugs that were detected while testing this software. And the very general conclusion is at the bottom in the red. Prerelease faults that you find when you test the software are highly correlated with change metrics, more highly correlated with change metrics than with source code metrics. So we see that, for example, those components that have higher new features metric had more faults. The correlation is 0.76. So that's very iterative; right? The more new things you introduce, the more faults you introduce likely. You see the code churn is also very high. How much the code has been changed and the lines deleted and the lines extracted. And the lines of code and cyclometric complexity are really much lower correlation. Now, why this is important, for a long time there was this really very intuitive belief, if your code is more complex, and the cyclometric complexity is number of independent parts to your code, it's likely to be more fault prone or more buggy. There is a positive correlation but it's much lower than with the change metrics. And in a way, this actually confirms in the product line context some of the results that were in the related work that start extracting other metrics than only code metrics for the sake of assessment and prediction of fault proneness. The second question I'm sure you have experienced it is just proving what we all tend to know, that a small set of components contain majority of prerelease faults. And as you can see on this graph, on X axis of the graph we have the percentage of the components from most fault prone to least fault prone, and you see that in 20% of the components, we see 80% of the faults that you find. So it's a very skewed distribution. Many things are fine, but you have some, 20% that are really very fault prone, and that's some sort of a heavy tail distribution. And this is what actually motivates the whole work. If we can predict what these are ahead of time, then we will be much more efficient into allocating our resources. When it comes to those -- that graph when you want to run all these tests, you just start exponentially requesting exponential growth in your resources. Now, the next question is -- does all these things, fault proneness and change proneness change with the level of reuse? Remember, there was another I would say anecdotal belief that the commonalities are not going to change much if you design your system as a product line because that's well designed core functionality that's going to stay stable. Now, there is a lot of bars on this graph, but if you look at the rightmost side, you see the code churn per 10,000 lines of code, the darkest one for the commonalities high reuse, low reuse, and single reuse. And you see actually that the normalized code churn or how much things change appears to be highest for the commonalities than for anything else. Basically, new features are added and also commonalities keep (indiscernible) to the rest of the system and to the introduction of the new product. Now, if you look at the middle graph and look at the fault per kilo lines of code which is the fault density or how many bugs you see per thousand lines of code, we see at the commonalities part, the middle line, that it's actually fairly low. So even though they keep changing, the fault density remains to be fairly stable, which in a way indicates that there is a benefit from having this reuse in place. I will go through this slide much faster. There is some numbers here that try to show do we benefit from product lines. And we see, actually, for the later introduced products, like P3 and P4, a lot of faults that are previously fixed are in shared part so we actually do benefit from reuse, and only see very few faults that are in the newly developed parts of the system. Now, the prediction parts of this study is, I would say, very small because it's much smaller study, and we only have couple of new components to look at. But it was a good motivation for what's following. And basically we see when we learn our models using simple linear regression on the existing products, P1 and P3, and try to predict the number of faults that are going to be there. In P3 and P4, we were successful. The one component in P3 didn't have any faults and the other one has only one. So this is too small of a sample to really show that this is a good prediction. And I am going to discuss more on the next case study, which is a much larger one. This is on Eclipse. When you work in academia and you don't own the data of software development, then open source is almost like a promised land and Eclipse is an open source and a back-tracking system and everything available. So what follows here is actually what we did as a team, and this is a paper that we submitted to a journal and it's currently under review. So there is some basic facts about Eclipse. It is a very large product line. Its product line, because it has parts that work for different -- different -- has different members. The ones that we looked at are classic, C, C++, Java, and JavaEE. These four products together have 125,000 files and 20 million lines of code. And we looked at the evolution through seven releases to see what are the trends. And unlike the previous study, our focus here was actually on post release faults, which is what the users found after the product -- products were released. This table here gives you an idea of the timeline. So in the first couple of releases, there was only one product, classic, in Europa, which is 3.3. Eclipse starts looking like a product line. Remember that graph that I show you in the beginning. So we now have Classic, C/C++, Java, and JavaEE, that all keep evolving through releases. And we here look specifically into the yearly big release of Eclipse. Now, I'm inclined to point out things that are consistent. Here we also saw that big picture of different degrees of reuse, so there is some core of functionality shared by all, but there is also, like, 16 files that are shared between -- here, these are packages. Classic Java and JavaEE or maybe 75 that are used only in JavaEE. So this just gives you the picture of the evolution through releases Europa, Ganymede, Galileo and Helios for these four products. Metrics that we collected here. This is not an easy task. I must say that the students put so much effort and so much scripting into it. You have the back-tracking system, but you have to do your own work to link that to the changes that have been made to the code from this, in this case, CVS repository to get your change metrics. We collected change metrics six months prior to release, code metrics on the day of the release, and we predict post release faults. Six months after release all these bugs that were find based -- find by the customers. So here are the metrics. It's much more metrics here because we have more things available. On the left side, you will see the code metrics such as lines of code, number of statements, percentage of branch statements, method codes and so on. On the right side you see the change metrics such as number of revisions that have been made to each. This is on a package level. Refactoring that has been made. Bug fixes, number of times a given package has been changed. Prerelease for fixing bugs. Authors, number of people that were participating into revisions, number of lines code added, number of lines of code deleted. Of course I don't have time to go through all metrics but just its age, how old it is. Is it introduced now or it's something that exists 50 weeks, its age in weeks? You versus all. So the whole point here is to go again through the assessment and the prediction part. And there's probably too many slides to cover them all in detail, especially because I went with little bit longer introduction. But I will try to give you the main ideas here. Because Eclipse evolves through releases, here we are trying to see does the quality improve as the product line matures from one release to another. The graph on the left side shows the box plots for the post release bugs that are find by customers for each of the releases, and you see that the million decreases as well as the variability decreases, so there is an indication of improved quality, although there is new code added all the time. Especially in Europa there is a huge amount of new code added because there were three new products introduced. The graph on the right side shows you that trend that we actually statistically tested by Kruskal�Wallis and then the Pirozhok test that shows that there is statistical significant difference of the decreasing trends of the full density. So there is new code but as it keeps evolving and maturing, the quality improves because the customers see less faults per thousand lines of code. Remember that graph with where the faults are? It's actually shows to be true here. This shows for releases from 2.0 to HELIOS that from 66 to 93% of post release faults were only again in 20% of the packages with an average of 81. So it's only 20% of your packages that have 80% of all your post release bugs. Another thing that we explored here is does the product line benefits from the reuse, really? For three of the releases. I'm not showing the code churn graphs because there's not enough time to go through all details, but you see here all the -- again, for different levels of reuse, as in the previous study, one is used in only one product, two is used in two products, and four is used in all four. You actually see although two releases, there's a lot of code added. The post release fault density is very stable and very low. So basically, anything that was preexisting tend to have stable fault density. Now, for the newly developed packages that are not existing in the previous release, the situation is really not that clear. You see some bars are much larger than others so it's a lot of variability. Some, like in the last graph used in only one component. In Helios we have 18 different packages and only two faults, but on the other side, those that were reused between Java and JavaEE had a lot of faults. A general idea here seems to be that things that are shared between two products tend to be more fault prone than those that are used only in one. Maybe because of adjustments needed. But this is not large enough sample here to -- there's not too many new packages to really draw some conclusions. And by that method, this is a characteristic of a long tail, heavy tail distribution. You have many good things and then one or two that are really, really very bad. Same lining with insurance claims. You have many small insurance claims, but one or two that are really, really very bad. And what we care about here is to be able to predict those that are really bad, because that's how we can efficiently put our resources. And more importantly, for this study what we are trying to do, as I said from the metrics, we are trying to use prerelease data -- actually, we are building a model on the previous release, which is machine learning model. We specifically used generalized linear regression models to predict number of prerelease faults on the next release from the prerelease data. So it's pretty much before you release your software, can you tell what are going to be the most fault prone parts of it. And if you can, then how you do the testing more efficiently to prevent that. We built 19 different models for each product that we had through releases, and then we used to predict for faults on all releases there. Now, this is an interesting graph that gives you an idea of what we are doing here. It's called Alberg diagram. On the X axis you see the percentage of packages in which the faults were find, ordered by the most fault prone to least fault prone. So the ones at the left are the ones where most of the bugs, post release bugs are. The full line is the actual bugs, and the dotted line are the predictions we are making based on our models trained on the previous release. So we built the model on 2.0, tried to predict post release bugs from prerelease data for 2.1. And the closer these two lines are, the better our predictions are. The vertical line of 20% is actually that magical weight 80/20% because it appears that 20% of your code has 80% of your bugs. Then the question is can we predict the number of faults and packages that are going to appear into 20% of the most fault prone packages. That is a line that can be anywhere in this graph, but 20% is the one we chose based on the data of 80/20 rule. This here now shows the predictions. So I will probably need a little bit time to explain what this is about. So this is so-called heat map in which you see the darker cells are the better results, and then on this graph, at the bottom you see the releases whereby we make the predictions 2.1, 3.0, Europa and so on. On the left side, C/C++, Classic, Java, and javaEE are the products on which we make predictions and on the top you see the products on which we build the models. Now, for the time being, we don't look into any sort of patterns. What we look at is how good the predictions are. No matter on what they are trained and where we do the predictions. So it appeared that these models were able to predict from 76% to 97% of where the faults are in those 20% of most fault packages which is fairly good result. There are two exceptions, and here I would like to make a note that we wanted to make this as practical as possible in the sense that we didn't take any outliers out, basically because the outliers that have most of the bugs are actually the most interesting that you want to find. Our predictions would have been more accurate, but not really very realistic. So the light areas are where the predictions weren't good, and we went to look at why that was the case in the raw data. On the C/C++, anything that we trained on C/C++ appeared not to make good predictions because there was one very high faulty package in the previous release that, as an outlier, make the generalized regression models not work very well. The other light area is on the rightmost side of prediction made on Java and JavaEE. That is only because of one package that is shared between Java, JavaEE which was again an outlier with having a large number of faults. And the numbers were predicting the number fairly accurately but weren't placing these among the 20% of topmost ranked packages. So they were missing this package, and that's how the predictions weren't so good. Now, here I will generalize a little bit the question and say can we benefit from the information about additional products when we make predictions? And as you can see on this graph, if there's no benefit of the additional information for predicting your fault-prone packages, then what's on the diagonal is supposed to be the best value, and that's almost never the case. If we want to look at some sort of patterns, here it appears that whenever we try to make predictions on a smaller product, such as C/C++ and Classic, which mainly consist of common components, then those predictions are better. Now, with respect to where you build and train your models, if you build the models on a larger product, such a Java and JavaEE, then the predictions were also better. I will just quickly talk about something that's really very interesting. We did feature selection to find out what are the most better predictive -- the best measures that are having good predictive power of where your bugs are. We used a stepwise regression for feature selection or machine learning method. And if you see for our models, anywhere between one or up to 16 features were selected out of 112 features. Basically, this says that only very few features matter and have enough information to predict what's going to be fault prone. This diagram here shows the frequency of different features that have been selected by different models. And look at the features at the top that are used in highest number of models. They are all change metrics. Total bugs and maximum -- total bug fixes and maximum bug fixes, total authors, and then total code churn, total revisions. Out of the first 15 metrics, only four are code metrics. So this simply says if you want to predict what's most fault-prone part, you better use change metrics than simple code metrics. Code complexity didn't appear to be very good at all, and it was chosen only by one model. So if we want to -- we also looked at the correlation between each metric with our response variable, which is post release bugs. But if you want to tell this whole machine learning part as a simple story, it really sounds very intuitive. The most -- The better -- the best predictors of what's most fault prone in number of post release bugs are packages that has a lot of bug fixes prerelease that has been handled by many authors that has a lot of code churn, added and deleted lines of code, as well as many revisions. Of the static code metrics, only four, such as maximum statements at level one and four and maximum method call statements, were among the first 15 . Lines of code, complexity don't appear to be a good predictor of where your bugs are going to be. And lessons learned very fast, so I can allow some time for questions, these are very high-level lessons learned that are consistent across both studies. There is a wide spectrum of reuse levels, commonalities, but things that are used in some but not all, as well as variabilities used in only one product. Then we found that both prerelease and post release fault have very skewed distributions, which means very -- a large number of good models and components but very few, 20%, that have most of your bugs, 80%. Lesson three, although preexisting, old, packages, including those shared among common products, continually changed, they retained low fault densities which showed the benefit of reuse. Both prerelease and post release faults are more highly correlated with change metrics than with static code metrics. So if you want predictions, those are the metrics that you should look at. Predictions of both prerelease and post release faults can be done accurately from prerelease data. This is the most important question. Can I from the data that I collect during testing, in-house testing, find out how good my product is going to be and predict where the problems are after the release? And the predictions benefited from information on additional products. Remember, training on larger products give better results. And revisiting those two big questions of the systematic reuse in a software product line context with respect to the quality, products benefited from faults that were fixed in reused components and packages and with respect to the fault proneness prediction, predictions also benefited from the additional product line information. So if you have data, you better use it. Your predictions are going to be better. I hope I was able to convey the main idea of you do automated testing, but you also can collect a lot of metrics and do automated predictions of where the problems are now or going to be after you release the product. So I would say that's one sentence that summarizes the general idea of my talk. And I will be glad to answer any questions. [ Applause ] >>Tony Voellm: That was a very fascinating talk. Thank you, Katerina. I learned two very important lessons. Number one, Venn diagrams are beautiful. And garbage in, garbage out. In all seriousness, one of the ideas I've been exploring for a while is this idea of the behavior around the code is more important than the code itself in terms of predicting failures. And you have scientifically nailed it. So congratulations. That is quite awesome. So thank you. >>Kostya Serebryany: >>Katerina Goseva-Popstojanova: Thank you. [ Applause ] >>Tony Voellm: So with that, I think we're going to go to questions. I think you were up first. So, please, this way. Who you are, where you're from, and your question, please. >>> Narindra from Audible, slash, Amazon. So did you normalize your data based on the use cases, so certain packages are used more than others, like Java is used more than C++, I hope? So did you take that into account? >>Katerina Goseva-Popstojanova: No. No. But it's a very good point in that sense that the information about usage of the data is not available when you deal with an open source. And I don't know whether it's very hard to get the information. So that is possible. But pay attention that we don't deal with failures here of how often things fail. We are dealing with faults, which means underlying reasons for a failure. Now, you are right in the sense that if you use something more often, it's more likely to hit the place where the bug is. But that's one of the metrics that's almost impossible to collect, especially if you are a third party. And for many products, would you know what the usage of Chrome is across the board? So, no, it will be good to know. But it's high metric, very hard to get. >>Tony Voellm: That was a great question. And since you work at Amazon, please put this on your desk. [ Laughter ] >>Tony Voellm: With that, we have another question over here. Please. >>> Erin McGowan from Rackspace. For your case study one, did you include any of the defects found in developer unit testing before it actually went into traditional test? >>Katerina Goseva-Popstojanova: For the case study one, actually, yes. Actually, that case study, because we had access and our coauthor was one of the developers, Dr. Jenny Li, from Avaya. We actually even had bugs that were in the early design stage there. But we only kept through -- and that's explained in the paper in detail -- we only kept code-related bugs, because that's what we tried to predict here from the code. And, yes, some were unit testing, some were system testing. For Eclipse, we only have bugs that are into bug-tracking system, which -- for the prerelease, which unlikely that have unit tests. >>Tony Voellm: Thank you. Okay. Sure, next question up, please. >>> All right. Hi, I'm Dylan Salisbury from Google. This is a really exciting topic, and it brought to mind the time at a previous company when I remember real clearly a program that had, like, the bad module that was the source of all the bugs, and it had a lot of the characteristics that you described. And you suggested that you -- for another product, you may be able to use this to decide when to focus your testing on before release. What I remember from my experience is that this module was very untestable, you know, because of the things you talked about. It had poor interfaces, it was very core to the -- there was no way to write small tests for it because it had been baked throughout the product. And the way it was finally improved was that a single engineer went in and refactored it, the whole thing, to make it testable. So I just want to suggest if make a takeaway that something that we kind of intuitively feel but this might provide evidence for is that even before release, it may be targeting one of these modules to be -- not just do more testing, but even refactoring or rewriting to make -- to allow more testing. >>Katerina Goseva-Popstojanova: This is a really very good point, because these metrics that I discussed in addition to being good predictors of where your faulty, most fault-prone parts of the system are, that gives you other ideas as well. One of the static code metrics was number of method calls. That was highly correlated with the post-release bug. So if you see a model that has plenty of those, then it's a model that may need more refactoring. Another thing is to look at things that have high cochurn. Why is the reason? You know, the CVS keeps very good track of what has been committed and when. So why some change proneness can be -- and we have done some work which we -- there's no time to present here. It's another topic. Why things tend to change is also important, because we see that the change also introduces fault. So the point that I'm trying to make, what you are saying is right. Some metrics on your own can give you an idea of how your code is. And I go back to what Ari said. If you cannot measure it, you cannot control it. And when you guys automate everything, then keep as much data as you can that then can be digested with some sort of machine learning and other methods to really find out more about your code and how to improve it. >>Tony Voellm: Thank you for your question. We have time for one more, and it's about a 30-second question plus answer. And I think you were up here first. So, please. >>> Nachum from Google. Did you find any correlation between contributors, individual contributors, and faults? [ Laughter ] >>Katerina Goseva-Popstojanova: You want to put the blame for the faults to people? >>> Absolutely. Always. Yes. >>Katerina Goseva-Popstojanova: We only looked at the metric that was extracting the number of authors that has been involved into refactoring and all kinds of things. I know of some other work by companies that do own the data and own information of communication between developers or testers that try to figure out other metrics, such as how often people tend to communicate about specific bug or module or component or how that reflect. And certainly if you own the data, you can attribute things to people. And if that's what the goal is, if you want to post somebody on that blackboard, on a -- >>Tony Voellm: Yeah, attribution is definitely perilous. I actually did that at a time at another company, and I was a popular guy, but not the kind of popular you want. So thank you, Katerina. That was great. [ Applause ]