Finding And Fixing Defects in C#

C# was invented by Microsoft, specifically by Anders Hejlsberg who has been the chief architect of the language this entire time; the last decade. In about 2002 the first version was released and it had been under development for quite some time before that. The idea of C# was that, it was going to be, the first line of the specification was that C# is a simple object oriented language that will be familiar to the users of C, C++ and Java, I think it says something like that. That’s always struck me as an interesting line particularly the simple because though C# started as a simple language 10 years ago the specification is now over 800 pages long. I think that it is a stretch to call it simple but that was the aim from the very beginning. The idea was to make a language that was like C and C++ but was much safer to use. The C# developer can definitely get lulled into a false sense of security by the protections that are afforded by the C# language. Let me give you an example. In C, C++ the consequences of accessing an array outside its bounds are dire. Literally anything can happen. Doing that can cause data loss, it can cause arbitrary code to run, there are a number of attacks, places where hackers have used out of bounds array access to cause code to do their bidding essentially. k. None of this is the case with C#. In C# an out of bounds array access does exactly one thing. It throws an exception that says you accessed this array out of bounds. On one sense that is much safer because you know that if you access an array outside of its bounds the program is going to end it is going to stop. It’s not going to be used as an attack vector. On the other hand your program just crashed. That’s still not good. You can’t use the fact that the system is much more reliable in the sense that there is less undefined behavior as an excuse for not writing the code correctly in the first place. The language has definitely become more complex and this a double edged sword. On the one hand, these sort of very high level languages allow the developer to express an idea that is much closer to the business domain than the mechanism domain of the program and that lowers the number of defects you would find in the program because it makes your program more obviously correct in that it implements the business domain semantics that you want. On the other hand all of the language and the interactions between all of the features and object models those are all very complex and have been growing in complexity and are more opportunity for things to go terribly wrong. The idea of a good static analyzer in addition to the static analyzer that is already in the compiler which is pretty good. We want to be able to understand the meaning of a program, the intention of the program and then try to figure out where it has possibly gone wrong. There are definitely challenges there both for developers and the people like me who are designing the tools to take in code that is possibly subtly broken and then understand how it is broken and then this is the really hard part, explain that brokenness to the customer. The large classifications, I can break it down into three things. There are issues with null references where a potentially null variable is checked for null in one place and then dereferenced in another place without a check. There are resource issues where a resource say a file handle is allocated, a file is opened and is never closed. The garbage collector and the finalizer will eventually take care of that but that is kind of rude. You want your programs to clean up their resources as early as possible so they can be used by other components in the system. Finally there are threading issues. We have a number of checkers that look for multi-threaded issues like accessing a field under a lock. Nine times out of 10 odds are good that the 10th time you access the field outside of the lock was a defect. There are two main workflows I use a lot when I use the product. The first is a batch build capture workflow where you are going to kick off a build on the command line using make or ms build or whatever build technology you use. What Coverity’s analyzer will do is you start the build using the Coverity build capture tool and while the build is running it records information about what that build is doing when it is calling the C# compiler, what arguments it is passing and what source code it is compiling. It records all of that information and then it has enough information to do an analysis of the source code that was actually built. There is also a workflow that is more familiar to what C# developers do on their day to day lives that involves a plugin that is in Visual Studio and it surfaces information to the developer about what defects have been found in the code in the past, where they are and what the analysis showed the defect was like. These have been designed to be complementary and fit well within the two major paradigms for how code gets developed both the traditional batch process and then the integrated development environment process. If you are interested in C# design issues, I write a blog about that at EricLippert.com I have a blog called Fabulous Adventures in Coding that I have been writing for over 10 years now. If you want to know what’s happening with Coverity, Coverity runs a blog as well. If you just do a web search for the Coverity Development Testing blog you will find it and it’s got a lot of good information there. I write a column for the Coverity blog, Ask the Bug Guys So if you have a question about a bug you found in C, C++, Java, C# there is an email address you can look up on the blog and send an email to it and we’ll have one of our experts take a look at it. We can’t promise to answer every question we get but we’ll take the best of them that we actually know the answer to and every couple of weeks do an article on those. And of course Coverity has a website and there is a lot of good information there.