Gtac 2009 - Opening talk

>> It's a pleasure for me to introduce the next speaker. And I want you--I want to take you on a journey back and think back 40-ish years and think about just a couple of highlights, probably, all of you remember, all right, as we go through languages, as we go through very early machines, as we go through concepts that have made their imprint in our--in our industry. We are talking about a fellow of the ACM, a recipient of the ACM Turing Award. I'm not sure whether you recognize this person, but maybe you recognize that person. So it's a pleasure for me to welcome Dr. Wirth to share with us some insights or overview of the history of the lost time, how we got here. I'm sure it will be a very interesting opening hour. Please. >> WIRTH: Thank you very much. You know, in spite of all this very nice introduction, you might be surprised to see me here as a speaker. And I still am actually surprise, and I wonder why I'm here. I got some time ago asked whether I would like to come down. Down, that means from the other side of the town to give a talk. And I said, "Well, why not." And now, I'm surprised to see myself as a speaker not in a little group but on a conference. Anyway, bear with me. Of course, I must confess that I'm not an expert in program testing. And so, I hope you don't expect big news and technical breakthroughs about program testing from me. So what I'd rather do is give you, perhaps, some perspective on history of programming and program testing, which are intimately intertwined. And, perhaps, see a bit through the development over the last five decades in this subject. Yes, I first got in touch with the field of programming in 1959. As a matter of fact, I had immigrated to Canada, and by chance ended up in--just founded [INDISTINCT] department. We were five students: one from France, and one from Brazil, one from Pakistan, one from China, and one from Switzerland. And I got a master's degree there and attended the course in numerical analysis. And that was kind of attractive because they have a computer. And one will do the exercises with the aid of this computer, an AWAC. It turned out, however, a big thing, big huge thing. It turned out that whenever we have finished an exercise, programmed an exercise, wrote the program, the computer was in service, in maintenance, it was down as people say now. So we ended up this whole course without a single, single--without once even touching the machine, and so there were no failing grades. Yeah, it was kind of a let down in spite of a goal, but never mind. Then I moved on to Berkeley where I entered the PhD Program that was in 1960. And was one I have looked whether there were any opening assistantships in the computer-related area. And after awhile, one turned up. And I got in touched with a Bendix G-15 machine because also like the [INDISTINCT] drum machine that means the main program, the data were on the drum, are arranged in tracks of I think on the 128th words or 27th bits or something like that. And there, I really got into programming. Programming, of course, in hexadecimal machine code and online, you have to reserve an hour or whatever on that machine for yourself to do that maybe once or twice or three times a week, and there you struggled to simple, very simple programs running. So testing was already more predominant than programming. The main debugging tool was a bell. That means when your program finished or somehow tracked or whatever, you have to insert--use a command "ring bell", very important command. So that was debugging in 1960. Well, the computation center at a bigger machine, an IBM 704 6--32,000 works it store and that machine was programmed remotely. That means, you never touch the machine, you never even saw it, you punched your program on punch cards and inserted them at a desk place where a tray and either 8 or 24 hours later, you could go and see whether the output was ready. So how was debugging done there? I forgot. But somehow, you have to inspect, if the results didn't come out right, you have to produce octal dumps. That means images of the core of the memory, which was called core because it consists of magnetic cores. And then you have to read through these myriads of octal numbers. Of course, the computer wasn't that big because as I said most 32,000 works it store, and your program occupied only a small part of it. We weren't in the age of gigabytes already. So how would the octal dumps look nowadays? I can't imagine. Well, there were two things, of course, one is that programming and debugging took a lot of time. Turnaround time was the key issue. If you made a mistake, a small mistake, you essentially spend another day waiting or programming. I don't know. So in a way, a very important emulation in that time was programming language. At that time, it was Fortran, and then it was just the advent of Algol 60. Fortran is a very--at least at that time was a simple language and one could even imagine how statements were translated into a machine instructions. So first milestone of the ones I'm going to discuss was the introduction of a high-level programming language. Of course, this brought me into the field of compiler design for various reasons. But I discovered that there was a group in one corner of the Computer Science or computer center building who programmed on something that's now called a compiler. And the challenging thing is that the whole compiler program was formulated in the language that was translated itself, some kind of a recursive situation. And I joined that group and managed to convince my professor that this was something worthwhile starting. And after a while, I got into the--I got some understanding of what was going on. It was--it was simply a true mess. There was only person who had what, on my call, some understanding of the whole thing. And if anything you wanted to do, you have to go and consult with her, involve with her. It was a program, maybe 100 pages long, which was awfully long at that time and a full of Algol statements intertwined, mixed up with machine code instructions, MCA [INDISTINCT] number. So the advantage of using a high-level language was rather marginal. And then, of course, comes the question of how do you debug. Well, again, the famous or infamous octal dump comes to question again, apart from certainly manipulating your program, inserting additional print statement, see it in there and so on, which even nowadays is actually not the bad way. But the key is you have to know how this compiler worked, how it translated into machine code because just--the compiler was a step between you and the program running, but there was no feedback. In order to know what was going on, you have to talk in the--or understand the machine code and the machine operations and tediously find how far the program had progressed and where it got stuck, where [INDISTINCT] occur. No wonder that debugging and testing which is intimately connected, testing is debugging until it runs. It's also very tedious process, and you have to learn to think in the machine not in terms of the machine, not only in terms of the programming language. I will come back to this topic. The--well, as I've said I got into this field of compiler design. I was sure that one of the primary duties in Computer Science, the one thing that might bring us a step ahead would be to bring some ordering to the chaotic mess. Perhaps, try to find a good principle according to which a compiler could be written so that you could systematically build it up from scratch. Of course, this had all to do with syntax analysis, a field very much active in the 1960s. Syntax analysis made it possible to--or syntax has made it possible to specify rigorously what the language was, what were acceptable sentences, and which ones were not? Now, to describe the meaning of admissible sentence, it's still another thing. That's a definition of semantics. And we didn't get very far in that subject. Anyway, subsequent compilers, which I wrote for--first for Algol then for Algol-W, the derivative of Algol 60, where a lot, lot, lot more systematically built than the one for is nearly act. And they could be explained, it could be introduced rather quickly. But so what? I mean, if I can--still lots of topic at hand. And nevertheless, the systematic high-level language allow to make one progress on that considerate quarter milestone nowadays, namely symbolic debugging. That means you--it replaced the necessity of understanding the translation and the outcome of it in terms of the machine architecture by letting you think in terms of the high-level language only, at least. Ideally, it should be that way. That means you could get--in the case of failure of your program, you could obtain the state of the computation. Not in terms of things pertinent to the machine or to hexadecimal dump. And location--PC location where we failed, instead, you could get the state of the point of failure in terms of your program. You could get a list of the variables active at this moment and their values as decimal numbers or character strings or Boolean values. And this was, as I would say, quite a breakthrough because it relief the programmer from knowing anything behind the scene of the language, of the high-level language. Nevertheless, let's go back to the topic of testing. Let's assume that you got through your debugging aids, the impression that your program was correct, how did you make sure that it was indeed correct instead of having only belief? Or how do you convince other people that it is correct and is free of mistakes? Well, that is of course, a key question, which was important then and I think is important now. This is certainly something I ought to tell you who are in the field of testing. I mean, it's all a pessimist of convincing other people that your product is right. Well, of course, there is the idea that you test all possible cases, at least that was an idea, and its--it's a failure, it's a misconception. It was Dijkstra who pointed this out that even very, very simple computation cannot be tested exhaustively. And the example he gave was that of--actually he's multiplying--doesn't matter, adding two 32-bit numbers. Now, in this case, you have--you have "2^64" possible cases not the auto-test. Now, let's assume that you want to--we are so naive to try to test all the "2^64" cases at the least. And that every test will occupy will take one microsecond. Perhaps nowadays, I should say one picosecond or whatever, never mind, let's assume one microsecond. Then you have "2^64", that's about "10^21" about, and you have "10^21" cases divide by the number of--of microseconds you have per year, and you decide--divide this by this number and you get something like "10^10" years to do all the test cases. It's maybe not "10^10", it maybe something "10^8" or something like that, yes, which is still more than we could care to wait for. And--just the point is that's obviously mistaken to try exhaustive testing even for very simple cases. Now the programs we use nowadays are very, very complex. And obviously we must find some method of obtaining the necessary conviction of correctness by some other means. There are two ways to do that; and one would, of course, be to reduce the number of test cases. And the other would be to use faster computers and I'm afraid that the world at large now certainly takes some mixture of both, but very heavily relies on the increased power of computers. It certainly not the academic way, and I think at the place where we can say we are still in misery and even the fastest computers for testing cannot really do the job properly. If the programming wasn't done in the first drive, right. The idea would, of course, be to have--now--so--so many test cases about zero test cases. I would like to remind you of what Dijkstra said back in 1970 about that testing is a test capable of showing the existence of errors, but never to prove their absence. Now, anyway, this is the death certificate for testing for death sentence. And I don't think you should be very happy about this and therefore I also hesitate to show you this sentence, but it is so true that I cannot withhold it from you. So the goal must be a dramatic reduction of the test cases at all feasible. Now, the best way is, of course, to go--to be radical and to reduce the number to zero. And the way of that--of doing that is called analytic testing; analytic program analysis or program verification, if you want. Now this is certainly a field that has undergone very much, the research activities, mostly at universities and still being actively pursued. And you might wonder why it is not more widely applied and why it hasn't been much more successful than it has so far been. First of all, it's not easy. You need twice the capability of mathematical thinking, of proving, essentially what Dijkstra already have shown that his programming is an activity very close to prove mathematical proving. And not everybody is very much trained of this and--so we shouldn't expect that the recruiter get thousands and tens of thousands of programmer to be brilliant mathematical provers. So that's one thing of this side in the human factor. And the other is, of course, that when you want to prove something you have to have a sound basis from which you can start your proofs by applying a whole sequence of deductions. The deduction rules must be very precise to define on the basis on which you rest your--your--your universe rest. Now, the--what is the basis? We prove programs which are texts--texts written in a certain language. And in order to do anything rigorous about such programs, your language must be rigorously defined. As a matter of fact people look at the programming language as something you jot down programs and then you feed it to the computer. But actually, a good programming language is for human consumption. It should be the tool in which you precisely formulate your programs which are then subjected into mathematical reasoning that's to say. That's already said in the 1970s, Dijkstra was a very strong proponent of all this. But he hates the key of the matter. A programming language must be rigorously defined and you must be able to do mathematical reasoning with its text. And if you look at nowadays, mathematic--programming languages, they are [INDISTINCT] and they were 6-50 year ago. I'd rather have mathematical reasoning conducted on Algo programs than on C programs or name it, you name it, the Java, whatever, our languages have become so huge, so complicated that there is no hope to whoever put them into a rigorous mathematical framework as a whole. And furthermore, programming has changed. It's not just concocting programs in that language according to given rules. You essentially rely very heavily on library routines. And most of your time is actually spend in finding the right library routines and understanding them. So, I'm afraid I'm actually drawing a fairly negative picture of our so-called Science. And testing at the conclusion, we can safely draw testing is going to stay with us forever, maybe that's dissatisfaction for you. It's not a very brilliant recommendation for academia in Computer Science and I don't hesitate. I actually think they failed pretty badly. And I don't even have to go to pre-history to make this point. Prehistory, well, I think that the era of computing started about 1975 which the advent of the personal computers, particularly the Alto and the Xerox Lab which brought computing into homes and into schools, out of these very few closed computation centers inhabited by the so-called specialists. So, everything that I was telling you about pre-1975 is prehistory, prehistoric events. We should consider high-level programming languages as formal systems. It's not only the user which consider them as such, but the designers. High-level languages must be rigorously defined, as I said. And the point is without reference to any underlying mechanism for their interpretation. Because if you do not satisfy this requirement, any programmer who finds his program in mistake, or any tester, must obviously also know about the underlying computer. He can not just rely of the definition on the rules of the language. Now, it's always said that these languages are high-level. What does that mean? I think that just means on a higher level of abstraction. Abstraction means that you can forget about certain things. And in this case, about the structure and the infrastructure set and so on of the underlying computer. And you should be able to use the language of this of that computer and should always behave equally well. And if you do not deliver a definition of the language without reference of that computer, then you are in bad shape. I have struggled, I think, all my life creating a language that would be explainable without reference to a computer. And my first one, Euler, you mentioned about Algol-W, and then Pascal, I had this in mind, but of course I failed. I have to produce a language in due time that was usable for me and for our students. So, I couldn't wait until I have found the ultimate solution. Pascal had failed. There are certain things where you have to know they understand the lang machine to understand mistakes, then came, Modula and then Oberon. And each time, I think, in this--with toward this goal, there was a definite progress made. In Modula, the idea of the encapsulation was brought in, very important. I think I mentioned that even in the milestones. Modules and encapsulation, that was in 1980. In Ada, there were then, afterwards, called Packages. Ada was--what the big influence that I--that to say, not only by Pascal, but by Modula. Encapsulation and module concept are important in so far as it was guaranteed that mistakes comes occurring at one place could be analyzed by just looking at its immediate environment. And you could exclude objects or variables defined, procedures defined in other modules, in other parts. So instead of having the whole world to analyze, you could restrict to your scope to a certain module, already a very big progress of which people on even nowadays making use. It was certainly a great help of reducing the number of cases to analyze. Of course, as I said, it would be ideal to reduce the test cases to zero. That means, by using straight mathematical analysis. And with that, you need the Axiomatic definition as a basis for proving program correctness. I'm afraid this idea had come up about 40--no, 30. Well, no, 40 years ago--between 30 and 40 years ago, and a lot of activity started immediately about improving programs correctness by analytical methods. Not very, very much has happened so far, I'm afraid, at least not in practice. Industry practically ignores all these efforts. And I can also understand why, because practitioners industry after [INDISTINCT] of systems available at large. And they are huge, and they are not to build with these methods at first place. And if you build a house, you can't prove it's stipulate as long as you want if you don't know how stable the ground is, you don't get very far. If you build on sand, it's not as if you build on rock. And whatever they approve about the house, must--cannot be better than what the ground promises. Well, here of course, we come to a topic that I--is very much close to my heart. The problem is in the complexity of our systems, and we should strive everyday when we program to reduce or rather to keep the complexity within manageable bounds that we are in that position. That of course whatever we build is built on many, many other systems that are hundred or thousand times more complex than what we build. And so, we can hold to promise anything with the reservation that the environment holds to the promises. In a way, we are very much victims of our own love and boundless exercising of creating more and more complexity. A telling tale or, of course, the programming languages, which have become more and more complex. And there are programming libraries which have become huge and ever become huger. And they are given specifications but they are typically not complete. Yes, I have talked about the past. And let me add at least five minutes to talk about the present. I have an out of this world of huge programming systems, by and large. But I'm still participating around project and have to program myself. I program software in the language of our own making, this Oberon. And it is really a pleasure because there, one understands what's going on. Of course, having built it myself mostly, I should understand. So, this is one thing. The other thing is that I'm in the same project, designing hardware. Essentially, a set of identical processors connected by a network, and all this is done on a programmable device, an FPGA. And there, of course, nowadays, you do not hesitate. Ten or twenty or thirty years ago, they're all circuit diagrams proving the correct number of circuit, how did you ever do that? What was an unknown concept. No--nowadays, you don't draw circuit diagrams anymore. You describe them in terms of what of a circuit description language, which is also called programming language. The one I'm using is called, Verilog, commercial product. First of all, it's also a language which is hugely complex. And it has one excuse namely that it has to introduce software concept into hardware. And sometimes, you don't know whether this grab a sequential program because the statements look very much like that. But then, you realize again that you're actually describing the circuit where all the gates operate at the same time, parallel. So, we are even in the further way than we were when we use C or C++ or C# from any kind of some theoretical basis which would allow program proving. But then, this language is also surrounded by a lot of tools. And they are the real misery stocks. I'm really struggling in finding my way through these tools. And often the concepts are not clear and the connections are not clear. And sometimes, I just don't know whether there is a bug in the tools or in the bug in my thinking. Narrowing down, of course, is--the mistake, is a primary principle in finding mistakes. And sometimes, you just stare at the place and you think it must be right on that writing. And then, you call for an expert and after a few days, they found out that this and this was the reason. Something have been missing. A completely different place of the system or of the tool setting, or of the configuration specification, or, or, or, or. A configuration file and all kinds of things where something might go wrong and which are of course not described in this Verilog. So understanding Verilog is by far not enough. You must understand a lot more inclusive down to the levels of gates. So, if you are unhappy with the tools of your software world, then I order you to dive into the hardware world for a few days and then come back happily. No, it's a--I sometimes feel--this is such a highly praised field information technology of Computer Science. And I sometimes wonder, "Where is actually the progress that we have made?" We are still struggling with similar problems as we did 50 years ago. I told you, we have the sign up times. And we have long, long compilation times, even for small programs. Well, my hardware program is also only few pages long. And when I compile it, I have to wait half an hour. And then, I spent an hour finding what went wrong. If I'm lucky, I'll finding out. Usually I do not. But I get some idea of what might be a possible remedy. I make one or two little changes, and then, I wait another half hour, reminds me totally of the times of the '60s. Then at least I have the time to go home not only to drink a cup of coffee. But still, it's the same basic problems that we are fighting with. And the--it shouldn't be that way. I believe, we could increase the productivity of our engineers by a factor of a hundred and more, if we have proper tools. Now, our dilemma is of course that we are stuck with the tools. Also the huge companies who produced these tools are stuck. And sometimes, they admit that they have been suffering most. I don't know the way out. I don't know. Don't expect that I have a panacea to get out of this situation. We have moved far too deeply to trace back, too bad and a reluctant to leave you with such a bad part note. But, before we started discussion and I--I have a bullet proof vest, by the way, so. Let me review with a few--perhaps a bit more positive thoughts. Whereas we acknowledge testing as indispensable, strong efforts must be made to eliminate errors in the first place. Instead of detecting, avoid errors. This, of course, means that testing and programming must be intimately inclined. They shouldn't be separate activities. I mean, they have seen in our [INDISTINCT] in order to reduce this enormous number of testing cases, you have to know the structure of your program. They have to know how it was developed. And then, you can test [INDISTINCT] case's object, and to rely from things in between seeing--hopefully seeing that there's a linear or whatever behavior between the two borders. If the--if the array is a hundred--it has a hundred element. If you test or see an element and for 99th element, you can safely assume that in between it would probably work. So, you cannot separate testing from development. Now, whereas analytic methods are needed, they are neither easy nor infallible. I'd like to stress on that too because program proving is a very subtle business. It needs own--it needs very good logical skills. And since systems are huge, we have a tendency to automate them like testing. If something is too big to understand, that's automated. And computers are fast enough nowadays, so, we'll manage somehow. Whether this, of course, increases the trustworthiness is another question. But, it's not infallible and proves and also contains errors. So, you may make an error improving, an error erroneous system, and this leading to the belief that the whole thing is correct. More effective, because available now would be simpler, less baroque languages--less baroque formal systems, together with disciplined approach to program design. I think, programming is--no, I don't think, I know it is so. Programming is at the base, is at the root of the whole matter. And it's not given proper attention. This may be a strong statement, but it is true. At universities, programming is neglected. The skill--teaching the skill of programming is neglected because it's difficult and it's considered below the rank of academic activities. I mean, programming--you assume by osmosis somehow. You are a clever fellow, you can strike this together. And this is not so. For the simple things, yes, but as soon as things become more complex, is not. Tony *** once said, "We should look at programming like learning to play the piano. You very quickly learn to play the first simple tunes with two fingers. But in order to become master, you must learn it--play it with 10 fingers." And very often, we quickly or train to or not train, we are asked to learn playing that two fingers. And then later on, the difficulties are overwhelming. Universities are not very good places to learn programming. There are professors who teach programming because they don't program themselves anymore. That's why I am not familiar with difficulties which also present in the languages and the systems they choose. And then, they leave their troubles to their assistants. Yes, can we do? I don't know, maybe you have a solution. That is really what I wanted to say. And I hope we can have a few questions. Now, thank you very much for your attention.