Lec - 2 solution of nonlinear algebraic equations - Part - 1

In our last lecture we introduced the concept of zero of a nonlinear function. Let us just briefly review what we have done on this topic and then proceed from there. We are considering an equation in one variable; fx is equal to zero. This function is either a polynomial of this particular form or it could be a transcendental function which will be a combination of polynomials, exponential functions, trigonometric functions and logarithmic functions. We are interested in finding the zeros of such a polynomial or a transcendental function that is given to us. We have defined a zero or a root of a function fx as any value x is equal to xi which satisfies the equation automatically. So that fx is equal to f xi is equal to zero. Now I would like to distinguish between a simple root and a multiple root of a given polynomial or a transcendental function. If, x is equal to xi, xi is a simple root then f of xi is zero, but its derivative f dash of xi is not equal to zero. For example x is equal to 1 and 2, 1 and 2 are simple roots of this equation; a polynomial equation which can be factorized as x minus one into x minus two. Similarly if we have a root of multiplicity m, then derivatives up to order m minus one are zero and nth derivative at xi is not equal to zero, then we shall say that it is a root multiplicity m by m; And if you set m is equal to one we go back to the simple root that it is an f of xi is zero, f prime xi is not equal to zero. Now if it is a root of multiplicity m, theoretically it is possible for us to factorize it and get the factor out, which means I can write down fx minus xi to the power of m into gx is equal to zero, g xi is not equal to zero. This may not always be possible to do for a transcendental function because, it is not possible to the write common factor and write it; but for a polynomial, yes, we can do it. But theoretically it would be possible for us to write it in the form of, fx is equal to x minus xi to the power of m and gx is equal to zero. We also gave an alternative definition for defining a multiple root, so that if you want to you have a method to obtain a simple root and the same method could be used if you are able to construct a new function which also has a simple root. So to use that concept we know that, if fx has a root xi of multiplicity m then its derivative has a same root xi but of multiplicity of m minus one. So if I now consider the ratio of these two functions i.e. fx and f prime x then hx has a simple root xi. Now as I said earlier, if I am able to construct methods which give me the simple root much more easily, then I can use the same to find the multiple roots also and apply the same method on the function hx in such a way that I will get the simple root of hx and hence the multiple root of fx is equal to zero. As an example we gave this as the multiple root of multiplicity three, so that here I am able to factorize it and then take x minus two whole cube downside. Now what is it that we are really looking for; we are talking of zero or a root of an equation and what we mean is, if I draw the graph of this function, y is equal to fx then either the graph is cutting the x axis like this or it is cutting the x axis like this. We are interested in finding that value which is the point of intersection of the axis with the graph of the y is equal to fx. Now if you want to construct methods for this, there are two types of methods; one is called the direct methods the other is the iterative methods. In the direct methods we can produce the solution in a finite number of steps. Therefore it means we will be able to count the total number of operations in a particular method and say you need to do this many multiplications, this many divisions, this many additions and subtractions. Now usually when you are giving an operational count, the total count we normally give the major operations which is your multiplications and divisions; and the minor operations are additions and subtractions. We know that addition and subtraction take much less than the multiplication and the divisions. Sometimes we include all the four but very often operation count would mean the major operations. Now in an iterative method we use the idea of successive approximation to a root which means we start with an initial approximation, we will get the initial approximation through some other way and then starting with that initial approximation we shall now proceed to find the sequence of approximations x1, x2, x3 and which we hope it will converge to the exact root of the solution. We will have a look at when it will converge when it will not converge. Now in the example that we have just given the polynomial a0x square plus a1 + a2 is equal to zero; If I straight away solve this and write it in this particular form then this will be called as a direct method. I am able to write down the solution of a problem in just one step and if I want both the roots then I can write it in two steps. Now as I said last time we will be able to count the total number of multiplications. Here there are three multiplications. As we have counted here there are two for this one, a square can be taken as a one into a one; so I have got here three multiplications, I have a division here. I have got an addition here and an addition here, so there are two additions that are coming into the picture here and there is one square root. So I am able to give out the total operation count for this so that I will be able to tell what will be the total amount of computed time that a method can take. So if I am given the time factor for multiplication and division, suitably I can multiply that and then say this is the total amount of computed time that this method will take for this solving this particular problem. Now if I take the example of this a0x square plus a1x plus a is equal to a2, I can write down a number of iterative methods. Which one of them would converge is not known. We have to do the convergence analysis of this and see which one would converge and which one will diverge. For example, for a simple trivial equation I have taken these two terms a1x plus a2 to the right hand side, then a0x square is written as a0 x into x. So we have taken a0x to the right hand side and then we constructed that, this will be the new approximation and this will be the old approximations. So I now construct an iterative method of this particular form. Similarly in this case we have kept the middle term on the left hand side, the first and the third terms to the right hand side and then written an iterative method over here; and in third case we have taken these two terms, taken x as common from here and I have taken this a2 and this quantity to the denominator. So therefore this describes three iterative methods for solving the same problem and all of them are in the form xk plus one is equal to g of xk. So that the new approximation is here and the old approximation is over here. We will later on see which one of these two will converge. We have just now mentioned that it gives us is an infinite sequence of approximations. Now it is very important for us to know, does the sequence converge at all. I mean what is the convergence criteria, how you are going to impose it. Secondly it is an infinite sequence of approximations, so when do we stop this iteration; because theoretically we are doing infinite cycles of computation but we will have to terminate it at some stage. Termination stage could be a particular accuracy or some other things should be given so that the iteration can be terminated at some stage. Now let us define convergence. By convergence we mean that we have a sequence of iterates xk, it converges to the exact root xi. If we are taking infinite sequence, limit k tending to infinity xk minus xi is equal to zero. Now I can define xk minus xi as error at any iteration because this is the value that we are obtaining at a particular iteration and this is the exact root. So I can define this xk minus xi as the error at any particular stage particular. Now during that iteration I can find out what is the magnitude of the error and if the error goes to zero as k tends to infinity, we are getting limit xk is equal to xi which means we are now achieving convergence. Even though this is the test we shall see how we are going to apply this. Now the second point is when we should stop the iteration. Now the stopping criteria or criteria that we have achieved a particular convergence is, let an accuracy epsilon be prescribed. We can also use the word tolerance for accuracy. In a particular problem if you require five plus accuracy or six plus accuracy of the solution or in other words once you prescribe the accuracy that you want in a problem any one of this criteria can be used. What we can do is we can find the difference between two successive iterates. Now we starting with x0 we got x1, we got x2, we got x3. So I can find out the difference between these two or any two starting at a particular stage. After a few iterations we can start testing what is x3 minus x2 in magnitude, what is x4 four minus x3. So I can go on finding, in the magnitude what is the difference between two successive iterates; if it is less than the tolerance that is given to us then I can stop the iteration and then give the output that is required. Alternatively we can also stop the iteration procedure when the equation is satisfied to a certain accuracy i.e. our equation is fx is equal to zero. So if at a particular iterate namely xk plus one, if I substitute f of xk plus one, for say ten to the power six accuracy; so once accuracy is prescribed I can look at the equation itself and see whether the equation has been satisfied correctly or not. Therefore I can have this particular criterion also. Now let us just have a look at what it really means graphically. Now I am describing here the first case where I have taken the x axis and fx here. Now this is the graph of this; xi is the exact root. Now I have taken xk as a particular approximation. The successive approximations move towards the exact root. So I have got xk plus one as a new approximation, I am testing xk plus one minus xk here. As we proceed to successive iterations this xk plus one would go on moving towards this and the difference between two successive iterates at any particular stage is going to be less than epsilon; whereas in the second case we are finding what is the value of f of xk plus one, f of xk plus one is nothing but fx an ordinate. So if I take a particular approximation xk plus one I will be testing the value of this ordinate. As we move towards this root xk plus one this f of xk plus one goes to zero and finally f of xi is exactly equal to zero. Therefore here in this case we are testing the ordinate, whereas in this case we are testing the abscissa i.e. the difference between two successive approximations is less than epsilon or not. Now sometimes you may have to use both, the reason being we would like to know when this criterion that we have written or this criterion may fail. Let us take this graph completely. The first criteria may fail when the graph of fx cuts x axis almost vertically. Now let us see what is going to happen? I have drawn the graph of this. Now just draw the graph almost vertically to this; then if I take any two successive points here, xk on the left, xk plus one on the right (these two are the points). Now even though they are very close to each other, since the graph is almost vertical, this f of xk or f of xk plus one will be very large; that means the equation fx is equal to zero has not been satisfied correctly. Since the graph is almost vertical, the first condition that xk plus one minus xk in magnitude less than epsilon would fail. Similarly the second criteria would fail if the graph is cutting almost horizontally. So the graph is falling like this and then cutting the x axis. You could see the reason why it is so. If I take any two successive approximations xk, xk plus one, when the graph is cutting horizontally, the f of xk plus one is almost zero but the difference between xk and xk plus one will be large. Now look at the graph which is almost tangential to this one and cutting it. So if you take the value of the ordinate, it is less than epsilon but we are not getting the root because we are far away from the root as it is cutting very horizontally. Of course if you are going to test both of them together there is no question of failure at any particular case. However in most of the cases we may not use both of them hoping it is not a typical case of one of these two cases but there can be failure which means, when you are solving a particular problem and you have difficulty, you may have to think that may be a kind of the graph of the function is coming either horizontally or vertically. Therefore we can use both the conditions together to avoid that particular pit fall. Now the next step I want an iterative method but I want a starting approximation. Obtaining a starting approximation is very important because, suppose an equation has got a root at minus ten. If I start an approximation at say plus five or plus ten which is so far away from the root, it may not be possible for the iterative method to converge that root because it is too far away from the root. Therefore it is necessary that the approximation which we are talking of should be in the neighborhood of the root. Neighborhood does not mean a small epsilon neighborhood, it should be quite large, a reasonably large one. Maybe say the root is lying minus ten, you may say the root lying between minus fifteen and minus eight or root is lying between minus fifteen or minus five. So a reasonably large interval is allowed so that we need a suitable initial approximation and in fact in many cases the situation would become much worse if you have a system of nonlinear equations. Here we had talked off one variable problem i.e. fx is equal to zero, if I take two variable problems as fxy is equal to zero, gxy is equal to zero then it is nonlinear equation in two variables. So I need very good approximation for the both the variables x and y. Therefore this situation will become more important in the case of system of nonlinear equation which is very commonly uncounted in almost all your applications. Now there are two ways of obtaining the initial approximation. One is the use of graphic method; the other is a use of intermediate value theorem. Let us see what we mean by the graphic method. In the graphic method if I am able to draw the graph approximately (we are not talking of exactly drawing the graph) if I am able to draw the graph of x is equal to approximately, I can use it to get an initial approximation. So draw the graph of it let it cut the x axis at a particular point and when it cuts the x axis at that point I can take that as the required approximation or initial approximation. So what we are saying is that take your axis, take the graph of this, substitute few values of this because fx could be a very complicated function. So substitute few values, get the values and then just plot a rough graph of this and then wherever it is intersecting take that point as your approximation as x0. So we shall take that as the initial approximation for starting the iterative method. When once I have an initial approximation I do not need anything else, the method itself will give me the successive approximations. Now if we are not able to write or plot the graph of x is equal to zero we can think of breaking the function fx is equal to zero into two parts; as f1x is equal to f2x which means I could write it as fx is equal to zero in a convenient form as f1x is equal to f2x. For example, if I want this graph the approximation for x minus sin x is equal to zero, I can write this as x is equal to sin x, take f1x as xf2, x as sin x. So I know the graph of f1y is equal to x. I know the graph of y is equal to sin x. Now wherever it approximately cuts it I will take that approximation as my initial approximation. So this is the idea behind, if you want to write fx is equal to zero in the form of f1x is equal to f2x we can break it like that and then draw the graph of y is equal to f1x , draw the graph of y is equal to f2x. The point of intersection is a root. Any value close to this point is an approximation. So approximately we are drawing the graph and getting the point of intersection and taking that as my approximation. But of course it is not a practical method; we strictly are not going to use it in our applications because it is very difficult for us to draw the graph of this and try to get the initial approximation. Now let us illustrate how we are going to get the initial approximation. I take a simple example of fx is x square plus x minus one is zero. So I want to get the initial approximation for this one. It is a quadratic equation. I know it represents a parabola. I will try to draw the graph of the parabola approximately. So I can now write this y is equal to fx, then make this as a perfect square, x plus half whole square and then add and subtract one by four. So I will get minus one by four. So I take this five by four to the left hand side, so y plus five by four is equal to x plus half whole square. So from this I can recognize that this is a parabola with the axis opening upwards. So I can say that this is a parabola vertex at, minus half minus five by four and opening upwards because of the negative sign; and now I take x axis, y axis, draw the graph of this parabola, I know this is a vertex and draw approximately. Just take one or two points here and then draw the graph of this and then I have got the two points a and b. Depending on which root is required whether the positive root is required or negative root is required, I can take the approximation accordingly. I can take this value for the positive root and this value as the negative root. So I can draw the graph of y is equal to fx is equal to zero in this form or as an example given earlier you can take the graph of y is equal to f1x and the graph of y is equal f2x. Now let us discuss the other way of finding the initial approximation, which one is the most useful and which one we shall use often in our analysis, which is called as the intermediate value theorem. We assume of course fx is a continuous function in a, b and the f of a, f of b is less than zero. Then fx has at least one real root or an odd number of real roots in a, b. Now what we are really saying here is, fx is a continuous function on a, b. f of a into f of b is less than zero. So the graph need not be like this. It does not matter if it is cutting the other way around. Now you can see that f of a into f of b is negative that means one of them is negative and other is positive. Therefore either f of a is negative here; this ordinate is negative; this ordinate is positive or alternatively this ordinate is positive, this ordinate is negative. When the product of these two ordinates is negative the graph has to cross the x axis. Since the graph is crossing the x axis we have located a root in that particular interval. Then fx has one real root or an odd number of real roots in the a, b. Obviously I have taken a nice graph like this but it is possible that the graph is cutting the x axis more than once. So we will have located more than one real root. Therefore it will have one real root or an odd number of real roots, particularly when the roots are very close to each other. Say 1 and 1.2 were the roots of a particular equation. You might have seen that the root lies between zero and three because it is going to be odd number of real roots. So there are three roots 1, 1.1, 1.4 and you have picked up the interval as zero and two, therefore it will say that it has got at least one root in the interval; one or odd number of roots. So this way I can locate it and once I locate it I know that I approximation should be good, that means the initial approximation should be close to the exact solution. Therefore if necessary I can define, let us suppose you say that zero is between zero and fifty, now this interval is too large. I can reduce this interval further to test whether at 25, say zero and twenty five is the root or twenty five and fifty root is. It is possible by using the intermitted value to reduce the length of the interval to any length which I want, and then I can go for a numerical method and construct the numerical method. Let us take this simple example of this same one earlier, where fx is equal to x square plus x minus one is zero. Now I would test few values; f of zero substitute zero i get minus one, substitute f of one I get one, substitute minus one i get one minus one, minus one is minus one, f of minus two substitute it here I get four minus two minus one is zero. Now I can see that f of zero is negative, f of one is positive, f zero into f one is negative and again f of minus two minus one are negative and positive, therefore its product is negative. Therefore I have located both the roots of this equation, one root lies between minus two and minus one; the other root lies between zero and one. So I now located the two roots where they are lying and once I have this one, then immediately I can go to any iterative method and I can start the iteration procedure from there. Now usually we can form a table of values to find the interval in which the root lies. Now here we are talking of, you need more than one root. So if you need more than one root you can form a table of values and if you say a polynomial of degree five is given and you want to find out where the roots are and whether all the five roots are real, where they are existing, all that you would be able to find by constructing a table of values. Now when once we locate that the root lies in a particular interval a, b I would go for iterating it further. See this is minus two minus one; there is a root between zero and one. Now you are removing this and saying between minus two and zero, so trivially it is there but what is the length of the interval minus two to zero? Length is two and the length of the interval minus two to minus one is one. The length is one. Therefore this is a refined interval than minus two zero. You can also take as the root lying between minus two one zero. But once I have given this, which is a smaller interval, length of the interval is one; therefore I have taken a refined interval in which the root lies. When one if you want to iterate a particular thing it is okay to take minus two to zero. Minus two to zero can also be taken and we can proceed on, except that it may take one more iteration, an extra iteration to get the required accuracy of the problem. Now the simplest and the trivial method is bisection method. The bisection method is trivial because we apply the intermediate value theorem repeatedly, each time bisecting the interval. He asked the question that the root lies between minus two and zero, yes, the root lies between minus two and zero. Let us bisect it. The root bisection of minus two, zero is minus one. Now I will test whether the root lies between minus two and minus one or minus one and zero. Now I find that the root lies between minus two minus one so starting with a particular length of the interval, each time I will bisect a length of the interval and then go on repeatedly applying the intermediate value theorem and we shall call it as the bisection method. Now what we really mean is the root is laying between a zero and b zero, therefore the product of these ordinates - f of a zero and f of b zero is less than zero. Now I bisect the interval and write these a1 is equal to a0 plus b0 minus a0 by two. Here I may make a comment, I had written the middle point of a0b0 not as a0 plus b0 by two. But I have written this as a0 plus b0 minus a0 by two. Of course if I simplify in our ordinary way it will be a0b0 but if you are going to a computation with finite length, you can take a simple example of a four digit number and then two, four digit numbers and then find out what will be a0 plus b0 by two and what will be a0 plus b0 minus a0 by two. This value is going to be much more accurate than taking a0 plus b0 by two. So when you are doing finite arithmetic where length is very small then a0 plus b0 by two is an inferior approximation or an inferior value compared to this one. Therefore I would prefer to use both, even though mathematically both are equivalent; but for computation purposes I would take the middle point of the interval a0b0 as this particular formula. Now when once I bisect it I get two intervals a0 and a1 or a1 and b0 if f of a1 is not equal to zero. Of course if f of a1 is zero it is a root. By definition f of a1 is zero; it is a root. So if f of a1 is not zero, I will determine whether the root lays in the interval a0a1 or a1b0. Now we can test that immediately again by the intermediate value theorem. Therefore what I can do now here is, I can test if f a0f a1 is less than zero; if it is so then the root lies in this one, and otherwise it is in the other one. We do not have to test the other one because there are only two intervals for us. If the root does not lie in one then it lies in the other one. So I will test only the first one if the root lies here, then the root is in the interval a0a1, otherwise the root is in the other interval a1 and b0. Now we shall repeat this procedure until the required accuracy is obtained. Now there is something very interesting in this bisection method and that is we are doing bisection which means the length of the interval is reduced by factor of two, so each bisection reduces the length of the interval by a factor of two. After n bisections the length of the interval is reduced by factor two to the power of n. So I am doing n times the repeated bisections therefore each time it is reduced by factor two namely - two square, two cube and so on, after n bisections the length of the interval is reduced by the factor of two to the power of n. That means if I start the interval at zero to ten, then let us hope the next interval is in the somewhere near zero; so the next time we will have a zero to five that is a factor of two. Now I divide it by factor of two i.e. 2.50 and 2.5, so it is now reduced to a factor of two square. So each bisection will reduce for factor two, so that after n bisections it is reduced by factor of two to the power of n. Now let us say that an error tolerance is given i.e. accuracy is given to us. We are trying to say that if I prescribe a tolerance of six plus accuracy or eight plus accuracy, can you tell the number of bisections that is required, what is a total number of bisection that would be required; it is possible for us to do that in this case. Let us see after n bisections, the length of the interval is reduced by factor of two to the power of n. Therefore the length of the interval is b0 minus a0b and this length of the interval b0 minus a divided by two to the power of n is the current interval. After n bisections this is the length of this interval and I want this length of the interval less than epsilon. Let us suppose finally I say that the root lies between 0.156 and 0.156525; so these two are agreeing to the three decimal places. So we are finally going to locate the root in the interval in which the a0 and b0. The difference between them is less than the given accuracy epsilon. So this is the quantity that we have i.e. length of the final interval is less than epsilon. Once this is given, here b0 is known, a0 is known, and epsilon is known, so I can determine n from here. So I will take the logarithm on both sides. This logarithm of b0 minus a0 minus logarithm of two to the power of n is less than logarithm of epsilon; and then solve from here for n. I take this, keep it over here, write this as minus n times logarithm of two, less than log of epsilon; then I bring log of epsilon to this side, n log two to the right sign side and write down what is the value of n. n is greater than one upon logarithm of two logarithm of b0 minus a0 minus logarithm of epsilon. So n is number of iteration that is required therefore this states that I need a minimum of these many iteration whatever the nearest integer is. I take the nearest integer and then say that this is the number of bisection iterations that will be required to get this accuracy of epsilon. Now there is something very interesting and the first thing is that (previous slide), the right hand side gives the minimum number of iterations required because we are saying n is greater than this one. So this tells us this is a minimum number of iteration that is required for this one. But usually that is the exact number of iterations also; because depending on the integer which you are going to round off, if you round it off to the next integer then that will be the required one. Now the most important observation that would be here is, in testing this bisection we are testing on the sign of f of a0, f of b0. We are not actually using the value. Let us say the value of f is minus 256.4, so we are taking the sign as negative but if we take another function fx whose value is minus three again we are taking the sign as minus. Therefore whatever we have discussed does not depend on the value fx, it depends only on the sign whether the sign is positive or a sign is negative. Therefore all computations use this sign of fx at a point and not its value, hence this result is true for all fx is zero. As long as fx is negative the result is same. Therefore it is independent of a given function. Therefore there are two things that happen; one is the results that you have just now noted is true for all fx. This result is not for a single function fx, this result is true for all functions, whatever be the functions that we are considering. And secondly since we are not using the value of the function and are using only the signs, the method can never fail because you are always locating the root in an interval a0b0. Therefore the method can never fail, therefore in bisection method the question of failure does not arise, but if that is so we do not need any other method. But the method is very slow in the sense you start with the big intervals say zero to some fifty and then you finally may have to do about fifty to forty iterations to get the required accuracy or may be more than that to get an accuracy of six decimal places. Therefore it is for this reason that we need a better numerical method which gives the solution much faster, but never fail. Therefore initially when you are starting the iteration procedure one can adopt a bisection method so that we are sure that you are getting an interval in which the root lies; then we can jump from bisection method to any other method which will converge very fast compared to the bisection method. Now let us see for a typical example what will be the number of iterations that will be required. Let us take an example like this. Let us say that the root lies between zero and one. I have taken a simple one because in this case somehow the things become zero and easy. I just want to substitute a0b0 over here so I have got here n greater than one upon logarithm of two logarithm one minus zero minus logarithm of epsilon, so that this actually goes off for me. It is easy to look at the values here, minus logarithm of epsilon divided by logarithm of two; and then I would give the values for epsilon. The accuracy that we want in a particular problem ten to the power of minus two and once I put ten to the power of minus two, I can take this minus two outside and take it as two times logarithm of ten divided by logarithm of two and then round off to the next integer. So I will have seven iterations. If I take ten to power of minus three I get ten iterations, if I take ten to power of minus four I have get fourteen iterations. Similarly I can construct for any other given epsilon. Now if this interval was not 0, 1, and if this interval is say zero to twenty or you have taken zero to fifteen, these number of iterations are going to increase further because this number is going to be much larger, due to that reason even though it does not fail any time it is still a very slow iterative method. Now let us just illustrate on a simple example. Find the real root of x cube minus five is zero. Now we would first of all locate an interval of length unity. Let us take the interval of length as one. So f of zero is minus five, f of one is minus four, f of two is three. Therefore I find f1 into f2, they are of opposite signs, and therefore it is negative. Therefore I would say that the root is in the interval 1, 2. Now once I locate this interval, that it is lying in one and two, I can go for the bisection and then say what will be the next interval in which the root lies. Now the midpoint of one and two we are taking it as three by two, then I will find out the value of f three by two and the value is twenty seven by eight minus five; therefore it is less than zero. Now I test whether the root lies between three by two, two because we had taken the interval as one and two, so either it should lie between one and three by two or it should lie three by two, two. Now I find f three by two, f of two to be negative, therefore root is minus three by two and two. Now if I take the midpoint of three by two and two, I get seven by four. Now again I find the evaluate what is f of seven by four, f of seven by four, this x cubed minus five, 343 by 64 sixty four five greater than zero. Product f three by two and f seven by four is less than zero. Therefore root is three by two seven by four. So now I can proceed on each time bisecting it and get the interval. The procedure would stop when the difference between these two is less than the given epsilon until that time the iteration shall proceed on. When we are going to discuss the other methods we would like to discuss the computational cost between difference methods and then say which is taking less amount of completed time, so that the maximum time that is taken here is in computation of fx. Because fx is a complicated function. So the major cost is on computation of fx; it is not in the one extra division you are doing or extra addition that you are doing, the whole cost will be based on the evaluation of fx. Therefore we can say the computational cost for bisection is one evaluation of f, because each iteration will require evaluation of one of the values of the function. So the computational cost for this is, one evaluation of a function fx. Suppose that there are three roots in an interval, say the interval 1,2; therefore f of one into f of two is less than zero. Let us assume that the roots are 1, 1.2 and 1.3. If we bisect the interval 1,2 we get the intervals 1, 1.5 and 1.5,2. We find f of one into f of one point five is less than zero; therefore there is one root or three roots in the interval 1, 1.5. There is no root in 1.5, 2 since f of one point five into f of two is greater than zero. Now bisect the interval 1, 1.5. We get the intervals 1, 1.25 and 1.5. Since f of one point two five into f of one point five is less than zero there is at least one root in it, the root one point three lies in it. But f of one point two five into f of one point five will be greater than zero since even number of roots cannot be captured by intermediate value theorem. However if you bisect the interval 1, 1.25 then we get the intervals 1, 1.125 and 1.125, 1.25. Now f of one into f of one point two five is less than zero. Hence at least one root lies in it. In the present case the root 1.1 lies in it. Again f of one point one two five into f of one point two five is less than zero. Hence there is a root in the interval 1.125, 1.25, the root one point two lies in it. Now we like to move on to how to get the method better than the bisection methods. One is the methods based on the first degree equation to solve fx is equal to zero. Now let me see the concept behind this one. Now let us just look at the graph of fx. This is a graph of y is equal to fx. This is our root that is required. Now we know that if you are given any curve and I take any point on the curve, in the neighborhood of this particular point on the curve the curve can always be approximated by a straight line. An infinite decimal segment of curve can be replaced by straight line. But now what I would do is since I may not be able to get infinitesimal length on the curve, I will take a straight line as an approximation into the curve in the neighborhood of this and then successively make the straight line become a real straight line at the point xi. Therefore in the neighborhood of an exact root we approximate the curve by a straight line; and the straight line is of the form fx is equal to a0x plus a1 is equal to zero. Now once I approximate the curve by a straight line the root required is the intersection of the curve and the x axis. Now if I approximate the curve by the straight line, the approximation to the root is the intersection of this line with x axis; that means if I just said y is equal to zero here I will get an extra approximation for this. Therefore if I am able to approximate fx by this straight line then I will have here x is equal to minus a1 upon a0, that means I am just setting y is equal to zero which is intersection of this line with the x axis. Therefore I would simply get x is equal to minus a1 minus a0. Now we say that we have approximated the curve by this straight line. How do you approximate it by straight line, how do you get the values of a0 and y1. Now we can get different methods by following different procedures to find the values of the constants a0and a1 and these give us number of methods. The first method that we shall take is a secant or a chord method. What is secant method? I take any two approximations to the root let xk minus one and xk be two approximations to the root. We are not saying that the root lies between xk, xk minus one but say if a root is 1.2 I may take 0.58 as an approximation to the root. So there are any two approximations to the given root, then what I do is, I will take the corresponding point on the curve. We denote f of xk minus one by f suffix xk minus one, f of xk is f of k. Now if I take this as the approximation then I know xk minus one, fk minus one is a point on the curve and xk fk is also a point on the curve. But we have just now approximated . We have approximated fx by this, that means these points that we have taken, they are all on this line therefore these points would satisfy this particular equation. So I can substitute xk minus one fk minus one in the equation of the line to produce f at fk minus one which is a0xk minus one plus a1. f of k is a0xk plus a1. I have got two equations linear equations for solving a0 and a1. I can just solve and subtract the two get a1 and then substitute for a0. I get the other value. So if I subtract these two equations what I would get is fk minus fk minus one. Here a1 cancels a0 into xk minus xk minus one and hence a0 is solved. a0 is fk fk minus one by xk minus xk minus one. Once I solve for a0 I can determine a1 from any one of either of these two equations. If I take this equation I can write a1 is equal to f of k minus a0xk. Now I have determined a0 and a1, and the root that we have said x is equal to minus a1 upon a0 is the required root. So I will substitute the values of a1a0 in x. I will write down the next approximation which is substituting in this xk plus one is minus a1 upon a0; a1 is fk minus a0xk and divide it by a0. First I will take this term xk minus fk upon a0, now substitute the value of a0. a0 is fk minus fk minus one xk minus xk minus one. Now this is the required formula which we shall call as a secant method or the chord method. However we can also simplify this, which is multiply these two and simplify further and write like this. So I just multiply these two i.e. xk fk minus xk fk minus one minus, multiply these two xk fk xk minus one into fk and therefore this gives us xk plus one is these two cancel of xk fk and I would get here xk minus one fk minus this divided by this. So this is called a secant method and we can use either of these two forms. I can use the form two or form three to get the next iterations starting from an initial value x0. As I said this x0 is obtained either from the intermediate value theorem or through the bisection procedure we obtained, but we can use any one of these two forms. Sometimes this is convenient and for error analysis two is very convenient. We shall see in the next lecture how these two are useful in doing this procedure. Now the very important thing to note here is this. In the starting we said that xk minus one xk are any two approximations to the root. We need not ask or require that the root should lie between xk, xk minus one; the root need not lie in xk minus one xk. The method gives convergence to the root from one side of the root. The convergence for the secant method is always from one side of the root. Let us suppose the root was 1.5 and the root lies between zero and three and if the root lies between, this iteration will take one or two iterations to jump to one side of the root. So it will go either to 1.5 to 3 or it will go from 0 to 1.5 side and then start converging from one side only. That means if you are using secant method and if you start with a root lying in an interval, you would lose one or two iterations so that the convergence will come from only one side. The convergence is always from one side for the secant method. Now the cost of the computation will be; at each iteration, we need one function evaluation fx plus one, which is a major cost in applying this method. Again cost is the same as bisection. Here also we are having only one function evaluation, there also we have one function evaluation. Therefore the cost is the same but the procedure is different. But here of course we can count for each iteration, there are two multiplications here and there are two subtractions here and one division here. So you have got the other operations also. For each iteration will be able tell over here.