Cs 161 - Strings 1

Colin Goble: Hello. This is Colin Goble. In this video, I would like to explain how old style strings work in C++. The most important thing to remember is that strings were handled this way for many years, from the original beginnings of C++ and the C programming language. It has only been in recent years that the string data type that we have been using in this class so far has been in common use. Before that and in C programs even to this day, strings were handled quite differently. Strings were quite simply arrays of type 'char,' or arrays of characters, if you will. The first line of this program has a string whose name is 'string1.' The declaration shows it as being an array of 12 chars. The characters therefore will be numbered 0 - 11. We are initializing them here with an array of characters. Every one is enclosed in single quotes [' '] because it is of type char. If we count these up: 1, 2, 3, 4, 5, 6, there is a space, 7, 8, 9, 10, 11. We will see that in the string itself, there are in fact 11 characters. Now in old style C-strings, the most important thing to always remember is that strings are always terminated by a byte, all of whose bits are set to 0. This is often called the 'Null Terminating Byte.' It is a charac, let me say that again. It is a character, each one of whose bits is set to 0, the 0 byte. [\0] The way we show that is a defining constant with a special escape sequence [\0] [pronounced: backslash zero] So the string Hello World which really contains 11 characters, requires a 12-byte array, or a 12 char array to store it. Because we have to add one byte at the end, the [\0] or the 'null terminating 0 byte.' Now of course, as with all arrays, if we move down to the next line, if we don't specify the length of the array, C++ will infer it for us. So the only thing that is different in this line, is I haven't specified the size of the array. In this it is calculated automatically, and the length in this case, would be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 bytes. The length of the string is really 15 bytes and we add 1 for the 'null terminating byte' [\0] at the end. Now of course, most people don't initialize their arrays like this. So in C++, instead of using an array of type 'char,' we can use the double quotes ["]. Right here we see string3. It has been initialized in this case to the array of chars "This is string3" And the double quotes ["] merely cause C++ to generate an array of characters and the null terminating byte [\0] is inserted automatically. So this declaration "This is string3" with a ["] at the beginning and the ["] at the end is identical to the declaration up here, "This is string3" well, this one is string2, but otherwise it is the same. In this case, we explicitly have the null terminating byte [\0]. In this case, when we put the string between double quotes ["], the null terminating byte [\0] is automatically inserted by C++, or the C compiler if you are using C. Okay. Now, in order to assign one string to another, or to print a string, or to compare two strings for equality, if we are using old style C-strings, we have to do the work explicitly. And I have prepared some functions here to do that. The first function is called 'printString.' It prints a string of any size. So you might notice here that in the formal parameter, I have specified that 's' is an array of char, but we don't specify the size of the array. As always, our program will infer the size of the array by looking for that [\0] null terminating 0 byte. 'CopyString' is a function which copies a source string to a destination string. If you like, it is an assignment of the source string to the destination, but we will do it by coping the string from one place to another. Again, it should work with strings of arbitrary size. So we don't specify the size of the arrays in the formal parameter. And similarly, 'compareString' will compare two strings and return true or false, depending on whether the strings are identical or not. If we proceed further down in the program then, we can use the function 'printString' to print the string, print string3 using printString. We will print out the contents of that string. The next section of the program here copies string3 into string4. Now if we are going to copy the bytes from string3 to string4, we must have reserved enough space in computer memory to hold those bytes in string4. And in this case, we must specify the size of the array. So I have declared that string4 is an array of 100 chars, probably 100 bytes, in the computer memory. How did we know to reserve 100 bytes? Well in C, that is a little bit difficult to do. What you have to do is determine the length of the longest possible string that might get copied into string4 and allocate enough space for that purpose. Perhaps you see now why the string type in C++ is so much more powerful than the old way of doing things. But nevertheless, in the old way, if you want to copy a string from one place to another, the destination area of memory you must reserve enough bytes to hold the longest possible string that might get copied in there. If for some reason, therefore, we know that the longest string that might get copied into string4 is 100 bytes, we will declare string4 to be an array of 100 bytes. If it turns out we copy in less bytes than that, it won't make any difference, because the actual number of bytes we stored in that string will be terminated with a null zero byte [\0]. So we will actually know how many bytes are in reality stored in that 100-byte array. Remember of course that in the 100-byte array, you also have to leave always one byte for the null terminating byte [\0]. You will always need to do that. So we copy string3 into string4 and we output: "This is a string4. It is a copy of string3." Just to prove it worked okay, we will output string4. And then finally, we will compare string3 with string4. And we'll output if bool, if result is true, we will output the string: "The strings are the same." Notice, by the way, the use of the [?:] [question-mark colon] operator here. If result has a value of true, the value of the expression is the string, 'The strings are the same." If result is false, we will output the string, "The strings are different." Now, next we will compare the string 1, 2, 3, 4, 5 with the string 1, 2, 3, 4, 5, 6. Of course we would expect those strings to be different, so we will expect to see the result output, "But these strings are different." Okay. Let's run the program and see if it works as expected. So we will Compile and Run. There is string2, there is string3. We printed string3 with using printString method. And it correctly says it is string3. String4 of course is a copy of string3. So that also says this is string3, because we copied string3 into string4. Of course string3 is equal to string4, so the comparer operator tells us that the strings are the same. And of course the string 1, 2, 3, 4, 5 was not the same as the string 1, 2, 3, 4, 5, 6. So we see that in that case, the strings are marked as being different. Okay. So far so good. Let's take a quick look at how the printString function works. It is right here. We notice again that it takes an array of characters as input, the characters to be printed. It should work with an array of any size, so we don't specify the size of the array. We leave that blank. And we run through a loop outputting each character in turn, right here, where we output s[i] [cout