Unix Shells, Environments

[Seminar - Unix Shells, Environments] [Douglas Kline - Harvard University] [This is CS50. - CS50.TV] Today's topic is the Unix shell. I'm Douglas Kline, expert, or at least reasonably competent user, of the shell. A shell is the interface for the user to the computer's operating system. The name is misleading as, unlike an animal's shell, which is hard and protective, the computer shell allows for communication. So porous membrane would probably be a better metaphor. The original shell for Unix is the Bourne shell. Bourne is spelled B-O-U-R-N-E. Bourne was one of the original authors of Unix, and so the shell is named after him. The name of that shell as a command is just simply sh. That's the command you can execute. The shell starts at login. When you log in to the computer, the shell just starts running for you, and that's what takes your commands. It can start at other times also. If you bring up a window with no other indication, it will start a shell for you. That's how it is that you can go to a window and start typing commands and so forth there even though you didn't log in to that window. In addition, if you do a remote login, then it will start a shell on the remote computer. And it's possible to run commands without an interactive shell. That can mean within your current operation, and it can also mean a remote operation. You could send a command to another computer, which includes starting up a shell there. In fact, it has to include starting up a shell there even if that isn't your final purpose. When something starts up like this, it doesn't necessarily start a new shell. If you bring up a new window, it's possible to tell it to bring up an editor or some other command. In that case, the editor will start from scratch. When the editor ends, the window ends. This is a little unusual but it can be done. In those cases, it won't be a shell. So it's not necessarily the case that a window or some such application will bring up a shell. Shell parses commands. Parsing means identifying the various elements and classifying them. Within a command, the complete string that you type, there will be 1 or more single commands to be executed. Other elements can be arguments. There can also be special characters which affect the execution of a command. They can send the output somewhere other than the screen if the command would ordinarily send it to the screen. It can redirect input; it can do other things also. There are various other symbols, characters, and so forth. Parsing involves detecting and interpreting those things. Now if there are no more questions, which is rather likely since there are no more people, we will go on to my next page here. I said earlier that the Bourne shell is the initial shell. There are others. One is the C-shell. The command is csh. The name C-shell is just a play on words. This shell was introduced with Berkeley Unix in the mid-1970s. Berkeley Unix was a seminal event in the development of Unix. It was a huge revolution and included the introduction of this shell. The reason for that play on words, C-shell, is that the C-shell has some characteristics in it which resemble the C language, which the Bourne shell does not have-- or it did not have at that time. There's also the TC-shell. This is a superset of the C-shell. It has additional features, many of which are useful for interactive use, such as recalling commands in the history mechanism, which I'll describe somewhat later-- in a simple manner, modeled after an editor. It also has bindings which allow you to bind a short key string to a longer command. We're not going to be getting into that today. It has some features that are useful for programming. However, the C-shell is not often used for shell programming. Shell programs, if you didn't already know, are programs that consist of shell characteristics. You could run these as programs. You write a bunch of shell commands into a file and execute the file. You don't need to compile it. This is an interpretive language. The phrase C-shell is now ambiguous since it might refer only to the original C-shell, csh, or to all C-shells, including tcsh. It's a little ambiguous. A later shell is the Korn shell, ksh, named after the programmer, Korn. This shell attempted to incorporate into 1 shell the advantages of the C-shell for interactive use and the Bourne shell for programming. It has been used as an interactive shell by some people--a minority. Later though, there was another introduction, the Bash shell, B-A-S-H, again a play on words, Bourne-again shell. It's an extension of the Bourne shell. Korn shell is also. Both of them are. It has the same objectives of the Korn shell of amalgamating the C-shell's and Bourne shell's advantages in 1 shell. Many of the enhancements of the Korn shell are also included in Bash. Bash, however, has more and is therefore preferable. The Bourne-again shell and the Korn shell are called Bourne-type shells because they include the Bourne shell's characteristics, which are incompatible in some respects with C-shells. There are other shells besides those, some intended for restricted use, maybe limited to some commands, maybe specialized purposes, not often used. Okay. Next item here. The Bash shell has become associated with various forms of Linux. I'm not sure if that's true of every form. There are many forms out there and I haven't used them all, but in those that I have used it has become associated with it. So far as I know, there is nothing about Bash which makes it any more compatible with Linux than any other combination of shell and operating system. I think this probably just reflects the inclinations of the programmers. That it has become associated with Linux is another reason to prefer Bash to ksh since things are likely to be written in it and it's likely to spread. I'll give you other reasons for that later on. Bourne shell scripts should run under the Korn shell or Bash. If you write something for the Bourne shell, you can probably execute it under ksh or bash. Korn shell scripts will probably run under Bash, but I can't guarantee that. Later on here, C-shell scripts should run under the TC-shell. The C-shell was actually never extensively used for scripting since the Bourne shell and later the Bourne-type shells were preferable for that purpose. So that really isn't all that important. There are quite a lot of Bourne shell scripts which were written long ago, before the Korn shell or the Bourne-again shell were introduced. Those are still in use, part of the operating systems, and so you will find them if you look into the operating system or some old programming packages. Bash is to some extent becoming a kind of lingua franca for operating systems. It's already been extended to Windows and to VMS. VMS, in case you don't know, is a proprietary operating system of Digital Equipment Corporation which is still in use, largely behind the scenes. And if it's going to be running on several different operating systems, likely the people tend to shift for it. But this development is relatively recent. It's just beginning, so I can't predict if this will turn out to really be that kind of lingua franca. Also, because file pathnames and libraries differ between these different operating systems, you might not be able to write a Bash script on one operating system and then run it on another one. You should be able to move it between different Unix, Linux Mac OS operating systems but not necessarily to Windows or VMS. You might have to change file pathname descriptions, and some libraries might be different, which may affect the way that some commands work or how they process arguments and the like. In addition to that, another caution here is that there is no guarantee that all the different shells I've mentioned--Bourne shell, C-shell, TC-shell, Korn shell, Bourne-again shell--will be available under any Unix or Linux or Mac OS computer. They simply might not be there. That's one of the cautions here. It's an unfortunate limitation here since you'd like things to work everywhere, but unfortunately, you can't rely on that. Okay. Next one here. Let's say that you want to write a shell script, a program consisting of shell commands. You write your commands, put them in a file, and execute the file. What if you want to include arguments? In the case of shell operations, arguments are called parameters or positional parameters and they'll be called by a dollar sign and numeral, $1, $2. So if the script has this name, my first argument might be argument 1 and my second might be argument 2, and inside my script if I want to refer to these things-- let's erase this since I'm not really going to run it-- inside my script I might have $1 to refer to arg1, $2, which will come out that way, arg2. So those symbols are available to refer to arguments, and those apply to all of the shells. In addition, there are other characters. $* refers to the entire argument list, all of them. $# refers to the number of arguments. Again, this applies to all the shells. Those symbols, * and #, can be used with those meanings in other places also. We won't be getting into that. Shell specifier line. What's that for? Let's say you've written a script and it's for a particular shell and you want to run it. How do you know what shell your operating system will use to run your script? At one point you could assume that it would run it in the Bourne shell if you didn't say otherwise, but people aren't writing scripts in the Bourne shell that much anymore and you can't even rely on that anymore. So here we have a shell specifier line right here. That specifies Bash. Note that it specifies it in the pathname, /bin/bash. If a computer has the Bash shell but not in the bin directory, /bin, this won't work. That's another qualifier, another caution here. The pound sign is the comment line character. That applies to all shells. The particular case here, #! at the beginning of a script, is a special case. That specifies the shell in which to run the script. As I was saying, it might not be the same place/bin. In addition, there's another thing here. If you just use the pound sign with no exclamation point and pathname, that should indicate a C-shell. However, I don't recommend doing that because I'm not able to guarantee that that will always work. If you want a C-shell, it would be better to say so. Then there's something rather confusing here. If you use a shell specifier line such as /bin/bash and that shell is not available there, there's no such thing as /bin/bash on that particular computer, either because it doesn't have Bash or because it's in a different location, you'll get an error telling you that the script you ran doesn't exist. And of course your script exists, so that error message is confusing. The reason that the operating system gives you that error or, more accurately, that your interactive shell in which you are running this gives that error, is that it reports the command you used, which is the name of the script. That command effectively called the shell by the name of the script. That's where you get that confusing error message. Another way to call shell script is by specifying the shell on the command line, as here. This is a command. This says run Bash and then run my script in Bash. That will take precedence over a specifier line, and this has the feature of allowing you to provide for varying pathnames. If you just give a command, the operating system will look for that command in various places. If it's available, it should find it. The computer will find Bash wherever it's located and run it, so you don't need then to be concerned about where it finds it. There are potentially other concerns here, as if there's more than 1 version of Bash, which is possible although unlikely. So that's another way to deal with these things. Specifier lines can call any shell. They can also call things other than shells. Examples I have here are sed, which is the stream editor; awk, which is a pattern processing language; and perl, a very highly developed scripting language. If you put a specifier line indicating one of those programs at the beginning, it will go directly into that program rather than starting a shell. Those programs have limits to their abilities. Perl is very capable. Sed is an editor. It can do things beyond simply editing. But it can be difficult to program that. In addition, passing arguments and stuff to script is either impossible or confusing. So in those cases, with awk or sed, it's, at least in my experience, preferable to write a shell script and call awk or sed from the shell script rather than calling awk or sed as the script specifier line. Perl is a highly diversified language, as I said. You can't run interactive commands in perl, which means that you can't test parts of scripts that you're developing by running them interactively. However, it's an extremely capable language and has developed into a very widely used tool. That's just a little bit of a parenthetical remark about the specifier lines. In all or most forms of Linux--again, I can't be certain that's all-- and in Mac OS, if you type csh you get tcsh, and if you type sh you get bash. They were trying there to give you the more advanced versions of these shells, but this can be confusing. If you write a script using tcsh or Bash features while calling csh or sh and then try to run it on a computer which doesn't have tcsh or Bash, you might get some errors if there are commands in there which those shells don't recognize. In addition, you may have called up your shell on your local computer calling it as sh or csh and then getting the more advanced shells. You may not even think of the fact that you're using the more advanced shell. So this is a potential pitfall. How is it established that if you type sh you get Bash, if you type csh you get tsch? There are things in these computers called links which can connect to file names to refer to the same thing. It can either be 2 names for the same file or a file whose purpose is to refer to another file. They're called hard and symbolic links. We won't be going into that anymore today. There may also be separate files--1 file sh, 1 file Bash-- but they both run Bash. Then there's another qualifier here. If you're calling one of these shells by one name, you might think you'd get the same functionality as calling it by another name. Well, that actually isn't necessarily true. These commands can examine the name by which they were called and they can, on the basis of that name, behave differently. There may be issues of trying to conform to a standard. Some of you may have heard of the POSIX standard or another, maybe other features. This can be selected sometimes by command line arguments or by setting shell variables. Calling it as sh or bash may actually lead to a different execution even if it's the same file that you're executing. Another thing to consider is that even if another computer has tcsh or Bash, if they aren't linked as they are on your local computer if you have a Linux or Mac OS local computer, then again you'll get the shell that you call sh or csh, not the one that you might prefer. The current Bourne shell has enhancements lesser than those in Bash but past those in the original Bourne shell. As a result of that, even the current Bourne shell, sh, even when it's not Bash, resembles the C language more than the C-shell does. That wasn't true when the C-shell was first created, but it has developed that way. You might notice here that all these shell names except for the Bourne shell have something to indicate which shell they are--csh, bash-- but the Bourne shell is just sh. Why? That was the original shell. It was THE shell then, not A shell, and since it was THE shell, there was no reason to distinguish it from another shell. So that's why it has that name and still does. This top here is a line from a password database for an account I have there on another computer. I'm going to try to get that name so you can see that part at the end, the shell. The password database holds the login characteristics for all the users. At the beginning is the username, which you can see the last 2 letters of mine now. The fields here are separated by colons. The last field, as you can see, is bin/tcsh, the shell. That's the shell specifier. There's something interesting here. When Unix was first developed, there was only 1 shell, so there was no choice there. So why did they allow a field in the password database to specify a shell? I don't know, but it's fortunate that they did. It's rather difficult to make changes in the password database format because many programs refer to its format and would have to be rewritten. It's a felicitous or fortuitous development that they included that field. That kind of a password file line is used on all Unix and Linux computers so far as I know. The Mac has its own system. It actually has a password file with the lines in that format, but that isn't where the user characteristics are defined. Another parenthetical remark there. If you're calling a shell, you can call it as a sub-shell of your existing shells. So if I go here, let's get rid of these things. Here I am in the C-shell. That variable, which accurately identifies my shell, actually isn't always a reliable way of determining what shell you're running, but in this case it is. What if I just type-- Now I'm in Bash. Some things are going to be the same. ls tells me my commands. If I do a suspend back to my C-shell, ls, same. Right? fg, foreground, back to my Bash shell. pwd, current directory, back to the C-shell. pwd, different directory--actually not a different directory in this case. It's the same directory. Let's say I want to call a command here: where ls. What does that do? It tells me where the ls command, the one that gives me a directory listing, is located in ls. Let's go back to Bash shell. Let's try the same thing. Hmm, interesting there, where: command not found. Why is that? The where command is built in to the C-shell. This isn't a command that has to be read in to memory from somewhere else and executed. The C-shell runs it by transferring execution to part of its own code and it's not in the Bash shell. So Bash, not having such a built-in command, looks for it, doesn't find it, and we get an error. So there we have a Bash shell running under a C-shell, and we call that a sub-shell. And just in case you're curious, Bash shell has its own way of locating commands. hashed refers to the fact that it can be executed more rapidly, being found more rapidly. That's one of the enhancements built in to some of these shells. Bourne-type shells are preferred for programming. They have control structures like loops, conditional statements, the sort of commands that you might use in programming languages like C or whatever language. Maybe you're programming in Java or whatever. Shells have those too. The Bourne-type shells, particularly Bash, have more and they are designed with greater flexibility. The Bash shell has arrays. The original Bourne shell doesn't. So that can be considerably advantageous for programming. The C-shell actually does have arrays but doesn't have a lot of these other features. The Bourne-type shells will execute faster if they don't have the features intended for interactive use. You load things down for one purpose; this loads them down for another purpose. There's that trade-off there. Those features which are intended for interactive use really are of little or no use for scripting. It's possible to use an interactive sub-shell just like the one I started there to test out commands which you intend to use in a script. That's what you can't do with perl. You can do it with the shells. Even the structures like for loops and so forth can be run interactively. They are occasionally useful to run interactively, but more likely you're using them to develop a script. Aliases. This is going to be about the C-shell. History mechanism where you get back to earlier commands or parts of them that you've already run. Again, about the C-shell, the Bourne shell and the Korn shell have these things, but I'm not going to get into them. So here are some useful aliases that I have. Instead of typing ls--it's a common command-- just type l and save yourself 1 character. ls with various options, all those work. Note that those definitions have quotes around them. In these cases, the quotes aren't necessary. If you can define those aliases without the quotes, it would still work. They are recommended. There are situations in which you can't use the quote because you want something to happen which the quote would prevent. Sometimes you can quote part of the definition but not all of it. It's also generally recommended to use single quotes rather than double quotes. Double quotes have effects on variable definitions, particularly causing them to be evaluated rather than stopping it. Why would we want to stop the evaluation? And how do quotes do that for us? Here is a command which you might find interesting. 'ls g*' g*, as you probably know, is a wildcard expression for all the file names beginning with g. If I just write in a command ls g*, I'll get a list of all those names in my current directory. If I define that alias as it is here with the quotes, it will run that command in your current directory where you're running it. But if you run the alias definition without the quotes, it will evaluate the wildcard g* when it runs this defining command. So the definition of the alias will be ls followed by the list of files in the directory in which the alias command is executed, regardless of where you actually intend to run the command. This isn't of much use, and the single quotes prevent the evaluation of the asterisk. So you just get the definition being ls g*. Then when you run the alias, lgs, it then puts that out. Now there are no quotes, and it will evaluate the asterisk when you run the alias command. So that's one thing. Double quotes would have that same effect here, but there are other cases in which double quotes wouldn't work so well. Here is another one. You might know the grep command. The grep command can be used to scan a file for lines which have certain strings. So let's go over here and I'll exit from my Bourne shell. Okay. Here's a file. Let's say it's grep abc strings. There it is. If I do grep zddd, I get nothing. Okay. So it finds a string, it reports; it doesn't find, it doesn't report it. It outputs any line which has that string on it. There are all sorts of options here which you can find in the documentation. Here's one way to do it. What about this one, alias grabc 'grep abc'? That's going to include 1 argument when the alias is defined. So if I do that here, now if I do grabc, now the alias includes more than the simple command. It also has the argument. So far that works. I have another command here, this one, so those are different strings in there and show that this doesn't find anything there since it doesn't match. What if I want to include in the alias definition the file that I'm going to search and I want to give as an argument to the alias the string that I'm looking for? I might want to say abc as the argument to my alias, but the alias already determined the file. And that's where this expression comes in. Notice here we have grep just like before. We have the file here, strings. \!^, kind of an odd expression, I suppose, if you haven't seen this before. Exclamation point is part of the C-shell history mechanism. It can recall earlier commands, it can recall arguments to those commands and so forth. The history mechanism is used as part of aliasing. If you specify a line after the exclamation point, it will refer to that line in the history list, which we won't be getting into now since it's a whole other topic. It is possible to specify part of a line. So !3:2 would be the second argument of command number 3. The caret here in this expression stands for the first argument. If you don't give it an indication of which command you're referring to, it refers to the immediately previous command, and the caret is a symbol for the first argument. Because it's the caret and not the number, you don't need to use the colon, so !^ means the first argument to the previous command. A little mixed up here. In this case, when you use this as an alias definition, the history reference refers back to the commands in which the alias is used. So this is going back 1 command as a history operation, but as an alias operation it refers to the command in which you would type, say, grstrings_file. We have the quotes here in it. What's the backslash for? In this case, as elsewhere, we don't want to execute the history mechanism while defining the alias. If we didn't have the backslash there, the shell would pull in the first argument of the command right before it ran this alias command, which we don't want. We want this to be built in to the alias command to call in an argument later. Single quotes don't escape an exclamation point, the history reference. Maybe you know the expression escape means to change the meaning of something. In this case, it means to stop something from having a special meaning. Exclamation point's special meaning is history. Escape and it doesn't have that meaning. Quotes don't do that; backslash does. So we're actually using 2 levels of escaping here. I'm going to move this command into the other window without typing it by using these editing operations, which you may find useful. Something else here I'll show you. If you just type alias with no arguments, it tells you all your arguments. This is a bunch of aliases I already had here besides those that I have been using here today. But if I just type with the name of an alias, it tells me what it means. Notice that the quotes are gone and the backslash is gone. This string here is the result of that alias definition, and now it has just !^ in it. This is going to look in the file strings for anything. So if I do grstrings_file strings, I didn't give it anything to look for there, but it's looking in strings. It didn't find the word strings in the file strings, but it does find abc. And it doesn't find that. So here we are giving an argument that hits into the definition of the alias, that is inserted into it. It's where this expression comes from. You can use more than 1. The caret is a symbol for the first argument. If you wanted to use a second argument, you would then say :2. There's no special symbol for the second argument. And because you're using a numeral, you would have to use the colon. There is, however, another choice here. The dollar sign stands for the last argument. And because this is a symbol, you can omit the colon. So it would be the last argument in the list. And there's also that one. Asterisk means all of, so this is the complete argument list, and again, you can omit the colon because it's not a numeral. I hope you're all observing all this. The history mechanism can go back to earlier lines in the history list. You could do this in an alias definition. I've never seen this done. It would have the effect of pulling out earlier commands from the history list when you execute the alias, which could be different commands depending on when and where you execute it. Conceivably you might want to pull out such a reference just to know what an earlier command was. I've never seen this happen. I suppose somebody might want to, but this is very unlikely. There is another thing here. If you use that history-type reference, then only the arguments to which there is such a reference are used. If you have an alias definition which doesn't use a history-type reference, if it just becomes the beginning of the command and you have further arguments, then anything you type after that will be added to the command. In this case, the example I just gave there, we used the first argument; we didn't use any others. If other arguments had been given on the command line, they would not be used. So if you use the history reference at all, then you must use it to get any argument. There's another thing here I just want to mention, partly parenthetically, namely that this history mechanism with the exclamation point goes back to the original C-shell. The tcsh introduced history operations which use the sorts of commands and strings from the editors, either Emacs or vi. My personal opinion is Emacs is much easier to use for this purpose even if you use vi for your regular editing. There are various Emacs commands which are now adapted for history. Control P gets the previous line in the history list. Another Control P will get you the one before that. The up arrow does the same thing. Control N gets the next command if you've already scrolled back some ways. Down arrow does that too. You can move left to right with the arrows and various other things. This can make use of the history mechanism much easier than using the exclamation point syntax, but you wouldn't use that in an alias definition. We'll go over that some other time. Variables. You know what variables are in programming languages. The shells have them also. The C-shell uses the command set to assign variables, so that sets the variable a to the value of b-- as I said, a useless definition but an illustration of how this is used. The set command will create a variable if it doesn't already exist. The positional parameters for shell scripts can be considered variables, but the use of them and the rules for them are somewhat different. You can't assign a value to $1 in the course of a script. You would have to define a new variable for that purpose if some of you wanted to. Type set with no arguments and you get a list of all the currently defined variables. And let's get over to my other shell here and see what we get if we do that. Quite a long list there, right? Scroll up a little bit. Look at all that. Some of these things are defined automatically by the shell. The shell creates the variable and gives it a value. Some of them are defined by the shell but then redefined by the user according to his preferences. And some of them are created by the user depending on what he's doing that day. That's just set with no arguments. There's an odd feature here of this thing. There have to be either no spaces between the equals sign and the variable name and the value or spaces on both sides of the equals sign, as in this one. This won't work, and this actually is a valid command but it won't do what you intend. That command will work because if you just say set and a variable name with no equals sign or set and a variable name with an equals sign and no value, it will set the variable to a null value. So set a= is a valid command. The set command can define more than 1 variable on the same line. So this command here has the effect of defining both a and b to null values. Probably not what you want. This one here, mentioned earlier, will lead to an error because =b is not a valid expression. A variable name can't begin with the equals sign. And there are these further things here. The colons were used to select arguments from history lines, and they can be used--and I didn't go into before--to modify those things. They can also be used to modify shell variables. This one here, $a, has a value. :r will take off an extension. An extension will be anything following a dot, a dot and anything following it at the end of a file, only at the end of the list after the last slash. So I have it here. a is that. It will drop the .o. If there's no extension, only the pathnames after the last slash, it will have no effect. a:h, that variable expression, will take off the last element of a directory list, again, only after the last slash. So /a/b/c becomes /a/b, but this one is changed because the element after the list is null. Here there is something which also I want to emphasize. These qualifiers don't search for the existence of these files. They just look for strings. These are intended to manipulate file names, pathnames, but they can be used on any string even if it's not a file name. And they don't look for the existence, so if there's no such file, /a/b/c, this will still work. Whether it's of any use is another question, but it will still work. Variables are different in the Bourne shells. We'll get to that later. Dollar sign can be escaped just like the exclamation point and the asterisk. Dollar sign can be escaped with a backslash or the single quotes. Double quotes have the odd effect in all shells of forcing the evaluation of a dollar sign variable expression. So if it's being escaped one way, the double quotes can have the effect of causing it to be evaluated anyway. This is a little confusing. If there are multiple levels of escaping, such as single quotes inside double quotes or double quotes inside single quotes, you should test to see what will happen to a variable if you're using one. Those 2 situations--double inside of single, single inside of double-- don't necessarily give you the same result. Environment variables, bound C-shell variables. Environment variables are also variables in the C-shell, and they are also variables in other shells too. In the C-shell, they are distinct sets. The things I was saying before are about shell variables. Environment variables are a distinct set of variables with the exception of several variables which we call bound variables, which are very important and we'll get into those later. Environment variables are automatically passed on to shells or commands that are run from your shell. The other things aren't. The shell variables, the aliases aren't. Environment variables are. That's why we call them environment variables, the idea being that the environment extends past just your current shell. They can be used to define things for commands. Here is an example. PRINTER, LPDEST. Both of those variables can define a printer that a command will use to print things. If you have multiple printers around, you might want to put the one you like. The reason we have 2 variables is that different sets of commands were written using these different variables. You might give them different values. Most likely you'll give them both the same value. Those things work because the commands that do printing were programmed to examine the values of these variables. If a program were not written that way, if it were written to do something else, the variable would be irrelevant. So the operating system isn't looking for these variables every time you refer to a printer. A command that does printing is looking for these variables if it is programmed that way. These variables are often defined in your initialization files but not necessarily. You can define them on the command line. They may be defined in a command. A command that runs something might have its own selection of variables-- variables that are unique to a particular software package, for example. They will be defined when you run that package. How are these variables passed to a sub-shell? When a sub-shell is written, it doesn't write into that area. The area of the sub-shell that is devoted to environment variables isn't written by the sub-shell; it's written by copying. When you run an ordinary command, such as these commands to print or whatever, they start off by creating a new shell. The shell creates a shell and then overwrites part of it with the command that you're running, which is a little confusing, but that's how these commands get the environment variables that they then refer to later on. The command here for defining the variable setenv. That's how you define it. It's 3 elements: setenv, variable, value. If you just do setenv with no arguments, what do you get? A list of all of those variables. Again, it's a nice long list and in this case, as in the others, these variables are defined largely by my login operation by the shell itself rather than by anything I did. There's another command here, printenv. That also prints out the environment. Notice this last thing here, EDITOR=vi. That says that if I'm using something that calls an editor and I don't specify an editor and it allows me the choice, it may give me vi. What if I do printenv EDITOR? It tells me what it is. Right before that, there was a variable, LESS. These are your defaults options when I run the LESS command, which displays files. So if I do that, printenv can take 1 argument or 0 arguments, not more than 1. There are other commands also, but we're not going to get into all that today. Remember there were the modifiers for the shell variables like :h, which will drop the last element of a pathname, or :r, which will drop an extension. Those now apply to the environment variables too. They didn't used to. It used to be they couldn't be modified. Now they can be. It's one of the advances with the developments of the shells over the years. I was saying that the shells as part of the environments and shell variables in the C-shell are, with some exceptions, distinct sets. You can establish an environment variable and a shell variable with the same name. They will be different variables; they can have different values. Changing the value of one won't change the value of the other. These variables are all evaluated with the dollar sign--$a, $whatever. So what if you have this? Do you know which one you get? In my tests I got the shell variable, but this isn't documented and you can't rely on that. So I ask you, is creating shell and environment variables with the same names a good idea? No. Okay. What are those major exceptions in which the environment and shell variables are linked to each other? There are these 4. Capital letter TERM environment variable, shell variable term in small letters, type of terminal emulation. I'm just going to go over here and I'm going to do echo, a useful command here, $TERM $term. And there. xterm is a terminal type for windows displayed in the x Window System. xterm-color is a variation of that that allows different colors. Why do we define these? What is this good for? Commands that rearrange the screen like the editor send particular sequences, called escape sequences, to a terminal or a window to rearrange it and so forth. Those sequences are different for different types of terminals. This tells it which ones to use. Sometimes there are issues there. You might want to change that. If things aren't working, sometimes the terminal type is set wrong, you may be able to fix it by redefining the term variable. In these cases, changing one variable, the environment variable or the shell variable, should change the other one. I've discovered through experience that changing TERM in capital letters doesn't always change shell variable term in small letters. This is a bug. I don't know if that's always true. Most of the time it isn't true, but it can be. So if you make a change, just check that out. It's not often that you need to change that value, but once in a while you do. Environment variable USER. Again, environment variable in capital letters, shell variable in small letters. This is your username. It's only under very exceptional circumstances that you would want to change that. If your username is someone else, it can throw all sorts of things off. Home directory, user's home directory. Again, you wouldn't want to change that. Notice in all of these cases and the one that we're about to cover, the path variable, environment variable is in capital letters and the bound shell variable is in small letters. If you change one, you should change the other. This kind of binding cannot be established as you can't bind 2 variables, other than these 4, and the binding in these variables can't be undone, you can't separate them. So these 4 pairs of variables are bound. They always will be. None others will be. In addition, it would be possible to create variables with the same names of the opposite types. You could make a shell variable term in small letters or an environment variable TERM in capital letters. Those variables would be independent of these paired variables and they would be independent of each other. I can't imagine why you would do that unless you want to confuse people. This one here, path variable, this is a really important one. Another thing here is that there can be cases of variables with similar paired names which aren't bound to each other. There can be variables, SHELL and shell, in capital and small letters. Based on that name, you don't know if that variable is a shell variable or an environment variable, and they're not bound to each other. So that kind of paired names doesn't imply bound variables. The path variable, which I was showing before, is a list of pathnames in which the shell looks for commands. Let's get over to this window here and we'll do echo $PATH, capital letters-- environment variable--echo $path, small letters--shell variable. Notice that the list of directories is the same. These are bound. Change one, you change the other. In the environment variable the elements are separated by colons. Notice that. The shell variables are separated by spaces. This environment variable is a single string. The shell variable is an array. The Bourne shell didn't have arrays. Bash does, but this is already a fixed part of the shell. This is a single string and not an array. The C-shell always had arrays. The arrays are much easier to work with. You can refer to parts of it. So echo $path[1] and I get /usr/bin, the first element. Again, remember dollar sign stands for the last element of the history list. What happens there? It tried to find dollar sign as a variable symbol. I escape it. Oops. It wouldn't take that either. Some of these things don't work so well. Maybe we'll just leave that out. Asterisk refers to the whole thing, but that's what you get if you don't specify an element. Another way that array variables can be manipulated, number of elements there, 7 elements. Here we put the pound sign before the variable name. Here's another one. Put a question mark there. That is a logical value. That indicates that the variable exists. It's another way of working with variables. That, by the way, doesn't have to be an array variable. That could be any variable. And if I do, there's no such variable and I get a 0. Another little thing there about variable evaluations. Back to this one here, if for some reason you wanted to work with this rather than working with the array, the shell variable, there are commands that can separate these things based on the colon. In fact, if you're going to be doing this in the Bash shell possibly, some kind of a script, that would be probably how you would do it. But in the C-shell it's much easier to use the array. In the Bourne shell, variables are assigned by a single expression like this, like the way you might assign a variable in a programming language, and here there must be no spaces. It's necessary that it be just 1 string. In the Bourne-type shells, all variables are shell variables. Environment variables are a subset of the shell variables. They are distinguished from the non-environment variables by exporting. The command to do that is export, like export PRINTER. If we were to define such a variable, if we wanted a printing command to find it, it would have to be an environment variable, and that's how we make it one. Here there's something kind of confusing. This expression, export to the environment, derives from this Bourne shell concept, and yet that expression is used in descriptions of the C-shell, where there is no such command as export. If you just say export by itself, you get a list of exported-- So if I just do export here, no such thing. Okay, there we go. These things, by the way, are also defined by the shell. I didn't define any of these by myself. The shell does all sorts of things by itself. It should do things automatically. In Bash or Korn shell, you can run a command like this, which will both give a variable a value and export it in 1 command. In the Bourne shell they have to be separate commands like export a. Here is another aspect that's confusing. The set command in the C-shell defines variables and with no arguments tells you what the variables' values are. In the Bash shell, the set command with no arguments does the same thing, but with arguments it does something quite different. So these are the various arguments here. Some of these are environment variables, some of them are shell variables. All of them are shell variables really. Some of those are environment variables. The set command with arguments can be used to operate on the positional parameters to a script, which is a way of getting them all at once. We can't really go into that today. It can also be used to change shell behavior. Particularly in Bash there are variables which will determine how the shell behaves. Then also just this one command that you might see, this command. Typeset followed by variables and variable types is used in the Korn and Bash shells. It's not mandatory but it can be used to restrict the values of variables, which can be useful to prevent errors, and it's fairly common. So I'm just mentioning that in case you see it somewhere. The where command. Remember I mentioned earlier the where command in the C-shell, which can tell you the location of a command pathname. Here is command substitution. You should find on your keyboard somewhere a character that looks like this. The location on the keyboard is going to vary. We've called it backquote. It's about the size of a quote. It goes from upper left to lower right. Here on my Mac keyboard it's in the upper left-hand corner. That character can be used to execute a command within a command. If you have an expression inside backquotes, that expression is a command, it's run. The output of that command is then substituted for the whole backquote expression inside a longer command which then runs with that output as part of its string of arguments and so forth. Here is a command which uses that. Let's demonstrate the operation here. Let's go up here, take out the backquotes. Control A gets me to the beginning of the line with the Emacs editing syntax. So far the pathnames is what where does, but when I do it like this, it then plugs in that list of pathnames in place of this whole backquote expression and runs ls -l on them. Kind of convenient, huh? So that's one neat thing. That's how backquotes work. Now let's go down a little further. These are aliases. I actually use these. I'll try to get this in with 1 editing operation. Okay. Now let's see how those definitions came out. alias lwh telling me how it's defined. Notice it's just this, but the outer quotes have been taken off and the exclamation point is taken off. !*, complete list of all the arguments. In an alias definition it will apply back to where I use this. lwh ksh bash. Okay. See how that works? It saves me some typing. Let's go up a little bit just to mention something else here. Notice here these different shells. I should have mentioned this before. The csh has a 2 over here and so does /bin/tcsh. We could establish by other means that those are actually the same file. Remember I was saying if you type sh you get bash. Type this and you get this. But those aren't linked. Those have single ones there. And this isn't the kind of file which can call another one. So those are separate files; the C-shell ones are the same file. Back down here, the other one here, this alias, note that's running this command, file. That alias runs that. File tells you the type of a file. So fwh ksh bash. Okay. That's the output of the file command. I don't know if you know what this means here, Mach-O universal binary with 2 architectures. There are 2 possible processor types in Mac, and some programs were written to be able to run with both, and the file command can determine that, so that's what this means. Both of these files were written that way. So we see how the alias works, we see how the backquote works, we see how the actual file ls or file works. This might not work. Try "where where" and "lwh where". Okay, let's try that. where where. where is a shell built-in. Remember earlier we showed that Bash didn't have where. If you type where in the Bash shell, you get an error message. It's just part of the shell rather than being a separate command. What happens if I type lwh looking for where? See what happens there. Ran where where, got this output, and then tried to run ls as l on where is a shell built-in. where is there, but the other ones don't exist. None of these exist, actually. So that doesn't always work, and it also illustrates how some things don't do quite what you might have thought. Let's go down a little further here. This here is in Bash. That is also command substitution like the backquote. But unlike backquote, it uses this variable style. There are a number of expressions which begin with a dollar sign, and while these aren't variables, they borrowed the use of the dollar sign to indicate an expression of some kind. That can be surrounded by parentheses or brackets or double parentheses, which has a different purpose. Single parentheses here are a command substitution just like the backquotes. Double parentheses is actually an arithmetic operation. There are other syntaxes, other operations. Backquote syntax is available in Bash. However, this one is preferable. It's much easier to read and it allows nesting. You can have inside $(command) another command, something like-- I get a list there. That would work if I had the backquote also. What if I want to do something like-- You probably wouldn't actually use this command, but this internal command substitution echoes the names of all files beginning with a, then this one runs ls -l on those files, and then this one just echoes the output. You probably wouldn't do this; you'd just do the echo or ls, but this illustrates how the nesting of commands works. So just another feature here. I mentioned this earlier, that when you have where in the C-shell, type works in the Bourne-type shells for locating commands. Built-in commands, just what I was saying there. Commands are part of the shell, like where. When the shell executes a command like ls, it locates it through the path, finds it in some directory somewhere, reads that into memory, creates a new shell, reads the command ls or whatever into the shell where the environment variables are already located, and then it transfers execution to it. Built-in command, the code for that command is inside the shell, so the shell just starts executing part of its own code. where is such a command. It actually gets faster. It doesn't have to read anything in memory; it's already in memory. Built-in commands always take precedence over commands with the same name. Commands that are in directories in the path may have the same name, commands in different directories, files in different directories. The one that occurs earlier in the path is the one you'll get. If there is a built-in command, you always get it. There's no way to give it a lower precedence than a command in the path. If you want to get that path command, you can type the full pathname. If there were a command where in the path somewhere, you could type /bin/where and you'd get it. If you don't want to type the whole pathname, you could define an alias. In fact, if you gave the alias the same name as the built-in command, it would work because the alias definition is evaluated before the shell determines that it's a built-in command which should be executed. Then this gets a little more complicated with some commands here. The case of some commands are actually built-in commands and in the path. One of them is echo, the command I just used a little while ago in those examples. Echo is a command in the path and it's in every shell. They don't necessarily all behave the same way. It was originally a command only in the path. It was built in to the shells later. Because there are options which depend on the environment and the command line options, the built-in commands were written to function the same as the command that had been in the path, it's unlikely they would have been written that way if the command hadn't already been written for the path. So this has side effects. Its history has effects here. There are options there. There's also an option defined by a variable in the tcsh called echo_style. That's one of these variables that can change the way that echo works. There are other cases in which you can assign a variable that changes the way that the shell operation, including a built-in command, works. It wouldn't affect anything else since other commands don't have access to the shell variables, only the environment variables. But shell operations can read the shell variables. That won't work for csh. That's only tcsh. That's one of the enhancements. Parsing has sequences when it evaluates metacharacters, when it evaluates variables, aliases, history references. There's a particular sequence for these things. If it does things in a particular sequence and gets to something that's an expression of a sort which has already been evaluated, it won't evaluate it again. If it gets it, then it will just pass on the characters. So if evaluation of some expressions like command substitution or variable or whatever gives rise to an expression which you would want to be evaluated, that will work only if that evaluation occurs later in the sequence. I hope I'm being clear there. That parsing sequence, an operation in the C-shell, isn't the same for built-in commands as it is for non-built-in commands. I'm not sure about Bash there. For example, if a shell variable produced a history reference, it probably would not go back in the history. It would just get the exclamation point. In fact, we can just try that out right now. set a= and we'll have to put this in there. Oh, wait. Sorry. I did this in the Bash. I wanted to do it here. See, so it didn't evaluate that history reference because it was already past the point of evaluating history expressions when it evaluated the variable. So that's 1 effect of parsing. And again, built-in commands aren't done the same way. All right. Let's go to the next one here. This is intended to be 1 line, but it's making it easier to read. What does that do? You may recall that we can evaluate asterisks as filename wildcards, and there are other filename wildcards like the question mark and bracket expressions. That kind of evaluation is called globbing. set noglob at the beginning of this command says don't do that. unset noglob says go back to doing that. Note that set glob would not have that effect. In ordinary language, set glob or unset noglob would seem to be equivalent, but here it isn't. It's unset noglob. Now tset. tset stood for terminal set. It's not used that often now, but before windowing systems became available and you had a single terminal, you might have to determine the type. And if something was coming over an Ethernet or from the network, you might want to say it's a vt100. VT100 is kind of a standard in the terminal business. It comes from the DEC terminal. If you just do dialup--notice that? This goes back a ways, huh? So if we just do tset over here, if I just do tset, it's resetting my terminal, but you didn't see anything. It didn't really change anything. -s Okay. setenv TERM xterm-color. We already know that the term was set that way, so that didn't change. That's the way we'd want to do it. But notice that this command, tset -s, just output these commands. It didn't run them. It didn't run these commands; it output them. So this is intended to produce commands which will then be run. You remember the command in that file I just showed you had a Q in it. So let's do that. The Q suppresses some output, but that doesn't matter here, as you can see. I'm just doing that to show you that it didn't matter. This is in backquote syntax. Note the backquote here, backquote here. I'm omitting these things here. These are cases of telling it what to do in the case of particular types of terminals-- Ethernet, network, dialup, what have you. It doesn't matter here because we're not actually doing any of these things. I'm just illustrating the command. If I do this with the backquote, what am I going to get? Also notice here that this included the set noglob and the unset noglob, so those are now redundant in the definition. That wasn't always true, but now they're included in this command. But let's see what happens if I do that and go to the beginning of the line with Control A and I do that. Okay, set: Command not found. That's kind of odd, isn't it? set is a well-known command. It's part of the shell. set: Command not found? Why is that? Hmm. Well, let's think about this. It's running a backquote command substitution, and that occurs at a certain part of the sequence of parsing the command. set is a built-in command. So by the time it does that command substitution, it's already gotten past the point of identifying built-in commands. So it treats set as if it were a command in the path. Needless to say, it doesn't find it and you get an error. Well. There's an example of parsing sequence. And what do we do about that? Notice this very interesting command here, eval. I wonder what that does. If you look at the manual--and let's just do that to show how confusing these manuals are-- man tcsh, confused manual, finding things here is not easy either. Here we go, eval arg, so we can have 1 or more arguments and there's a list of things there. Treats the arguments as inputs to the shell and executes the resulting commands in the context of the current shell. This is usually used to execute commands generated as the result of command or variable substitution because parsing occurs before these substitutions. Very good. And here they even refer to the tset command for a sample use like the one I just showed you. Now I have to get the window back to a useful place. Let's get over here and we'll see that eval is used just before that. So let's see what happens if we put--here we go up with the arrows to that command and Control A to the beginning, eval. Okay, so it works. When you do eval, it takes what comes after it and makes it a command. This enables you to essentially parse it twice. The section here runs this command inside the backquotes, gets the output. Output is supposed to be run as those commands here like these at this one and this one. So those commands are now here in this sequence, but these are built-in commands and it can't get them right away. So we go to eval, eval picks that up, starts the whole thing all over again, and it works. An example both of backquoting, eval, parsing, consequences of parsing, and a command which is probably of very little use to you nowadays. Okay. All right, umask. Let's look at this command here, umask 022. I wonder what that does. Let's just type umask with nothing after it. 22. Okay. 022 and do it again. As you might have guessed, umask with no arguments tells you the current mask; umask with arguments makes it that, but that was the one I already had. What does 022 mean? These are here the protections for a file. They determine who is allowed to read or write or execute the file. Protections are also called permissions. The r stands for read, the w for write, and the x, which isn't present there, stands for execute. There are 3 categories there. The last 3 elements are in the category of user. Those apply to me, the user. These 3 here apply to the group. The file belongs to 1 group, user may belong to several groups, but if the user is in the group to which this file belongs, then these protections will apply to him if he's not the user. And this one is everyone else. These categories are mutually exclusive. The user protections apply to him, the group protections apply to members of the group other than the user, and the other protections only apply to people other than the user and the group members. If there's an r or a w or an x, it means that protection is granted. If there's a hyphen, it means it isn't. There actually are other things that can be put in here besides these, which I won't get into now. The umask defines a default for files that you create. And as a mask, basically it says the bits that you don't set. How has this become bits? If you think of each of these as an octal number, this is the 1s bit, this is the 2s, this is the 4s. So 0 through 7 will describe what combination of r's, w's, and x's you have for these 3 and then a similar number for these and then for these. So 022 means 0 for other, 2 for the group, 2 for the user. But this is a mask. The mask is what you don't have. I'm sorry. I just gave you things in the wrong order. It's the first 3. These 3 are the user, these 3 are the group, these 3 are the other. Sorry I gave you these in the wrong order. The 0, which is the first of those, doesn't display the value, but if a number isn't there, it's a 0. That means all 3 of these would be allowed. Notice that in this particular one the x isn't allowed. The reason is that the shell is capable of determining whether a file should be executed or not. Since this is not an executable file, it didn't set the x. The 2 means that write permission, the second category here, the one in the middle, is denied. So again, these are the things that it denied. Well, x is allowed but it's not here because it's not executable and similarly for the others. So that's a common umask. Another common one is 700--give yourself everything and no one else anything. And there are other possibilities. I'll go back to that. Using the history I can search back for that, lwh to there. Okay. So here, these are the shells. Bash, the owner who is system account, can do everything. Group and everyone else can do read or execute but not write. This one doesn't even allow the owner to write to it. If the owner wanted to write to it, the system account, he would have to change the protection first. But again, the umask sets the default by masking it, by indicating the bits that will not be set. This is typically in one of your initialization files, which is the .cshrc for the C-shell or the .profile for the Bourne-type shells. It can be elsewhere also if there are other initialization files on the system. Anyway, that's umask. There's something kind of odd here, and that is, why is there a single command for this? If I were writing this, I would make it a variable, umask = some value. Why is there a whole command just for this purpose? The reason is this just goes back to the origins of Unix. Unix was just some programming project at Bell Labs in the early 1970s. People just got together to program. They never intended it to become a worldwide operating system. Different people wrote different parts without thinking very much of how they were going to be used--rather sketchy. And it came together like that, and it's still like that in some respects. So that reflects the history, and there are still these inconsistencies and odd elements of it. Okay. Next one here. As I wrote earlier, the C-shell isn't really used very much for programming, although it can be. It executes more slowly, again the trade-off between interactive use, which has more processing involved than speed, which can do without the processing. The extra features added to the Bourne shell by the Korn and Bourne-again shells don't seem to slow them down, and I don't know why that is. It might just be better programming, but I'm not in a position to know. Speed here actually isn't such a big deal, although it is mentioned. The reason is that shell scripts actually get fairly fast. If there is a lot of commands like in a calculational program, you probably wouldn't do it in a shell script. The operations there are fairly simple and straightforward. The ones that I've experienced that are too slow involve repeated applications of slow commands. Earlier I mentioned the stream editor sed. That command is slow. If you execute sed many times, you'll get a slow script, but it's not the shell that's slow. Running it in the Bourne shell won't be much faster than running it in the C-shell, although there's maybe some advantages there. The additional programming capabilities, on the other hand, are significant reasons why you would use the Bourne-type shells. C-shell has odd features to it-- the fact that you don't know if a variable is a shell variable or an environment variable. It can be very confusing. It's not so easy to write just based on your experience of programming in other languages. I think you may find the Bourne-type shells more consistent with your experience. Some scripts, though, can be thousands of lines in length. Those that I've seen are used for patching operating systems. Those can execute very slowly, but you don't run those very often. It's only when you're doing patching, and it's only the system manager who does those things, so it's not really much of an issue. Those that are hundreds of lines long actually execute fairly quickly. Mentioning this here, what are those enhancements? I've already mentioned a few of them--arrays, calculations, the $( ) expression for calculations in the Bash shell, the other kind of command substitution. There are different kinds of testing commands by which you can do conditional tests on the existence of a file or other things. Last here, this command here. What does this do, and why would anybody use it? printenv variablename. We know what printenv does. It tells us the value of a variable. And printenv variablename won't tell us very much because there's no such variable. Blank. But let's give it something meaningful. That's not there either. Okay. I guess I never defined that. Let's just check my environment. This is another command by which you can inspect your environment. There is good old EDITOR, the one we saw before. What does that do? Here we have a backquote expression. Remember this is the C-shell. So printenv EDITOR will give us a value of EDITOR. It's vi. And then it will set that value to variable a, the set command. So now if I do echo $a, I get vi. That doesn't seem terribly useful. However, it actually does have a purpose. Since we don't know whether a variable is a shell variable or an environment variable by using the dollar sign evaluation syntax, we can use printenv to make sure that it's an environment variable. So if there were a shell variable editor, this wouldn't have gotten it. This works only with the environment variable. If there were a shell variable and I wanted its value, I'd have to find some other way to do it. One way to do that would be by doing set and piping. This is one of the metacharacters, special characters. It sends the output of set to something else. Let's see what we might find there. Nothing. Okay. Let's just see what's in there all together. It was echo_style, the one I mentioned before. Okay, let's do that. Remember I mentioned before, echo_style determines the way the echo command will run. bsd stands for Berkeley Standard Distribution. This is the Berkeley Unix from the 1970s. That's one of the ways that echo can run. Setting echo_style to that value in the TC-shell will cause echo to behave that way. So set does that, but set only gets shell variables. It wouldn't find EDITOR, which is not a shell variable. Nothing. So that's one way of distinguishing them. But the fact that you have to go through some strange command like that to distinguish between shell variables or environment variables shows the kind of impractical nature of the C-shell for some purposes. And now, last and maybe least, this is the man pages. Those of who you may know, the man is the command short for manual. The man pages for the shells are hard to read. They're very long. They're organized in a way that may make it difficult to find what you're looking for. So if you're looking for something with a purpose, you may not know if that purpose is a shell variable or something else, so you may not know where to look for it. You can look for various strings, but the strings are often repeated. So it's generally hard to read. We just looked at the TC-shell man page a little before to find the eval command. Some things go faster. One approach is to search for a string. You can use the pager. Pager has the slash to look for a command or a string inside a pager operation. Man by default will use pagers, either be MORE or LESS. I don't know if you're familiar with those, but those can show files bit by bit. I've been using LESS to display these particular files we've got here. You can search inside there. You can try using different search strings. Also man pages in different operating systems may not be the same. They can be separate pages for csh and tcsh. They're aren't on the Mac, but they might be if those are separate commands. If sh doesn't really call Bash, there probably would be a separate man page. Some systems have separate man pages just for the C-shell built-in commands. Sometimes if you want to read a description of a built-in command that's also in the path, like echo, you need to read the man page on that command on echo to determine how it will work as a built-in command even if you're not calling the built-in command. That's a drawback of the operating system in general, not only for the shells, although for the shells in particular the man pages are quite long, partly because they've added useful features to them, which may be a positive. Okay. Are there any questions? Any topics you want to bring up? Anything relevant here? Well, it's been very nice talking to you all. I hope you got something out of this seminar that will be useful for you in your future endeavors. [CS50.TV]