I/o Bootcamp 2011 - Real world go

>> GERRAND: My name is Andrew Gerrand and I work at Google Sydney and on the Go Team, the Go Programming Language. Just before I get started, who here has heard of Go before? And keep your hand up if you've actually seen some Go code or if you've written some Go code. My colleague there is feverishly waving his hand. Cool. And other languages that you guys use, who's--who writes Java codes typically and Python? JavaScript? Anything else? Just yell something out. >> Scala. >> GERRAND: Scala. Cool. >> Common Lisp. >> GERRAND: Common Lisp. We got one real nerd here. That's right. Sorry. So in this talk today called Real World Go, I'm going to introduce Go, talk about some of Go's interesting language features and also discuss some real world applications in which Go is being used outside of Google. And we do use it increasingly extensively within Google. But there's a lot of interesting projects happening out there in the world as well. There is a system called SpeakerMeter which we're using at Boot Camp in I/O to get feedback from people. That's the QI code you can scan now, but I'm going to take it off the screen probably before you get a chance to do so. I'll show it again at the end. It's just a good way of getting feedback back to me so I can improve my speaking skills. So I'll just start with some background as to the origins of Go, the why and what. So, why Go? It's a question I get asked pretty often. Why create a new programming language? We have so many. Surely there's no need for another one. Well, this was born basically out of frustration with the existing languages, the mainstream languages that we had at our disposal. The statically typed languages, the dominant ones like C++ and Java tend to be very efficient when they compile and run and they give you a lot of control. But they're typically very bureaucratic. They can be verbose in use but--and also overly complex. It can be difficult to understand exactly how the code you've--that you've written will behave when you've written it and sometimes they can be very, very picky. But on the other hand, we have these dynamic languages and scripting languages that are really easy to use. But because they're dynamically typed, they can be very error-prone. A lot of programmer errors that would be compiler errors in a statically typed language become runtime errors in a dynamic language. And this means that you don't notice until you've deployed your program but some obscure corner case triggers a colossal bug. They can also be inefficient and slow. And when you start writing code at scale and particularly at a kind of Google scale with many tens of thousands of programmers, they really start to breakdown in terms of their ability to componentize and abstract things. And finally, we live in this concurrent world where we have many, many network machines and multi-core machines. But the traditional approach to writing concurrent software is tricky. It's--involves threads and locks and you have to be very precise in your reasoning about those programs to write correct concurrent programs. So, generally, the landscape looks like you had speed reliability or simplicity and you had to pick two. And sometimes we only got to pick one and we thought, "Well, can't we do better than this?" So, Go is a modern general purpose language. It compiles to native machine code on a variety of architectures. It's statically typed, so you get the reliability and efficiency benefits of statically typed language, but it has a lightweight syntax. Using type inference, we can infer a lot of the typed information, so you remove that repetition from defining types and such. It has a simple type system. It's not a classical OO kind of model and so it's very easy to understand what's going on and understand what your code does. And finally, it has some novel concurrency primitives that make reasoning and writing--reasoning about and writing concurrent code a lot more straightforward. So when we designed Go, the tenets of that design were, first and foremost, simplicity. Each language feature should be easy to understand in and of itself. So you should be able to understand all the rules associated with a particular feature and make--easily make valid decisions about how that should behave. And going hand in hand in that is orthogonality. Each of Go's features should, when they interact with the other features, should interact in a predictable and consistent way, a way that's easy to understand and reason about. And finally, readability. It's very important when working in a collaborative environment that when you look at a piece of code, you should be able to understand what that code does without having to have a huge amount of external context in your mind in order to make sense of it. And driving all of this was a consensus driven design process. Rob said that nothing went into the Go language until Ken Thompson, Robert Griesemer, and Rob all agreed that it was right and it was correct. And as a result, there were some language features that didn't--that were in discussion for over a year before they actually made it into the language proper. And each of those three has a very--has different aesthetics, very different opinions about languages. And so, the--in the end, the compromise is something that's really solid and very, very well thought through. So this is one of the simplest possible Go programs is just the "Hello, world." It has a package statement saying which package the program belongs to. Every Go program starts in package main, the function main. We have an input statement. The font package is a string formatting package. And our main function simply cause the print line function from the string for main package to print "Hello, world." And that, I think those are the Chinese characters for "world." They're just--the--we use those characters to underscore that all Go source files, UTF-8, Unicode files, so you can use Unicode in string literals and you can also use them in identifiers. So, if you brought a mathematical formula, for example, you can use like a sigma character to signify some value if you like. And this is Hello, world 2.0. So, if you run this program, it starts a web server listening on port 8080 and if you visited that URL, you would see the string "Hello, world." The difference between this and the previous slide is we've added the HTTP package to the imports. And now we have defined a second function, handler, which is an HTTP handler that simply writes the string "Hello" followed by the path component of the URL as the HTTP response. And this--and in the main function, we register that handler to the webroot and start the web server. So this is just to give an idea of the sort of succinctness of Go code, how straightforward it is to accomplish something that's actually quite complex under the hood. So let's take a look at Go's type system. So, I mentioned Go is a statically type language but the type inference system that we have saves a lot of repetition. Now this is a bit of a contrived example for Java, but in Java or in C or C++, you need to say which type your variable is going to be and then you assign something of that type to the variable. And so in Java's case, you say, "Integer i = new Integer(1)" or "foo = new foo." In C+--C or C++, you have to say "int i = 1." But it's obvious in the code that most of the time you just want an integer. So in Go, we can just say "i := 1" to declare a new i which is an integer equal to one. If we use a floating point initializer, then it'll be a float64 type and if we use a string, it'll be a string. And in this final example, I've actually created a function value, mul, which multiplies two integers and returns an integer. And the type will be inferred to be a function type that takes two integers and returns an integer. So, this gives you an idea of the kind of shorthand that you start to develop running Go code. In Go, we have--we defined methods on types. And methods can be defined on any user defined type. In this case, I've defined a type called Point which is a struct containing two values, x and y, both floats. And then I can declare this method apps on point and it returns to float64, and all that does is return the absolute value of that point. And then to create a point value, I have to say "p := Point" with some initial values. And then I can take the app--call apps on that P in the same way you would in Java or other languages like that. So it's kind of familiar. But when I say they can be defined on any type, I mean like any type. It doesn't have to be a stuct. A struct kind of looks like an object but this kind of underscores here, I if have my float which is just a float64, I can define an abs method on that type and it just returns the absolute value of that floating point value. And then I can create my float in the say way I did with point and call abs on that. So what I'm trying to show here is that Go objects are just values. There's no sort of box. There's--it's not a--it doesn't have to be a class. You just define methods on bits of data. So it's a nice way of associating logic with data but without the sort of the way overhead of establishing a class and building that kind of hierarchy. So, in order to generalize Go types, in order to make them interact nicely with each other, we have Go's concept of interfaces. And interfaces simply specify the behaviors of types. So, when you define an interface type; you define a set of methods and often, it's just one or two methods. I think the largest go interface I have seen has maybe four or five. And any type that implements those methods will implement that interface implicitly. So if I have some function PrintAbs which types an Abser, which is any type that implements Abs, I can pass in a value of type MyFloat or a point to that function. And at no point did the--either of those types have to be aware of the Abser interface. They don't have to declare that they implement the Abser interface. They just do, simply because they declare the Abs method. And this--to give you a concrete example of this in the IO package in the standard library, we declare a interface type called Writer. And writers are used to write streams of binary data to things and they--it's just one method called Write which takes a buffer or a slice of bytes or an array of bytes and returns the number of bytes written and an error value. But there are many, many writers throughout the standard library and in other Go code and then you can use those writers anywhere where a writer is expected. And we've already seen an example of this in the Hello World 2.0 example. In our HTTP handler, we have an Fprint statement and Fprint writes the string to W. W is a response writer, it's our HTTP response. And the reason why this works is because Fprint expects an IO writer and HTTP response writer implements the write method. Now, at no point does the fmt package need to be aware of the HTTP package. It just works because that interface is implemented implicitly and there's examples of this everywhere. I mean, you can just as easily, say, encode an image to a writer or you can--internally in the HTTP handler, we have like, if you--if you're writing GZIPed HTTP request, you can just chain these writers to write from this response writer to a GZIP writer to the network connection itself that sends the response to the user. So this becomes very, very powerful once you're coding at scale and it really helps to decouple pieces of code and make them independent of each other and very easy to mix and match. And now a bit about Go's concurrency features. Who is familiar with UNIX environments? It's the world, isn't it? In UNIX we think about processes that are connected by pipes. If I wanted to find all of the number of lines of test code in the Go standard library, I could issue a find to find all the file names in the Go package tree, grep them for file sending in test and then pipe that into a word count and get the number of lines. Each of these tools is a simple tool designed to do one thing and to do it well. And this is--this is the UNIX philosophy, right? And so we have standard interface between each of those and then they can just be connected indiscriminately. The analogue in Go is that you have goroutines that are connected by channels. So what's a goroutine? A goroutine is like a thread. They share memory like a typical threading environment, but they're much cheaper than threads. Goroutines have segmented stats which means that they can be created in only a few kilobytes so they're very, very cheap to create and destroy. And the go runtime typically schedules many of these goroutines across a fewer number of operating system threads, often only a handful. It's possible to have thousands or tens of thousands of goroutines running in a program that is essentially single-threaded. So the syntax to launch a goroutine is using the Go keyword. If I had some function Sort that takes a list and sorts it, I could choose a pivot i and then sort each part of that list in separate goroutines by saying go sort which calls the sort function in a new goroutine. The other part of go's concurrency model are channels. Channels are a typed conduit for synchronization and communication. So, a channel can send or receive any go type. So unlike UNIX pipes which are simply binary data, channels are typed. And the way you use a channel is with the channel operator which is a "