Socio - Plt - Quantitative and social theories for programming language adoption

PHILIP: Thank you everyone for coming. My name is Philip. I'm a software engineer here, and I am very happy that Leo is coming to speak. So Leo and I were in grad school around the same time. And we were actually officemates one summer at a another unknown company. And we just had a really good time that summer. And Leo has really impressed me by both the breadth of stuff that he's really interested in, in terms of research engineering, and also the depth in which he goes into stuff. So he's one of these rare individuals that has a lot of diversity and breadth, and also [INAUDIBLE]. This is one of his several projects that he goes in with a good amount of depth. So I'm really excited for this type. I hope everyone else is as well. So go for it. LEO A. MEYEROVICH: Philip didn't say he's also like that. Hi, I'm Leo. I'm from over at East Bay UC Berkeley. And this is work I've been doing with Ari Rabkin who was at Berkeley and now is at Princeton. And we've been looking at programming language adoption. I'm going to be talking about two different ways we've been looking at it. One is about quantitative analysis, and the other was looking at what the sociologists might say to us, and seeing if we can cherry pick theories that apply to our domain. So before I get into what that really means, I want to ask why do we care? How is this interesting to us? And so for that, there's a really cool paper by Eric Meyer, where he found this principal called the change function. This is by Pip Coburn, and the sociologists-- they actually call it something called a switching cost, but neither of them appear to have known that at the time. But basically, what it says is if you're looking at some sort of new technology, it's going to have some benefit. And when that benefit is greater than the cost and that pain of going through that adoption process, then it's a rational choice to go forward. And you might not have known the guy I was talking about-- Eric Meyer-- but you probably know the language that he was one of the key designers of, which is Haskell. And when he looked at this change function here, he realized Haskell has all this functional programming goodness, but you also have to pick this new language. And there's a lot of pain involved there. And what he decided after that is that from now on, my goal in life would be to also drive the denominator down to zero. And what he meant here is that instead of doing all this cool functional programming stuff and designing these new features for Haskell, he'd actually join the Visual Basic team, do all this cool new functional programming research there, but do it in that same language that is easy for developers to already use and lower that cost of adoption. So what I want to talk about today is two different ways I've been looking at how adoption goes and how we should think about adoption. And by we, I mean language designers and language researchers. So we've been looking at two ways. One is doing large scale data analysis. We've been going out into the wild and seeing how the stuff actually happens. And instead of just doing random safari, just looking at random facts, we've been trying to keep all of our work informed by how actual sociologists look at adoption in general and in particular fields that are somewhat related to languages. So let's first talk about the fun numbers, and then we'll get to the models behind it in a bit. So when I'm talking about adoption, I'm interested in two things. One is a language like Haskell. But also another thing is a feature like functional programming. And throughout this talk, I'm going to switch in and out of those two different types of adoptions. And we've done a lot of cool quantitative analysis. Here, I just want to talk about three particular types. One was we want to look at how people pick domain specific languages over general purpose languages, which matters when we're trying to pick what type of language to design. Another thing is in the small-- when a programmer makes one small decision, how did they actually make it? And then finally, as an educator or somebody who wants people to know languages, we're interested in what actually influences people's ability to do so and what's important there. So let's talk about demand-specific languages first. So when I say demand-specific language, I mean something like Excel, which is good for, maybe, doing accounting formulas. And then, there's this spectrum-- that you can make it more and more general, which is something called general purpose languages. And oftentimes, we might say this is something like a Turing complete language, where you can compute anything you want. So maybe there's like a library for everything in Perl. You just have to look for [INAUDIBLE] and find it. And then there are-- and this is a spectrum. Maybe there's something between like map reduce here at Google, where you can plug in whatever function you want into your map and reduce. And it will just run it on the cluster. But this might not be a good way to think about this stuff. So what you're seeing here is one of those experiences that scarred me as a youth. I went to a soap factory to see how they make soap. And apparently, they run the soap machines by Excel macros. It's this very domain specific thing, right? So maybe we don't actually even know what we mean when we say domain specific. And so we took a look at about 200,000 projects in SourceForge and tried to understand what does it mean to be domain specific. And for each project, we got out a few pieces of information. So here, we're looking at the Squirrel SQL client. On the bottom left, we're seeing that it's a client. It's a front end. So it's the category of front ends. And then on the bottom right, we're seeing the programming language is Java. So you're going to write this type of front end in Java. And let's see. What else do people use Java for in SourceForge? And so the chart you're seeing here is-- on the x-axis is different categories. So one of those dots means blogging. Another one of those dots means you're writing some sort of search program in Java. And what the y-axis means is that [? wire ?] means it's more popular for that particular category. So if somebody's going to write a search client-- it's actually about 40% chance it'll be in Java. But if they're doing blogging, it'll only be a 10% chance that it's in Java. And so this is in just the SourceForge [INAUDIBLE]. If you're one of those 200,000 programmers, this is specific to that. What's cool is I actually started life as a schemer, and it's fun to look at the adoption-- how niches work there. And there we see, OK, apparently build tools is something people-- if you're going to use scheme, it's going to be for a build tool. And if you notice, though, the y-axis is a little different. It's smaller. And what the really interesting thing here is that you don't have this nice spread of scheme across different languages, right? A few things pop out. And so what I did next is-- or Ari and I-- we ordered the languages by popularity. So Java is there on your left. And note, that's the popular one. And then Scheme is on the right. And maybe it's not unpopular, but under appreciated. [LAUGHTER] LEO A. MEYEROVICH: And so here we are seeing just general popularity. And then what we added in was the standard deviation across different categories. And notice that the y-axis here is on a log plot. And so what that means is even though they all kind of look the same size, because the standard deviation is kind of at a lower point on the bottom right, that means it's much smaller than the ones on the other side. And so, for example, if I divide standard deviation by the average, you actually see-- or maybe another way of looking at it, if you look at those two slopes, they're changing at a different rate. And so what that means is standard deviation-- so the interpretation here is basically as you get more unpopular, you're only going to be showing up in certain niches is what we saw for Scheme versus in Java. And so that kind of leads to a certain type of thinking. For example, when you're going to talk about language adoption, you're not going to say that it's getting generally more popular. It's getting more popular niche by niche. You're going to see more of those popping up. All of them are going to go up, but also the cool phenomenon there is you start seeing this pop. And I'll get it back into that later. But now, this already leads into a whole new line of reasoning about how languages work. So this is all very high level. So let's actually zoom in a bit. So we can ask, well, how do programmers actually pick languages? And here we see a picture of a bunch of dogs where they say, no matter what it is, we want it. I think better of programmers, and so I was curious what they actually do. So here again, we're looking at SourceForge, the same 200,000 projects. And what you're seeing on the right is just a project-- the second most recent project they wrote. And then what you're seeing on the bottom axis, given that project, what's likely the likelihood of them picking a language, some next language? So given that you use one language for a project, what's the likelihood that you're going to use some other ones? So as an example of what's going on here, no matter what language you picked on that right axis-- on the y-axis-- you see these vertical strips that good chances are, you're going to pick any one of those six languages. So the probability is independent of what language you use. You're going to probably use one of those six bars. And so that means programmers are creatures of habit. You're going to use a popular language. What's also cool, if you notice, is there's another phenomenon going on here, which is we have this very strong diagonal in this matrix. And what that means is you're going to use that same language. You use one language, there's a good chance the next language you use is going to be the same exact language. So now, it's more strongly instilling that programmers are creatures of habit in two different ways. And this characterizes most of the projects in SourceForge. Programmers use the same language, either that's a popular one or that they used before. So this led us to a question of why. I'll explain this in a second. But why are they a creature of habit? What actually led to those decisions? Is it just they like popular things, or is there something else going on? And so we launched a bunch of visualizations of this stuff. And we realized that, oh, this might be a nice opportunity to get our work in the eyes of lots of other people. In this case, you see we got Slashdotted. And as soon as we put up those visualizations, we were talking to the press and it was all really fun. But what was really going on here was something different. It was much more insidious. We wanted to run a survey. And so the reason we did this viral campaign is we wanted to see if we can ask programmers about how they actually pick projects for their most recent project. And so in about a period of two days after this campaign, we had about 1,600 responses from people on websites like Slashdot and Wired. And so what we're seeing on this graph is what they said. So this is a little noisy graph. But basically, what we're seeing here on the bottom bar is different types of reasons people pick their language for the last problem. The very strong influences are those bars under the green arrows. So for example, on the leftmost-- actually, open source libraries were from a strong to medium influence-- so everything above that horizontal black bar. And then also something like group legacy which is sort of what we're seeing in the SourceForge case. The group was already writing code in this language, so the next project will use that same language. And what was really cool was that as we go through all these green bars-- for example, cellphone familiarity, team familiarity, open source libraries-- these are all about social properties of language. This is how other people use the language. It's not just about how fast the language is intrinsically. And as a languages designer, then I started asking, OK, well, what about the others? What had slight influence on what languages were picked? So on the leftmost, you see something like correctness. That would be like type safety or something-- one of those very common properties. And more dear to my heart is something like developer speed or productivity or inherent productivity to language. That, actually, was not a strong influence for picking a language, when you actually get down to the concrete decision. So the programmer is an interesting social beast. So this is just me asking all the programmers and showing that. So the question is what happens when we start picking what programmer we look at. And so if social properties are very important, we can start asking what happens when we look at different types of programmers and from different types of social organizations. And so what this chart is showing is those same axes on the bottom, those same properties. But now, I'm slicing the programmer data based on what size company does that programmer work at. And so the leftmost-- that dark blue-- would be you're working for yourself. It's a one man team. And the rightmost will be you're working for a 500 organization with 500 programmers or more. And just as a quick surface representation, I asked what's the slope of the curve across these. So for example, if we look at correctness on the left, I wrote a green plus. And that means that the bigger the organization-- except for one of those bars-- basically means that the bigger the organization, the more that correctness is a concern. Well, if we look at something like open source libraries-- the first red minus-- that's a negative slope. And what you see is, the bigger the organization, the less that open source matters. And presumably, they're building their own software. And these are actually significant changes in adoption habits, because this is essentially going from medium influence to a slight influence. This is a full one point drop. So in summary, larger companies care about more social properties about their language or how the language is used. So the size of the company is just one way of bucketing programmers. And so there are lots of different types of programmers out there and lots of different ways of looking. And so one thing they we've been looking at, actually, was partially inspired by Philip. It was actually education. And so we were wondering, how does education and age and things like that play into the languages you know? So as one example, we did another survey, this time on something called a MOOC, a massive open online course. And essentially, you can think of as a programmer who's in the job force but wants to take an online course. So this is an educated programmer-- somebody interested in languages. And we were able to get about one 1,000 or 2,000 programmers this way and ask them about how they learned languages. And one interesting phenomenon found out is that the number of languages a programmer knows-- once you're 20 or older, it stagnates. You'll say that, oh, I know-- those red bars we're seeing, the x-axis is age. So as we go from left to right, the red bar is kind of constant, right? We have this nice line in the middle. The number of languages you've used actually stays kind of dormant independent of your age. And the number of languages you know well is the green bars. And that also stays fairly invariant. And again, these are educated people who are working programmers. So if we are going to do something about language education, it sounds like what's going on after you're in the workforce today is rather stagnant. So we took a look at what happens in school, like what we do there matters at all. And so what we asked is if you look at the labels on the left column, there we asked them for different categories of programming, like functional programming like Lisp and Scheme, or dynamic programming Perl and Python, or maybe specialized systems like Assembly or Matlab and Mathematica. For each one of those categories, we wanted to see how you acted in school influenced what languages you know today. And there, the first thing we looked at is whether you're a CS major or not. Is CS education actually doing anything at large? And there what we found is that when you ask CS majors versus non-CS majors about the languages they know in these different categories, that actually doesn't really matter. For example, non-majors, 19% of the time will know about functional programming, and CS managers will know 24% of the time. So there's a small 5% jump. So as somebody interested in education, this is a small jump. However, if we look at whether a particular language was taught in an individual course, like in one of these families, now we see significant changes in the statistics. For example, for functional programming-- that top row-- if you were taught one of these languages, you'll say that you know it 40% of the time. But if you weren't taught it, you'll only have picked it up after school about 15% of the time. So the takeaway here is it actually doesn't matter to a large extent, if you don't know anything else about a person, whether there they were a CS major or not. But if you ask what was taught in those particular courses, now that becomes significant. And when you define a CS curriculum, the languages we teach actually matters, because otherwise, apparently, people won't learn them. So I actually have lots and lots of statistics. If you go to my website-- the URL is here-- we actually have a few interactive visualizations, and this is the thing that launched our viral campaign. Also, our raw data is up. If anybody know statistics much better than me, I would be curious to see what you have to say. And with that, I wanted to move on to the social principles, the more theoretical stuff. But before then, I think this might be a good stopping point if anybody wants to ask questions about this more quantitative analysis. AUDIENCE: The previous slide here, [INAUDIBLE] LEO A. MEYEROVICH: Yes. AUDIENCE: [INAUDIBLE]? LEO A. MEYEROVICH: So we have that data, but I suggest you come up to me after and I can pull it up. This is just a snippet. Also, I think for those, people use those in practice, so I don't think it's as surprising. Mark? AUDIENCE: Well, the number of languages versus age-- there was never an upward slope, even at the beginning of the graph. So that doesn't seem to support the hypothesis that people tend to learn languages before they [INAUDIBLE]. LEO A. MEYEROVICH: Yes. AUDIENCE: It seems like as soon as they're qualified to be on the graph at all that they're already [INAUDIBLE]. LEO A. MEYEROVICH: Yeah. So our first guess here was that we had some sample bias going on. And actually, the first time we did it was actually for the Slashdot survey. So we figured we were just asking a bunch of nerds about nerdy things. And that's actually why we went back to the online course, because there, we had much wider demographics. And so here, I think this really is what's going on. But maybe what's going on is somehow-- for example, how we asked the question, that maybe somebody isn't remembering languages. But one of the ones we asked was we actually gave reminders of different types of languages. And then we tried different ways of asking the question, like can you enumerate your answers and can you actually just give a number? In both cases, we kind of saw the same stagnant-- so this seems close. AUDIENCE: Could this be because companies pay employees to [INAUDIBLE]? LEO A. MEYEROVICH: Yeah. So maybe a good thing to do would be to do some more of the demographic slicing. For example, we had a lot of international students in this particular survey. So that would be interesting to control for. AUDIENCE: Also, you're assuming that a snapshot of ages at a given point in time is a good indicator of what would happen if you followed the programmer over the ages of the same programmer. But it might be that the conditions for being 20 now are different than the conditions for being 20 20 years ago. LEO A. MEYEROVICH: Yeah. So the computing industry could change, and that will-- so maybe these are statistics of the day. These are cross sectional, not longitudinal. AUDIENCE: Yeah. LEO A. MEYEROVICH: I've actually found indicators that that's not the case. But I think we should talk offline about that. But basically, essentially, as soon as personal computing happened, then ages became invariant. But we can talk about that offline. I think in the back. AUDIENCE: Are there any [? indications ?] that the set of languages that a 20-year-old programmer knows well [INAUDIBLE] know well [INAUDIBLE]? LEO A. MEYEROVICH: So this is cross sectional, meaning we took it at a snapshot in time. So I really don't know. But what I will say is that we actually looked at different age groups to see what languages they know. Then for example, if you're in college and you haven't left yet, this is the time to learn Ruby. You're a Ruby programmer. But if you've already left, you've missed the curve, and you probably won't know Ruby, even if you're just two years older. So there definitely are cool age-specific phenomenon. John? AUDIENCE: [INAUDIBLE]. So you would predict that the age or [INAUDIBLE] programmers [INAUDIBLE]? LEO A. MEYEROVICH: Yeah. So the question is are languages somehow generational? When Java came out, you have the Java generation programmers. That I don't know. That's kind of what the Ruby comment is, that there are these generational blips, especially for the less popular languages. Something like Java's a little tricky, where it's an old language, relatively speaking, yet it's still the number one language for a lot of statistics. That's a good question. Unless there's like another really burning question, I do want to move on to the really, really crazy stuff. OK. So now we have a bit of an idea of the numbers. But we weren't actually doing a random number hunt. A lot of this was informed by ways that we saw that sociologists might think about these things. And it was opportunistic, but there was some picking here. And so the other side of our research-- we've started thinking about adoption in a little bit of a different way, where we realized that it's not just about getting our language out there. We have these longstanding challenges and questions in programming language research, and so the question is could we start, by looking at them differently in terms of social theories, we have alternate explanations for what's going on, or in some cases, explanations for the first time? And I want to talk about three particular cases. The first one is I think what a lot of people would expect me to talk about. And it's very practical, which is, if you are building a language or any tool, how do we market it so people will adopt it? And so there's something called diffusion of innovation. That gives us a nice recipe. The next thing-- that was a bit of a narrow minded view of how adoption works. For example, one thing I've come to appreciate is through theories like reinvention, what it basically means is the more people use your system, the better your system becomes. And I'll talk about one particular case of how we might try to harness that. And then finally, I'm going to argue that-- pulling on, again, arguments from sociology-- that we shouldn't just be looking at the technical side of what's inside your language, that basically, a lot of the knowledge about what it means to be a language is actually coming from how people use it. And when I say people, I mean groups of people. So let's kick this off. So I'm going to say something really bizarre here. So to a sociologist, if I told them, yeah, I think safe sex is like this thing called the type system, I don't think they'd laugh me out. And before we get to safe sex let's talk about corn. That's a little easier. And so basically, what happened in 1943 was near the beginning of this revolution in sociology. And what happens in one particular case study was this guy named Brian went out from farm to farm. And he looked at how people were adopting corn. And the reason he was doing this is-- the corn you're seeing here is genetically modified, and more in the tame 1940's sense, not the way we're doing it today. And the reason they were doing this is because they were concerned about things like, say, world hunger. And what Ryan found was that it took about 12 years from a farmer to hear about corn that will increase his yield to the farmer actually using this on the farm. And all the farmers eventually did this. So the question is why did it take 12 years? So over the next about 20 years, there are a lot of other studies, some about corn, other things about how children buy toys and how companies buy microscopes. About 500 quantitative studies later, somebody named Everett Roger came into the picture and said, wait, there are lots of patterns going on. I could build a model of how adoption works. And then for the next about 50 years, this became a really big wide study field and cited more than any computer science paper I know of. So I was impressed. So this is a very general thing, so I think you guys will enjoy hearing about this. So how adoption works is, Everett found, was that it goes through a pipeline. The first step-- somebody has to hear about your innovation. In the case of corn, that's very easy. A door to door salesman went around trying to sell corn. But that doesn't really work. Salesmen aren't very compelling. And so what happened is it progressed to the next step, which is something called persuasion. And this is more of an information seeking thing. You can imagine going through your social network, talk to your friends about it, go onto the internet, and try to learn more. And this is more of an active process. And eventually, you're going to go through this coin flip. You're going to have to make a decision. And this is a very short process, because eventually you're like yes, I'm going to do this. However, you're not just going to yes, I'm going to do this and replant your entire farm. You're actually going to go through a trial period. So here, I'm showing that you'll maybe plant one piece of corn, see how that works. If it works for you, maybe you'll try to deploy it on your entire farm. If you're in Alaska, obviously, this won't work. So it has to be compatible with how things worked. And so this is basically what Roger found for all these innovations. He was actually very easily able to characterize the processes they went through. Now, at the same time that a person is going through this adoption process, there are catalysts that influence how it goes. So for example, is the innovation simple? Oh, this is magic corn that's resistant to bugs. It's bred to be resistant from bugs. I understand what that does, and well, wait, that there's a relative advantage to this. So because it's resistant to the bugs, no bugs will eat my corn and I can sell more corn. And at the same time, you might not believe that this is actually the case, like this is a salesman trying to sell you something. So you actually want to try it out and see that, maybe, for example, does my soil type work with this type of corn, and see if it really works. And then for the final step-- I just described the final step of compatibility. Does this innovation actually work for your particular domain. Now the reason I wanted to walk through this, first of all, I think this is one of the most important things in this talk. So I didn't even come up with it. But it's still really cool. But the thing is now, we can start analyzing innovations and technologies using those processes and catalysts. So for example, let's look at safe sex and how that relates to type systems. So if we look at the process-- actually, let me back pedal one bit. So the reason I'm talking about safe sex-- I actually mean a very particular time period, about early '90s safe sex, which is essentially when you had the whole *** epidemic was really concerning. And so if we look at, in terms of the process of why safe sex was becoming an issue of the time, most people knew that safe sex would prevent problems. And they might actually talk to their friends to say yes, do you really believe this is going on? But somehow somewhere between decision or trying to see if it works for them, going forward, it fell down. And the interesting thing is we look at type systems-- most people know about type systems. A lot of people might have even tried to read up a blog entry or something about it. But somehow, it falls down. And so if we take a look at the catalyst, it actually seems very similar as well of what the challenges are. So the relative of advantage of both safe sex and type systems-- oh, we prevent bad things. You will not die. Your space shuttle might not explode. That sounds really good. And it's pretty simple how they both work. I don't think I need to belabor the point at somewhere like Google. I'm not even talking about crazy type systems, just simple things. However, they also fell down with something like observability. Did it actually prevent the problem it was advertised to prevent? I don't know. If something didn't happen, why didn't it happen? It's hard to see causality, right? And at the same time, through trialability-- you can't just use a type system in isolation. You have to work with your co-worker. Same thing for safe sex. It takes two to tango. And so as you go through these things, you see that this actually is the same type of technology that is at its essence. And so the question is, can we learn from the kind of studies done in other domains for similar things? So apparently, in two weekends, you can save a lot of lives. So I'm going to teach you how to do that. So some sociologists took a look at how could we spread safe sex? How do we get this innovation into people's hands and make it active? And so weekend number one, they hung out at a bunch of gay bars in three different cities. And they isolated local opinion leaders in the community. They said, here are the people that people at the bars apparently listen to. What that means, I don't know, but apparently-- why that is-- these are the people of interest. And once they found those opinion leaders, they came back the next weekend and said, come to our workshop. Let us teach you about what this thing is and let us teach you how to explain it to other people. And then finally, a cute thing they did, also, was they gave them a little token. So they actually them a badge with a traffic light. So basically what happened is somebody would say, oh, why do you have a traffic on your head or on your lapel? And he says, well, let me tell you about safe sex. And that was actually very simple, but it got the conversation rolling. I could claim that it worked well, but I could also just show you that it worked well. And I don't want to get into the methodology of how they measured this. But it's actually much more rigorous than what we normally see in computer science, [? as its ?] [? weighed. ?] But what is essentially what comes out is when they came back three months later and also three years later, they asked, all right, if you had sex maybe several times or whatever, did you have safe sex? And more people would say, yes, I did after the interventions. And likewise, if they asked, did you have unsafe sex, which is the thing that we're really interested in. And that's where we see it went down. So we have this nice gap. So this was, I think, a very successful case study. So now the question is, can we go back to languages and see does the diffusion process mesh with how things work here? So we actually have some cool success stories in the language community. And I think I could describe them in terms of this diffusion process. So that's two very simple examples. One is observability. Do you see a benefit of this technology? So type systems are supposed to be a program analysis that finds bugs. We can't even get people to do this for free, right? But that's the [? high school ?] world. Now, there's a Stanford start up called Coverity that runs program analysis. Again, it's sort of in a sense similar technology, but in this case, not only do they get people to use it, they get people to pay them to use it. And basically what's going on, the analysis runs and then you see that this long standing bug or this really scary looking bug actually gets characterized. So that's an observable result of your tool. That's something you want. So the question is, can we do this for type systems? How would you do that? Can you give some accountability? Another case here-- and actually, probably really big at Google-- is something like relative advantage. If I use this tool, what do I expect to see in terms of a substantial change to myself or to my organization? So something like Hadoop or EC2, all of a sudden, oh, I'm going to scale out and be able to handle more users, things like that. And so this targets a particular need. And its a very quantifiable need. And so relative advantage is something you should be thinking about when you're trying to design your technology. You want people to actually use it. So I can go through all the other catalysts and all the other processes and do the same thing for other popular technologies of the day. I suggest you think about it. And then if you really are stumped, then maybe look at the slides, but I think you should just think about it. I just described very technical solutions. What's really cool to me about the safe sex advocacy was that it was very non-technical. They just want to a bar for a weekend. We actually have that in the computer world. For example-- this, luckily, is no longer true. We dropped the URLs now. But you actually could look at a website and see what technology they use, and this is a ringing endorsement for your technology. People know about it. They get persuaded. So I think a lot of simple solutions will work. You have to think about them and know how to think about them. This was about how to get people to use your technologies and ways of thinking about it. As a language designer or as an academic, that sounds fun, but that's not the only thing I'm interested in. I just want to make better and cooler things. And so I'm going to argue that adoption lets you make better and cooler things. So I'm going to look at one particular case. It's something called reinvention. And I'm going to look at something in the context of something called the living will law. This is a very politically charged topic, so instead of getting into that, what I do want to say is that both Republicans and Democrats in the US think this is an important thing. And so both do lots of legislation on this. And there are a lot of points where they agree that, essentially, you need to have laws about what happens when somebody's in a coma. What are their legal rights? So as a Californian, I was very excited. California wrote one of the first living will laws. That's good. And then very soon after, Nevada wrote another living will law. And they expressly said, the goal of this legislation is to become in accordance with recent California legislation, a living will law. So they had no innovation in this law. It was the same thing. It was just copy and paste. But the really surprising thing is about 10 years later, Arkansas, which is not liberal hippy California, had made its own living will law and legislation. And what's more is it was actually better than that legislation that California had in 1976. So the question is, what happens? And what's cool is this isn't a singular event. If you look at something like school policy-- or maybe if you don't like welfare, but you want people to go off welfare and get jobs, that's something called welfare reform. That is actually the same curve that happened for how, as policy spread throughout the country, it got better. AUDIENCE: What is your vertical axis? How are you measuring [INAUDIBLE]? LEO A. MEYEROVICH: It's very law-specific. The case of living will law, the question is does it handle more scenarios that come up, and independent of whether you can or cannot do something, how easy is it to exercise? So think of this like a flexible type system. Maybe it does what you want or not, but it's flexible. It gets you there really quickly. I'm going to stop making fun of type systems. I've written a paper about them, so don't think I'm against them. So what's really going on here is two very cool phenomena that I think we should be looking at. One is something called social learning, where basically this is Arkansas looking at somebody who did a full deployment of the idea before. And you can see how that worked for them. Can we copy the good things, change and fix the bad? And related to that is something called adaptation, where you're going to be in a different context than the person using the innovation before. So in this case, for example, if the legislation was made for an urban area, because you're in a rural area, you're going to need different legislation. And so then you're going to have to adapt it based on your context. So both of these phenomena-- you can learn from them and see how people work in those scenarios. This is very hard to do in a lab room environment. So if you're designing a language, I think it's a very good question-- how can we harness this reinvention of the community to be part of the language of design process or the feature process? And more in general, if you're making technology, you could pretend you can invent it all, but I'm going to say history is against you. So to be very concrete how the shows up-- as me, Leo the language designer, I'll come up with an idea. I'll maybe prototype something three to nine months. Maybe I'll send it for three to six months to people so [? for ?] feedback, see if they like this language feature. Then maybe I'll have to iterate again. And so now we're entering this year long period. And then heaven forbid, I decide to publish a paper about this thing. That's an extra year long lead I'm working on this feature. This is where we are in the language design community today. Even an industrial setting-- it's a little faster, but it's not significantly faster. And so the question is how do we streamline language evolution so you can involve the community to actually improve your technology? So this is a thought experiment. I want to be clear that I'm not doing this, but this pulls onto a lot of ideas that people are doing. So again, let's say we start at the top with an idea. Now, then you're going to want to design your language feature, but you're not really interested all the heavy, gunky, low level details. So notions like language as a library, where you can do an interpreter level or very high level implementation of feature, that actually streamlines the process. And then from there, you can go and get it out into the community right away. However, today you can put it on a blog, but who's going to use it? So the question is if you're working with a language community, how do you actually engage with the language community? And so for example, we have Mechanical Turk, which will give you people who work at a call center. That's not very interesting. But you can ask is there a Mechanical Turk equivalent for trying out language features? And as far as I can tell, there is not. So if you want to build one, please let me know about it. But that's not enough. Then you have to get the data back from these experiments if you want to learn how the community works. And so there, you need to get analytics from your compiler. Maybe, as you saw in the beginning of the talk, you have to survey people to see what's going on. Not everything is just in the numbers. There's explanations for the numbers. And then finally, you realize that we're iterating. And so you want to put this all into a central place, save that data for somebody else, fork, and move on to the next pipeline. And so I'm not saying this is necessarily the way to do it, but hopefully, by principle of social learning, you appreciate, oh, there is this untapped resource that we don't know how to take advantage of today, and for a lot of other people, works for us. Mark is making eyes at me. AUDIENCE: Yeah. Back up just a little. So many languages have active communities that seem to me to already be providing your expert social [? files ?] behind the function. [INAUDIBLE] the Python community has the OLPC process. When I was [INAUDIBLE], there was this very active [INAUDIBLE] list, and we were constantly discussing language features today. Equiscript has a very active community. How do these communities of discussing and proposing prototype of the futures and providing feedback, iterating fairly quickly, differ from those [INAUDIBLE]? LEO A. MEYEROVICH: That's a very excellent question, which is basically we already have involved communities. Do we want to involve them in a different way? Is this the type of the involvement I'm talking about? And so for there, I actually want to draw on an example. I was talking to one of the Scala developers, and in particular, a concern that they've been having, which is Scala is this language that's going under rapid evolution as we speak. What basically happens is somebody like Martin Odersky will have an idea, and they'll put out a patch or talk about on a mailing list. But what they realized pretty quickly on was that essentially, it's an echo chamber-- that you're talking to this very special demographic, and that this does not necessarily relate-- it's unclear how this demographic relates to the demographic you care about. And so when I'm talking about social learning, I'm really saying get this out to people, the community or a representative of the community who really are working there. And so I think Equiscript are doing a really good job of, for example, getting high level developers at Google and Microsoft to say what they want in the language. But unfortunately, those aren't the only people using the language. And I say those are the minority. And so I don't actually have a solution for you. But I do have the problem for you. Or I have a proposal for a solution. This is totally untested. So yeah, that's a great question. OK. So now, I want to go to a super high level. I talked about different ways of using adoption for some particular tasks. Now, I'm going to argue that it shouldn't just be Leo coming here and talking to you about it. I think other people should be looking at it. And I'm going to give two examples of why I found this interesting. So here, we have something called an ecological theory. And somebody named Mark, about, I think 15 years ago, made an interesting observation that discussing music with your friends is fun. And so what led from that is well, does this somehow drive how we pick music and how music genres emerge? And what he observed is that individuals have time constraints. You can't just listen to all the music or talk to everybody about your particular music and seek them out. And what he realized is that somehow, music moves along demographic lines in a pretty interesting way. And so what you're seeing on that chart on the left-- on the x-axis is people's age, and on the y-axis, people's education level. For example, if you look at person A on the top, they're somewhere in that middle age group, but they're very, very educated. And what's cool is you can actually start drawing niches around which people listen to what music. So even if you don't know anything about person A, there's a good chance, according to this graph, they like new age music. If you look at somebody older, person B on the right, they're part of a much bigger demographic, a big age demographic. And country music-- I was surprised by this-- is actually very popular in the US, even before the recent bluegrass stuff. And part of the appeal is, according to this reasoning, is that a lot of people could talk about each other to country music and know what they're talking about. And so what the realization behind this chart I'm showing you here is that when some sort of innovation, whether music, technology, whatever, is competing out there in the market, it's actually not competing for individuals, but competing for social networks. So the case of country music has won the demographic of people who are older, while heavy metal has this nice niche of younger people. And so the interesting thing is if I go back to this early graphic I showed in the beginning of how language is spread throughout SourceForge, I said that DSLs are going niche by niche. I'd make a much stronger claim-- that according, at least, to the ecological theory, if you're a language designer, you're not targeting the technology constraints of the niche, you're more generally targeting-- you're a community builder. You might not necessarily have even fixed any technical problems in that niche. According to ecological theory, you have just somehow spread into that particular social network of the domain. So when I say domain specific language, I mean community specific building or something like that. So this changes exactly how we evaluate or understand languages. And so now, I want to finish on one last example. I grew up in New England, and we liked to make snowmen there. And I don't you know if you ever made one, but you roll the snow. Every time you roll the ball, it gets bigger and bigger. And technology is like that. In particular, let's say you make one roll and you add in some technology on top of the existing technology. What this is going to do is enable new types of social interactions. For example, we added a Twitter wall or something or a Facebook wall. Now, all of a sudden, people standing in line at the market can tweet on it. And that means they have new social interactions driven by the technology. But at the same time, if we turn the snowball again, now we're going to have, based on those social interactions, new types of technology emerging. And for there, I might say, for example, before we had Twitter, we had Facebook. And because people were using Facebook, then we were able to advance to Twitter. And the relevance here is that if you want to talk about designing a language, building some new technology, that's only half of ball roll. The other half of the ball roll is understanding how people are using it. And that tells you how the next iteration of the technology works. And this is good news, bad news. Good news is that we have this understanding where I claim there's this relationship. The bad news is twofold. One is this relationship is a moving target, because they're co-dependent. It's human specific. It's society specific. And the really bad news here is I don't think sociologists are going to do the work of understanding the other side of the ball roll for us. So if you really want to understand this, I think, I have to keep doing this. Other people have to keep doing this. And we have to be a little more, I would claim, scientific about it-- but at least just looking at somehow. So in conclusion, I showed you two things. I don't think this is a conclusion. I think is a start of a lot of cool stuff. The first thing is I think we are in a data era of language research. It's not just software engineering or analytics. I think it's how we get data to understand how languages work. It'll hopefully be not as much of an art going forward. And the second thing is, I think when I talk about principles of programming languages or language foundations, my argument here is that social theories are one of the big foundations that, for a lot of things we care about in languages, the social theories actually inform whether they work or not and how they should work and they give explanatory explanations. So with that, I'm going to say, if you thought this was cool, go to my website. I have data, papers, and an email link probably somewhere there if you want to talk more. John? AUDIENCE: [INAUDIBLE]. LEO A. MEYEROVICH: OK. Then let's hearken back to the safe sex example. Apparently, people on their own aren't going to get pushed to do safe sex. But somehow, if you hijack the social process and interject on it, you can change it. So there's really bad news here, which is one of the earlier results from the modern school of sociology was that a lot of these social processes are not automatically self-sustaining, that even if you do an intervention-- to design an intervention that keeps working is very hard. So good news, bad news. Good news is interventions work. The bad news is it's hard to do them. One last thing here. I was very impressed by the age in variance results, because what it told me is that older programmers have a very long shelf life. A 60-year-old knew all the popular languages. Statistically, it's fine. Maybe there's some sample bias here, but I thought that to be a very promising thing that they were there on it. Hi. AUDIENCE: On that age thing, how much have you considered the fact that people shift the set of languages that they know? When I was 15, I knew Basic very well. But if someone asked me today, I probably wouldn't [INAUDIBLE] Basic at all. LEO A. MEYEROVICH: Yes. So the question is how do we discern languages before versus languages that we are using actively today? There, we had two different questions about that. And we put them in a particular order to help. The first question was what languages have you ever used? And even if before-- and we're careful with the phrasing to get at that. And then the second question is what languages do you know well now. So I'm totally with you on that. I think there's still problems with the phrasing. Mark and John-- AUDIENCE: [INAUDIBLE] LEO A. MEYEROVICH: Yeah. So I think that's a very good observation, which is that basically, languages may not necessarily be an entirely technical innovation, that there's still some perceptions and beliefs involved. And I found two very compelling areas of research that helped align my thinking there. The first one was historical linguistics, which is asking questions like, well, why was Italian so popular after the Romans? And it's not because Italian is a better language, but it's because if you work in the Roman army and learn whatever that you'll become a citizen. So that was an issue of prestige and other things. The other community which is a smaller body of research, but also very interesting, is something called the economics of religion, where you get a statistician to ask how a religion works. And there, you get funny results like, for example, if you don't care if your language is adopted in wide scale, but you are interested in if people keep using it-- for example, you could do horrible experiments on them, which actually, I think, is a great model for academia-- you actually could be a very strict or very polarizing religion or language. And if you make it hard for people to get in, but once they're in, it's hard for them to leave because all your libraries don't inter-operate with anything else, people will stay. And then you can try things on them. So I totally agree with you that there's a lot non-technical going on, or non-utilitarian In the middle? AUDIENCE: It feels like the greatest area of language exposure was back to the '60s, perhaps through the '70s. And everything we're seeing now is, oh, well, let's take something from the '60s and repeat [INAUDIBLE] syntax or dumb it down so that people [INAUDIBLE]. Is that a reasonable observation? AUDIENCE: [INAUDIBLE]. [LAUGHTER] AUDIENCE: [INAUDIBLE] But seriously, it's like you're arguing over the details in a sense. It's like the difference between sect A of religion and sect B of religion, but way over here is something that's a totally different view of the world and you're totally ignoring it. LEO A. MEYEROVICH: Right. I think that's what motivated a lot of this work. I feel like it's not quite the same problem that physics has today, but we're getting there-- where our ideas are way past where people can do or what people will use. But on the other hand, when I look at numbers, the programmers today use many more languages than they used before. Whether those languages are actually very different from the ones they used before, that's a good question. For example, the research I do for the language features I do, those aren't going to show up in your language in a very long time. That's also why I think we should understand this. I don't want that to keep happening. Mark? AUDIENCE: So on your diffusion of innovation numeration and the characteristics of [INAUDIBLE], there was all this stuff about compatibility and [INAUDIBLE]. And then, you were focusing on type systems, [INAUDIBLE] seem or what have expected that the focus would have [INAUDIBLE] that you're imagining one is trying to advance is to promote type systems to programmers that currently are in [? dynamic ?] type languages-- that gradual typing approaches and optional typing approaches-- things that allow the types to be adopted incrementally and allow them to be used in the context of not have to completely change what these programmers [INAUDIBLE] use. I would have expected that to have all followed from your basic principles. LEO A. MEYEROVICH: Yes. So the question is something like gradual typing, which lets you mix in static types into your dynamic language-- let's make this the last question. So the question is how does gradual types, which is supposed to be an adoption-oriented approach to static types, mixing those into popular dynamic languages-- does that work or not? I think, in many cases, it does address a lot of issues. But for example, can you imagine working with three people-- one of them doesn't know anything about static types and the other two do use the gradual types. Could the person who doesn't understand static types very well write programs that inter-op with them? My claim is with modern gradual type systems, the answer is no. That's my experience with something similar at Adobe. AUDIENCE: I think the reason the answer to that is yes because of the very common observation that main purpose that declared types is documentary. LEO A. MEYEROVICH: So that's like a slightly different issue. And that is what's the purpose of these types? And there, I agree with you. AUDIENCE: I think that's the bridge between the person who doesn't think in terms of type checking. That bridge will enable him to work well and will enable the incremental learning without having to explain concepts. LEO A. MEYEROVICH: Yeah, and actually, our statistics agree with that. When we asked people why they thought static types are good, they didn't think static types were good for bug finding. They thought unit tests, generally, were better. But they did think static types were good for explaining things. So maybe that's what the static type community should look at a bit more strongly. PHILIP: Well, thank you so much, Leo. [APPLAUSE]