Visualization Culture - Data literacy for the rest of us

PRESENTER: Thank you all for coming. We really do appreciate it. We're very excited to have so many different groups represented here tonight. We do have a lot of partnerships that we have throughout the year. So hopefully we can have some events that kind of cater to every different organization. Moving into our next topic, we actually are going to hear from a Google engineer. I'd like to give you a bit of information about her. And I apologize for reading this, but I don't want to get anything wrong. So our speaker is Fernanda Viegas. She's a computational designer whose work focuses on the social, collaborative, and artistic aspects of information visualization. She is a co-leader, with Martin Wattenberg, of Google's Big Picture Data Visualization group in Cambridge, here. Before joining Google, she and Wattenberg founded Flowing Media, Incorporated. It's a visualization studio focused on media, consumer-oriented projects. And prior to that, they led IBM's Visual Communication Lab, where they created a groundbreaking public visualization platform called Many Eyes, an experiment in open public data visualization and analysis. Before joining IBM, Viegas's research at the MIT Media Lab focused on the visualization of online communities. She is known for her pioneering work on depicting chat histories, email archives, and Wikipedia activity. Viegas's interest in the stories that people tell about these activities led to a series of visualizations of personal, emotionally-charged data. Her artistic visualizations have been exhibited in venues such as the New York Museum of Modern Art, the Boston Institute of Contemporary Art, and the Whitney Museum of American Art. Viegas holds a PhD and MS from the Media Lab at MIT. She is Brazilian, and she misses the year-round warm weather she had in Rio de Janeiro, where she grew up. So I'd like you all to welcome Fernanda Viegas. [APPLAUSE] FERNANDA VIEGAS: OK. Let's hope it works. I have a lot of live demos, so there's a lot of opportunity for things to go wrong here tonight. [LAUGHTER] FERNANDA VIEGAS: So anyway, it's an honor to be here talking to you guys. It's always an honor to be talking to an audience with so many women, I have to say, which is quite different. But it's a nice change. So I'm going to talk about something that I think is starting to impact all of us, which is visualization culture. And what do I mean by this? I mean that when I first started working in this field about 10 years ago, visualization was something that scientists did, and serious people with a lot of expertise did, like businesspeople and scientists. And it used to look something like this. It was mainly done by men. And it looked very colorful and very spatially interesting. And it was something that, again, was done for scientific insight, mainly, or for business insight. But the cool thing is that scientists don't have a monopoly over this kind of technology anymore. And in fact, this is the change I want to talk about tonight. Artists, for instance, are starting to use this technology in very interesting ways. So what I'm showing you here is actually something that is not new. This piece is an art piece from the '90s. It's by an artist named Jason Sullivan, and it's called "Homes For Sale." What he did here is he collected 100 pictures of homes for sale in six different cities in the US. So you have Seattle at the top left here and Miami at the bottom right. And then he superimposed all of these pictures and averaged the color of each pixel. And so what you end up with are these ghostly images. You don't see any individual house, but you see something of the overall median house for sale in that city. So you can see that at the top there, on the left, Seattle is awash in a sea of greys. At the bottom middle here, Dallas boasts the greenest lawns. And my Miami has the bluest skies. He then took this same technique and aimed it at something very different, which are photo books. So you have the class of 1988 at the top and the class of 1967 at the bottom. So this is a study-- bad hairdos. [LAUGHTER] FERNANDA VIEGAS: And you can get pretty racy with stuff like that. What he did next is he took this and showed every "Playboy" centerfold by decade. So you can see, that's the '60s at the end, there, '60s, '70s, '80s, and '90s. And you can get a sense that Playboy Bunnies are getting lighter-skinned and blonder as time goes by. So whatever that says about our culture, what it means is that this artist was, back in the '90s, taking some pretty cutting-edge technology to look at rites of passage, at cultural icons, and so forth. Fast forward to 2006. This is a piece called "We Feel Fine," by Jonathan Harris and Sep Kamvar. And what they were interested in, here, was they were mining the blogosphere to understand how people were feeling. So they were looking for sentences like "I feel," or "we feel," or "I felt." And then he would start counting. What is it that people said afterwards? I feel happy. I feel lucky. I feel sad. And then he started visualizing it, and even breaking it down by demographics. So those are women in their 20s, in cold, cold places. And this is what they're feeling like right now. And then, finally, web visualization becomes news. So I am sure that most of you must have come across some of the "New York Times" visualizations, interactive visualizations online, right? And they are really pushing the envelope on what visualization can be in this day and age. So this, for instance, is a visualization of Obama's 2013 budget proposal, where each circle has to do with a specific program. The area of the circle is how many dollars are going there. Green means that program is getting more dollars. Red means that program is getting less dollars. So it's a very different way of looking at tons of numbers at the same time. And keep in mind, this is not just for experts. This for anyone, right? It's a "New York Times" reader on the web. It's sort of like, here, go and make sense of this really interesting and sophisticated graph. This is another piece they did this past year, because the election was so close that what they visualized were all the different paths that could take each one of the candidates to the presidency. So if Obama wins Florida, then these are the other paths, right? If Romney wins Florida, then these. So this is sort of like a simulation. I remember hearing friends, on Election Night, who were sort of sitting in front of this visualization, playing with it back and forth to try and understand, oh my god, OK, so if this other state goes to Romney, what happens next? If this other state goes to Obama, what happens next? So this is really just a simulation of a whole bunch of different scenarios that could happen. But you really know that visualization has made it big when it becomes part of a comic strip. [LAUGHTER] FERNANDA VIEGAS: I am sure a lot of you must be familiar with "XKCD." And what he did here, he was creating these narrative plots, where he was looking, at the top here, he was looking at "Lord of the Rings." So the horizontal axis is time. And then he was plotting when different characters were coming together, or setting apart and then coming together. And there are a lot of sort of plots and subplots happening at the same time. In the middle here is "Star Wars." So you get the idea. But the cool thing, for me, as a visualization researcher, was that this very chart actually inspired at least a couple of submissions to the top information visualization conference in the world. So here are scientists being inspired by a comic strip trying to visualize narrative. So how cool is that, that we're sort of reversing gears here, and it's not just visualization coming out of the ivory tower. It's sort of like, ooh, look at all this interesting, inspiring stuff around us that artists, writers, comic strip writers are doing. What can we take from that? So how did we get there? And this is where I get to talk about my work. And hopefully, some of these projects will give you a sense of my own personal path in understanding all the different uses that visualization can have, and how it becomes more powerful the more mainstream it is. So back in 2003, I started working with Martin Wattenberg. And by the way, all the work that I'm showing here tonight is actually work I did with Martin. So you're going to hear a lot of "we." It's Martin and me. We looked at Wikipedia. This is 2003. So you have to remember that this is a time when nobody knew what Wikipedia was-- it was the beginning of Wikipedia-- let alone what a "wiki" meant, right? So what is this thing? But we happened to look at it, and we thought, wow, isn't it interesting that people are coming together and creating these pretty sophisticated-- are you guys hearing me? AUDIENCE: Yes. FERNANDA VIEGAS: Pretty sophisticated articles that have pictures. They have tables of content. How are they doing this? So we realized at the bottom of each page, there was an Edit This Page link. But then we also realized that Wikipedia makes every edit to any article public on its website. So this is the history of edits to that article that we were just looking at. And it's basically a log of everything that happens to that article. So you can see when the edit happened, who edited it, and what they did, and if they left a little comment saying what it is that they did. So we looked at this, and we're like, wow, this is it. This is what we need to visualize to try and understand how people are doing what it is that they are doing. And nobody had visualized Wikipedia before. We had no idea if we were going to find anything interesting or even meaningful. But we set out to try it. So this is how the visualization works. Imagine you have three people who are working together on a document-- in this case, a Wikipedia article. Mary, Suzanne, and Martin. And I'm going to color them in different colors. And version 1 is all written by Mary. She puts together a little outline or something. And so we're creating a line that represents the length of the article, and it's all colored by her color because she's the one who authored it. Then, in version 2, Suzanne comes along and says, this all looks nice, but I want to add a little paragraph to the end. So her little paragraph, in blue, shows up in her color, and it's just a small snippet. And then it goes on and on, and if someone deletes text, you see that piece of text shrink, the line there shrink, and so forth. What we did then is we started connecting pieces of text that stay the same over time. So you start to get these shapes, colorful shapes. And you start to get things like holes, which means that either some content was added or it was deleted. And then we can start to play some interesting games with this. What I'm showing you here is each version equally distanced from each other. But I can also show the same data in real time, and then we can start to get a sense of rhythm. So you can see that between version 1 and 2, a lot of time passed, but then version 3 came out really fast. And then I could start to highlight each version and see the text on the right with the different colors of the authors. So let me show you the first demo. Let me show you history flow, if I can. So I am showing-- remember, this is 2003 data. This is when we were working on this project. This is the history of edits for the article on design. So I have this wand here that I'm moving, and it's showing me every different version of this article. So OK, not a very interesting article. Let me bring up a more interesting article. So this is the article on cats. So you have here on the left-- obviously, cats are always big. [LAUGHTER] FERNANDA VIEGAS: You have on the left the list of authors, and on the right, you have the article itself. I can start browsing through the article. It's a pretty long article. But the thing I want to call attention to is this white spike here at the bottom. Because it's sort of hanging out there by itself. So what happened here? So if I bring my wand here, I can see that someone added a whole bunch of paragraphs. And they are all talking about the Unix command 'cat.' [LAUGHTER] FERNANDA VIEGAS: I knew this audience would understand this. So what happens then, right? Does someone just go ahead and delete the whole thing? So if I look at the next version, I see that no, instead of just deleting, they created a new page called "cat (Unix)" and redirected that entire piece of content there. So this is some of the dynamics that we started uncovering, is how do you negotiate what piece of content fits or doesn't fit within a given article. So now let me show you the article on abortion. So does anything strike you as different or weird here? What? AUDIENCE: [INAUDIBLE] FERNANDA VIEGAS: Somebody's saying it's been deleted. Exactly. So these black gashes here are mass deletions. This is a huge article, as you can imagine, on abortion. It's beautifully written. And then someone comes in and deletes the whole thing. It gets restored, and then someone over here deletes it and says, "Abortion is great." And then they come back and they say, "Abortion is good." And then it gets fixed. So OK. So we know there was vandalism in Wikipedia. It's interesting to actually see it. But the cool thing, the interesting thing to us, was that when I highlight this deleted version, at the very bottom, there is a time stamp. And it says that the deletion happened on the 17th of December at 4:06. And then if I look at when it got fixed, it's the same day at 4:07. It's a minute later. And we're like, how are these guys doing this? And this is something we started seeing in a bunch of pages. It's just like, one minute, two minutes, and then it would get fix. We're like, how are you guys doing this? So we talked to Wikipedians, and they told us about something called a watch list, which is whenever you edit an article, you can sign up to watch it. And whenever that article gets touched, you get a notification that says, you know, your article's [? been ?] touched. And then we were like, OK, so do you watch, like, one or two articles? They're like, oh, no, we watch hundreds of them, sometimes thousands. I'm like, how-- OK, rewind again. How are you doing this? And so they say explained that a whole community will watch, sometimes, a single page. And then the other strategy is when you get the notification, if you recognize the name of the person who edited, and you trust them, you sort of don't even bother looking at it. But if it's an unknown IP address or a new user, you might want to go check and make sure they are doing the right thing. In fact, if I showed you this same data spaced out by real time, those things were so fast that you don't even see the gashes anymore. And then the last page I want to show you for Wikipedia is one of my favorites. It's chocolate. So it's very pink, but other than that, not very interesting. Except that when I space it out by versions, I get this interesting pattern, the zig-zag. And I always say that when I saw that, I was like, I want a scarf that looks like this. But does anyone have any idea of what the zig-zag might mean? AUDIENCE: Is this an argument? Someone deletes, someone adds back? FERNANDA VIEGAS: Exactly. It's an edit war. [LAUGHTER] FERNANDA VIEGAS: Someone puts something and it gets deleted. And it gets put back, and gets deleted. I can show you what it is. So back here, this little piece of white text was inserted. And it's this small paragraph that says, "Extremely rarely, melted chocolate has been used to make a kind of surrealist sculpture called coulage." So that piece survives for a long time. Oh, and it was put by someone called Daniel C. Boyer. Until someone here says, "Removing Boyer invention." Well, Daniel comes back and says, "Reverting. Coulage is not a Boyer invention." Someone says, "Google search for chocolate coulage finds only Boyer. Reverting. Leave your humbug out." "Reverting." And so on and so forth. [LAUGHTER] FERNANDA VIEGAS: Until Daniel C. Boyer sort of gives up. Which is really bad, because Martin and I did a search afterwards, and chocolate coulage does exist. The other thing we did was to get rid of all the author information. So not have colors by author, but just get the text darker the older it got. So why are we doing this? Well, we're doing this because in a place like Wikipedia, old, untouched text could be a proxy for high-quality text that the community doesn't feel needs to be edited. And if I scroll down this article, you can see that there's quite a lot of dark text there that people are working around. So it was really nice to see this. So what happened with this? We did this visualization. We started to see these patterns. And we wanted to know if things like the very, very fast fixing of mass deletions, how generalizable that was. So given that we were seeing these patterns, what we did then is we downloaded all of Wikipedia and ran a statistical analysis to understand how fast, on average, those fixes were happening. And they were happening within a couple of minutes for the entire site, so that was average. Which was really interesting to see. So now, let's go back. Yes, yes. OK. So from Wikipedia, we started realizing the importance of social. And the fact that people were starting to look at visualizations not only as tools to use in isolation, but really as part of bigger conversations and interesting back-and-forths that they were having with other people. So we thought, how can we design an environment where communication is part of visualization from the ground up? And that's what Many Eyes is. So Many Eyes was launched in 2007. And it was-- it is-- it's still running. It's a public free site where anyone can upload data, visualize that data, and share those visualizations with others. So let me see if I can actually-- I'm going to have to do a little back-and-forth here, maybe? Yes. Let's hope this works. All right. Let me just see. Yes, OK. So this is Many Eyes as it stands today. And basically, it's a sort of like a media site, where you go there and there's always some interesting visualization being highlighted on the front page. And you can see that people are using a bunch of different visualization techniques and they're all looking at very different data sets. So the exercise-- the little thing I want to do with you guys now is do a little exercise. I'm about to show you a map, a world map, of alcohol consumption per capita, per country. So when I do this, what country do you think is going to be the champion? What country do you think is going to be the one with the highest alcohol consumption in the world? [MURMURS FROM AUDIENCE] FERNANDA VIEGAS: Russia. OK, Russia. Any other guesses? [MURMURS FROM AUDIENCE] FERNANDA VIEGAS: Germany? [MURMURS FROM AUDIENCE] AUDIENCE: France. Belgium. FERNANDA VIEGAS: France? Belgium? OK. All good guesses. So-- oh, I was afraid of this. You know, guys, we're going to have to quit out of this guy and start again. So isn't that fun? I told you there were a lot of opportunities for things to go wrong. Oh no. I know. Let's see. Oh, let's see if I have everything I need here. AUDIENCE: [INAUDIBLE] windows from last session. FERNANDA VIEGAS: Where? Where did you see that? I didn't see that. Oh, here. Awesome. Oh, you guys are good. That's why it's a good audience. OK. So let's hope this comes up? It's coming up. Good. Awesome. OK. So this is our world map that I was talking about. So Russia. Russia consumes 10.3 liters of alcohol per capita. So 10.3 But our scale goes to 17.6. So Russia is not the champion. Someone said England. 11.8. Better, but not quite. France, 11.4. And someone said Ireland? 13.7. Anyway, you would think it would be in this vicinity, but it's not. The champion is here in Africa. It's Uganda-- 17.6. So when we first saw this, we were like, wait, is this right? Because we had sort of the same guesses as you did. And we're like, either the data is wrong or something is going on that we didn't know. So the cool thing about Many Eyes is that every visualization is not only a visualization but it's an opportunity for conversation. So at the bottom, people started a really interesting conversation about what was going on. And in fact, this person here, for instance, said, "For the story on Ugandese alcohol consumption, check this link. It seems like a national epidemic." Actually, it was pretty sad. It had something to do with AIDS and men drinking a lot. So it was not good. But the data was true. The other thing that's interesting about this map, though, is the fact that there's a whole swath of white countries there. White meaning they almost don't consume alcohol. So this also became a point of conversation. And any guesses about what that might be? AUDIENCE: Religious? AUDIENCE: Religion. FERNANDA VIEGAS: Religion. Yes. So another cool thing that someone did is to link the conversation to this other Many Eyes map that shows the percentage of Muslims in the world population. And you can see that it's almost the opposite map of the map we had before. So this was exactly the kind of conversation that we wanted to see, which was based on data. And where people were really discussing hypotheses and trying to explain what they were looking at. Let's wait for this guy to-- OK. I think I am where I want to be here. So this is a very different visualization. This is called a tree map. A tree map is a visualization of hierarchical data. Each one-- we're looking at cars here, different car models, and how they do in terms of consumption, in the city versus highway mileage that they get. Each one of these little rectangles here is a car model. So this one is an Audi. This one is a Mercedes-Benz. This one is a BMW. This one's Kia Optima, and so forth. The color of the rectangle has to do with how well they do in the highway versus city. So the better they do, the difference between highway mileage and city mileage, the bluer they are. Which would make sense. It makes sense that most of these cars are blue. Except that we have very orange ones. This means that these cars are really much better in the city than they are on the highway. So I can start highlighting these. OK, so this is a Toyota Prius. This is an Escape Hybrid. This is a Mariner Hybrid. You start to get a sense of what the pattern is. But I can also start changing the hierarchy of the way I'm grouping things. Right now, I'm grouping cars by their class. Midsize cars are here. Compact cars are here. If I change this and I do by manufacture, I can see that Toyota has a lot of oranges. Lexus has a couple. But then if I bring up transmission as the grouping principle, I can see that all of a sudden, all my outliers are here. So this is interesting, because it tells me that not only are we talking about hybrid cars, we're talking about hybrid cars with a specific kind of transmission, and those are the ones who do much, much better in the city versus on the highway. And that was another kind of data set that was unexpected. But then, imagine this. This is a public website. People are uploading whatever data they want to upload to visualize. Everything here is public, so we say that very clearly. One of the nice things were these surprises that we would get. So for instance, someone used the same exact technique, which is quite sophisticated, to visualize their wedding. So this is a tree map of someone's wedding, where each rectangle is a guest. So all the guests who said yes, we're coming, are sort of this warm color here. And all the guests that said no are here, in the greenish color. People who said maybe, people who said drinks only. So it's a very sophisticated kind of wedding list. And then you have all these hierarchies at the top. So if I bring up detailed category, I can see, you know, these are aunts and uncles. These are the parents' friends. These are the cousins. These are the bride's friends, and so forth. I mean, this is really serious. [LAUGHTER] FERNANDA VIEGAS: And if I bring up country, I can see that it's US and UK. So it's an [? international ?] wedding. How interesting. There are people coming from Italy, Austria, Ireland. And then my favorite one, which is if I bring up age, all the way to the top, I have young and old. [LAUGHTER] FERNANDA VIEGAS: I have no idea what the threshold is and why there aren't any people in the middle. But you are either young or you're old in this wedding. But the cool thing is that here's someone who's looking at their personal data, quite personal. They're using a pretty sophisticated visualization technique. And they did it so well that I have no idea who they are. This is all anonymous data. It's very personal, but it's very anonymous. I have no idea who these people are, which is perfect for something like Many Eyes. The other thing we saw, as soon as we launched Many Eyes, was that people-- we had a dozen different kinds of visualization techniques, and they were all aimed at visualizing numbers. But as soon as we launched the site, we saw people uploading text. They wanted to visualize text. So what we did is right away, we did a word cloud, which is what everybody does. But we wanted to do more than that. The problem with word clouds is that they only show frequency, and they decontextualize everything. OK, so this word happens a lot. Is it good or bad? What's happening with this word? So we wanted to show frequency, but in context. So we created something called the word tree. So this is what the word tree looks like. We're looking at the entire King James Bible here as a tree. And what you do is you search for things. So here I searched for "and God." And the word tree shows me everything that, in the Bible, comes after that phrase, "and God." So I can say, "and God said," "and God saw," "and God made." Blessed, spake, hath. And then I can start looking further. I can say, well, what did God say? Oh, he said, let there be light. Let there be a firmament, and so forth. And I can start very quickly interacting with this big corpus of text. So what did God see? And I can very quickly get to specific verses of the Bible. I can also play games, such as, OK, let me put a question mark. And let me put the question mark at the end. So I have all the questions in the Bible at once. And I can say, what are questions being asked of me? What are things being done unto me? And so forth. Let me go back. And you can just play with this and search for-- oops. Let me go up. How are we-- oh, no. You can just play with the text and see both macro patterns and micro patterns, also. And again, I want to contrast this very serious piece of text with another data source that someone put up, which was this one. These are personals ads buy males who say I'm married. And so these are all married males who are looking for things, for companionship. And I love the difference in philosophy. Right? It's like, I am married and looking, or I am married but looking. What does that mean? [LAUGHTER] FERNANDA VIEGAS: So I am married and plan on staying that way. I am married and looking for someone discreet. Or I am married but looking. I'm married-- it gets sad. [LAUGHTER] FERNANDA VIEGAS: I'm going to stop here. But you know, you get the point of the fact that the data sets were really just incredibly diversified. So I'm going to click out of this guy, and let's go back to our presentation. So the last visualization I want to tell you about on Many Eyes is something called the Phrase Net. And the Phrase Net came about because we wanted to create a concept map of a large body of text. So if you could say, OK, what are the concepts being talked about here? And then, one of the things that people have been doing, and it takes a lot of power, is sort of like semantic analyses of text. And it takes a lot of time and it takes a lot of power. And we wanted to do the dumb stuff. We're like, what can we get away with, that's really low-hanging fruit. How far can we get? We decided to look at repetition. Once more, repetition is always a visualization friend. And we decided, OK, if you have these templates and you just point these templates at text, do you get repetition that is meaningful enough that will start to give you a sense of concepts, and how concepts are connected in a piece of text? So basically, this is what we created. We created a template where you would have whatever word and whatever other word. And in the middle, you would have some template. And we chose "and" here, but and is not a Boolean. And could be anything. It could be over, under, yellow, green-- whatever word. And then what we do is we run a piece of text. So for instance, if I was looking at Jane Austen's "Pride and Prejudice," I would come to this phrase here, "pride and impertinence." And that would match my template, so that becomes an edge on my visualization. It becomes pride, edge, impertinence, where the edge is "and." So if I run the entire "Pride and Prejudice" book through this, I get this interesting network. And what's interesting about this network is that at the bottom here, I have pretty much the social network of the book. So I have the father, Elizabeth, Jane, the mother, and so forth. And then I've got other kinds of interesting concepts there. Let's zoom in. I get a cluster that says pride, impertinence, conceit, folly, vanity, ignorance. So all these negative things are sort of happening together as one cluster. I also have another cluster, which is amiable, pleasing, affectionable. Or wishes, affection, hope, solicitude. So all these sort of positives things also happening. So all of a sudden we start to have these clusters that we got for free just by looking at repetition, really. If we changed our template and we say "X at Y" instead of "and Y," we get a network of places in the book. So I have Longbourn, Netherfield Pemberley. And I start to see things that are happening at these different places. So for instance, in Longbourn, I have people assembling, reappearing. I have visitors and so forth. If we look at a different piece of text, if we look at the Bible again, and we do "X begat Y," I get the family tree, right? So these are the families. If I zoom in to that circle there, I see who begat whom. And I get these interesting sort of circular patterns here, when the son has the same name as the father or the grandfather, and so forth. And again, looking at the Bible, if I do "X of Y"-- this is the Old Testament. This is the New Testament. Children of Israel. King of Israel. And here, kingdom of God, son of God, and so forth. So again, just very low-hanging fruit to try to visualize a conceptual map of text. So Many Eyes has many more visualization techniques than I have time to show here. But if you haven't played with it and you have data that can be public, it's a fun place. So what were people doing with it? Two days after we launched, someone named Crossway uploaded a co-occurrence data set for biblical verses. So in other words, whenever you have a verse of the Bible, if two people show up in the same verse, that becomes a data point. That person then created a visualization of social network. This is the New Testament. You can imagine who the big dot is, in the middle, right? He then blogged about it. And immediately, he started getting hundreds of comments. It was really impressive. Within a couple of days, this got out of the Christian blogosphere and expanded to things like Boing Boing, YouTube. This a screenshot of a YouTube video where the person is playing with the visualization and looking at different hypotheses that the visualization allows him to make, and questions, and so forth. And then the other thing is that, in turn, people started seeing these visualizations and they started uploading their own data sets, their own biblical data sets, to Many Eyes, creating visualizations, and then going back to the conversation that they were having with their community before. And this is something that we could only dream of happening when we launched Many Eyes. Which is you have a community that has data that's interesting to them, and they use that data to create visualizations and charge the conversation that they want to have. Even though we designed Many Eyes for lay users-- there's no database. The technical parts are very, very simple. You just copy and paste your data. That's about it. We got scientists using Many Eyes, a lot of them. So for instance, this is a historian who was looking at the Canadian Parliament and created a bunch of different visualizations and then wrote to us. He was finishing his PhD, and he wrote to us and said, I just found out new things about my data set. Can I talk about this in my thesis? We're like, yeah, of course. Score for you. This is a linguist who was looking at language formality in corporate blogs. And as you can see, he used a bunch of different kinds of visualizations. And then he would create these visualizations, go back to his blog-- this was during his master's-- and talk about it, and then the rest of the community would chime in. This is a geneticist who was looking at micro-arrayed data. And he created maybe hundreds of these. We thought they all looked beautiful. We have no idea what they mean. And then he wrote to us and said, I just found out something really interesting. I want to publish a paper. Can I show the visualizations? We're like, yes, of course. This is why we built Many Eyes, for people to use it. As long as you credit Many Eyes, that's cool. So really, this interesting and unintuitive thing, where here we were, designing for the regular person, thinking scientists have access to tools like this. And yet, we had a lot of interest from experts and scientists. So going from a place that was designed to-- I have five minutes. OK. Very quickly. I'm going to talk about ripples, Google+ ripples. So this is a visualization where we created a visualization inside a product. So this is the idea where-- [INAUDIBLE] --Have to go outside a place where you are having your conversation to look at visualization. So the question here is, how do things flow and get shared on Google+? Google+ being, obviously, the social network that Google launched. And so this is the sharing tree for an ad that Volkswagen did for the Super Bowl last year. It was something called "The Bark Side." It was a video. It was all of these little dogs barking the "Star Wars" theme song. So it went viral. So what's happening here, Volkswagen posted this on Google+. And then we create edges whenever any of the followers from Volkswagen reshares that link with someone. But this doesn't look very viral to me, because it's not very big. But this is the action around Volkswagen. Other people saw this video and were resharing this video outside of Volkswagen, right? So this is what the entire sharing graph actually looks like. So let's take a look at this. Let me see. Let me see if I can get this really quickly. Yes. Yeah. One more. OK. So this is the live visualization. And I can see that this person, Chris Pirillo, was hugely influential. So the more followers who reshare from you, the bigger your circle becomes. So this person reshared from Chris. I can zoom in. Josh Armor, then Sarah Ling reshared. I can zoom in to her. I can go back. I can actually look for Volkswagen here. And Volkswagen is right here. So you can see how small their circle is, right? I can zoom into their circle and see that it's the same tree we were looking at before. So this is something that is really interesting to see, is that individuals have, sometimes, a whole lot more influence and power on these social networks than the official channels who actually created the content-- in this case, Volkswagen. And you can start to see the different communities who are all paying attention to this. And the thing I wanted to say about that is that we started seeing different patterns were interesting. So this is what we're calling a celebrity pattern. So Felicia Day happens to be somewhat of a celebrity and has tons of followers. And then she shared a link to a video, and then tons of her followers reshared that immediately. And then that's it. It fizzles out. So it's a very broad tree, but it's a very shallow one. Very different from this sharing pattern. This is sharing a link to a petition to the White House to open up scientific journals. So you can see different communities with deep sharing chains. These are people from outside the US, who are also helping to boost this up. This is one of our usual suspects, Tim O'Reilly. The other thing, at the bottom here, you have a timeline of activity. You can see that this gets a lot of activity, and then it sort of dies. And then Tim O'Reilly comes up right here, and it gets another boost of activity. And you have different patterns like this, where you do have sort of a celebrity thing going on, but they are not the only ones sharing the content here. And then this one, which is going to take me to my last project, is a visualization where Martin and I were using this as a mirror. This is a visualization of how one of our projects got reshared on Google+. This is a visualization we did of wind. It was a map of the wind in the US. And the cool thing for us is that we started seeing-- so I'm just a little dot here. I have no impact or influence or whatever. But these other people have tons of influence in resharing our content. And you know, this is a very high-up executive at Google. This is a mathematician. This is an author. There's someone else here who's from the "New York Times." So we started seeing all of these different communities who were paying attention at this visualization, and people we had no idea liked visualizations. So it's really nice to be able to see how your own content sort of does. So to finish off, I want to show one last project. This is the Wind Map. So the Wind Map-- let's see how it does. So this is the wind right now in the US. And we can actually zoom in to Boston and see how we're doing. We're actually doing really well. The wind is pretty low, 2.4 or so around our area. But the thing I want to show you is that you might look at this and be like, oh, what's so interesting about this? It's how we got there. You would think that visualizing something like the wind would be pretty straightforward. But it's not. So I'm going to show you how we actually do stuff and how crazy it gets. So we've got the data from the wind, and we decided to-- we were like, how are we going to visualize this? So this is actually just the vector field of the wind around the world. Looks horrible. That's the purpose. We're like, we're going to do something just horrible in the beginning, just to remind ourselves, we don't want anything that looks like this. Then we thought, oh, we know what we're going to do. Wind is nothing more than particles floating around. It's just the air floating around. So let's show it as a particle field, as just particles. And this is what we get when we show particles, the wind as particles. So this is the world. Can you see the pattern? I can't. [LAUGHTER] FERNANDA VIEGAS: It's looks just like a bunch of little ants. So we decided, what? Particles don't work? What is that supposed to mean? Maybe if we have a very specific shape that we're all familiar with. So say the US, and we do particles on the US. Maybe that will work. And so we tried that next. So this is better, right? You can see the US. But if I ask you, what part of the US has the fastest wind or the slower wind, can you guess that? No, right? Isn't that amazing, that particles just don't do it? How is that possible? So we're like, OK, we're thinking about this in the completely different-- like, this is just wrong. We don't need movement, or-- actually, we need color. We're like, let's do something where we do areas that are very, very colorful, and they just flow in the way that the wind is going, and they just take the speed of the wind with them. That's it. That's how we're going to solve this. So we went with the colored areas and decided to flow them. [LAUGHTER] FERNANDA VIEGAS: Exactly, right? So it sort of just melts your eyes, and you don't get anything meaningful. But that's the challenge with data visualization, is that nothing is obvious. So then we decided, OK, the idea that actually worked, we're going to go back to our particles, even though these seemed horrible. And what we're going to do is we're going to add some transparency to them, so that they leave traces as they move along. And this was sort of one of our last attempts. So by doing that, we start to see stuff. We start to see patterns. And this is not what the Wind Map-- it's not exactly the Wind Map, but it's pretty close. By then, we were like, OK, we have something. We just need to polish it up. And so that's what we did. And so the Wind Map shows the wind live, always. And then the cool thing is that we also have this gallery that shows you how different things can look from one day to another. So this is actually Hurricane Isaac. And it was interesting, because it was happening, it made landfall, and we started getting emails from people there who were in New Orleans, saying, I'm sitting here, I'm looking at your map, and just hoping that this thing passes through. And we didn't really-- we were like, we're right there with you. But it wasn't until we got this guy here that we really understood what it feels like to be in the path of something. So this is Sandy. This is after it had become a tropical storm. It wasn't a hurricane anymore. But just to give you a sense of all the different-- the wind can do crazy things from one day to another. And it can go in very, very different ways. This is not a hurricane at all. This was a week when we had a lot of tornadoes in the Midwest. And we got a nice email from a teacher, saying, I was sitting with my students looking at this, and we predicted that we would have tornadoes. And we did have tornadoes. And this is the final thing I want to say is something like the wind map-- I'm going to pass all of this. We have no time. It's sort of like the impact that it had that was completely unexpected to us. We did the wind map and we thought, this is going to be a geeky project. Who cares? Everybody has access to weather maps and wind maps. They've been around forever. Who cares? It turns out a lot of people care. So this is the project that we've got the most email, tons of email, to this day. So bird watchers love to look at this. We got email from a community of butterfly watchers who were using the Wind Map to help track migration. Professionals-- this is actually a professional meteorologist. We have someone who said, I've been working with this data professionally for 28 years, and your visualization gave me a new intuition for the data. And then we have tons of pilots, surfers, sailors, you name it, who write to us. And pilots are writing to us, and we're thinking, please don't use the Wind Map. Please. [LAUGHTER] FERNANDA VIEGAS: It's like, [INAUDIBLE]. We do visualize surface wind data, so that's from the bottom to, like, 10 meters. You don't want to be looking at this. And we got such a response-- we got a response from firefighters of wildfires. And we had to add a disclaimer. It was the first project we ever had to add a disclaimer, saying, please do not use the map or its data to fly planes, sail a boat, or fight wildfires. And we got an email, even after that, saying, please respect the power of the visualization. So I was like, all right. But just saying. It's unofficial. [LAUGHTER] FERNANDA VIEGAS: But anyway, that's the thought I want to leave you with tonight, is the fact that we are in a new moment. We are in a new and, I think, very exciting moment, where data is not only something that the government has. It's something that we're dealing with every day, and it's becoming part of our story. [INAUDIBLE] part of how we tell our stories. I becoming part of how we understand the world. And so it's really exciting to see people using visualization to understand this new reality and to interface with things like statistics or math or whatever in a more natural way, for a lot of people. And so I hope that you, as engineers and as scientists, are part of this. I think it's a new culture. So thank you. [APPLAUSE] PRESENTER: So we do have a few minutes, if anyone has any questions for Fernanda. We can take those now. AUDIENCE: Hi. You kind of touched on this in your ending, as far as not doing government-based work. But I know the [INAUDIBLE] and a lot of government agencies, they have huge, big data issues, right? They're trying to do a data-to-decision quickly and most efficiently. And this seems like a great method of getting to that point. So I'm wondering if you have worked with them or gotten in touch with such agencies. FERNANDA VIEGAS: So we personally have not. But it's interesting. So the Wind Map, for instance, is all based on government data. And thank god for government data. So it's all based on NOAA data. And what was cool was that NOAA, after we launched the Wind Map, got in touch with us and said, hey, can we use your algorithms to actually visualize the flow of currents in the Great Lakes? And we're like, sure. And so they did. They used our algorithm to visualize the flow of currents in the Great Lakes, and different depths, which was really cool. But yes, you are absolutely correct in that all different kinds of government agencies are interested in this. A big part of their interest, also, is in things like text visualization. They have a lot of text. Text is very, very hard to get a grip on. So text visualization's another area that is of interest, yes. PRESENTER: Any other questions? AUDIENCE: I was just wondering, is Google Ripple an in-house term? Or is it-- FERNANDA VIEGAS: No, Google Ripples is available on Google+. So all you do is-- let me see if I have it open here. Yes. Whenever you are in your stream-- this is my stream-- and you see something that has been shared-- so this, for instance, has been shared. You go to the menu here on the post, and you say, View Ripples of that thing. This Ripple is going to be very small, because it's only 11 things that got shared. But yeah, it's a public-- oh. Where is it going? Anyway. It should-- I've having trouble. But that's-- it what? AUDIENCE: [INAUDIBLE]. FERNANDA VIEGAS: Did it go somewhere else that I'm not seeing? Anyway, just go to any public post. It only shows public posts. So even if you're sharing a lot of things privately, we don't show any of that. AUDIENCE: Probably wouldn't have been shared many times that specifically, but the Ripple itself is showing the whole thing. FERNANDA VIEGAS: It should be coming out, though. No. Yeah. Not for me. AUDIENCE: How do you get, I guess, the ideas for what you're going to visualize? And what are some things you'd like to visualize in the future? FERNANDA VIEGAS: So it's a very interactive process, in which we usually get the data first, and then we're like, oh, how can we visualize this? So it's sort of like with the Wikipedia thing. We had no idea how it would work-- even what it meant that we wanted to visualize Wikipedia. And it wasn't until we saw that log of edits that we're like, OK, this is data we can really grab onto and try to visualize. So it's sort of a give-and-take with the data set itself. Things I would like to visualize that I haven't visualized yet-- I think different kinds of media. So things like video. We have tons of video. How can we visualize video in an interesting way? Nobody has done that. So maybe you can do it. AUDIENCE: I was just wondering, as you familiar with the research that Deb Roy and Michael Fleischman have been doing at Bluefin? FERNANDA VIEGAS: Oh, yeah. AUDIENCE: What do you think of that? Is it cool? Hype? FERNANDA VIEGAS: So Deb Roy is a professor at the Media Lab, just to catch everybody up. And if I am thinking about the same thing you were thinking, he is well known for having had-- I don't know if he still does this, but he had a whole set-up where he had cameras all over his house, and he would take video footage of his little kid, ever since he was born, until I don't know, until maybe he was like four years old or something. And he was trying to understand how his kid got language, how he would pick up words. And if you had access to pretty much every single moment of this kid's life, what could you understand from how language is formed in a kid's mind? And I think you're right. They did try to do visualizations of video for that. Because they had just so much video footage. The visualizations I'm familiar with, that they did, I don't think they were really the tools they were using to explore that data set. I think they were more sort of like, oh, how can we show the kinds of things we know in an easy way? In a visual way? So it's sort of like almost after the fact, instead of an exploratory visualization, let's find the patterns here. AUDIENCE: So some of these visualizations that you're showing are pretty complicated and have a lot of-- they're kind of unusual, not things that people would have seen before. I'm just wondering how do you determine which visualizations are going to make sense to people? Do you do a lot of usability testing? What kind of process do you go through for that? FERNANDA VIEGAS: So we do some testing. But it's not like we have focus groups that come in. Basically, we're always testing with people who are not familiar with the project we're doing, and try to get a sense of how well they would respond to it. Having said that, there are two things that I think are interesting about that, that question specifically. One is a lot of times, I found out we tend to underestimate users in the following sense. When we launched Many Eyes, for instance, we knew that we wanted to have all of the visualization techniques, all of the graphs and charts that people are very used to, from things like Excel. So we wanted to have a bar chart, a line chart, a pie chart, a scatter plot, things of that nature. And then we were like, ooh, should we have a tree map? Which was the thing I was showing you. I don't know. Tree maps come from academia. Nobody really uses them. Nobody really understands them. But we're like, oh, what the heck. Let's put a tree map there, and if people don't understand it, they won't use it. And then we saw it be used over and over again. So that was really interesting and unexpected. The other thing is when you look at things like the "New York Times," where they are pushing the limit and they are showing things that are very sophisticated, I think it's the kind of thing where I think visual literacy is going up. Not necessarily because we are having traditional classes in what does it mean to use this technique or this other technique? And that's a whole other thing. I think we should have more of that in regular education. But because we're starting to be surrounded by these things. You're just exposed to them all of time. All of the time. And not only that, they're fun to play with, right? They do things. You'll click on things, things happen. Oh, interesting. How does this one work? So maybe you get people that way, too, by how appealing it is. Nothing I'm saying here is to say user studies are of no interest. They definitely are. And we always try our things with people. But it's not like we have a random sample of people, that we have a really strict methodology. We basically just try it out. And the last thing I want to say about that is that you always know when you have a hit, when you've hit a sweet spot for a visualization. It's when you give it to a friend of yours or to anyone, and they just can't stop playing with it. Oh, what about this? Oh, what about, oh! You're like, OK, I'm done. My work here is done. OK, next? So it sort of speaks for itself a lot of times. PRESENTER: I think we've got time for about two more questions. AUDIENCE: So in a couple of answers, you mentioned either getting access to the data and playing with it and seeing what visualization works best, and putting things out there and letting users toy with it. Do you see visualization as more of a young science or an advanced art? FERNANDA VIEGAS: I think it's a mix of those things. I think it's definitely a young science, in the sense that a lot of the techniques come from academia. And in the sense that when it first started, visualization was something that you could really only do in very expensive machines that universities and industry labs had access to. So it sort of historically could only happen in these places. So it was very scientific. But more and more, you don't need anything special to do visualization. Visualization on the web is a thing that exists today, and just about anyone can do it. So in that sense, I think it's a young-ish science, but it's becoming less of a science. At the same time, you have things like scientific visualization, in which you need like volume rendering and 3D techniques. And those continue to be still quite sophisticated and [INAUDIBLE] and resource-consuming enough that you have to have a little bit more infrastructure. But to me, the interesting thing is that these two worlds are coming together. It's the scientists doing their things and their explorations, but really also the artists, the journalists, the authors using these techniques in a completely different and interesting, refreshing way. So to me, I think it's a little bit of both. AUDIENCE: So my question is a little bit out of curiosity. So I'm wondering, if you had a chance to start from scratch, what would you do differently? You actually talked about, the ideas became popular really quickly. So what would you do different if you start from scratch? FERNANDA VIEGAS: Oh. You're saying if I start from scratch today? AUDIENCE: Yeah. Like developing this idea. FERNANDA VIEGAS: I think I would have all of my projects online and publicly available to people to use. So things like the project I started with, history flow that I showed you, something that was done in a lab. And it was done as an tool for me and Martin, really. We were looking at Wikipedia. And it wasn't until much, much later that people started writing to us and saying, wait a second, I am doing code source revisions. I want to look at my code and the code that I work with, with my colleagues. Can you share this technique? Or I'm working on this other project, where I'm looking at how bills in Congress get edited. Can you share this technique? So all of a sudden, it started do dawn on us that these things need to get out of labs. They need to be available on the web. They need to be freely available for a bunch of different communities. So I think that would be one thing I would have done differently. And then taken more statistics classes-- [LAUGHTER] FERNANDA VIEGAS: I think would have helped, too. PRESENTER: Great. I think that's all we have time for. Can you all join me in thanking Fernanda for being here? [APPLAUSE]