Tip:
Highlight text to annotate it
X
PRESENTER: Thank you all for coming.
We really do appreciate it.
We're very excited to have so many different groups
represented here tonight.
We do have a lot of partnerships that we have
throughout the year.
So hopefully we can have some events that kind of cater to
every different organization.
Moving into our next topic, we actually are going to hear
from a Google engineer.
I'd like to give you a bit of information about her.
And I apologize for reading this, but I don't want to get
anything wrong.
So our speaker is Fernanda Viegas.
She's a computational designer whose work focuses on the
social, collaborative, and artistic aspects of
information visualization.
She is a co-leader, with Martin Wattenberg, of Google's
Big Picture Data Visualization group in Cambridge, here.
Before joining Google, she and Wattenberg founded Flowing
Media, Incorporated.
It's a visualization studio focused on media,
consumer-oriented projects.
And prior to that, they led IBM's Visual Communication
Lab, where they created a groundbreaking public
visualization platform called Many Eyes, an experiment in
open public data visualization and analysis.
Before joining IBM, Viegas's research at the MIT Media Lab
focused on the visualization of online communities.
She is known for her pioneering work on depicting
chat histories, email archives,
and Wikipedia activity.
Viegas's interest in the stories that people tell about
these activities led to a series of visualizations of
personal, emotionally-charged data.
Her artistic visualizations have been exhibited in venues
such as the New York Museum of Modern Art, the Boston
Institute of Contemporary Art, and the Whitney Museum of
American Art.
Viegas holds a PhD and MS from the Media Lab at MIT.
She is Brazilian, and she misses the year-round warm
weather she had in Rio de Janeiro, where she grew up.
So I'd like you all to welcome Fernanda Viegas.
[APPLAUSE]
FERNANDA VIEGAS: OK.
Let's hope it works.
I have a lot of live demos, so there's a lot of opportunity
for things to go wrong here tonight.
[LAUGHTER]
FERNANDA VIEGAS: So anyway, it's an honor to be here
talking to you guys.
It's always an honor to be talking to an audience with so
many women, I have to say, which is quite different.
But it's a nice change.
So I'm going to talk about something that I think is
starting to impact all of us, which is
visualization culture.
And what do I mean by this?
I mean that when I first started working in this field
about 10 years ago, visualization was something
that scientists did, and serious people with a lot of
expertise did, like businesspeople and scientists.
And it used to look something like this.
It was mainly done by men.
And it looked very colorful and very spatially
interesting.
And it was something that, again, was done for scientific
insight, mainly, or for business insight.
But the cool thing is that scientists don't have a
monopoly over this kind of technology anymore.
And in fact, this is the change I want
to talk about tonight.
Artists, for instance, are starting to use this
technology in very interesting ways.
So what I'm showing you here is actually
something that is not new.
This piece is an art piece from the '90s.
It's by an artist named Jason Sullivan, and it's called
"Homes For Sale."
What he did here is he collected 100 pictures of
homes for sale in six different cities in the US.
So you have Seattle at the top left here and Miami at the
bottom right.
And then he superimposed all of these pictures and averaged
the color of each pixel.
And so what you end up with are these ghostly images.
You don't see any individual house, but you see something
of the overall median house for sale in that city.
So you can see that at the top there, on the left, Seattle is
awash in a sea of greys.
At the bottom middle here, Dallas boasts
the greenest lawns.
And my Miami has the bluest skies.
He then took this same technique and aimed it at
something very different, which are photo books.
So you have the class of 1988 at the top and the class of
1967 at the bottom.
So this is a study--
bad hairdos.
[LAUGHTER]
FERNANDA VIEGAS: And you can get pretty racy
with stuff like that.
What he did next is he took this and showed every
"Playboy" centerfold by decade.
So you can see, that's the '60s at the end, there, '60s,
'70s, '80s, and '90s.
And you can get a sense that Playboy Bunnies are getting
lighter-skinned and blonder as time goes by.
So whatever that says about our culture, what it means is
that this artist was, back in the '90s, taking some pretty
cutting-edge technology to look at rites of passage, at
cultural icons, and so forth.
Fast forward to 2006.
This is a piece called "We Feel Fine," by Jonathan Harris
and Sep Kamvar.
And what they were interested in, here, was they were mining
the blogosphere to understand how people were feeling.
So they were looking for sentences like "I feel," or
"we feel," or "I felt."
And then he would start counting.
What is it that people said afterwards?
I feel happy.
I feel lucky.
I feel sad.
And then he started visualizing it, and even
breaking it down by demographics.
So those are women in their 20s, in cold, cold places.
And this is what they're feeling like right now.
And then, finally, web visualization becomes news.
So I am sure that most of you must have come across some of
the "New York Times" visualizations, interactive
visualizations online, right?
And they are really pushing the envelope on what
visualization can be in this day and age.
So this, for instance, is a visualization of Obama's 2013
budget proposal, where each circle has to do with a
specific program.
The area of the circle is how many dollars are going there.
Green means that program is getting more dollars.
Red means that program is getting less dollars.
So it's a very different way of looking at tons of numbers
at the same time.
And keep in mind, this is not just for experts.
This for anyone, right?
It's a "New York Times" reader on the web.
It's sort of like, here, go and make sense of this really
interesting and sophisticated graph.
This is another piece they did this past year, because the
election was so close that what they visualized were all
the different paths that could take each one of the
candidates to the presidency.
So if Obama wins Florida, then these are the
other paths, right?
If Romney wins Florida, then these.
So this is sort of like a simulation.
I remember hearing friends, on Election Night, who were sort
of sitting in front of this visualization, playing with it
back and forth to try and understand, oh my god, OK, so
if this other state goes to Romney, what happens next?
If this other state goes to Obama, what happens next?
So this is really just a simulation of a whole bunch of
different scenarios that could happen.
But you really know that visualization has made it big
when it becomes part of a comic strip.
[LAUGHTER]
FERNANDA VIEGAS: I am sure a lot of you must be familiar
with "XKCD." And what he did here, he was creating these
narrative plots, where he was looking, at the top here, he
was looking at "Lord of the Rings." So the
horizontal axis is time.
And then he was plotting when different characters were
coming together, or setting apart
and then coming together.
And there are a lot of sort of plots and subplots happening
at the same time.
In the middle here is "Star Wars." So you get the idea.
But the cool thing, for me, as a visualization researcher,
was that this very chart actually inspired at least a
couple of submissions to the top information visualization
conference in the world.
So here are scientists being inspired by a comic strip
trying to visualize narrative.
So how cool is that, that we're sort of reversing gears
here, and it's not just visualization coming out of
the ivory tower.
It's sort of like, ooh, look at all this interesting,
inspiring stuff around us that artists, writers, comic strip
writers are doing.
What can we take from that?
So how did we get there?
And this is where I get to talk about my work.
And hopefully, some of these projects will give you a sense
of my own personal path in understanding all the
different uses that visualization can have, and
how it becomes more powerful the more mainstream it is.
So back in 2003, I started working with Martin
Wattenberg.
And by the way, all the work that I'm showing here tonight
is actually work I did with Martin.
So you're going to hear a lot of "we." It's Martin and me.
We looked at Wikipedia.
This is 2003.
So you have to remember that this is a time when nobody
knew what Wikipedia was-- it was the beginning of
Wikipedia--
let alone what a "wiki" meant, right?
So what is this thing?
But we happened to look at it, and we thought, wow, isn't it
interesting that people are coming together and creating
these pretty sophisticated--
are you guys hearing me?
AUDIENCE: Yes.
FERNANDA VIEGAS: Pretty sophisticated articles that
have pictures.
They have tables of content.
How are they doing this?
So we realized at the bottom of each page, there was an
Edit This Page link.
But then we also realized that Wikipedia makes every edit to
any article public on its website.
So this is the history of edits to that article that we
were just looking at.
And it's basically a log of everything that happens to
that article.
So you can see when the edit happened, who edited it, and
what they did, and if they left a little comment saying
what it is that they did.
So we looked at this, and we're like, wow, this is it.
This is what we need to visualize to try and
understand how people are doing what it is
that they are doing.
And nobody had visualized Wikipedia before.
We had no idea if we were going to find anything
interesting or even meaningful.
But we set out to try it.
So this is how the visualization works.
Imagine you have three people who are working
together on a document--
in this case, a Wikipedia article.
Mary, Suzanne, and Martin.
And I'm going to color them in different colors.
And version 1 is all written by Mary.
She puts together a little outline or something.
And so we're creating a line that represents the length of
the article, and it's all colored by her color because
she's the one who authored it.
Then, in version 2, Suzanne comes along and says, this all
looks nice, but I want to add a little paragraph to the end.
So her little paragraph, in blue, shows up in her color,
and it's just a small snippet.
And then it goes on and on, and if someone deletes text,
you see that piece of text shrink, the line there shrink,
and so forth.
What we did then is we started connecting pieces of text that
stay the same over time.
So you start to get these shapes, colorful shapes.
And you start to get things like holes, which means that
either some content was added or it was deleted.
And then we can start to play some
interesting games with this.
What I'm showing you here is each version equally distanced
from each other.
But I can also show the same data in real time, and then we
can start to get a sense of rhythm.
So you can see that between version 1 and 2, a lot of time
passed, but then version 3 came out really fast.
And then I could start to highlight each version and see
the text on the right with the different
colors of the authors.
So let me show you the first demo.
Let me show you history flow, if I can.
So I am showing--
remember, this is 2003 data.
This is when we were working on this project.
This is the history of edits for the article on design.
So I have this wand here that I'm moving, and it's showing
me every different version of this article.
So OK, not a very interesting article.
Let me bring up a more interesting article.
So this is the article on cats.
So you have here on the left--
obviously, cats are always big.
[LAUGHTER]
FERNANDA VIEGAS: You have on the left the list of authors,
and on the right, you have the article itself.
I can start browsing through the article.
It's a pretty long article.
But the thing I want to call attention to is this white
spike here at the bottom.
Because it's sort of hanging out there by itself.
So what happened here?
So if I bring my wand here, I can see that someone added a
whole bunch of paragraphs.
And they are all talking about the Unix
command 'cat.' [LAUGHTER]
FERNANDA VIEGAS: I knew this audience
would understand this.
So what happens then, right?
Does someone just go ahead and delete the whole thing?
So if I look at the next version, I see that no,
instead of just deleting, they created a new page called "cat
(Unix)" and redirected that entire piece of content there.
So this is some of the dynamics that we started
uncovering, is how do you negotiate what piece of
content fits or doesn't fit within a given article.
So now let me show you the article on abortion.
So does anything strike you as different or weird here?
What?
AUDIENCE: [INAUDIBLE]
FERNANDA VIEGAS: Somebody's saying it's been deleted.
Exactly.
So these black gashes here are mass deletions.
This is a huge article, as you can imagine, on abortion.
It's beautifully written.
And then someone comes in and deletes the whole thing.
It gets restored, and then someone over here deletes it
and says, "Abortion is great." And then they come back and
they say, "Abortion is good." And then it gets fixed.
So OK.
So we know there was vandalism in Wikipedia.
It's interesting to actually see it.
But the cool thing, the interesting thing to us, was
that when I highlight this deleted version, at the very
bottom, there is a time stamp.
And it says that the deletion happened on the 17th of
December at 4:06.
And then if I look at when it got fixed, it's
the same day at 4:07.
It's a minute later.
And we're like, how are these guys doing this?
And this is something we started seeing
in a bunch of pages.
It's just like, one minute, two minutes, and then
it would get fix.
We're like, how are you guys doing this?
So we talked to Wikipedians, and they told us about
something called a watch list, which is whenever you edit an
article, you can sign up to watch it.
And whenever that article gets touched, you get a
notification that says, you know, your article's
[? been ?] touched.
And then we were like, OK, so do you watch,
like, one or two articles?
They're like, oh, no, we watch hundreds of
them, sometimes thousands.
I'm like, how--
OK, rewind again.
How are you doing this?
And so they say explained that a whole community will watch,
sometimes, a single page.
And then the other strategy is when you get the notification,
if you recognize the name of the person who edited, and you
trust them, you sort of don't even bother looking at it.
But if it's an unknown IP address or a new user, you
might want to go check and make sure they are doing the
right thing.
In fact, if I showed you this same data spaced out by real
time, those things were so fast that you don't even see
the gashes anymore.
And then the last page I want to show you for Wikipedia is
one of my favorites.
It's chocolate.
So it's very pink, but other than that, not very
interesting.
Except that when I space it out by versions, I get this
interesting pattern, the zig-zag.
And I always say that when I saw that, I was like, I want a
scarf that looks like this.
But does anyone have any idea of what the
zig-zag might mean?
AUDIENCE: Is this an argument?
Someone deletes, someone adds back?
FERNANDA VIEGAS: Exactly.
It's an edit war.
[LAUGHTER]
FERNANDA VIEGAS: Someone puts something and it gets deleted.
And it gets put back, and gets deleted.
I can show you what it is.
So back here, this little piece of
white text was inserted.
And it's this small paragraph that says, "Extremely rarely,
melted chocolate has been used to make a kind of surrealist
sculpture called coulage." So that piece
survives for a long time.
Oh, and it was put by someone called Daniel C. Boyer.
Until someone here says, "Removing Boyer invention."
Well, Daniel comes back and says, "Reverting.
Coulage is not a Boyer invention."
Someone says, "Google search for chocolate coulage finds
only Boyer.
Reverting.
Leave your humbug out." "Reverting." And
so on and so forth.
[LAUGHTER]
FERNANDA VIEGAS: Until Daniel C. Boyer sort of gives up.
Which is really bad, because Martin and I did a search
afterwards, and chocolate coulage does exist.
The other thing we did was to get rid of all the author
information.
So not have colors by author, but just get the text darker
the older it got.
So why are we doing this?
Well, we're doing this because in a place like Wikipedia,
old, untouched text could be a proxy for high-quality text
that the community doesn't feel needs to be edited.
And if I scroll down this article, you can see that
there's quite a lot of dark text there that people are
working around.
So it was really nice to see this.
So what happened with this?
We did this visualization.
We started to see these patterns.
And we wanted to know if things like the very, very
fast fixing of mass deletions, how generalizable that was.
So given that we were seeing these patterns, what we did
then is we downloaded all of Wikipedia and ran a
statistical analysis to understand how fast, on
average, those fixes were happening.
And they were happening within a couple of minutes for the
entire site, so that was average.
Which was really interesting to see.
So now, let's go back.
Yes, yes.
OK.
So from Wikipedia, we started realizing the
importance of social.
And the fact that people were starting to look at
visualizations not only as tools to use in isolation, but
really as part of bigger conversations and interesting
back-and-forths that they were having with other people.
So we thought, how can we design an environment where
communication is part of visualization
from the ground up?
And that's what Many Eyes is.
So Many Eyes was launched in 2007.
And it was--
it is-- it's still running.
It's a public free site where anyone can upload data,
visualize that data, and share those
visualizations with others.
So let me see if I can actually--
I'm going to have to do a little
back-and-forth here, maybe?
Yes.
Let's hope this works.
All right.
Let me just see.
Yes, OK.
So this is Many Eyes as it stands today.
And basically, it's a sort of like a media site, where you
go there and there's always some interesting visualization
being highlighted on the front page.
And you can see that people are using a bunch of different
visualization techniques and they're all looking at very
different data sets.
So the exercise-- the little thing I want to do with you
guys now is do a little exercise.
I'm about to show you a map, a world map, of alcohol
consumption per capita, per country.
So when I do this, what country do you think is going
to be the champion?
What country do you think is going to be the one with the
highest alcohol consumption in the world?
[MURMURS FROM AUDIENCE]
FERNANDA VIEGAS: Russia.
OK, Russia.
Any other guesses?
[MURMURS FROM AUDIENCE]
FERNANDA VIEGAS: Germany?
[MURMURS FROM AUDIENCE]
AUDIENCE: France.
Belgium.
FERNANDA VIEGAS: France?
Belgium?
OK.
All good guesses.
So--
oh, I was afraid of this.
You know, guys, we're going to have to quit out of this guy
and start again.
So isn't that fun?
I told you there were a lot of opportunities for
things to go wrong.
Oh no.
I know.
Let's see.
Oh, let's see if I have everything I need here.
AUDIENCE: [INAUDIBLE] windows from last session.
FERNANDA VIEGAS: Where?
Where did you see that?
I didn't see that.
Oh, here.
Awesome.
Oh, you guys are good.
That's why it's a good audience.
OK.
So let's hope this comes up?
It's coming up.
Good.
Awesome.
OK.
So this is our world map that I was talking about.
So Russia.
Russia consumes 10.3 liters of alcohol per capita.
So 10.3 But our scale goes to 17.6.
So Russia is not the champion.
Someone said England.
11.8.
Better, but not quite.
France, 11.4.
And someone said Ireland?
13.7.
Anyway, you would think it would be in this
vicinity, but it's not.
The champion is here in Africa.
It's Uganda--
17.6.
So when we first saw this, we were like,
wait, is this right?
Because we had sort of the same guesses as you did.
And we're like, either the data is wrong or something is
going on that we didn't know.
So the cool thing about Many Eyes is that every
visualization is not only a visualization but it's an
opportunity for conversation.
So at the bottom, people started a really interesting
conversation about what was going on.
And in fact, this person here, for instance, said, "For the
story on Ugandese alcohol consumption, check this link.
It seems like a national epidemic."
Actually, it was pretty sad.
It had something to do with AIDS and men drinking a lot.
So it was not good.
But the data was true.
The other thing that's interesting about this map,
though, is the fact that there's a whole swath of white
countries there.
White meaning they almost don't consume alcohol.
So this also became a point of conversation.
And any guesses about what that might be?
AUDIENCE: Religious?
AUDIENCE: Religion.
FERNANDA VIEGAS: Religion.
Yes.
So another cool thing that someone did is to link the
conversation to this other Many Eyes map that shows the
percentage of Muslims in the world population.
And you can see that it's almost the opposite map of the
map we had before.
So this was exactly the kind of conversation that we wanted
to see, which was based on data.
And where people were really discussing hypotheses and
trying to explain what they were looking at.
Let's wait for this guy to--
OK.
I think I am where I want to be here.
So this is a very different visualization.
This is called a tree map.
A tree map is a visualization of hierarchical data.
Each one-- we're looking at cars here, different car
models, and how they do in terms of consumption, in the
city versus highway mileage that they get.
Each one of these little rectangles
here is a car model.
So this one is an Audi.
This one is a Mercedes-Benz.
This one is a BMW.
This one's Kia Optima, and so forth.
The color of the rectangle has to do with how well they do in
the highway versus city.
So the better they do, the difference between highway
mileage and city mileage, the bluer they are.
Which would make sense.
It makes sense that most of these cars are blue.
Except that we have very orange ones.
This means that these cars are really much better in the city
than they are on the highway.
So I can start highlighting these.
OK, so this is a Toyota Prius.
This is an Escape Hybrid.
This is a Mariner Hybrid.
You start to get a sense of what the pattern is.
But I can also start changing the hierarchy of the way I'm
grouping things.
Right now, I'm grouping cars by their class.
Midsize cars are here.
Compact cars are here.
If I change this and I do by manufacture, I can see that
Toyota has a lot of oranges.
Lexus has a couple.
But then if I bring up transmission as the grouping
principle, I can see that all of a sudden, all my
outliers are here.
So this is interesting, because it tells me that not
only are we talking about hybrid cars, we're talking
about hybrid cars with a specific kind of transmission,
and those are the ones who do much, much better in the city
versus on the highway.
And that was another kind of data set that was unexpected.
But then, imagine this.
This is a public website.
People are uploading whatever data they want
to upload to visualize.
Everything here is public, so we say that very clearly.
One of the nice things were these surprises
that we would get.
So for instance, someone used the same exact technique,
which is quite sophisticated, to visualize their wedding.
So this is a tree map of someone's wedding, where each
rectangle is a guest.
So all the guests who said yes, we're coming, are sort of
this warm color here.
And all the guests that said no are here, in
the greenish color.
People who said maybe, people who said drinks only.
So it's a very sophisticated kind of wedding list.
And then you have all these hierarchies at the top.
So if I bring up detailed category, I can see, you know,
these are aunts and uncles.
These are the parents' friends.
These are the cousins.
These are the bride's friends, and so forth.
I mean, this is really serious.
[LAUGHTER]
FERNANDA VIEGAS: And if I bring up country, I can see
that it's US and UK.
So it's an [? international ?]
wedding.
How interesting.
There are people coming from Italy, Austria, Ireland.
And then my favorite one, which is if I bring up age,
all the way to the top, I have young and old.
[LAUGHTER]
FERNANDA VIEGAS: I have no idea what the threshold is and
why there aren't any people in the middle.
But you are either young or you're old in this wedding.
But the cool thing is that here's someone who's looking
at their personal data, quite personal.
They're using a pretty sophisticated
visualization technique.
And they did it so well that I have no idea who they are.
This is all anonymous data.
It's very personal, but it's very anonymous.
I have no idea who these people are, which is perfect
for something like Many Eyes.
The other thing we saw, as soon as we launched Many Eyes,
was that people-- we had a dozen different kinds of
visualization techniques, and they were all aimed at
visualizing numbers.
But as soon as we launched the site, we saw
people uploading text.
They wanted to visualize text.
So what we did is right away, we did a word cloud, which is
what everybody does.
But we wanted to do more than that.
The problem with word clouds is that they only show
frequency, and they decontextualize everything.
OK, so this word happens a lot.
Is it good or bad?
What's happening with this word?
So we wanted to show frequency, but in context.
So we created something called the word tree.
So this is what the word tree looks like.
We're looking at the entire King James
Bible here as a tree.
And what you do is you search for things.
So here I searched for "and God." And the word tree shows
me everything that, in the Bible, comes after that
phrase, "and God." So I can say, "and God said," "and God
saw," "and God made." Blessed, spake, hath.
And then I can start looking further.
I can say, well, what did God say?
Oh, he said, let there be light.
Let there be a firmament, and so forth.
And I can start very quickly interacting with this big
corpus of text.
So what did God see?
And I can very quickly get to specific verses of the Bible.
I can also play games, such as, OK, let me
put a question mark.
And let me put the question mark at the end.
So I have all the questions in the Bible at once.
And I can say, what are questions being asked of me?
What are things being done unto me?
And so forth.
Let me go back.
And you can just play with this and search for--
oops.
Let me go up.
How are we--
oh, no.
You can just play with the text and see both macro
patterns and micro patterns, also.
And again, I want to contrast this very serious piece of
text with another data source that someone put up,
which was this one.
These are personals ads buy males who say I'm married.
And so these are all married males who are looking for
things, for companionship.
And I love the difference in philosophy.
Right?
It's like, I am married and looking, or I
am married but looking.
What does that mean?
[LAUGHTER]
FERNANDA VIEGAS: So I am married and plan on
staying that way.
I am married and looking for someone discreet.
Or I am married but looking.
I'm married--
it gets sad.
[LAUGHTER]
FERNANDA VIEGAS: I'm going to stop here.
But you know, you get the point of the fact that the
data sets were really just incredibly diversified.
So I'm going to click out of this guy, and let's go back to
our presentation.
So the last visualization I want to tell you about on Many
Eyes is something called the Phrase Net.
And the Phrase Net came about because we wanted to create a
concept map of a large body of text.
So if you could say, OK, what are the concepts being talked
about here?
And then, one of the things that people have been doing,
and it takes a lot of power, is sort of like semantic
analyses of text.
And it takes a lot of time and it takes a lot of power.
And we wanted to do the dumb stuff.
We're like, what can we get away with, that's really
low-hanging fruit.
How far can we get?
We decided to look at repetition.
Once more, repetition is always a visualization friend.
And we decided, OK, if you have these templates and you
just point these templates at text, do you get repetition
that is meaningful enough that will start to give you a sense
of concepts, and how concepts are connected
in a piece of text?
So basically, this is what we created.
We created a template where you would have whatever word
and whatever other word.
And in the middle, you would have some template.
And we chose "and" here, but and is not a Boolean.
And could be anything.
It could be over, under, yellow, green--
whatever word.
And then what we do is we run a piece of text.
So for instance, if I was looking at Jane Austen's
"Pride and Prejudice," I would come to this phrase here,
"pride and impertinence." And that would match my template,
so that becomes an edge on my visualization.
It becomes pride, edge, impertinence, where the edge
is "and."
So if I run the entire "Pride and Prejudice" book through
this, I get this interesting network.
And what's interesting about this network is that at the
bottom here, I have pretty much the social
network of the book.
So I have the father, Elizabeth, Jane, the mother,
and so forth.
And then I've got other kinds of interesting concepts there.
Let's zoom in.
I get a cluster that says pride, impertinence, conceit,
folly, vanity, ignorance.
So all these negative things are sort of happening together
as one cluster.
I also have another cluster, which is amiable, pleasing,
affectionable.
Or wishes, affection, hope, solicitude.
So all these sort of positives things also happening.
So all of a sudden we start to have these clusters that we
got for free just by looking at repetition, really.
If we changed our template and we say "X at Y" instead of
"and Y," we get a network of places in the book.
So I have Longbourn, Netherfield Pemberley.
And I start to see things that are happening at these
different places.
So for instance, in Longbourn, I have people assembling,
reappearing.
I have visitors and so forth.
If we look at a different piece of text, if we look at
the Bible again, and we do "X begat Y," I get the family
tree, right?
So these are the families.
If I zoom in to that circle there, I see who begat whom.
And I get these interesting sort of circular patterns
here, when the son has the same name as the father or the
grandfather, and so forth.
And again, looking at the Bible, if I do "X of Y"--
this is the Old Testament.
This is the New Testament.
Children of Israel.
King of Israel.
And here, kingdom of God, son of God, and so forth.
So again, just very low-hanging fruit to try to
visualize a conceptual map of text.
So Many Eyes has many more visualization techniques than
I have time to show here.
But if you haven't played with it and you have data that can
be public, it's a fun place.
So what were people doing with it?
Two days after we launched, someone named Crossway
uploaded a co-occurrence data set for biblical verses.
So in other words, whenever you have a verse of the Bible,
if two people show up in the same verse, that becomes a
data point.
That person then created a
visualization of social network.
This is the New Testament.
You can imagine who the big dot is, in the middle, right?
He then blogged about it.
And immediately, he started getting hundreds of comments.
It was really impressive.
Within a couple of days, this got out of the Christian
blogosphere and expanded to things like
Boing Boing, YouTube.
This a screenshot of a YouTube video where the person is
playing with the visualization and looking at different
hypotheses that the visualization allows him to
make, and questions, and so forth.
And then the other thing is that, in turn, people started
seeing these visualizations and they started uploading
their own data sets, their own biblical data sets, to Many
Eyes, creating visualizations, and then going back to the
conversation that they were having with
their community before.
And this is something that we could only dream of happening
when we launched Many Eyes.
Which is you have a community that has data that's
interesting to them, and they use that data to create
visualizations and charge the conversation that
they want to have.
Even though we designed Many Eyes for lay users--
there's no database.
The technical parts are very, very simple.
You just copy and paste your data.
That's about it.
We got scientists using Many Eyes, a lot of them.
So for instance, this is a historian who was looking at
the Canadian Parliament and created a bunch of different
visualizations and then wrote to us.
He was finishing his PhD, and he wrote to us and said, I
just found out new things about my data set.
Can I talk about this in my thesis?
We're like, yeah, of course.
Score for you.
This is a linguist who was looking at language formality
in corporate blogs.
And as you can see, he used a bunch of different kinds of
visualizations.
And then he would create these visualizations,
go back to his blog--
this was during his master's--
and talk about it, and then the rest of the community
would chime in.
This is a geneticist who was looking at micro-arrayed data.
And he created maybe hundreds of these.
We thought they all looked beautiful.
We have no idea what they mean.
And then he wrote to us and said, I just found out
something really interesting.
I want to publish a paper.
Can I show the visualizations?
We're like, yes, of course.
This is why we built Many Eyes, for people to use it.
As long as you credit Many Eyes, that's cool.
So really, this interesting and unintuitive thing, where
here we were, designing for the regular person, thinking
scientists have access to tools like this.
And yet, we had a lot of interest from experts and
scientists.
So going from a place that was designed to--
I have five minutes.
OK.
Very quickly.
I'm going to talk about ripples, Google+ ripples.
So this is a visualization where we created a
visualization inside a product.
So this is the idea where--
[INAUDIBLE]
--Have to go outside a place where you are having your
conversation to look at visualization.
So the question here is, how do things flow and get shared
on Google+?
Google+ being, obviously, the social
network that Google launched.
And so this is the sharing tree for an ad that Volkswagen
did for the Super Bowl last year.
It was something called "The Bark Side." It was a video.
It was all of these little dogs barking the "Star Wars"
theme song.
So it went viral.
So what's happening here, Volkswagen
posted this on Google+.
And then we create edges whenever any of the followers
from Volkswagen reshares that link with someone.
But this doesn't look very viral to me, because
it's not very big.
But this is the action around Volkswagen.
Other people saw this video and were resharing this video
outside of Volkswagen, right?
So this is what the entire sharing graph
actually looks like.
So let's take a look at this.
Let me see.
Let me see if I can get this really quickly.
Yes.
Yeah.
One more.
OK.
So this is the live visualization.
And I can see that this person, Chris Pirillo, was
hugely influential.
So the more followers who reshare from you, the bigger
your circle becomes.
So this person reshared from Chris.
I can zoom in.
Josh Armor, then Sarah Ling reshared.
I can zoom in to her.
I can go back.
I can actually look for Volkswagen here.
And Volkswagen is right here.
So you can see how small their circle is, right?
I can zoom into their circle and see that it's the same
tree we were looking at before.
So this is something that is really interesting to see, is
that individuals have, sometimes, a whole lot more
influence and power on these social networks than the
official channels who actually created the content-- in this
case, Volkswagen.
And you can start to see the different communities who are
all paying attention to this.
And the thing I wanted to say about that is that we started
seeing different patterns were interesting.
So this is what we're calling a celebrity pattern.
So Felicia Day happens to be somewhat of a celebrity and
has tons of followers.
And then she shared a link to a video, and then tons of her
followers reshared that immediately.
And then that's it.
It fizzles out.
So it's a very broad tree, but it's a very shallow one.
Very different from this sharing pattern.
This is sharing a link to a petition to the White House to
open up scientific journals.
So you can see different communities with
deep sharing chains.
These are people from outside the US, who are also helping
to boost this up.
This is one of our usual suspects, Tim O'Reilly.
The other thing, at the bottom here, you have
a timeline of activity.
You can see that this gets a lot of activity, and then it
sort of dies.
And then Tim O'Reilly comes up right here, and it gets
another boost of activity.
And you have different patterns like this, where you
do have sort of a celebrity thing going on, but they are
not the only ones sharing the content here.
And then this one, which is going to take me to my last
project, is a visualization where Martin and I were using
this as a mirror.
This is a visualization of how one of our projects got
reshared on Google+.
This is a visualization we did of wind.
It was a map of the wind in the US.
And the cool thing for us is that we started seeing--
so I'm just a little dot here.
I have no impact or influence or whatever.
But these other people have tons of influence in resharing
our content.
And you know, this is a very high-up executive at Google.
This is a mathematician.
This is an author.
There's someone else here who's from the "New York
Times."
So we started seeing all of these different communities
who were paying attention at this visualization, and people
we had no idea liked visualizations.
So it's really nice to be able to see how your own content
sort of does.
So to finish off, I want to show one last project.
This is the Wind Map.
So the Wind Map--
let's see how it does.
So this is the wind right now in the US.
And we can actually zoom in to Boston and
see how we're doing.
We're actually doing really well.
The wind is pretty low, 2.4 or so around our area.
But the thing I want to show you is that you might look at
this and be like, oh, what's so interesting about this?
It's how we got there.
You would think that visualizing something like the
wind would be pretty straightforward.
But it's not.
So I'm going to show you how we actually do stuff and how
crazy it gets.
So we've got the data from the wind, and we decided to--
we were like, how are we going to visualize this?
So this is actually just the vector field of the wind
around the world.
Looks horrible.
That's the purpose.
We're like, we're going to do something just horrible in the
beginning, just to remind ourselves, we don't want
anything that looks like this.
Then we thought, oh, we know what we're going to do.
Wind is nothing more than particles floating around.
It's just the air floating around.
So let's show it as a particle field, as just particles.
And this is what we get when we show
particles, the wind as particles.
So this is the world.
Can you see the pattern?
I can't.
[LAUGHTER]
FERNANDA VIEGAS: It's looks just like a
bunch of little ants.
So we decided, what?
Particles don't work?
What is that supposed to mean?
Maybe if we have a very specific shape that we're all
familiar with.
So say the US, and we do particles on the US.
Maybe that will work.
And so we tried that next.
So this is better, right?
You can see the US.
But if I ask you, what part of the US has the fastest wind or
the slower wind, can you guess that?
No, right?
Isn't that amazing, that particles just don't do it?
How is that possible?
So we're like, OK, we're thinking about this in the
completely different--
like, this is just wrong.
We don't need movement, or--
actually, we need color.
We're like, let's do something where we do areas that are
very, very colorful, and they just flow in the way that the
wind is going, and they just take the speed of
the wind with them.
That's it.
That's how we're going to solve this.
So we went with the colored areas and
decided to flow them.
[LAUGHTER]
FERNANDA VIEGAS: Exactly, right?
So it sort of just melts your eyes, and you don't get
anything meaningful.
But that's the challenge with data visualization, is that
nothing is obvious.
So then we decided, OK, the idea that actually worked,
we're going to go back to our particles, even though these
seemed horrible.
And what we're going to do is we're going to add some
transparency to them, so that they leave traces
as they move along.
And this was sort of one of our last attempts.
So by doing that, we start to see stuff.
We start to see patterns.
And this is not what the Wind Map--
it's not exactly the Wind Map, but it's pretty close.
By then, we were like, OK, we have something.
We just need to polish it up.
And so that's what we did.
And so the Wind Map shows the wind live, always.
And then the cool thing is that we also have this gallery
that shows you how different things can look from one day
to another.
So this is actually Hurricane Isaac.
And it was interesting, because it was happening, it
made landfall, and we started getting emails from people
there who were in New Orleans, saying, I'm sitting here, I'm
looking at your map, and just hoping that this
thing passes through.
And we didn't really--
we were like, we're right there with you.
But it wasn't until we got this guy here that we really
understood what it feels like to be
in the path of something.
So this is Sandy.
This is after it had become a tropical storm.
It wasn't a hurricane anymore.
But just to give you a sense of all the different--
the wind can do crazy things from one day to another.
And it can go in very, very different ways.
This is not a hurricane at all.
This was a week when we had a lot of
tornadoes in the Midwest.
And we got a nice email from a teacher, saying, I was sitting
with my students looking at this, and we predicted that we
would have tornadoes.
And we did have tornadoes.
And this is the final thing I want to say is something like
the wind map--
I'm going to pass all of this.
We have no time.
It's sort of like the impact that it had that was
completely unexpected to us.
We did the wind map and we thought, this is going to be a
geeky project.
Who cares?
Everybody has access to weather maps and wind maps.
They've been around forever.
Who cares?
It turns out a lot of people care.
So this is the project that we've got the most email, tons
of email, to this day.
So bird watchers love to look at this.
We got email from a community of butterfly watchers who were
using the Wind Map to help track migration.
Professionals--
this is actually a professional meteorologist.
We have someone who said, I've been working with this data
professionally for 28 years, and your visualization gave me
a new intuition for the data.
And then we have tons of pilots, surfers, sailors, you
name it, who write to us.
And pilots are writing to us, and we're thinking, please
don't use the Wind Map.
Please.
[LAUGHTER]
FERNANDA VIEGAS: It's like, [INAUDIBLE].
We do visualize surface wind data, so that's from the
bottom to, like, 10 meters.
You don't want to be looking at this.
And we got such a response--
we got a response from firefighters of wildfires.
And we had to add a disclaimer.
It was the first project we ever had to add a disclaimer,
saying, please do not use the map or its data to fly planes,
sail a boat, or fight wildfires.
And we got an email, even after that, saying, please
respect the power of the visualization.
So I was like, all right.
But just saying.
It's unofficial.
[LAUGHTER]
FERNANDA VIEGAS: But anyway, that's the thought I want to
leave you with tonight, is the fact that we
are in a new moment.
We are in a new and, I think, very exciting moment, where
data is not only something that the government has.
It's something that we're dealing with every day, and
it's becoming part of our story.
[INAUDIBLE]
part of how we tell our stories.
I becoming part of how we understand the world.
And so it's really exciting to see people using visualization
to understand this new reality and to interface with things
like statistics or math or whatever in a more natural
way, for a lot of people.
And so I hope that you, as engineers and as scientists,
are part of this.
I think it's a new culture.
So thank you.
[APPLAUSE]
PRESENTER: So we do have a few minutes, if anyone has any
questions for Fernanda.
We can take those now.
AUDIENCE: Hi.
You kind of touched on this in your ending, as far as not
doing government-based work.
But I know the [INAUDIBLE] and a lot of government agencies,
they have huge, big data issues, right?
They're trying to do a data-to-decision quickly and
most efficiently.
And this seems like a great method of
getting to that point.
So I'm wondering if you have worked with them or gotten in
touch with such agencies.
FERNANDA VIEGAS: So we personally have not.
But it's interesting.
So the Wind Map, for instance, is all based
on government data.
And thank god for government data.
So it's all based on NOAA data.
And what was cool was that NOAA, after we launched the
Wind Map, got in touch with us and said, hey, can we use your
algorithms to actually visualize the flow of currents
in the Great Lakes?
And we're like, sure.
And so they did.
They used our algorithm to visualize the flow of currents
in the Great Lakes, and different depths, which was
really cool.
But yes, you are absolutely correct in that all different
kinds of government agencies are interested in this.
A big part of their interest, also, is in things like text
visualization.
They have a lot of text.
Text is very, very hard to get a grip on.
So text visualization's another area that is of
interest, yes.
PRESENTER: Any other questions?
AUDIENCE: I was just wondering, is Google Ripple an
in-house term?
Or is it--
FERNANDA VIEGAS: No, Google Ripples is
available on Google+.
So all you do is--
let me see if I have it open here.
Yes.
Whenever you are in your stream--
this is my stream--
and you see something that has been shared--
so this, for instance, has been shared.
You go to the menu here on the post, and you say, View
Ripples of that thing.
This Ripple is going to be very small, because it's only
11 things that got shared.
But yeah, it's a public--
oh.
Where is it going?
Anyway.
It should--
I've having trouble.
But that's--
it what?
AUDIENCE: [INAUDIBLE].
FERNANDA VIEGAS: Did it go somewhere else
that I'm not seeing?
Anyway, just go to any public post.
It only shows public posts.
So even if you're sharing a lot of things privately, we
don't show any of that.
AUDIENCE: Probably wouldn't have been shared many times
that specifically, but the Ripple itself is showing the
whole thing.
FERNANDA VIEGAS: It should be coming out, though.
No.
Yeah.
Not for me.
AUDIENCE: How do you get, I guess, the ideas for what
you're going to visualize?
And what are some things you'd like to
visualize in the future?
FERNANDA VIEGAS: So it's a very interactive process, in
which we usually get the data first, and then we're like,
oh, how can we visualize this?
So it's sort of like with the Wikipedia thing.
We had no idea how it would work-- even what it meant that
we wanted to visualize Wikipedia.
And it wasn't until we saw that log of edits that we're
like, OK, this is data we can really grab
onto and try to visualize.
So it's sort of a give-and-take with the data
set itself.
Things I would like to visualize that I haven't
visualized yet--
I think different kinds of media.
So things like video.
We have tons of video.
How can we visualize video in an interesting way?
Nobody has done that.
So maybe you can do it.
AUDIENCE: I was just wondering, as you familiar
with the research that Deb Roy and Michael Fleischman have
been doing at Bluefin?
FERNANDA VIEGAS: Oh, yeah.
AUDIENCE: What do you think of that?
Is it cool?
Hype?
FERNANDA VIEGAS: So Deb Roy is a professor at the Media Lab,
just to catch everybody up.
And if I am thinking about the same thing you were thinking,
he is well known for having had--
I don't know if he still does this, but he had a whole
set-up where he had cameras all over his house, and he
would take video footage of his little kid, ever since he
was born, until I don't know, until maybe he was like four
years old or something.
And he was trying to understand how his kid got
language, how he would pick up words.
And if you had access to pretty much every single
moment of this kid's life, what could you understand from
how language is formed in a kid's mind?
And I think you're right.
They did try to do visualizations
of video for that.
Because they had just so much video footage.
The visualizations I'm familiar with, that they did,
I don't think they were really the tools they were using to
explore that data set.
I think they were more sort of like, oh, how can we show the
kinds of things we know in an easy way?
In a visual way?
So it's sort of like almost after the fact, instead of an
exploratory visualization, let's find the patterns here.
AUDIENCE: So some of these visualizations that you're
showing are pretty complicated and have a lot of--
they're kind of unusual, not things that people would have
seen before.
I'm just wondering how do you determine which visualizations
are going to make sense to people?
Do you do a lot of usability testing?
What kind of process do you go through for that?
FERNANDA VIEGAS: So we do some testing.
But it's not like we have focus groups that come in.
Basically, we're always testing with people who are
not familiar with the project we're doing, and try to get a
sense of how well they would respond to it.
Having said that, there are two things that I think are
interesting about that, that question specifically.
One is a lot of times, I found out we tend to underestimate
users in the following sense.
When we launched Many Eyes, for instance, we knew that we
wanted to have all of the visualization techniques, all
of the graphs and charts that people are very used to, from
things like Excel.
So we wanted to have a bar chart, a line chart, a pie
chart, a scatter plot, things of that nature.
And then we were like, ooh, should we have a tree map?
Which was the thing I was showing you.
I don't know.
Tree maps come from academia.
Nobody really uses them.
Nobody really understands them.
But we're like, oh, what the heck.
Let's put a tree map there, and if people don't understand
it, they won't use it.
And then we saw it be used over and over again.
So that was really interesting and unexpected.
The other thing is when you look at things like the "New
York Times," where they are pushing the limit and they are
showing things that are very sophisticated, I think it's
the kind of thing where I think visual
literacy is going up.
Not necessarily because we are having traditional classes in
what does it mean to use this technique
or this other technique?
And that's a whole other thing.
I think we should have more of that in regular education.
But because we're starting to be surrounded by these things.
You're just exposed to them all of time.
All of the time.
And not only that, they're fun to play with, right?
They do things.
You'll click on things, things happen.
Oh, interesting.
How does this one work?
So maybe you get people that way, too, by how
appealing it is.
Nothing I'm saying here is to say user
studies are of no interest.
They definitely are.
And we always try our things with people.
But it's not like we have a random sample of people, that
we have a really strict methodology.
We basically just try it out.
And the last thing I want to say about that is that you
always know when you have a hit, when you've hit a sweet
spot for a visualization.
It's when you give it to a friend of yours or to anyone,
and they just can't stop playing with it.
Oh, what about this?
Oh, what about, oh!
You're like, OK, I'm done.
My work here is done.
OK, next?
So it sort of speaks for itself a lot of times.
PRESENTER: I think we've got time for
about two more questions.
AUDIENCE: So in a couple of answers, you mentioned either
getting access to the data and playing with it and seeing
what visualization works best, and putting things out there
and letting users toy with it.
Do you see visualization as more of a young science or an
advanced art?
FERNANDA VIEGAS: I think it's a mix of those things.
I think it's definitely a young science, in the sense
that a lot of the techniques come from academia.
And in the sense that when it first started, visualization
was something that you could really only do in very
expensive machines that universities and industry labs
had access to.
So it sort of historically could only
happen in these places.
So it was very scientific.
But more and more, you don't need anything special to do
visualization.
Visualization on the web is a thing that exists today, and
just about anyone can do it.
So in that sense, I think it's a young-ish science, but it's
becoming less of a science.
At the same time, you have things like scientific
visualization, in which you need like volume rendering and
3D techniques.
And those continue to be still quite sophisticated and
[INAUDIBLE] and resource-consuming enough that
you have to have a little bit more infrastructure.
But to me, the interesting thing is that these two worlds
are coming together.
It's the scientists doing their things and their
explorations, but really also the artists, the journalists,
the authors using these techniques in a completely
different and interesting, refreshing way.
So to me, I think it's a little bit of both.
AUDIENCE: So my question is a little bit out of curiosity.
So I'm wondering, if you had a chance to start from scratch,
what would you do differently?
You actually talked about, the ideas became
popular really quickly.
So what would you do different if you start from scratch?
FERNANDA VIEGAS: Oh.
You're saying if I start from scratch today?
AUDIENCE: Yeah.
Like developing this idea.
FERNANDA VIEGAS: I think I would have all of my projects
online and publicly available to people to use.
So things like the project I started with, history flow
that I showed you, something that was done in a lab.
And it was done as an tool for me and Martin, really.
We were looking at Wikipedia.
And it wasn't until much, much later that people started
writing to us and saying, wait a second, I am doing code
source revisions.
I want to look at my code and the code that I work with,
with my colleagues.
Can you share this technique?
Or I'm working on this other project, where I'm looking at
how bills in Congress get edited.
Can you share this technique?
So all of a sudden, it started do dawn on us that these
things need to get out of labs.
They need to be available on the web.
They need to be freely available for a bunch of
different communities.
So I think that would be one thing I would have done
differently.
And then taken more statistics classes--
[LAUGHTER]
FERNANDA VIEGAS: I think would have helped, too.
PRESENTER: Great.
I think that's all we have time for.
Can you all join me in thanking
Fernanda for being here?
[APPLAUSE]