Tip:
Highlight text to annotate it
X
[ Silence ]
>> I'm going to be talking about work that we've been doing
for the last two and a half years, I think,
how to build a common model of a neural representational space.
And we demonstrate how the model works
with a common model eventual temporal cortex
in humans using FMRI data as the basis for an algorithm
for application algorithm.
And the problem that we're trying
to address is the problem cross validity classified models.
So, we built a multi-pattern classifier to try
to classify brain states that we get from measurements from FMRI
or anything else, but specifically here with FMRI.
We have to build a new model for each individual subject
because when we try to build a model based on other subjects,
the voxels or features in these brain spaces don't line
up very well.
And that leaves the question
of do different brains use the same basis functions or response
to new functions for the classifications
or does every brain have kind of an idiosyncratic set?
So, we tried to address that with this.
So, this is the outline of the talk.
First, I'll talk about the problem
and then I'll present the algorithm that has two steps,
hyper one within dimension reduction
and revalidate it doing between subject classification using
model dimensions and speechers [assumed spelling].
And then, we will talk about a couple
of other topics within this dataset.
We use a movie to, we have people watch a full length
action move and measure brain activity
for two hours while they watch this movie.
And, that's the basis for aligning data across brains.
And, the question that everyone asks is why are you using the
movie, why this, you know?
But I'll try to show that that really is an essential component
of this.
And then, I'll go over what the working parts of the model are,
computationally speaking.
So, I wanted to acknowledge the people in the lab
and everyone contributed to this work,
but especially [inaudible].
[Inaudible] fused the engine and made this possible.
He wrote all the software, all the validation testing,
he was tireless and incredibly efficient and accurate.
So, he is we owe him a great debt.
And, this work is also in collaboration with people
at Princeton University, Peter Ramadge who is the chair
of electrical engineering there and sequence
of graduate students [inaudible] who is now at MIT,
Ryan Conway who is now at Columbia, and Alex [Inaudible]
and he's still there in Princeton.
So, this is about, we know that,
with using multi [inaudible] classifiers,
so machine learning classifiers, we can look at the patterns
of response in eventual temporal cortex in humans and make
and make fairly fine distinctions
of among the brain states that are evoked
by different types of stimuli.
And, I'm going to, I'm going to present two datasets here,
one with seven categories, present three
if I include the movie.
But, one is a category perception experiment,
two categories of king faces, dog faces, monkey faces,
shoes, chairs, and houses.
And, the second is a study with six species of animals,
two species of insects, two species of birds,
and two species of primates.
And, another analysis, this study was run by Andy Connelly
and there's another analysis of these data on a poster
that you might want to check out before it's taken down.
But, the problem is that we, we can,
we can distinguish the responses to all these.
Others have shown other fairly fine distinctions
so in the [inaudible] the distinguished teapot from chairs
and they distinguish one type of teapot from another type
of teapot and then and then the [inaudible] they distinguished
old faces from baby faces, hands from torsos,
and rural buildings from skyscrapers.
So, these are fairly fine distinctions
that can be harvested from eventual temporal patterns.
But, this classification work
in eventual temporal cortex builds a new classifier
for each subject.
And these confusion matrixes
for these two studies you can see along the diagonal.
You have this nice bright yellow and red colors
and off diagonal everything's blue.
So, we have quite high accuracy.
In fact, it's about 63% for the faces and objects and about 68
to 69 percent for the animal species with chance
down around 14 to 15%.
So these, really, try to build a classifier
by aligning the data anatomically across subjects.
And then, we build a classifier based on nine subject's data
and try to classify the tenth subject's patterns.
The confusion matrixes is much worse
and the classification performance crashes,
crashes down into the 40, around 44% for faces and objects.
And the drop is even more drastic
for the final distinction among animal species
from 68 down to about 37%.
So, the anatomical alignment just doesn't detect the
commonality of these feature spaces
that are underlie pattern classification.
So, that's a problem.
Other response between functions is the same
for different brains.
And, the classifiers
after anatomical alignment are suboptimal.
So, how do we go about trying to approach this problem.
Well we've called in some help, called in an Indiana Jones.
And, we had people watch the full length of Radars
of the Lost Ark while they watched,
while we measured their brain activity.
We did this in two sessions, actually it's 110 minutes,
55 minutes they watched for 55 minutes and then we took them
out of the scanner, let them take a break, have some popcorn,
and put them back in the scanner
and they watched the second half of the movie.
And, we had the same subjects watching the movie
and performing these category perception experiments.
Ten subjects at Princeton watched the movie
and did the face and object category [inaudible] experiment.
Eleven subjects at Dartmouth watched the movie
and did the animal species experiment.
So, this, this cut is a scene at the beginning
of Radars of the Lost Ark.
And I just want, I'm just showing this so you get a sense
for the complexity of this stimulus.
This is, obviously, nothing
like a standard psychological experiment.
It's very cluttered.
It moves quickly and it's dynamic, okay.
And we have them listening to the sound,
see if we can get this, if that's going to work.
>> You chose the wrong friends and this time it will cost you.
>> So, we took out all this, you know, we pulled all the stops
and had them watch the movie and listen to the soundtrack.
By the way, this is a wonderful thing in terms
of subject [inaudible].
We say we're going to pay you to lie
and watch a movie for two hours.
And they think great.
And, you know, with a movie like this, they have no trouble,
no trouble paying attention.
So, what's their brain doing while they're watching
this movie?
Okay so, here are two brains.
And, these are, the patterns of activity
in ventral temporal cortex as their watching the movie
and they're in synchrony, okay.
So, these are the same time points that you're seeing.
And, everything is scaled to a zero being for each voxel
with red and yellow above zero and blue's below zero.
The two hours, actually three seconds,
we speed it up a little bit so you can see what's going on.
And, first of all, you can see
that the patterns change quite drastically
from one time point to the next.
And, the it's hard to see exactly what the commonality
between these two brains is.
So, it's not obvious objectively
as we watch people's brains responding to this movie.
How these two brains are,
these patterns are representing the same information.
So, we assume that, as people watch the movie,
the pattern of activity in the ventral temporal cortex,
which has this kind of fine grain,
embodies these fine grain distinctions while the video's
being watched, is encoding the same visual information, okay.
The problem is that the cornea axis
of this representational stages,
so the voxel spaces are out of alignment.
And so, what we want to do is, we want to take the data
from one subject, like this person, and, somehow,
transform it so it's an optimal alignment with the other person.
So, what we do with this in order to do this,
we put the data into a high dimensional space, we're not,
we take it out of anatomical space.
So, it's high dimensional space, each dimension is a voxel.
Select 1,000 voxels in each subject.
And then, we use per question transformation
to align one subject's representational space
to another.
The procrustean transformation is a ridged high dimensional
transformation that is, and one of the beauties
of it is it has a closed form solution.
So, it finds the optimal parameters.
That rotates the data from one subject
into another subject space.
We do a series of paralyzed procrustean transformations
and with a couple of iterations
and so we get one standard space,
standard across all 21 subjects
and we rotate everybody's data into that space.
So, how does this work?
Okay, so this is the algorithm.
So we call this application of procrustean transformation
to align subject's voxel spaces
into a common high dimensional space hyper alignment.
And this is an overview of the algorithm.
We start out with subject data and the data
of the anatomical space.
And, in our demonstration, we have a thousand voxel
of ventral temporal cortex.
And then, we hyper align it to a 1,000 dimensional common space.
And so, in that, we have individual data
in the common space and we have a set of parameters.
So, procrustean transformation finds an orthogonal matrix,
it rotates with reflections of data for one subject
into an optimal alignment
with the data space that's averaged across all subjects.
And so, that's just average data in common space.
So, that's hyper alignment.
I'll go over how that works first.
So imagine, it's hard to,
it's hard to visualize vector pattern response factors
in thousand dimensional space.
We've reduced this to a two dimensional brain, two voxels,
voxel one and voxel two and subject one and subject two.
And, here's the vector response
to time point one and subject one.
Here's the vector response
to the same time point in subject two.
And, you can see they're
in different locations in this vector space.
And, these are our color coded two voxel maps.
So, this, you can see that this person's brain map is different
from this person's brain map.
Now, as the movie progresses, the pattern's response,
the patter vector moves to a new location in this space,
to this time point two.
And you can see that the brain patterns are really showing no
co-response to each other pattern, three,
four, five, six, and seven.
So, the patterns never looked similar.
These patterns don't look that similar except, unless you have,
good rotational variance for.
So, what the question transformation does is,
it takes these two patterns of response factors
and finds the rotation that brings them into alignment.
So, this is rotating clockwise.
And now, this pattern and this pattern are
essentially identical.
And, the dimensions in this rotated space are weighted sums
of the original voxel dimensions, okay.
So, this new dimension two is not a single voxel.
It's a pattern of, you know, of 87 hundredths of voxel one
and half of voxel two.
[Silence] So, another way to think about it is,
this is in the original voxel space, voxel one and voxel two,
and the new dimensions are oblique
to those original dimensions.
And, this transformation is very simple to be calculated
by the [inaudible] transformation.
It is very simple to express matrix format
so that the original voxel space, which is two voxels
with seven time points, is multiplied by a rotation,
a two dimensional rotation matrix
to produce the data in this common space.
It's now a space that is common between these two subjects.
So, with our thousand voxel data it's exactly the same form.
So, we have the voxel space, a thousand voxels, 2,000 in terms
of five time points, and a hyper, the parameter matrix
for the, that rotates this common space
which is what we call hyper parameters is a 1,000
by 1,000 matrix, okay.
And what that produces is a time, is time series
in 1,000 dimensions over the same number of time points.
So, what does this do?
So this aligns the vectors that are also on,
makes the time series for each dimension more similar
across [inaudible].
So, this shows the histogram of correlations of each subject
to the average time series
of the other subjects dividing the data
from Princeton and Dartmouth.
We have, the Dartmouth data are of higher quality
because we have, we have a better scanner here than used
in Princeton a few years back.
And, you can see that the medium between subject correlation is
around .2 for anatomically aligned data and goes up to
about .4 at Princeton and at Dartmouth the meeting was
at about .35 and went up to about 06.
And so, there's a freak increase
in between subjects synchrony for each dimension.
So, we're finding response tuning functions,
so this is response tuning into the movie, 2,200 stimuli,
virtually, if you do it time point by time point.
And, the correlations are increasing very dramatically.
[Silence] And, you can see that the between,
now between subject classification,
based on these hyper aligned dimensions, so now,
instead of building a classified for each individual subject,
we build a classified based on nine subject's data,
and then we classify the ten subjects data
on the other subjects patterns.
But, it's now in this common hyper aligned space.
And, for Princeton, which is faces and objects, oh,
this is wrong, oh I'm sorry, okay, this is,
this is not for the category perception.
We did something different to, as our first step of validation.
We would take an 18 second segment, time segment,
from the movie, which is 6 T hours,
and we would say can we tell exactly what part
of the movie the subject was watching when this sequence
of brain responses was produced?
And so, we can't do within subject classification with that
because every 18 second time segment is unique.
So, we have to do between subjects.
What we did is we correlated that 18 second time segment
of patterns with the average pattern
for the other 20 subjects, okay.
And, we compared that correlation with the correlation
of that time segment in the test subject with all
of the time segments in the movie.
And, we did this cross validation
so we wouldn't divide hyper alignment parameters in half
of the movie and the test the validity
in the other half of the movie.
And, the accuracy for identifying precisely
which time point, which segment of the movie they're watching is
about 66% in Princeton and about 74% at Dartmouth, okay.
So, the chance here is one out of a thousand.
So, this is we can, we can identify what, you know,
what the stimulus was
with remarkably high precision using this.
When we do the same kind of thing
with anatomically aligned data, the drop is huge.
It's over 50% drop of classification accuracy which,
again, shows the anatomical alignment is just not capturing
this kind of fine skilled information
in ventral temporal patterns.
And, the next thing we asked was, do we,
in a thousand dimensions,
is that really the true dimensionality
of this representational space or can we find a smaller space?
So, we did a PCA to and to see how many dimensions were
necessary to afford this kind of high level
of between subject classifications.
And, this is the between subject classification movie time
segments based on different numbers of principle components.
And so, we find that five and ten are very clearly efficient
and needs more than 20, and it starts with asymptote
out here somewhere past 30 or so.
And so, arbitrarily, we've picked 35 as the number
of dimensions we were going to analyze
as our common model space.
>> A very quick clarification question.
How did you get the original thousand voxels,
just like the one sense on how you got them?
>> Okay, we did that by correlating all voxels
in one subject specie cortex individually all voxels all
other subjects into cortex [inaudible] individually.
And then, we found the voxel that showed the maximum
between subject synchrony
with any voxel anywhere in VT cortex, okay.
And again, we did that kind of voxel selection
of the data work used for validation testing.
So, we arbitrarily chose 35.
There were people who advocated 42.
But we decided that was too cute.
So, this is the final form of the transformation.
So, we have now instead of a 1,000 by 1,000 dimension matrix,
we have a matrix that's 35, 35 by 1,000.
And that converts the first subject's voxel space
into time series on these 35 dimensions, okay.
Now the next, there's a little bit of magic here now,
because once we have this transformation matrix,
so this is our P, this is our hyper aligned parameter matrix
for a given subject.
Once we had that for a given subject we can apply
that to any data in the same voxel.
It doesn't have to be movie data.
We could apply that to the existing data
from the object category perception experiment
and the animal species category perception experiment.
And so, we can take any data from those thousand voxels now
and put them into our common model space.
And that's how, and this just shows that.
So, now we can have voxel data from experiment two and multiply
that by the same parameter matrix
and now we get the experiment two data in model six.
[Silence] How does that work for?
So, first of all with the time signal classification,
the light blue bar is classification
from these all thousand dimensions
and the dark blue bar is what they used the 35 hyper
aligned PC's.
Now, if we use this matrix to transform the data
from the category perception experiments,
you can see that the red bar is within subject classifier,
so that's the standard way of doing this.
The light blue bar is classification
between subjects based on all thousand dimensions.
And you can see it's equivalent
to within subject classification.
And, the dark blue bar is if we only used those 35 dimensions,
we've reduced this to 35 features
and classification performance is equivalent.
There's no drop
in classification performance with this model.
And, it's equivalent now
to within subject classification unlike the classification
anatomically aligned data.
And, the, you know the confusion matrixes
for within subject classification,
between subject classification there's the thousand dimensions,
between subject classification there's the 35 PCs,
and between subject classification based
on the thousand anatomically aligned voxels.
And, you can see that the similarity structure embodied
in the class confusion matrix is very similar
for within subject classification
and these two varieties of between subject classification
of data transformed into the model space.
So, those are the basic data.
We've shown that we can derive a transformation matrix
for each individual subject that puts their data
into a common model space.
And, once it's in the common model space,
we can do between subject classification that's just
as good as within subject classification.
Yeah.
>> I just wanted to ask if there's anything interesting
that you can see
in the transformation matrixes themselves,
because I would expect, if you were to do it on the big chunk
of brain that you would see some portions of the matrix
that are transforming sub regions
of the brain into [inaudible].
>> Yeah.
>> Does it partition out in any obvious way?
>> What I'm going to show you later in the last session,
working parts of the model,
is we look at the typography that's associated with each PC.
And so, each PC has, a PC now is not a voxel or a super voxel.
It's a pattern of activity across all
of the controlled temporal cortex.
>> But, in the transformation matrixes themselves?
>> In the transformation.
>> It works inside of those?
>> Yeah that's where, that's where it'll be.
Yeah, okay.
So, the first question, a lot of people look at this,
and especially people who are psychophysics
and say why do you have people watch this movie.
And you know, what a pain.
And, you're collecting so much data and it's so uncontrolled.
Why don't you have people looking at, you know,
spots of colors and [inaudible] bars, and things like that?
So, I want to show you that there is something
about using a natural stimulus that gets us into a place
where we get more information out of ventral temporal cortex.
So, the reason we chose the movie was because we knew
from work like [inaudible] work
that movies evoke synchronized activity
across broad expansive cortex,
especially the posterior perceptual areas.
And, we wanted to have as large a sample of brain responses
to visual stimuli as possible.
And so, we thought the movie was a good way
of getting a very large variety.
And, we didn't really analyze the content.
We just thought we have a variety
of stimuli to brain responses.
So, we did a couple of things to try to add,
to see if there's something further
about the movie besides just having a quick wide variety.
The first thing we did is we decided
to do a classification test
where we matched exactly the classification,
the type of the test for the faces and objects
as compared to movie timeframes.
So, we picked one timeframe from each block
of the seven categories of stimuli
in the category perception experiment.
And then, we took the same timeframe, the same timeframe
of the time series for the movie
so we had seven single time points from the movie,
seven time points that we're trying to classify
in the asymmetric experiment and seven time points in the movie.
And then, we did between subject classification
of data transformed in the model space.
And, for the single TR of the movie,
we're getting over 70% accuracy.
Now, this is one out of seven classifications instead
of one out of a thousand.
But, it's a single TR not six TRs.
So, this is really pretty remarkable.
A single TR from the asymmetric experiment classification
accuracy's about 38%, okay.
So, the responses that we're evoking
with the movie are much more distinctive relative
to each other than the responses that we evoke
with a single a single object presented
with a blank background.
So, there's something about the clutter,
there's something about, perhaps,
about the subject being more engaged with the stimulus.
There's, you know, there's a narrative,
a plot that's going on with it.
And it's dynamic.
We think, also, that maybe the dynamic nature is very important
for evoking this kind of distinctive response.
But, it's much more distinctive
than is the other responses to faces and objects.
And that, the algorithm can be applied to any time series,
so we apply it to the movie because we want it
to sample a large number of brain states.
But, we can also apply it to the time series of the faces
and objects and the animal species experiments, okay.
And we can get a common space that's based only
on the responses of those limited sets of stimuli.
And then, we can do between subject classification
of data that's been transformed into a model space based
on this on these responses.
Now, for the between subject classification of the face
and object data and the movie, and the animal species data,
it's a little bit tricky.
But, we figured out a data folding scheme
so that it's completely clean.
There's no peeking or double dipping.
Is Niko [assumed spelling] here?
This is one of Niko's obsessions.
But, we, I think he will be satisfied
when he sees how we did that.
And, when we do that, we find that the
between subject classification of the data
from the same experiment is very high.
So, for faces and objects, it's equivalent
to the movie hyper aligned data.
Animal species is actually a little bit better.
So, this is a very useful method that you can use
within an experiment, you don't have to do the movie.
But, the validity is limited to that experiment.
So, we apply the transformation matrix that we derived
from the face and object experiment
to the movie experiment, the movie data
from these Princeton subjects, classification performance drops
from 65, 66% down to about 16%, okay.
So, it just, it doesn't have validity for the variety
of stimuli that are in the movie.
It's really only finding parameter is the rotation
parameters that align responses in a very small subspace
of the representational space of ventral temporal cortex.
We've got an equivalent drop
in the derived hyper aligned parameters
from the animal species experiment.
So, that means that our hyper aligned parameters have a
general validity of cross experiments.
So, they seem to be capturing the structure
of this representational space in a way that has validity
across a wide range of the stimuli.
Whereas, when you do a more traditional experiment
with single images of static images of single objects,
the validity is very limited.
And then, finally, people say, two hours,
really, you guys are crazy.
And so, so we analyzed what is the effect of deriving the model
on smaller sets of data from the movie.
So, maybe we can get away with ten minutes of the movie.
All the studies of resting state functional kind
of activity right now are based on ten minutes
of resting state data.
Here we are doing 110 minutes.
We really need that.
Well, this shows the effect of the number of time used
for [inaudible] on the see accuracy
for the three experiments.
You can only do it for half of the data, you can only,
we have to derive the model in half of the movie and apply it
to the other half the movie for time segment classification.
And, you can see, it keeps going up.
We haven't reached a peak
with a full 55 minutes of movie in here.
And, with these other experiments, it still seems
to be going up a little bit.
Maybe it's asymptoting somewhere here
around 1,700, 1,800 time points.
So, finding these parameters so that they're fine-tuned
to align the data across subjects requires a lot
of information about that subject representational space,
a very broad sampling of responses
to a wide variety of stimuli.
So, does the movie add anything?
Well, we think the answer is yes
that there's something very special
about using a natural dynamic stimulus.
We don't know if it's because it's cluttered or it's dynamic
or that it's tied together with a narrative, you know.
Some people attend better or if it's
because the sound helps them pay attention.
We don't know which of those factors is critical.
But, something about watching a natural movie which real people
like to do in real life is much more effective
for finding the parameters that for deriving this model
than the standard experiment.
So, I'm just going to go over the working parts of the model.
So, what are these dimensions?
We have 35 dimensions.
What is this dimension in the common model space?
And this is something that really tripped people
up when we first showed this.
as a, as a, on the side, I should say,
the paper is finally impressed at, we first presented this work
through Peta Poster at neurosciences,
what it a year and a half ago?
Two years ago, two years ago, and we've been,
we actually submitted the paper at the time that we,
that [inaudible] for a neuroscience meeting.
So, Jack Gallant came up and looked at it
and said oh this is really cool.
Does it impress?
I said no but it does, you know, we just submitted it.
He said, oh so we'll see you in a year or so.
And, I thought, what a cynic, you know.
What does he know?
It's going to get accepted like that.
So, two years later, it's finally accepted.
But, what are these dimensions?
And one thing that tripped people up is,
you know, what happens here.
So now, each dimension has a distinct response
tuning function.
I showed you how those response tuning functions are more highly
correlated across subjects than the response tuning functions
for voxels that are anatomically aligned.
And, each dimension's a pattern of activity in VT cortex
that has a distinctive typography.
We'll go through how that works.
So, any voxel in the original data,
in the original data here is associated
with a series of 35 weights, okay.
So, these 35 weights are multiplied by the time points
in the transposed model data matrix and give a model
of that voxel's time series.
So, an original time series is modeled as a weighted sum
of the, of the dimension response
to functions or time series.
And, you can look at that in the movie, you can also look at that
in the, in the category perception experiments.
So, this just shows the average response to the seven categories
and face and object experiment and the six categories
in the animal species experiment
for the top PCs PC1 and PC2, PC3, PC5.
You can see that PC1 is responding
and has positive weights for just the human faces
and negative weights for everything else.
Whereas, PC2 and PC3 have negative weights for the faces
and positive weights for the objects.
And, PC5 seems to have positive weights
for all categories [inaudible] faces.
So, there's some meaning contracted at least
from the high order of PCs.
Now, each column
in the parameter matrix is a series of 1,000 weights.
And, each of those weights is four voxels.
So, you can apply these weights as a typography.
So now, this is the typography for PC1.
And this is the typography for PC3.
And, this just shows the typography
for again the same four PCs that I showed you before
with the response tuning functions.
And, you can see that the PC that seems
to be responding only has positive responses only
for human faces shows positive weights only
in a couple of spots.
They're within the FFA.
So, these outlines here are the individually defined FFA
and individually defined PPA, okay so in face area.
So, it's not picking up the entire face area.
It's just a piece of the face area.
Whereas, the PC5, which is also responding to positively
to the faces but not only faces,
has larger positive responses in the FFA.
But there are also positive responses outside the FFA
and the PPA.
So, this just shows those two typographies in greater detail
so you can see them more easily.
So, the FFA and the PPA are not good guides
to the structure the typographies
that are being picked up by these PCs.
These PCs, again, are the PCs
that best capture the information
of response variability while during a natural viewing task,
watching a movie.
So, any response, this is the response to time point 200,
is modeled as a weighted sum, a weighted sum
of these 35 pattern, response pattern basis functions.
And, it's a different set of response pattern basis functions
for each individual subject
because the different individual subjects have different
typographies, okay.
The response tuning functions are common across subjects,
patterns are slightly different.
But again, you can see that in subject one and subject two,
there's a certain amount of commonality that you can see
if you use your human pattern recognition capability.
But, they don't align up well.
Now, another thing you can do, if you can take
that 35 dimensional states, is define that PC.
But those are dimensions that are selected by PCA
which notoriously doesn't pick up things that can be are
that meaningful, right.
It's an aggregate that captures the information we want to get.
That's five minutes, right, okay.
And, but, you can, you can, that's a 35 dimensional state.
So, you can define dimension anywhere you want
to in a 35 dimensional state.
You can do a linear discriminate.
So, we did a linear discriminate that contrast the response
to faces versus houses
in the faces object perception experiment.
And what's the typography for that dimension
in this 35 dimensional space?
And, the math is very simple.
You get, linear discriminate has a set
of 35 weights for the 35 PCs.
And you multiply that by the transformation matrix
and you get out a typography for that linear discriminate.
And, this is the linear discriminate
for faces versus objects.
And, you can see now that it really is picking
up the FFA pretty often.
So, the FFA is preserved in the model space.
We haven't lost it.
It's right there.
And it can be extracted
with a fairly straightforward procedure.
So, this just shows the typography
for that linear instrument versus objects
in subject one and subject two.
And, you can see it's pretty good correspondence
with the outlines of individual differentia of FFA.
Now, remember, this linear discriminate is defined
in the model space for all subjects.
So, we're projecting all subjects' data
into individual typographies.
And, we can go one step further.
We can take one subject's brain space, and then,
we can project everybody else's data
into that subject's brain space.
So now, we've moved people with that person's brain space
with other person's data throwing his data out, okay.
And then, we can calculate, do the standard calculation
of where's the FFA and where's the PPA
in this subject's brain typography.
But now, we're not using that subject's brain data.
We're using other subject's brain data.
And so, this shows that result for these same two subjects.
Again, here's the individually defined FFA
in the yellow outline of the right and left,
and the red area is the group defined FFA projected
into that subject's brain.
And you can see they're remarkable good, even the kind
of long, you know, the thing extending
in the post area direction has picked a little blob that's
separated from the other blob that's picked up.
Again, that's from group data aligning with individual,
his own individually defined FFA and PPA based on their own data.
So, we haven't lost the anatomy.
The anatomy is there and it's preserved.
So, I think, I'll sum up now.
This [inaudible] dimensional model, ventral temporal cortex,
we showed that the population responses I ventral temporal
cortex population responses as reflected in patterns
of activity and FMRI data can be accounted for by response
to functions that are common across brains.
And we're going to reduce that to a set of 35
or 42 common response functions.
And these response tuning functions
and the associated typographic basic functions model
representations for a very wide range of complex stimuli.
So, it's all derived in one dataset
but it's valid for the other two.
So, it seems to have general validity across a wide range of,
across a large part of stimulus space
in ventral temporal cortex.
These same principles can also be applied to other parts
of the brain, of course.
And so, this group has done this.
so, in addition to ventral temporal cortex
with movie time classification you're
on 74% Dartmouth subjects, he did the same thing
for movie [inaudible] cortex and got
between [inaudible] classification movie times
seven, that's about 70% in auditory cortexes
with a time classification of about 30%.
So, we're, so we can model in other spaces well
and then we can combine them.
So, if we, if we, instead of just taking data
from ventral temporal cortex, we take data
from early visual cortex, late ventral, late ventral pathway,
and auditory cortex and say now can we tell what part
of the movie a person's looking at.
It's in the same classification procedure
with that one [inaudible] correlation classifier.
And now, we're at 90%.
So, we're classifying one out of a thousand,
we're specifying exactly which part
of the movie a person's watching.
I don't think we can go much higher than 90% for that kind
of a very demanding classification test.
So, what are we, what does this model do for us?
Okay, let's back up and we think of representation
for a complex stimuli as residing
in representation spaces.
And, each stimulus is a vector within this space.
And, we start with the stimulus space and we can model
that with visual features and semantic features.
And we're going to hear a lot more about those kinds
of models later, in talks later today.
And, this is transduced into a neuronal population state, okay.
And this is, this population is comprised of single neurons
with simple tuning functions.
And then, that we can measure that neural population space
with FMRI and get a VT voxel space.
So, if we, we can't get a complete picture
of this neural population space with any existing measure.
But, with ventral temporal cortex, we can get a measure
of all ventral temporal cortex at low special resolution.
And now, with our model, we show
that we can then transform this voxel space into a kind
of model space and with a very straightforward transformation,
transform from VT voxel space to common space and we can invert
that and convert the data from a common model space back
into the VT voxel space.
And, these common, these dimensions here,
which are patterns, we don't really have,
we can't say exactly what the single [inaudible] functions are
at this point.
[ Applause ]
>> Error.
>> So that was really cool and I
like the methods, they seem great.
But I have this nagging concern
and I'm hoping you can fix it for me.
When you do single cell analysis, logistical analysis
to try to figure out what the response properties
of a neuron are, and you show an ensemble of stimuli,
you always have this worry
that if your stimulus ensemble has its own structures,
correlations let's just say, then the analysis is going
to hand back something that has that embedded in it.
>> Right.
>> And so, you'll attribute things to the neuron
that really are not about the neuron they're
about the stimuli they used.
So, when you use natural stimuli here,
I agree with everything you said
about the nice rich stimuli really covering a wide range
of brain activity and patterns, that's great.
But, aren't you worried
that Indiana Jones has some correlation between something
in the soundtrack and something Indi tends to do
and that those are the things that are going
to show up in your PCs?
And, they have nothing to do with the brain they have
to do with that movie.
>> Yeah, that's why we, you know,
we test whether these parameters then are valid for other stimuli
like the faces and objects.
>> Well, just, just because they're strong enough
to give you face versus object classifications doesn't mean.
>> That's just basically dog face versus monkey face.
>> I know but that doesn't mean [inaudible].
>> Moon moths versus ladybugs.
So, those are fine distinctions.
And, another analysis, I'm sorry I'm going to get you talking,
but if I could put one thing into here
which is we did another analysis.
Some people said well there's faces and objects and, you know,
and things like that in the movie.
So, how about you just, you know,
define the parameters based on other responses
to the same categories?
We went back to the movie and we looked, went through it
and we found all the time points where there was a monkey
or a dog or a bird or a bug and we removed those time points
from the movie and the 30 seconds falling the movie
because of delayed human dynamic response.
And then, we had a dataset that was based on responses
to everything but monkeys and dogs and birds and books, okay,
had no effect on the [inaudible] model.
>> I believe that and I would expect it.
The effects that I'm talking about are a little more subtle,
right, they're biases.
>> Yeah.
>> And they're gentle, but they might matter.
And, you might run into, later on, some sort of a test
where you end up not succeeding because those biases are,
as I said, driven by the movie rather
than driven properties of the brain.
>> Yeah I know.
Our plan now is to try a wider variety of natural stimuli
and to see if we get the same kind of model, you know,
for something that's not as, you know,
that's very different from Indiana Jones.
It's a concern we have with, you know, general validity.
The whole point of this enterprise was to try
to find general validity.
And, as we know, you can't completely sample object space,
okay.
So, we're just doing the best we can.
>> There's going to be a problem with any natural stimuli
because if you whiten the stimulus
so that it's perfectly white, then you end up with white noise
and you don't get any responses.
So, it's, I mean, you can complain oh yeah gee there's
probably spatial temporal correlations in this movie,
there's huge space temporal correlations,
but there's way less spatial correlations
in the stimulate set and, if you just showed, say, set pictures
of faces and houses as the only stimuli used.
So, I guess, there I don't understand what your
complaint is.
He's not making a claim by how the brain processes the movie.
He's making a claim that you can align brains
and that there's common responses to movies.
And that seems like to be a fine claim.
>> I think that we have to take it online because you
and I have been having this discussion for like four years.
I don't think we're going to resolve it right now.
Tom did you want to say something?
>> Yeah, I want to ask a question about the volume
of the brain on which you're applying this transformation.
So, I could imagine that, instead of doing the region
of the brain you did, you might say well I'm going
to apply this, train this transformation
on just FFA, or I'm hoping.
So, why didn't you do those two things or,
you know, maybe get to it?
But, I'm more interested in your thought experience
about what would happen if you did.
>> We haven't done the whole brain
because we're very concerned about over fitting.
And so, if we have, in 2,000 time points we don't really want
to have more than 2,000 voxels that we're rotating.
But.
>> I would guess you'd get less over,
if you did the whole brain.
It seems like the more constrained.
>> We haven't tried that yet.
There's no reason not to try it.
But, we have done this, restrict the analysis
to individually define FFA voxels
and individual define PPA voxels.
And, we find that we get between six subject classification
for faces and objects and, you know, houses versus objects,
that's equivalent to an equivalent number
of small voxels from all ventral cortex.
[Inaudible] works within a small location.
And, there's no reason you can't do this for, you know,
a small defined region, functionally defined region
like the FFA or a smaller anatomical region, we can do,
you know, just the, you know,
the lateral [inaudible] posterior third.
>> Yeah, yeah what I was really thinking was
that you have a particular constrained type
of transformation that you're allowing these kinds
of rotations in image, MRI image space.
And so, that's kind of at tight constraint if you think
about all the possible more things you might do
from one brain to another.
And so, what I was kind of guessing your answer would be,
and, it's interesting it was sort of completely the opposite,
is that if you went down the smaller region should expect
more over fitting because that constraint wouldn't be
as powerful whereas doing it
to the whole brain you would expect maybe you wouldn't get
that good of a mapping because it's an
over constrained problem.
But, it does seem like there's some interesting theme
to be figured out about what some regions
and what the effects are.
[Silence]
>> It's a different point from [inaudible] but it seems
like there are different transformation.
>> Jack, push the mike button [inaudible].
>> [Inaudible] earlier point was the rational one is the pinpoint
I was referring to where he come
into that you could break the transformation and make it
into several sub transformations [inaudible].
The good part.
>> Thanks for clarifying that Jack
in case anybody was confused.
So, it seems like the questions are about claims
that I didn't see you really, you really weren't making claims
that these 35 dimensions are special or important,
they're just condensation of the data you have in hand.
And, your real claims are about the ability to align subjects
in a way that's much better than what's been done before,
at least, that's how I read what you're presenting.
So, to that end, could you tell us how much,
if you took a subject and said, how much variance remains
in individual subject's data that cannot be,
that can be explained by the common model, all right?
So, you have a bunch of voxels going off on the subject
and you say really this person is actually just a specific
re-projection of this common model.
And, the rest of this stuff is either noise in your methods
or something real in that subject
that you can't yet explain.
Did you try to do anything of that nature so what faction
of explainable menus have you?
>> Yeah we've done the plot of variance explained
and also the plot between subject correlation
as a function number of PC, PC by PC.
And, we found that we explain about 70% of the variance
with these 35 dimensions.
And, we get up to about 75% I think when we go
to 50 or 60 dimensions.
Now, we haven't tried classification based only
on the PCs we excluded.
But, imagine it would be pretty close to chance.
That's my intuition.
What do you think [inaudible]?
>> [Inaudible].
>> Mike, mike.
>> I don't have a lot of information
so I don't know how it could convert.
Can I answer Tom's question?
>> Sure.
[ Inaudible ]
>> Any other questions?
Alice.
>> Alice.
>> Yeah so.
>> [Inaudible].
>> Mike.
>> Yeah, across these rotation, I mean,
did you have problems adjusting the number
of dimensions you rotated
and were their parameters there in choosing the fit?
I mean, how did you?
I have the intuition it would actually be better
to fit a little bit less than more, than to over fit there.
>> We found that the results, we chose, we did the analysis
on a variety of different voxel set sizes from 100 voxels
up to 2,600 voxels per subject and found
that the results were really stable from about 6
or 800 voxels up to about 2,000 voxels.
>> [Inaudible] usually gives you an option of rotating to fit
in a variety of number of dimensions of the space.
So, I don't know maybe.
[ Inaudible ]
The biggest.
[ Inaudible ]
>> We let all voxels in our thousand voxel space rotate.
And we reduced the number of parameters.
Before we applied the [inaudible] transformation,
we took the data in each voxel and standardized it.
So yeah, every voxel had a zero mean and unit variance
for the responses to the movie.
So, we didn't have to, we didn't have
to allow any translations or scaling.
It was just the rotations and the reflections,
just that one [inaudible].
>> Come to the Mike Patrick.
>> Patrick to the Mike.
>> So, you are suggesting
that there's a common neural architecture.
>> Right.
>> And, but it differs, in some ways from person to person.
So, I'm interested in whether you speculate on this
as like preordained bits of stuff that get packed
at different locations kind of like when I put the groceries
in the car milk might go there or there.
But milk never goes on top of the bread.
Or, is it just a result
of self-organization maybe constrained by where inputs
and outputs have to be?
So, do you have a feeling about that or how you might test it
or whether you could ever solve that question?
>> Yeah, so, we think that that's a very deep
and complicated question.
And, the I thought I showed you
that the high ordered PCs have a certain commonality
and it's hard to math, align them mathematically.
But I can see there's some similarity.
Now, we have this other algorithm
that we've developed called functional alignment
and we also have a third quick called functional activity
alignment where we try to align subject's data preserving the
top logical relationships among voxels.
So, we could do a rubber sheet warping to see
if there's a typography that's just been distorted
from one subject to another.
And, we can improve between subject classification
with that.
But, the improvement is only about a third
of what we get with hyper alignment.
So there's seems to be something that we cannot that at least
so far we cannot capture
with a consistent typology across subjects.
Now, there's a, we know that there's a course scale
and fine scale typography.
Okay so the course scale typography's like case response
to various and place response to various.
And then, there's a fine scale typography that's
poorly characterized.
And, when we tried to look at that, here we go,
as we analyzed the effect of data smoothing, now,
all the data, we always smooth our data
with a four millimeter filter.
And, offline, I can talk about why we do that.
And we consistently find
that four millimeter smoothing helps NVP
classification analysis.
But then, if we go beyond four, it drops off.
And, it drops off somewhat for faces and objects
and animal species but it really drops off
for the time segment classification.
So that suggests that there's information
at a spatial frequency of around four, you know,
four millimeters that's lost as soon
as you start blurring it, okay.
So, there seems to be for some fine structure.
I'm not arguing for hyperactivity in these patterns,
this could be a typography that exists with a form,
a spatial frequency for along four millimeters rather
that that kind of sub voxel spatial frequencies
that people are still arguing about
and Jeremy Freeman has gone a long way to destroy
in the orientation math.
>> Tom.
>> You showed us the loadings of the PCs on the brain, so,
for example PC1 here are the regions
that it weights positively and negatively.
What if you go the other direction
and for each PC just show us the images
in the movie most excite that PC?
Did you try that and if you did?
>> We haven't done that yet.
But, we're, you know, you know, that's, that's like you know,
the forest of no return, right, when you start to go into trying
to analyze the content of a natural movie.
>> [Inaudible] I'm trying to understand the content of PC1.
>> Right, and so, you know, actually you did a little bit
of that didn't you [Inaudible]?
[ Inaudible ]
>> I have an odd question.
Could you hyper align across species?
So, pick a [inaudible] for example and, you know,
first that'd be interesting.
And, if you could do it, would that tell you anything about how
that species represents that information differently?
>> Yeah, well, I think, you know,
we're going to have a talk later today by [Inaudible]
where he shows that you see a similar similarity structure
among the responses to stimuli in a cat brain
as to the population measure [inaudible] recording
as compared to patterns in the human brain measured with FMRI.
And, so I assume hyper alignment would work and might help
to even improve showing the cross species similarity
of representational stages.
Now, I've been trying to think about how can we get a dataset
in monkeys watching Indiana Jones?
You know, so get, we're getting a large number of neurons
in a single recording while they're watching natural
stimulus and have a set of monkeys doing that
and also doing the FMRI thing with that.
But, it's a daunting challenge and I'm not a money
or a physiologist so I don't think I'm going to go into that.
But, I think that the same methods should be applicable
to population responses measured
with other more realities like [inaudible].
And, we should be able to look for transformations
from one species to another.
>> Okay, one last question if there is one.
Alice.
>> Just a quick one and maybe it's more for thought.
You, did you think about doing correlations among the images
that are being viewed when the brain patterns are similar
to see the extent to which a lot of this is described based
on checking visual feature categories.
>> Visual feature categories.
>> Just out of curiosity.
I mean, the question is are you looking at something
that is a really a visual representation,
high level visual representation that's reliably hooked
to a similarity?
Just, stimulus images that are being seen.
>> Well, we've analyzed the stimuli
in the category perception experiments using a V1 model.
And, the way we, this is in Andy's poster.
He then looks at the similarity structure among the stimuli
based on the vector in V1 space.
He also looks at the similarity based on behavioral ratings.
You know, is a mallard more like a warbler or a monkey?
And he finds that the V1 similarity structure is seen
in the neural similarity structure as in NVP analysis
in early visual cortex and not in ventral temporal cortexes.
Ventral temporal cortex is a very strong correlation,
actually .9 correlation, is that right,
between the behavioral rating similar structure
and the similar structure of the neural responses.
So, it's not related to the V1 structure.
It's strongly related to behavioral weightings.
I think [Inaudible] has tried to run the V1 analysis
on the movie, and we just have, it hasn't worked.
And, you know, I don't think the V1 model works very well
for the dynamic stimulus to be perfectly honest.
>> Yeah, actually, you could almost anticipate
that as they're looking at the movie
if the brain map's changing so quickly, when the movie general,
probably, similarity parameters
in the movie stayed pretty constant
but the activation maps were changing very dramatically
and very quickly.
But I was curious.
[ Applause ]
[ Silence ]