Building Common, High - Dimensional models of neural representational spaces

[ Silence ] >> I'm going to be talking about work that we've been doing for the last two and a half years, I think, how to build a common model of a neural representational space. And we demonstrate how the model works with a common model eventual temporal cortex in humans using FMRI data as the basis for an algorithm for application algorithm. And the problem that we're trying to address is the problem cross validity classified models. So, we built a multi-pattern classifier to try to classify brain states that we get from measurements from FMRI or anything else, but specifically here with FMRI. We have to build a new model for each individual subject because when we try to build a model based on other subjects, the voxels or features in these brain spaces don't line up very well. And that leaves the question of do different brains use the same basis functions or response to new functions for the classifications or does every brain have kind of an idiosyncratic set? So, we tried to address that with this. So, this is the outline of the talk. First, I'll talk about the problem and then I'll present the algorithm that has two steps, hyper one within dimension reduction and revalidate it doing between subject classification using model dimensions and speechers [assumed spelling]. And then, we will talk about a couple of other topics within this dataset. We use a movie to, we have people watch a full length action move and measure brain activity for two hours while they watch this movie. And, that's the basis for aligning data across brains. And, the question that everyone asks is why are you using the movie, why this, you know? But I'll try to show that that really is an essential component of this. And then, I'll go over what the working parts of the model are, computationally speaking. So, I wanted to acknowledge the people in the lab and everyone contributed to this work, but especially [inaudible]. [Inaudible] fused the engine and made this possible. He wrote all the software, all the validation testing, he was tireless and incredibly efficient and accurate. So, he is we owe him a great debt. And, this work is also in collaboration with people at Princeton University, Peter Ramadge who is the chair of electrical engineering there and sequence of graduate students [inaudible] who is now at MIT, Ryan Conway who is now at Columbia, and Alex [Inaudible] and he's still there in Princeton. So, this is about, we know that, with using multi [inaudible] classifiers, so machine learning classifiers, we can look at the patterns of response in eventual temporal cortex in humans and make and make fairly fine distinctions of among the brain states that are evoked by different types of stimuli. And, I'm going to, I'm going to present two datasets here, one with seven categories, present three if I include the movie. But, one is a category perception experiment, two categories of king faces, dog faces, monkey faces, shoes, chairs, and houses. And, the second is a study with six species of animals, two species of insects, two species of birds, and two species of primates. And, another analysis, this study was run by Andy Connelly and there's another analysis of these data on a poster that you might want to check out before it's taken down. But, the problem is that we, we can, we can distinguish the responses to all these. Others have shown other fairly fine distinctions so in the [inaudible] the distinguished teapot from chairs and they distinguish one type of teapot from another type of teapot and then and then the [inaudible] they distinguished old faces from baby faces, hands from torsos, and rural buildings from skyscrapers. So, these are fairly fine distinctions that can be harvested from eventual temporal patterns. But, this classification work in eventual temporal cortex builds a new classifier for each subject. And these confusion matrixes for these two studies you can see along the diagonal. You have this nice bright yellow and red colors and off diagonal everything's blue. So, we have quite high accuracy. In fact, it's about 63% for the faces and objects and about 68 to 69 percent for the animal species with chance down around 14 to 15%. So these, really, try to build a classifier by aligning the data anatomically across subjects. And then, we build a classifier based on nine subject's data and try to classify the tenth subject's patterns. The confusion matrixes is much worse and the classification performance crashes, crashes down into the 40, around 44% for faces and objects. And the drop is even more drastic for the final distinction among animal species from 68 down to about 37%. So, the anatomical alignment just doesn't detect the commonality of these feature spaces that are underlie pattern classification. So, that's a problem. Other response between functions is the same for different brains. And, the classifiers after anatomical alignment are suboptimal. So, how do we go about trying to approach this problem. Well we've called in some help, called in an Indiana Jones. And, we had people watch the full length of Radars of the Lost Ark while they watched, while we measured their brain activity. We did this in two sessions, actually it's 110 minutes, 55 minutes they watched for 55 minutes and then we took them out of the scanner, let them take a break, have some popcorn, and put them back in the scanner and they watched the second half of the movie. And, we had the same subjects watching the movie and performing these category perception experiments. Ten subjects at Princeton watched the movie and did the face and object category [inaudible] experiment. Eleven subjects at Dartmouth watched the movie and did the animal species experiment. So, this, this cut is a scene at the beginning of Radars of the Lost Ark. And I just want, I'm just showing this so you get a sense for the complexity of this stimulus. This is, obviously, nothing like a standard psychological experiment. It's very cluttered. It moves quickly and it's dynamic, okay. And we have them listening to the sound, see if we can get this, if that's going to work. >> You chose the wrong friends and this time it will cost you. >> So, we took out all this, you know, we pulled all the stops and had them watch the movie and listen to the soundtrack. By the way, this is a wonderful thing in terms of subject [inaudible]. We say we're going to pay you to lie and watch a movie for two hours. And they think great. And, you know, with a movie like this, they have no trouble, no trouble paying attention. So, what's their brain doing while they're watching this movie? Okay so, here are two brains. And, these are, the patterns of activity in ventral temporal cortex as their watching the movie and they're in synchrony, okay. So, these are the same time points that you're seeing. And, everything is scaled to a zero being for each voxel with red and yellow above zero and blue's below zero. The two hours, actually three seconds, we speed it up a little bit so you can see what's going on. And, first of all, you can see that the patterns change quite drastically from one time point to the next. And, the it's hard to see exactly what the commonality between these two brains is. So, it's not obvious objectively as we watch people's brains responding to this movie. How these two brains are, these patterns are representing the same information. So, we assume that, as people watch the movie, the pattern of activity in the ventral temporal cortex, which has this kind of fine grain, embodies these fine grain distinctions while the video's being watched, is encoding the same visual information, okay. The problem is that the cornea axis of this representational stages, so the voxel spaces are out of alignment. And so, what we want to do is, we want to take the data from one subject, like this person, and, somehow, transform it so it's an optimal alignment with the other person. So, what we do with this in order to do this, we put the data into a high dimensional space, we're not, we take it out of anatomical space. So, it's high dimensional space, each dimension is a voxel. Select 1,000 voxels in each subject. And then, we use per question transformation to align one subject's representational space to another. The procrustean transformation is a ridged high dimensional transformation that is, and one of the beauties of it is it has a closed form solution. So, it finds the optimal parameters. That rotates the data from one subject into another subject space. We do a series of paralyzed procrustean transformations and with a couple of iterations and so we get one standard space, standard across all 21 subjects and we rotate everybody's data into that space. So, how does this work? Okay, so this is the algorithm. So we call this application of procrustean transformation to align subject's voxel spaces into a common high dimensional space hyper alignment. And this is an overview of the algorithm. We start out with subject data and the data of the anatomical space. And, in our demonstration, we have a thousand voxel of ventral temporal cortex. And then, we hyper align it to a 1,000 dimensional common space. And so, in that, we have individual data in the common space and we have a set of parameters. So, procrustean transformation finds an orthogonal matrix, it rotates with reflections of data for one subject into an optimal alignment with the data space that's averaged across all subjects. And so, that's just average data in common space. So, that's hyper alignment. I'll go over how that works first. So imagine, it's hard to, it's hard to visualize vector pattern response factors in thousand dimensional space. We've reduced this to a two dimensional brain, two voxels, voxel one and voxel two and subject one and subject two. And, here's the vector response to time point one and subject one. Here's the vector response to the same time point in subject two. And, you can see they're in different locations in this vector space. And, these are our color coded two voxel maps. So, this, you can see that this person's brain map is different from this person's brain map. Now, as the movie progresses, the pattern's response, the patter vector moves to a new location in this space, to this time point two. And you can see that the brain patterns are really showing no co-response to each other pattern, three, four, five, six, and seven. So, the patterns never looked similar. These patterns don't look that similar except, unless you have, good rotational variance for. So, what the question transformation does is, it takes these two patterns of response factors and finds the rotation that brings them into alignment. So, this is rotating clockwise. And now, this pattern and this pattern are essentially identical. And, the dimensions in this rotated space are weighted sums of the original voxel dimensions, okay. So, this new dimension two is not a single voxel. It's a pattern of, you know, of 87 hundredths of voxel one and half of voxel two. [Silence] So, another way to think about it is, this is in the original voxel space, voxel one and voxel two, and the new dimensions are oblique to those original dimensions. And, this transformation is very simple to be calculated by the [inaudible] transformation. It is very simple to express matrix format so that the original voxel space, which is two voxels with seven time points, is multiplied by a rotation, a two dimensional rotation matrix to produce the data in this common space. It's now a space that is common between these two subjects. So, with our thousand voxel data it's exactly the same form. So, we have the voxel space, a thousand voxels, 2,000 in terms of five time points, and a hyper, the parameter matrix for the, that rotates this common space which is what we call hyper parameters is a 1,000 by 1,000 matrix, okay. And what that produces is a time, is time series in 1,000 dimensions over the same number of time points. So, what does this do? So this aligns the vectors that are also on, makes the time series for each dimension more similar across [inaudible]. So, this shows the histogram of correlations of each subject to the average time series of the other subjects dividing the data from Princeton and Dartmouth. We have, the Dartmouth data are of higher quality because we have, we have a better scanner here than used in Princeton a few years back. And, you can see that the medium between subject correlation is around .2 for anatomically aligned data and goes up to about .4 at Princeton and at Dartmouth the meeting was at about .35 and went up to about 06. And so, there's a freak increase in between subjects synchrony for each dimension. So, we're finding response tuning functions, so this is response tuning into the movie, 2,200 stimuli, virtually, if you do it time point by time point. And, the correlations are increasing very dramatically. [Silence] And, you can see that the between, now between subject classification, based on these hyper aligned dimensions, so now, instead of building a classified for each individual subject, we build a classified based on nine subject's data, and then we classify the ten subjects data on the other subjects patterns. But, it's now in this common hyper aligned space. And, for Princeton, which is faces and objects, oh, this is wrong, oh I'm sorry, okay, this is, this is not for the category perception. We did something different to, as our first step of validation. We would take an 18 second segment, time segment, from the movie, which is 6 T hours, and we would say can we tell exactly what part of the movie the subject was watching when this sequence of brain responses was produced? And so, we can't do within subject classification with that because every 18 second time segment is unique. So, we have to do between subjects. What we did is we correlated that 18 second time segment of patterns with the average pattern for the other 20 subjects, okay. And, we compared that correlation with the correlation of that time segment in the test subject with all of the time segments in the movie. And, we did this cross validation so we wouldn't divide hyper alignment parameters in half of the movie and the test the validity in the other half of the movie. And, the accuracy for identifying precisely which time point, which segment of the movie they're watching is about 66% in Princeton and about 74% at Dartmouth, okay. So, the chance here is one out of a thousand. So, this is we can, we can identify what, you know, what the stimulus was with remarkably high precision using this. When we do the same kind of thing with anatomically aligned data, the drop is huge. It's over 50% drop of classification accuracy which, again, shows the anatomical alignment is just not capturing this kind of fine skilled information in ventral temporal patterns. And, the next thing we asked was, do we, in a thousand dimensions, is that really the true dimensionality of this representational space or can we find a smaller space? So, we did a PCA to and to see how many dimensions were necessary to afford this kind of high level of between subject classifications. And, this is the between subject classification movie time segments based on different numbers of principle components. And so, we find that five and ten are very clearly efficient and needs more than 20, and it starts with asymptote out here somewhere past 30 or so. And so, arbitrarily, we've picked 35 as the number of dimensions we were going to analyze as our common model space. >> A very quick clarification question. How did you get the original thousand voxels, just like the one sense on how you got them? >> Okay, we did that by correlating all voxels in one subject specie cortex individually all voxels all other subjects into cortex [inaudible] individually. And then, we found the voxel that showed the maximum between subject synchrony with any voxel anywhere in VT cortex, okay. And again, we did that kind of voxel selection of the data work used for validation testing. So, we arbitrarily chose 35. There were people who advocated 42. But we decided that was too cute. So, this is the final form of the transformation. So, we have now instead of a 1,000 by 1,000 dimension matrix, we have a matrix that's 35, 35 by 1,000. And that converts the first subject's voxel space into time series on these 35 dimensions, okay. Now the next, there's a little bit of magic here now, because once we have this transformation matrix, so this is our P, this is our hyper aligned parameter matrix for a given subject. Once we had that for a given subject we can apply that to any data in the same voxel. It doesn't have to be movie data. We could apply that to the existing data from the object category perception experiment and the animal species category perception experiment. And so, we can take any data from those thousand voxels now and put them into our common model space. And that's how, and this just shows that. So, now we can have voxel data from experiment two and multiply that by the same parameter matrix and now we get the experiment two data in model six. [Silence] How does that work for? So, first of all with the time signal classification, the light blue bar is classification from these all thousand dimensions and the dark blue bar is what they used the 35 hyper aligned PC's. Now, if we use this matrix to transform the data from the category perception experiments, you can see that the red bar is within subject classifier, so that's the standard way of doing this. The light blue bar is classification between subjects based on all thousand dimensions. And you can see it's equivalent to within subject classification. And, the dark blue bar is if we only used those 35 dimensions, we've reduced this to 35 features and classification performance is equivalent. There's no drop in classification performance with this model. And, it's equivalent now to within subject classification unlike the classification anatomically aligned data. And, the, you know the confusion matrixes for within subject classification, between subject classification there's the thousand dimensions, between subject classification there's the 35 PCs, and between subject classification based on the thousand anatomically aligned voxels. And, you can see that the similarity structure embodied in the class confusion matrix is very similar for within subject classification and these two varieties of between subject classification of data transformed into the model space. So, those are the basic data. We've shown that we can derive a transformation matrix for each individual subject that puts their data into a common model space. And, once it's in the common model space, we can do between subject classification that's just as good as within subject classification. Yeah. >> I just wanted to ask if there's anything interesting that you can see in the transformation matrixes themselves, because I would expect, if you were to do it on the big chunk of brain that you would see some portions of the matrix that are transforming sub regions of the brain into [inaudible]. >> Yeah. >> Does it partition out in any obvious way? >> What I'm going to show you later in the last session, working parts of the model, is we look at the typography that's associated with each PC. And so, each PC has, a PC now is not a voxel or a super voxel. It's a pattern of activity across all of the controlled temporal cortex. >> But, in the transformation matrixes themselves? >> In the transformation. >> It works inside of those? >> Yeah that's where, that's where it'll be. Yeah, okay. So, the first question, a lot of people look at this, and especially people who are psychophysics and say why do you have people watch this movie. And you know, what a pain. And, you're collecting so much data and it's so uncontrolled. Why don't you have people looking at, you know, spots of colors and [inaudible] bars, and things like that? So, I want to show you that there is something about using a natural stimulus that gets us into a place where we get more information out of ventral temporal cortex. So, the reason we chose the movie was because we knew from work like [inaudible] work that movies evoke synchronized activity across broad expansive cortex, especially the posterior perceptual areas. And, we wanted to have as large a sample of brain responses to visual stimuli as possible. And so, we thought the movie was a good way of getting a very large variety. And, we didn't really analyze the content. We just thought we have a variety of stimuli to brain responses. So, we did a couple of things to try to add, to see if there's something further about the movie besides just having a quick wide variety. The first thing we did is we decided to do a classification test where we matched exactly the classification, the type of the test for the faces and objects as compared to movie timeframes. So, we picked one timeframe from each block of the seven categories of stimuli in the category perception experiment. And then, we took the same timeframe, the same timeframe of the time series for the movie so we had seven single time points from the movie, seven time points that we're trying to classify in the asymmetric experiment and seven time points in the movie. And then, we did between subject classification of data transformed in the model space. And, for the single TR of the movie, we're getting over 70% accuracy. Now, this is one out of seven classifications instead of one out of a thousand. But, it's a single TR not six TRs. So, this is really pretty remarkable. A single TR from the asymmetric experiment classification accuracy's about 38%, okay. So, the responses that we're evoking with the movie are much more distinctive relative to each other than the responses that we evoke with a single a single object presented with a blank background. So, there's something about the clutter, there's something about, perhaps, about the subject being more engaged with the stimulus. There's, you know, there's a narrative, a plot that's going on with it. And it's dynamic. We think, also, that maybe the dynamic nature is very important for evoking this kind of distinctive response. But, it's much more distinctive than is the other responses to faces and objects. And that, the algorithm can be applied to any time series, so we apply it to the movie because we want it to sample a large number of brain states. But, we can also apply it to the time series of the faces and objects and the animal species experiments, okay. And we can get a common space that's based only on the responses of those limited sets of stimuli. And then, we can do between subject classification of data that's been transformed into a model space based on this on these responses. Now, for the between subject classification of the face and object data and the movie, and the animal species data, it's a little bit tricky. But, we figured out a data folding scheme so that it's completely clean. There's no peeking or double dipping. Is Niko [assumed spelling] here? This is one of Niko's obsessions. But, we, I think he will be satisfied when he sees how we did that. And, when we do that, we find that the between subject classification of the data from the same experiment is very high. So, for faces and objects, it's equivalent to the movie hyper aligned data. Animal species is actually a little bit better. So, this is a very useful method that you can use within an experiment, you don't have to do the movie. But, the validity is limited to that experiment. So, we apply the transformation matrix that we derived from the face and object experiment to the movie experiment, the movie data from these Princeton subjects, classification performance drops from 65, 66% down to about 16%, okay. So, it just, it doesn't have validity for the variety of stimuli that are in the movie. It's really only finding parameter is the rotation parameters that align responses in a very small subspace of the representational space of ventral temporal cortex. We've got an equivalent drop in the derived hyper aligned parameters from the animal species experiment. So, that means that our hyper aligned parameters have a general validity of cross experiments. So, they seem to be capturing the structure of this representational space in a way that has validity across a wide range of the stimuli. Whereas, when you do a more traditional experiment with single images of static images of single objects, the validity is very limited. And then, finally, people say, two hours, really, you guys are crazy. And so, so we analyzed what is the effect of deriving the model on smaller sets of data from the movie. So, maybe we can get away with ten minutes of the movie. All the studies of resting state functional kind of activity right now are based on ten minutes of resting state data. Here we are doing 110 minutes. We really need that. Well, this shows the effect of the number of time used for [inaudible] on the see accuracy for the three experiments. You can only do it for half of the data, you can only, we have to derive the model in half of the movie and apply it to the other half the movie for time segment classification. And, you can see, it keeps going up. We haven't reached a peak with a full 55 minutes of movie in here. And, with these other experiments, it still seems to be going up a little bit. Maybe it's asymptoting somewhere here around 1,700, 1,800 time points. So, finding these parameters so that they're fine-tuned to align the data across subjects requires a lot of information about that subject representational space, a very broad sampling of responses to a wide variety of stimuli. So, does the movie add anything? Well, we think the answer is yes that there's something very special about using a natural dynamic stimulus. We don't know if it's because it's cluttered or it's dynamic or that it's tied together with a narrative, you know. Some people attend better or if it's because the sound helps them pay attention. We don't know which of those factors is critical. But, something about watching a natural movie which real people like to do in real life is much more effective for finding the parameters that for deriving this model than the standard experiment. So, I'm just going to go over the working parts of the model. So, what are these dimensions? We have 35 dimensions. What is this dimension in the common model space? And this is something that really tripped people up when we first showed this. as a, as a, on the side, I should say, the paper is finally impressed at, we first presented this work through Peta Poster at neurosciences, what it a year and a half ago? Two years ago, two years ago, and we've been, we actually submitted the paper at the time that we, that [inaudible] for a neuroscience meeting. So, Jack Gallant came up and looked at it and said oh this is really cool. Does it impress? I said no but it does, you know, we just submitted it. He said, oh so we'll see you in a year or so. And, I thought, what a cynic, you know. What does he know? It's going to get accepted like that. So, two years later, it's finally accepted. But, what are these dimensions? And one thing that tripped people up is, you know, what happens here. So now, each dimension has a distinct response tuning function. I showed you how those response tuning functions are more highly correlated across subjects than the response tuning functions for voxels that are anatomically aligned. And, each dimension's a pattern of activity in VT cortex that has a distinctive typography. We'll go through how that works. So, any voxel in the original data, in the original data here is associated with a series of 35 weights, okay. So, these 35 weights are multiplied by the time points in the transposed model data matrix and give a model of that voxel's time series. So, an original time series is modeled as a weighted sum of the, of the dimension response to functions or time series. And, you can look at that in the movie, you can also look at that in the, in the category perception experiments. So, this just shows the average response to the seven categories and face and object experiment and the six categories in the animal species experiment for the top PCs PC1 and PC2, PC3, PC5. You can see that PC1 is responding and has positive weights for just the human faces and negative weights for everything else. Whereas, PC2 and PC3 have negative weights for the faces and positive weights for the objects. And, PC5 seems to have positive weights for all categories [inaudible] faces. So, there's some meaning contracted at least from the high order of PCs. Now, each column in the parameter matrix is a series of 1,000 weights. And, each of those weights is four voxels. So, you can apply these weights as a typography. So now, this is the typography for PC1. And this is the typography for PC3. And, this just shows the typography for again the same four PCs that I showed you before with the response tuning functions. And, you can see that the PC that seems to be responding only has positive responses only for human faces shows positive weights only in a couple of spots. They're within the FFA. So, these outlines here are the individually defined FFA and individually defined PPA, okay so in face area. So, it's not picking up the entire face area. It's just a piece of the face area. Whereas, the PC5, which is also responding to positively to the faces but not only faces, has larger positive responses in the FFA. But there are also positive responses outside the FFA and the PPA. So, this just shows those two typographies in greater detail so you can see them more easily. So, the FFA and the PPA are not good guides to the structure the typographies that are being picked up by these PCs. These PCs, again, are the PCs that best capture the information of response variability while during a natural viewing task, watching a movie. So, any response, this is the response to time point 200, is modeled as a weighted sum, a weighted sum of these 35 pattern, response pattern basis functions. And, it's a different set of response pattern basis functions for each individual subject because the different individual subjects have different typographies, okay. The response tuning functions are common across subjects, patterns are slightly different. But again, you can see that in subject one and subject two, there's a certain amount of commonality that you can see if you use your human pattern recognition capability. But, they don't align up well. Now, another thing you can do, if you can take that 35 dimensional states, is define that PC. But those are dimensions that are selected by PCA which notoriously doesn't pick up things that can be are that meaningful, right. It's an aggregate that captures the information we want to get. That's five minutes, right, okay. And, but, you can, you can, that's a 35 dimensional state. So, you can define dimension anywhere you want to in a 35 dimensional state. You can do a linear discriminate. So, we did a linear discriminate that contrast the response to faces versus houses in the faces object perception experiment. And what's the typography for that dimension in this 35 dimensional space? And, the math is very simple. You get, linear discriminate has a set of 35 weights for the 35 PCs. And you multiply that by the transformation matrix and you get out a typography for that linear discriminate. And, this is the linear discriminate for faces versus objects. And, you can see now that it really is picking up the FFA pretty often. So, the FFA is preserved in the model space. We haven't lost it. It's right there. And it can be extracted with a fairly straightforward procedure. So, this just shows the typography for that linear instrument versus objects in subject one and subject two. And, you can see it's pretty good correspondence with the outlines of individual differentia of FFA. Now, remember, this linear discriminate is defined in the model space for all subjects. So, we're projecting all subjects' data into individual typographies. And, we can go one step further. We can take one subject's brain space, and then, we can project everybody else's data into that subject's brain space. So now, we've moved people with that person's brain space with other person's data throwing his data out, okay. And then, we can calculate, do the standard calculation of where's the FFA and where's the PPA in this subject's brain typography. But now, we're not using that subject's brain data. We're using other subject's brain data. And so, this shows that result for these same two subjects. Again, here's the individually defined FFA in the yellow outline of the right and left, and the red area is the group defined FFA projected into that subject's brain. And you can see they're remarkable good, even the kind of long, you know, the thing extending in the post area direction has picked a little blob that's separated from the other blob that's picked up. Again, that's from group data aligning with individual, his own individually defined FFA and PPA based on their own data. So, we haven't lost the anatomy. The anatomy is there and it's preserved. So, I think, I'll sum up now. This [inaudible] dimensional model, ventral temporal cortex, we showed that the population responses I ventral temporal cortex population responses as reflected in patterns of activity and FMRI data can be accounted for by response to functions that are common across brains. And we're going to reduce that to a set of 35 or 42 common response functions. And these response tuning functions and the associated typographic basic functions model representations for a very wide range of complex stimuli. So, it's all derived in one dataset but it's valid for the other two. So, it seems to have general validity across a wide range of, across a large part of stimulus space in ventral temporal cortex. These same principles can also be applied to other parts of the brain, of course. And so, this group has done this. so, in addition to ventral temporal cortex with movie time classification you're on 74% Dartmouth subjects, he did the same thing for movie [inaudible] cortex and got between [inaudible] classification movie times seven, that's about 70% in auditory cortexes with a time classification of about 30%. So, we're, so we can model in other spaces well and then we can combine them. So, if we, if we, instead of just taking data from ventral temporal cortex, we take data from early visual cortex, late ventral, late ventral pathway, and auditory cortex and say now can we tell what part of the movie a person's looking at. It's in the same classification procedure with that one [inaudible] correlation classifier. And now, we're at 90%. So, we're classifying one out of a thousand, we're specifying exactly which part of the movie a person's watching. I don't think we can go much higher than 90% for that kind of a very demanding classification test. So, what are we, what does this model do for us? Okay, let's back up and we think of representation for a complex stimuli as residing in representation spaces. And, each stimulus is a vector within this space. And, we start with the stimulus space and we can model that with visual features and semantic features. And we're going to hear a lot more about those kinds of models later, in talks later today. And, this is transduced into a neuronal population state, okay. And this is, this population is comprised of single neurons with simple tuning functions. And then, that we can measure that neural population space with FMRI and get a VT voxel space. So, if we, we can't get a complete picture of this neural population space with any existing measure. But, with ventral temporal cortex, we can get a measure of all ventral temporal cortex at low special resolution. And now, with our model, we show that we can then transform this voxel space into a kind of model space and with a very straightforward transformation, transform from VT voxel space to common space and we can invert that and convert the data from a common model space back into the VT voxel space. And, these common, these dimensions here, which are patterns, we don't really have, we can't say exactly what the single [inaudible] functions are at this point. [ Applause ] >> Error. >> So that was really cool and I like the methods, they seem great. But I have this nagging concern and I'm hoping you can fix it for me. When you do single cell analysis, logistical analysis to try to figure out what the response properties of a neuron are, and you show an ensemble of stimuli, you always have this worry that if your stimulus ensemble has its own structures, correlations let's just say, then the analysis is going to hand back something that has that embedded in it. >> Right. >> And so, you'll attribute things to the neuron that really are not about the neuron they're about the stimuli they used. So, when you use natural stimuli here, I agree with everything you said about the nice rich stimuli really covering a wide range of brain activity and patterns, that's great. But, aren't you worried that Indiana Jones has some correlation between something in the soundtrack and something Indi tends to do and that those are the things that are going to show up in your PCs? And, they have nothing to do with the brain they have to do with that movie. >> Yeah, that's why we, you know, we test whether these parameters then are valid for other stimuli like the faces and objects. >> Well, just, just because they're strong enough to give you face versus object classifications doesn't mean. >> That's just basically dog face versus monkey face. >> I know but that doesn't mean [inaudible]. >> Moon moths versus ladybugs. So, those are fine distinctions. And, another analysis, I'm sorry I'm going to get you talking, but if I could put one thing into here which is we did another analysis. Some people said well there's faces and objects and, you know, and things like that in the movie. So, how about you just, you know, define the parameters based on other responses to the same categories? We went back to the movie and we looked, went through it and we found all the time points where there was a monkey or a dog or a bird or a bug and we removed those time points from the movie and the 30 seconds falling the movie because of delayed human dynamic response. And then, we had a dataset that was based on responses to everything but monkeys and dogs and birds and books, okay, had no effect on the [inaudible] model. >> I believe that and I would expect it. The effects that I'm talking about are a little more subtle, right, they're biases. >> Yeah. >> And they're gentle, but they might matter. And, you might run into, later on, some sort of a test where you end up not succeeding because those biases are, as I said, driven by the movie rather than driven properties of the brain. >> Yeah I know. Our plan now is to try a wider variety of natural stimuli and to see if we get the same kind of model, you know, for something that's not as, you know, that's very different from Indiana Jones. It's a concern we have with, you know, general validity. The whole point of this enterprise was to try to find general validity. And, as we know, you can't completely sample object space, okay. So, we're just doing the best we can. >> There's going to be a problem with any natural stimuli because if you whiten the stimulus so that it's perfectly white, then you end up with white noise and you don't get any responses. So, it's, I mean, you can complain oh yeah gee there's probably spatial temporal correlations in this movie, there's huge space temporal correlations, but there's way less spatial correlations in the stimulate set and, if you just showed, say, set pictures of faces and houses as the only stimuli used. So, I guess, there I don't understand what your complaint is. He's not making a claim by how the brain processes the movie. He's making a claim that you can align brains and that there's common responses to movies. And that seems like to be a fine claim. >> I think that we have to take it online because you and I have been having this discussion for like four years. I don't think we're going to resolve it right now. Tom did you want to say something? >> Yeah, I want to ask a question about the volume of the brain on which you're applying this transformation. So, I could imagine that, instead of doing the region of the brain you did, you might say well I'm going to apply this, train this transformation on just FFA, or I'm hoping. So, why didn't you do those two things or, you know, maybe get to it? But, I'm more interested in your thought experience about what would happen if you did. >> We haven't done the whole brain because we're very concerned about over fitting. And so, if we have, in 2,000 time points we don't really want to have more than 2,000 voxels that we're rotating. But. >> I would guess you'd get less over, if you did the whole brain. It seems like the more constrained. >> We haven't tried that yet. There's no reason not to try it. But, we have done this, restrict the analysis to individually define FFA voxels and individual define PPA voxels. And, we find that we get between six subject classification for faces and objects and, you know, houses versus objects, that's equivalent to an equivalent number of small voxels from all ventral cortex. [Inaudible] works within a small location. And, there's no reason you can't do this for, you know, a small defined region, functionally defined region like the FFA or a smaller anatomical region, we can do, you know, just the, you know, the lateral [inaudible] posterior third. >> Yeah, yeah what I was really thinking was that you have a particular constrained type of transformation that you're allowing these kinds of rotations in image, MRI image space. And so, that's kind of at tight constraint if you think about all the possible more things you might do from one brain to another. And so, what I was kind of guessing your answer would be, and, it's interesting it was sort of completely the opposite, is that if you went down the smaller region should expect more over fitting because that constraint wouldn't be as powerful whereas doing it to the whole brain you would expect maybe you wouldn't get that good of a mapping because it's an over constrained problem. But, it does seem like there's some interesting theme to be figured out about what some regions and what the effects are. [Silence] >> It's a different point from [inaudible] but it seems like there are different transformation. >> Jack, push the mike button [inaudible]. >> [Inaudible] earlier point was the rational one is the pinpoint I was referring to where he come into that you could break the transformation and make it into several sub transformations [inaudible]. The good part. >> Thanks for clarifying that Jack in case anybody was confused. So, it seems like the questions are about claims that I didn't see you really, you really weren't making claims that these 35 dimensions are special or important, they're just condensation of the data you have in hand. And, your real claims are about the ability to align subjects in a way that's much better than what's been done before, at least, that's how I read what you're presenting. So, to that end, could you tell us how much, if you took a subject and said, how much variance remains in individual subject's data that cannot be, that can be explained by the common model, all right? So, you have a bunch of voxels going off on the subject and you say really this person is actually just a specific re-projection of this common model. And, the rest of this stuff is either noise in your methods or something real in that subject that you can't yet explain. Did you try to do anything of that nature so what faction of explainable menus have you? >> Yeah we've done the plot of variance explained and also the plot between subject correlation as a function number of PC, PC by PC. And, we found that we explain about 70% of the variance with these 35 dimensions. And, we get up to about 75% I think when we go to 50 or 60 dimensions. Now, we haven't tried classification based only on the PCs we excluded. But, imagine it would be pretty close to chance. That's my intuition. What do you think [inaudible]? >> [Inaudible]. >> Mike, mike. >> I don't have a lot of information so I don't know how it could convert. Can I answer Tom's question? >> Sure. [ Inaudible ] >> Any other questions? Alice. >> Alice. >> Yeah so. >> [Inaudible]. >> Mike. >> Yeah, across these rotation, I mean, did you have problems adjusting the number of dimensions you rotated and were their parameters there in choosing the fit? I mean, how did you? I have the intuition it would actually be better to fit a little bit less than more, than to over fit there. >> We found that the results, we chose, we did the analysis on a variety of different voxel set sizes from 100 voxels up to 2,600 voxels per subject and found that the results were really stable from about 6 or 800 voxels up to about 2,000 voxels. >> [Inaudible] usually gives you an option of rotating to fit in a variety of number of dimensions of the space. So, I don't know maybe. [ Inaudible ] The biggest. [ Inaudible ] >> We let all voxels in our thousand voxel space rotate. And we reduced the number of parameters. Before we applied the [inaudible] transformation, we took the data in each voxel and standardized it. So yeah, every voxel had a zero mean and unit variance for the responses to the movie. So, we didn't have to, we didn't have to allow any translations or scaling. It was just the rotations and the reflections, just that one [inaudible]. >> Come to the Mike Patrick. >> Patrick to the Mike. >> So, you are suggesting that there's a common neural architecture. >> Right. >> And, but it differs, in some ways from person to person. So, I'm interested in whether you speculate on this as like preordained bits of stuff that get packed at different locations kind of like when I put the groceries in the car milk might go there or there. But milk never goes on top of the bread. Or, is it just a result of self-organization maybe constrained by where inputs and outputs have to be? So, do you have a feeling about that or how you might test it or whether you could ever solve that question? >> Yeah, so, we think that that's a very deep and complicated question. And, the I thought I showed you that the high ordered PCs have a certain commonality and it's hard to math, align them mathematically. But I can see there's some similarity. Now, we have this other algorithm that we've developed called functional alignment and we also have a third quick called functional activity alignment where we try to align subject's data preserving the top logical relationships among voxels. So, we could do a rubber sheet warping to see if there's a typography that's just been distorted from one subject to another. And, we can improve between subject classification with that. But, the improvement is only about a third of what we get with hyper alignment. So there's seems to be something that we cannot that at least so far we cannot capture with a consistent typology across subjects. Now, there's a, we know that there's a course scale and fine scale typography. Okay so the course scale typography's like case response to various and place response to various. And then, there's a fine scale typography that's poorly characterized. And, when we tried to look at that, here we go, as we analyzed the effect of data smoothing, now, all the data, we always smooth our data with a four millimeter filter. And, offline, I can talk about why we do that. And we consistently find that four millimeter smoothing helps NVP classification analysis. But then, if we go beyond four, it drops off. And, it drops off somewhat for faces and objects and animal species but it really drops off for the time segment classification. So that suggests that there's information at a spatial frequency of around four, you know, four millimeters that's lost as soon as you start blurring it, okay. So, there seems to be for some fine structure. I'm not arguing for hyperactivity in these patterns, this could be a typography that exists with a form, a spatial frequency for along four millimeters rather that that kind of sub voxel spatial frequencies that people are still arguing about and Jeremy Freeman has gone a long way to destroy in the orientation math. >> Tom. >> You showed us the loadings of the PCs on the brain, so, for example PC1 here are the regions that it weights positively and negatively. What if you go the other direction and for each PC just show us the images in the movie most excite that PC? Did you try that and if you did? >> We haven't done that yet. But, we're, you know, you know, that's, that's like you know, the forest of no return, right, when you start to go into trying to analyze the content of a natural movie. >> [Inaudible] I'm trying to understand the content of PC1. >> Right, and so, you know, actually you did a little bit of that didn't you [Inaudible]? [ Inaudible ] >> I have an odd question. Could you hyper align across species? So, pick a [inaudible] for example and, you know, first that'd be interesting. And, if you could do it, would that tell you anything about how that species represents that information differently? >> Yeah, well, I think, you know, we're going to have a talk later today by [Inaudible] where he shows that you see a similar similarity structure among the responses to stimuli in a cat brain as to the population measure [inaudible] recording as compared to patterns in the human brain measured with FMRI. And, so I assume hyper alignment would work and might help to even improve showing the cross species similarity of representational stages. Now, I've been trying to think about how can we get a dataset in monkeys watching Indiana Jones? You know, so get, we're getting a large number of neurons in a single recording while they're watching natural stimulus and have a set of monkeys doing that and also doing the FMRI thing with that. But, it's a daunting challenge and I'm not a money or a physiologist so I don't think I'm going to go into that. But, I think that the same methods should be applicable to population responses measured with other more realities like [inaudible]. And, we should be able to look for transformations from one species to another. >> Okay, one last question if there is one. Alice. >> Just a quick one and maybe it's more for thought. You, did you think about doing correlations among the images that are being viewed when the brain patterns are similar to see the extent to which a lot of this is described based on checking visual feature categories. >> Visual feature categories. >> Just out of curiosity. I mean, the question is are you looking at something that is a really a visual representation, high level visual representation that's reliably hooked to a similarity? Just, stimulus images that are being seen. >> Well, we've analyzed the stimuli in the category perception experiments using a V1 model. And, the way we, this is in Andy's poster. He then looks at the similarity structure among the stimuli based on the vector in V1 space. He also looks at the similarity based on behavioral ratings. You know, is a mallard more like a warbler or a monkey? And he finds that the V1 similarity structure is seen in the neural similarity structure as in NVP analysis in early visual cortex and not in ventral temporal cortexes. Ventral temporal cortex is a very strong correlation, actually .9 correlation, is that right, between the behavioral rating similar structure and the similar structure of the neural responses. So, it's not related to the V1 structure. It's strongly related to behavioral weightings. I think [Inaudible] has tried to run the V1 analysis on the movie, and we just have, it hasn't worked. And, you know, I don't think the V1 model works very well for the dynamic stimulus to be perfectly honest. >> Yeah, actually, you could almost anticipate that as they're looking at the movie if the brain map's changing so quickly, when the movie general, probably, similarity parameters in the movie stayed pretty constant but the activation maps were changing very dramatically and very quickly. But I was curious. [ Applause ] [ Silence ]