Gesture And Tactile Interfaces - Applications in mobile computing and american sign language

>> It's great pleasure, a really great pleasure to introduce Thad Starner for Georgia Tech visiting us here in the wonderful Bay Area. [INDISTINCT] while there's sunlight today, it's really bright and nice and warm. Thad has got his PhD from MIT, and then moved to Georgia Tech and is possibly the person that defined the use of variable computers in a productive life. There's a number of people who started running around with beta displays, but his work is much deeper, and much more interesting than most of the other people. He's been working on various interfaces since, input devices, variable display devices, and so on. And it's a real pleasure to having him talk about his most recent research. So, with that--he's also a friend of Sergey Brin, I take it. But Sergey's... >> STARNER: We knew each other from--we ran into each other at conferences a while back. >> But he's out of town today. Unfortunately, he can't make it here. Well, looking forward to your presentation. >> STARNER: Thank you. Okay, let me first apologize because I literally got off the plane recently this morning and drove down here. The weather in Atlanta, we're having high storms, we had a--it looked like a tornado coming in but it didn't show up. But it meant that all the planes got messed up, and I spent the night in the airport. So if I seem a little out of it, that's why. We're also going to be switching back and forth between different devices here. So bear with me and the video folks as I go between--between systems. Okay. So, the last time I was here, I talked about how to improve mini QWERTY keyboards, how to do a lot of stuff in mobile computers, a lot of mobile [INDISTINCT] stuff. Today, if you saw that talk, you won't be bored because today is almost completely different. Let me start out with something that shouldn't work. This is something called the Mobile Music Touch. Let me tell you what it does. With this device, you can actually learn piano melodies without paying attention to it. In other words, you'll be wearing this glove, right here, and be learning how to play, I don't know, Star Spangled Banner. Now, how this works is that you have a mobile phone. In this case, it's the [INDISTINCT] you see on the screen there. And this, you upload your songs to, and the MIDI player in the phone plays the songs in--sorry, that's actually--coming online. It plays the songs over and over again in your Bluetooth headset or your earphones, whatever you have on. But for each note, it actually taps the finger responsible for that note using this glove. Now, this is a Bluetooth glove, the--there's vibrators in each finger, which you can see there, and they're on the knuckles, they're tuned to 160 hertz, which is about the frequency your Pacinian corpuscles are most sensitive to. The fingers--the whole finger vibrates and so you'll get an idea of which finger goes with which note. Now, I'm going to pass this around as I talk about this so people can actually play with it. So, there is a toggle switch in the back, toggle it from off to on and you'll feel it startup. I think, it's right now, doing the sequence of "Dashing through the Snow." So, feel free to--feel free to play with this and then, I'll tell you why this particular glove is so interesting in just a little bit. >> [INDISTINCT] >> STARNER: No, you don't want me to sing. This is not karaoke night. The--inside the box there, it was made--many different version of this glove. But it's pretty simple, it's just a Bluetooth receiver, you can see in the center there, attached to a glove. [INDISTINCT] a lot about sewing wires into gloves. Sebastian, you got it working? >> SEBASTIAN: Yes. >> STARNER: Yes, cool. >> SEBASTIAN: It sounds great. >> STARNER: It sounds great. Well, you really have to have the music with it right. But, you might say--you might wonder why this works, right? And in particular, we'll talk about the hands moving left and right on the piano a little bit. Let me describe to you the simple study we did first, which is we did two newly composed 10-note passages. Now we did newly composed because the first time we did, we used Amazing Grace, and "The Dashing through the Snow," part of Jingle Bells. And some of our subjects were from Muslim countries and had heard neither of them. And some of our subjects, of course, they were very, very familiar. None of our subjects knew how to play the piano or had a musical background. But we want to have something that we [INDISTINCT] as clean as a [INDISTINCT] as we could get, so we actually showed--we gathered 16 subjects, none of them have musical experience. We showed them the passage once on a keyboard where the keys light up. And then, they had to try to repeat it. And that was the base case. Then for the next 30 minutes, they did a reading comprehension exam. That reading comprehension exam was what you'd find on a normal [INDISTINCT]. As a matter of fact, I think I have it here. I don't know if you can see that. But it's something where they have to read the paragraph and answer questions. And we'll come back to that in a second. Then after 30 minutes of doing this, they have the glove. So, they're playing the passage in their earpiece, in their headphones, as well as tapping their fingers. That's the experimental condition. The control condition was just playing the audio in their headphones over and over again. After 30 minutes, each subject tries to play the song again. And this is within the subject study, a 2x2 design, and this was presented at KAI this year, so if people are interested in the details, you can look at it there. Again, here is the distractor task. We actually tested people on the distractor task. Their scores did not improve, or did--or did not change in the experimental condition versus the control condition. And this is the total number of errors after 30 minutes. Now, the green bars or kind of--kind of fluorescent green on the--on this screen here, show the number of errors made by people in the experimental condition. The red shows the number of errors they made when they just had the audio playing. And as you can see here, they don't learn anything with just the audio playing. But most of them--half of them played the sequence correctly with no mistakes after the 30 minutes of passive practice. Now, this is really kind of bizarre. I mean, how many people would have thought that that would've worked? Right? I certainly didn't. So, this is the type of thing that as a cognitive science major, I go, "How is this working?" So, we've done the study again and again and again in two different continents with three different researchers. And it seems to hold true. And it seems to work no matter if the distractor task is a reading comprehension task., if you're reading your email, if you're watching a movie, if you're doing a scavenger hunt, if you're playing a memory game, or even, at KAI, I gave a talk, and had the system teach me Beethoven's Ode to Joy as I was giving the talk. Which, let me tell you, talk about performance pressure, I just said about how this thing works and I had to walk up to the keyboard and try it. I didn't even know where to put my hand down at first. But indeed, I can now play Beethoven's Ode to Joy; I used the first two passages--the first two phrases pretty flawless--flawlessly. >> What was the talk? >> STARNER: I don't know. I didn't watch the video of myself giving the talk. I will tell you that having the audio at--the thing's volume was too loud. So, it was really distracting. But you can imagine if you're doing things like email or something where it's more quiet, it might not be so distracting. One of the things we want to do right now is, we're really curious to see if the audio is necessary, it maybe just the tapping of the fingers is necessary to give you this sort of muscle memory. Now, some of you who actually play the piano, might say, "Hey, how about the left and right movements of the hands?" Well, actually, this technique is better for things like clarinet or saxophones or flute, something where you're not moving the hands around a lot. But what we found is that when we did, you know, real piano pieces, and this is still just one-handed, where you're moving around, you have somebody work on a song until they can play through it once. And then, you would turn on the glove and let them spend the next--the rest of the day feeling it on their hand, and they actually continued to learn instead of forget, so you don't--so it's sort of passive haptic rehearsal at that point. And one of our reviewer in one of our papers actually said they really liked it because for musicians with repetitive stress injuries they could actually practice without practicing, which is kind of cool. The other thing we're kind of interested is, in this--will this work for other manual learning tasks? Things like typing or sign language or prosthetics or complicated manual controls. We don't know yet. This whole idea of passive haptic learning, as we call it, is new. And we're very excited about it, but we don't know how far it's going to go. One of the things we do have data on, though, is passive haptic rehabilitation. We worked with the Shepherd Spinal Cord Center in Atlanta; it's one of the nation's premier centers for dealing with traumatic spinal cord injury, and in particular working with the Murderball Team. Who--you can see a picture of them here. And what's--we have a pilot study where we showed that wearing this glove and actually having this vibration seems to improve these folks' ability to grasp objects and manipulate them, able to--ability to feel objects on their fingers. And most importantly, the ability to do things for themselves like buttoning their own shirt. And we only did--ran this with two subjects so far, we're gearing up for a full-scale study. But there's stuff in the literature that seems to indicate that having this passive tapping on your fingers actually activates the motor region as well as the [INDISTINCT] sensory region, and this passive practice may actually help neurons reconfigure and re-hookup. And I can give people references for that if they're interested. But we're trying to get this actually hooked up this summer and fall with a more large-scale study and show--and see if there really is an effect here. That's one of the reasons why we're interested in whether or not the audio is necessary. If the audio is not necessary, it could be very, very useful. Okay, any questions on that before I move on? I'll try to make this interactive. Okay. I have a lot of stuff here, way too much stuff, so, you know, we're not going to get through it all, ask away. So, as some of you know, I've been wearing computers for 17 years now. I have a heads-up display on, I use a keyboard called Twiddler, and in fact right now I'm looking at my notes for this talk in my eyepiece. And we've learned a lot about mobile devices since then. One of the biggest things is access time--is a killer. Access time is the amount of time it takes you to physically get the phone out of your pocket, get it on, get to the right place in the interface--I'm actually doing this right now. So I'm going to my calendar. And you can see that that took me, you know--and I'm practiced at this--that took me about 15 seconds. On average, it's about 20 seconds to get to an application on your phone. It... >> [INDISTINCT] >> STARNER: Yes, [INDISTINCT], yes. Yes, Windows--Windows is going to be a serious problem on a phone. What we found out is that anytime you have--and it takes more than two seconds to get access to your interface. Your use of it tends to go off exponentially. So, if you can make it a quick interaction, people would do it all the time. If it takes more than two seconds, the usage of it goes down linearly or exponentially depending on the type of interface. Now, the other thing we've discovered is that people don't really multitask, they multiplex for most things. I just talked to you about a real, as far as I know, a real multitasking application, but most of the time when people are driving and texting, they're either driving or texting, they are just switching back and forth fast or not as the case may be. By the way, we're not doing some [INDISTINCT] on driving--texting while driving. It's really, really bad. There is one obstacle on a course. The course is 28 feet away from it. It's a telephone pole. And this--the driver still scare us to death driving this car. We're actually using the autonomous or using the Georgia Tech's autonomous car for this. >> [INDISTINCT] >> STARNER: Little you know it's because of my subjects. No. >> [INDISTINCT] >> STARNER: Yes. But one of the things we've discovered is that when people are actually using an interface like, say, walking down the street, if you're walking down the street and, you know, out late nights in Palo Alto, you're going to get some ice cream or something, you'll spend on average about four seconds on your phone interface before looking up to see where you're going and looking back down at your display. And that, kind of, leads to the 4-second rule. If you can actually make your interface happen in four seconds, it will be much more useful to people. In other words, if you can get a little bit of useful work done before you have to look back up again, if you can checkpoint, it's actually much more useful than if you can't. And that's how we're--and that's why we're making this distinction on microinteractions. Microinteractions are fast to access and allow fine checkpointing. Now, if you're on something like a bus or a subway, you might actually go spend more time on your interface. But I'm talking about--the four seconds seems to be a nice rule for making something that's universally applicable. Now, I might say--I might say--let me give you an example. Checking your time on your watch, if you actually wear a wristwatch, is a relatively fast interaction. It takes less than two seconds to do the entire interaction. It's very valuable. It's fast to access and it gives you a feedback. Now, you notice that a lot people now are putting their time on--on their cellphone, right? Oops, there we go. Not quite so fast to access but the cellphone is a useful enough device, if you were willing to take that hit. Wristwatches actually came into existence during, or into popular use, in World War I, when you had to time your trench warfare, right? When people had to go over the trench line all at the same time. And you can't be [INDISTINCT] that fiddling with your pocket watch when you're about ready to go, you know, run against the Germans. So, that was one thing that made all the GIs, back then, wear wristwatches. But also aviators; you can't be flying your plane and be fooling around with your pocket watch. You need something you can look at quickly and get back to what you're doing. And back then, World War I, you actually had to fly by your clock. Now--so pocket watches went the way of the dodo. Now, they've come back, right? They're cellphones. But I think what we're going to see is a lot more use of very fast access interfaces. Now, one of the things we're that--for that is something called textile interfaces. Now, we're trying to create interfaces that can be woven into your clothing. And I really mean woven or knit--embroidered, in this case, into your clothing. And we're using embroidery because it's a raised thread. You can actually feel it. So, if I was going to control my iPod with something that was on my sleeve, I can feel the controls here, I can grope for them. We'll call it grope--good gropability. And actually interact with it without looking. So, there's no visual distraction. Now, there's been a lot of work that was done by this--by some friends of mine, Maggie Orth and Remi Post back in the mid-'90s, but what we've decided to do is start taking a look at this from a more complete interaction as far as trying to actually reproduce the GUI toolkit from scratch on--using these devices now. We actually have demonstrations of different circuits the people can use in the fashion industry. Now, this is a book, and I have a live version--this machine is hooked up to do--I'll show you this afterwards, to show these different types of interfaces. This is what's called a knife-edge pleat. It's got three lines in it. One on each side of the pleat and one in the base, depending on which way the person strokes the pleat. It moves the slider one way or the other. So you can imagine that, if you have this embroidered on your pants leg, for example, you could use this--control a webpage in your heads-up display or, you know, just slide up or down, or, you can imagine, controlling volume of your MP3 player. Here is a menu widget rendered in embroidery. So, you can see we have three menus--three categories like, you know, file, edit, select, like you might have on a Mac, and then you have five options. And so, I'm actually controlling the graphics here on the right-hand side based on which line I touch. Now again, imagine it's not on the book but on a piece of clothing, like on your armband or on your arm, so that you can actually, you know, select different menus on your iPhone or your--on your GPhone, or wherever else you want to think about. This [INDISTINCT] called the rocker switch. This is a multi-touch system, not just like the last one though. So--do you remember the old types of rocker switches where you can rotate--you could pivot to that point and one that goes--turns a volume down, one turns the volume up? Well, this has three different configure points. You have three different sliders you can access and then you just--once you select one, you just pivot about it, hit the two bigger circles and that adjusts the level on each slider. Now, for those of you who are electrical engineering types in the crowd, I can give you a quick lesson of how the circuitry is done. It's--this is not the normal capacitive sensor you might know--think it is, because the problem with fabric is that as it crinkles and wrinkles, it gets out calibration real quickly. This is actually recalibrating itself, you know, every time it senses, which is really kind of cool. Here's a zipper. This is--this has been done before. We're doing it, I think, a slightly different way. It can sense its position... >> [INDISTINCT] >> STARNER: Excuse me? >> [INDISTINCT] >> STARNER: Nothing. It's all conductor embroi--it's silver thread, so it washes just fine. The [INDISTINCT] lines are conductive thread. The only thing you got to do is take out the circuitry where it combines in. Yes? >> The zipper [INDISTINCT] >> STARNER: It will--it will sense falsely in that case. Yes. Yes. [INDISTINCT] to do it, where you do--basically a [INDISTINCT] bridge and you can do a little bit better than what we're doing here. This is a proximity sensor. This is--one of the first things we did, the Brothers embroidery machine we have, one of its default settings is to embroider Hello Kitty, so we have the Hello Kitty proximity sensor here. And you can see depending on how close you get, it has different sensitivity light ranges. It's--it's the brightness of the rectangle indicates how close you are to the system. This is a really complicated one. With this, by stroking the--by hitting the top pad in one of the three but--middle buttons, you select one of the three sliders on the top. By hitting the bottom pad in one of the middle three slider--three buttons--you hit the three sliders on the bottom. And then you can increase and decrease it by doing gestures on top of it. Now, unfortunately, this one is not tuned very well with the video, but you get the idea. Okay. So, if we can switch back to the--having both screens be the presentation, I'd appreciate it. In that way, I can cheat by keeping on looking at my notes. Okay. So, this conductive embroidery really got us thinking about all sorts of things we could do with conductive embroidery. Let me do this. So it's not quite so distracting. We not only have to do input, we need to do output as well. Remember, what we're trying to do is make something here where you can interact with an object. You can get access to the interface in two seconds or less and you do the whole interaction in four seconds or less. So, we got some output--sorry, some input. How about some output? Well, these are conductive threads and they have a high impedance, relatively speaking, compared to a normal wire, but at high voltage, it doesn't matter. The human body senses a voltage current tuned to the exact right level as vibration. So, what we started looking at is can we make a wristwatch watchband that shocks you in different patterns. It feels like vibration to you, but we're trying to figure out how many different patterns we can indicate. So, you can imagine that you have an SMS or a call coming in, you can have a different not ringtones but shock-tones, vibrations, you know, good sensations, I don't know, coming into this wristband. And I have a copy of that up here somewhere, I can show--show you all. And so, we--this is [INDISTINCT] by Seungyon Lee who just got her--just defended her PhD. It turned out that this is much higher resolution than the human wrist can feel. Believe it or not, if you take two points and put them close together in your wrist, you really can't determine its two points. Most of the time, you just think it's one. And believe it or not, you have to get out to like a centimeter before you start distinguishing that there are two points. On your fingertips it's like two millimeters. But on your wrist, it's a centimeter, sometimes more. So really ridiculous. >> [INDISTINCT] >> STARNER: Excuse me? >> [INDISTINCT] >> STARNER: Well, if you--so, if you start using time delay, you can do a completely different pattern. We're looking at spatial stuff here but you can do all sorts of stuff at a time. And, as a matter of fact that's what the next slide is going to be about. You're predicting me. But what's also very interesting is that, while you can actually sense--well, you can actually tune the system to do decent shock levels on the fingertip, on your wrists; your wrist is often dry or wet. And so, the amount of current you need is very different from minute to minute. So, my poor grad student ended up having a little tattoo. No. It was a very fine threshold between pain and the vibration sensation we wanted. So, we actually had to go away from this to a vibration pattern, however, now we are going back to it. I said that we could sense capacitance and resistance using these threads. So, here's an idea, let's [INDISTINCT] the water content of your skin and then dial up or down the current depending on how much you need to get the right vibration feel. And so we have a circuit in our lab right now that does that, it's very crude but it's getting there, and so we're going to revisit this very soon. Other people have done this sort of thing on the forehead or on the tongue, turns out the tongue is a very good place for it because it's always wet, it takes very little current to get a good sensation there, and your tongue has got a high density of receptors. Your wrist is relatively insensate, but it's a really good place if you're thinking about wristwatches. So we wanted to keep on going down this wristwatch form factor. And we decided to make a display that was just three vibrators. Now, these vibrators are made so that two of them hit your wrist's bones at the top--just where--just where your arm bone hits your wrist, there's two bones there. We're generally doing this on the bottom side of your wrist and there's one in the middle but back, going up your arm a little bit. And we can actually do 24 different patterns here. The patterns differed depending on which vibrator starts the pattern, one, two, or three. And you can see that in the red, green and blue columns. It also is the--we have different intensities of the patterns, low and high. We have what's called "pulse intensities" so that if the vibrator is going "Zzzt zzzt zzzt," versus "Zzzzt zzzzt," and--see here, we didn't have frequency--oh, I think we have different frequencies as well. So, 24 patterns total and we're trying to see how well can people actually sense these 24 different patterns on the wrist. And the answer is not bad, except for intensity. It turns out intensity is a very, very poor thing for actually getting it--transferring information from--information of your wristwatch. So again our idea here is to transfer messages alerts like, who's sending you an SMS, which sort of phone calls coming in to this sort of wristwatch. And we'll talk more about wristwatch--if people are interested, I can talk more about wristwatches, wristwatch interfaces afterwards. What's interesting here is that intensity is a really horrible feature to use, we got rid of it. Direction was pretty good, temporal pattern was pretty good, starting point was very good. So, if you're going to actually make a wristwatch with vibrators in it, you know, here's a good starting point. The next thing is, can we actually use these vibrators while you're doing other things? Now remember what I said about people don't actually multitask, they are more of a multiplex. So what we did is we compared, using one of these wearable tactile displays, to a normal phone. So normally if somebody is SMSing you, you reach in your pocket and you pull out a device and you look at the--and see who's calling or what the SMS is and put it back in your pocket. So we made a system where people had to pull out their phone and hit one of these three buttons on this keypad to complete the trial. With the vibrator system they had to do one of three different patterns. Now, we're trying to do something that sort of mimics the high visual intensity, the high visual distraction of driving and for that we have this. So, they have five seconds to determine whether or not the number 51 is in this image. Now I know all of you being nerds, you're not going to listen to me in the next 20 seconds as you're trying to figure out whether 51 is in there. It's not. All right, give it up. But the point is, you can't help but paying attention to it, right. So, we're doing this on Georgia Tech students, it's a very good distractor task. So it worked very well to kind of emulate high visual distraction while getting these different alerts. The BuzzWear system is actually doing these three different patterns. They are the most distinctive patterns we had. And we're looking at information transfer. Now notice, ignore the outlier on the right-hand side for a second, we have different difficulty primary task, one where--is where there's only 10 numbers on the screen, you've got to find 51--51 is in those 10 numbers. One is the one there is 30 numbers, one when there is 50 numbers. And so that's the easy, moderate, and difficult. Notice that the bits we can transfer per second or per minute in this case is actually higher with the tactile display than it is with the phone. Interestingly also the tactile display does not interfere with your primary task which is great. But let's look at the left-hand side here. The phone is having a much higher bit transfer rate when you're just paying attention to the phone than when you're just paying attention to the wearable tactile display. Why is that? Well, it's something called the "Yerkes-Dodson Law." people get bored when you only give them--give them one task at a time, and there mind wanders and because the tact--we think, because the tactile display is so easy, they are off doing something else in their own minds; they don't pay attention to the study anymore. And so that's why we think we have this discrepancy on the left-hand side. With the phone it's still a physically active enough thing that people are forced to pay attention to it. But what we're most interested in is this multiplexing scenario, like when you're driving and you're getting an SMS at the same time. Now, so we've talked a little about how we can actually do input using textile interfaces, how we can actually do output using vibration and electro-stimulation, but can we do something more complex? One of the things that we specialize in is gesture recognition, and you can imagine that if you eventually have an mp3 player, that's basically, you know, it looks like a hearing aid, you know, it can fit in your ear. The only problem is you don't have any buttons, you know. It's standard--to be walking down the street doing this, right, is kind of socially inappropriate. So can we actually make a device where you can control an mp3 player in your ear, when it's not big enough for buttons? Well, again, we're looking at the wristwatch, in particular we're looking at accelerometers in the wristwatch and we're trying to figure out, can we make gestures that are distinct in real life to control things. Now, making gestures for controlling applications is difficult. For example, supposed I make a gesture like this for delete email, well then I'm in the middle of a conversation and I make the same gesture and I accidentally delete all my email. That's not going to fly. As a matter of fact, you guys are all familiar with this particular problem. That's not purely gesture recognition but when you have your phone in your pocket, you know, how many of you have somebody call you back and say, "Hey, your phone called me, what did you want? I couldn't hear anything." Right? I occasionally get voice messages from other people where it's just the background noise, their butt called me. No drunk dialing, no sitting and dialing at the same time, you know. So, there's other places where you get these sorts of problems and people go through a lot of pain to avoid this. For example in speech recognition they also have a push-to-talk interface, even if they don't do that, they do something like computer open file, something to tell the computer to listen in. On the Nintendo Wii, when you're playing bowling, which is a relatively complex gesture, right, it's doing relatively fine sensing. It is--excuse me--it is requesting that you actually push a button and hold it down to do the gesture and then release, that's how it's detecting when the action's happening. On the iPhone, right, there's--you push something down and you put--do a slide across the interface to activate the phone. And most phones including--I have a Backflip on me here, it has the same sort of thing. It has a push button and then--oops--and then you have to hit another button for it to actually work. And I'm going to need to pull this out anyways in just a second so I might as well get it out. So what we're trying to do is make a system where you don't need these push-to-activate. It'll be much cooler if I actually had a system where I just made the gesture and did the action. If I actually had to have a button on my wristwatch to activate the wristwatch and then do the gesture, it kind of misses the point. Why would I do that anyways? I just should have a button, right? Anything that requires too much attention to push a button is probably the wrong thing. Now, correspondingly you can imagine I have accelerometer in my mp3 player here and my gesture for change track is that, but... >> [INDISTINCT] >> STARNER: That. But then you start looking at, like, Night at the Roxbury as you change tracks. It's a distinctive gesture by the way, you can do it I just don't necessarily recommend it. But what we want to do now is actually make a device such as a toolkit, so that people can research these gestures easily. And what normally happens is people do some survey. Like suppose you're trying to make a gesture system for the iPod. People would say, "So what gesture do you need for play?" Could someone give me a gesture for play? What gesture do you want? If you have a wristwatch on, what gesture do you want for play? This? Okay. What else? This? This? Okay, anything else? Okay, notice I didn't get any similar ones yet, everybody has their own gesture. So usually people go off and do a lot of surveys and try to figure out what's--yeah, there we go--figure out what sort of gesture people want and then they try to make a gesture recognition system for it and then they have that system in an actual device and they find out it doesn't work at all. Right? Because it's false triggering all over the place. So that's where MAGIC comes in, the Multiple Action Gesture Interface Creation tool. So we're using an accelerometer on the wrist again, just to start it out with. For those of you who do machine learning and pattern recognition in the crowd, you can think of this as simple Dynamic Time Warping just because it's the easiest to explain. For those of you who aren't machine learning or pattern recognition people, basically, if you have one gesture that's sort of the one that you want to recognize and you have templates of other gestures that are the one that indicates, you know, the play function. You compare the red to the green by drawing lines to the closest thing, closest points on each. And then the difference between that--now slanting those lines is the error. Now, for those of you who are pattern rec people who are actually doing this iSAX, it allows us to search very large databases in split seconds, and so we can actually then make a user interface that just flies. Like I said, the design process in the past is basically, people try to create a gesture system, then they try to test them in the real world, they find most of those gestures conflict with real world gesture that--and they go back to the drawing board. What MAGIC allows you to do is do them both at the same time. Test your gestures against each other and against the real world. Now how does that work? What we do is we collect something called the Everyday Gesture Library. Then we put the sensor you want to use for your iPod on your wrist and give it to somebody for--to wear for a month. And we try to get, you know, representative actions and representative people, so we might get an academic, a librarian, a construction worker, you know, a pet-sitter, you know, just trying to expand the space of people who might use this device and we gather lots of data from their everyday life. We also, if they'll put up with it, get video from this cap, this fashionable cap, with a fisheye lens on it. And I notice that fisheye lens is extreme enough that you really have to get within kissing distance to somebody to actually recognize who they are in the video image, so it's actually privacy preserving. And so when--we sort of have a whole huge library of people's everyday gestures and video of what they were doing when that motion occurred. So then if you have a candidate gesture you want to try, say, you know, this or this or this or whatever everybody was telling me. I guess you'd try that against everybody's months of data and see which ones work and which ones don't. And this is the interface for it. So I am--[INDISTINCT] this is a cursor here. Yes, here we go. So let's first look at this. This, for the pattern recognition people, is--this is--each of the classes, we have four different gestures we're looking at here. Of each of the four gestures, we're looking at the inter--intraclass of variants versus the closeness of all other classes and their variants as related to that gesture. So this is both intra and interclass variants. Over here--hey, don't do that. Over here [INDISTINCT]. That is our month of data and you probably can't see it back there but there's a little yellow--pink or yellow lines for each gesture as it happened in the month long data. And so then they can click on one of those, as we see here and it shows you that particular example of one that happened in a person's everyday--in people's everyday lives and what they were doing at that time, used in the video. It also gives you some idea about these different examples of gestures. We're doing a K-nearest neighbor's approach here and some other details that if you are a panoramic person you can tune. Now we had a lot of fun with this. We actually had people try to make 8 control gestures for the new Upod Touchless by Parrot Computer. And people who had the EGL would generally have about two false positive per gesture per hour. Those people without the EGL had 50 false positives. Now this--it didn't matter if they claimed to know pattern recognition of not. They all sucked. They were all very bad at this task. So the EGL really--having this database really had a big impact on the system. Now the other thing that was kind of cool about this is that our subjects really did discover ways to improve their performance by doing particular techniques to get better gesture recognition. And I will switch to a file to show you these. So this is somebody doing an iconic gesture. In other words, they'll repeat each of these two times so that you can see it. The first one was iconic and stop, the second one was really interesting is impacts. In other words when you hit your hand against the other hand, that looks very distinct in the accelerometer space compared to your everyday actions. This guy is prefixing every gesture he has with another gesture. So his is the way--he's basically saying, you know, "Listen to me computer." And then he does this. Let's, see what's this one? This one was just repeating the same gesture twice. So you get some idea of the types of gestures you need to get uniqueness. Now the problem is a lot of these things are not socially appropriate. Right? The--you know, the guy, who's doing "Computer, listen to me. Okay, I really meant it now," this and this. That--if you saw me doing that walking down the street, you'd probably think I'm an idiot. Either that or some mage who's doing incantations. But if you saw me doing something like, you know, this, where I'm just flicking my fingers. I can do that on my side. I can do it straight up. That is something that you could just, you know, it's a subtle gesture, you might not even notice me doing it. And it's very distinct in the database, so we're actually discovering the gestures you can make that are very subtle for controlling your mobile electronics. Now, the last video here is just for fun. This is somebody's everyday gesture library. I think this was going on a hike somewhere. After a while, you forget you have a camera on, so, you know, I'm not going to try to show the embarrassing EGLs, but you get the idea. And so, again, this is a video you'd get if you found a conflict. Now, for those of you with Android phones, how many of you got a phone on you? Can you accept--Android phones, can you accept un--if you had the Backflip you can't have unsecured apps, but I'm going to show you an app right now. What we have is an application you can run on your Android phone. It uses accelerometers in the phone. You can actually--we have a database of somebody walking around with a phone in their daily life. And now, you can actually make different gestures. You can add them to the database, it's there. You can see how unique they are compared to the everyday gesture library of an Android phone. And so you can start thinking about actually having different gestures for your Android phone to tie to different activities. As a matter of fact, you can even download the source code for a recognizer that'll recognize the gestures you trained up. So if people who have the application, you can come up afterwards and I'll show that to you. Okay. One of the kind of interesting things about all this is that the people who were in our usual study, kind of feared the EGL, they thought it was very hard. That was understandable and they don't really care about the video that was there. They just cared about they had a conflict, but they've actually found the system very useful in doing the task. Okay. Now I'm going to switch to something a little bit different here. This is our prime example of doing gesture recognition technology. This is CopyCat. Let me give you some background on CopyCat. Ninety-five percent of deaf children are born to hearing parents. Many of those parents, when their child is born, of course, do not know sign language. And since sign language--American Sign Language is difficult to learn as Japanese. Many of them will never learn it sufficiently to really communicate with their children. Well that might seem odd but when you are working two jobs. You got three children. One of them is deaf and--there's a lot of people who, in the literature, who say that, you know, if you teach somebody two languages, they won't learn--if you teach them sign language, they won't learn English. It turns out exactly the opposite way. You should teach them sign language first; they have a much better chance of learning English. But what we discovered in the literature, in the research, is that these children actually, unless they learned some language in the age between zero and three, they would not form their short-term memory normally. Let me make that clear. So each of us can actually remember about seven things in our head. If I give you a telephone number like 244-5156, you will be able to repeat that phone number back to me. The children I worked with often have a short-term memory of two items. And that happens because they do not learn a language. When you learn a language, that's when your brain is forced to form their short-term memory. And so children need access to some language, any language, in order to actually--to form the short-term memory. So the question is how can we make a--how can we use this gesture recognition technology we have to actually encourage the formation of short-term memory and acquisition of language? So what we've been creating is this system here. This is called CopyCat. So what happens is the hero of the game, Iris the cat here. You can see her in the bottom left-hand side here right where E is. Iris is a white cat with blue eyes, because white cats with blue eyes are often deaf. She's the hero. She is trying to find all the gems that have been stolen and they've been stolen by snakes and spiders and alligators and all sorts of other monsters. And so the children have to--when they come upon a scene like this, they have to say that the snake is under the chair. That's the three-word phrase. Oftentimes, there'll be multiple chairs and multiple snakes. You guys say which one has the gem. So in that case it'd be the orange snake is under, in this case, the blue chair. If they get it correct, Iris will magically poof the snake and get the gem and go on the next level. Now this is a sign language verification task, not just a sign language recognition task, and we're using gloves for computer vision. We're also using our accelerometers again. That gives us--well, vision might not give us up and down, the accelerometers do, so they're--think of them as glorified tilt sensors. This is the scenario we have with the children in this little kiosk as they're assigned to the game. Now, you might think this seems like a relatively easy, you know, computer vision tracking system. But remember we have a lot of different video going on here, a lot of different lighting conditions. Here, the features we're using--we're using head placement, hand placement, angles' relationship to each other. We're using--we're doing PCA on our database, so we have the top 20 hand shapes, or the left and right hand. We're doing FFTs on the accelerometers. We actually render little eyeglasses on the children's video. So as they are interacting with the system, the eyeglasses stick on them, so they can stay within the view of the camera. Now, why this is hard--is hard? Well, it turns out--we're only using 19 signs in our system. But, they can be done in many different ways. For example, this is bed and this is bed, and this is bed and this is bed. This is cat, so is this, so is this. Most signers, if you watch our interpreter here, have a hand dominance. And so most of her signs--I'm actually looking at her--her hand to figure out which dominance she is. But she's doing all two-handed signs. There we go. She's right-hand dominant. So most signers have a dominance. The children we work with actually don't have a dominance. They'll switch dominance in the middle of the phrase, which causes us all sorts of problems. They also have things like, you know, flowers which can go right to left or left to right, with right or left hand. So that's a problem. We love--so with 19 phrases--the 19 signs, we ended up with, believe it or not, 128 different tokens we're looking at. There's that much variation going on. The other problem is we have lots of disfluencies. In speech recognition, disfluencies--you had to actually recognize people are coughing or "Uh," or "Ah," you know, or [MAKES SOUND], right? You have to recognize all of those different utterances in order to make your speech recognition better. We had the same thing. We have cough, "Excuse me," and "No, no, no, no," you know, "I didn't mean that," or, "What am I supposed to sign next?" And we don't, and we want to be able to recognize, you know, "The orange snake, umm, oh, under the green chair," right? So we got to be able to handle these disfluencies, to actually recognizing them as well. My favorite disfluency that we're recognizing is the pick your nose gesture. That's why our gloves are washable. So we get about 84% accuracy in trying to determine if a phrase was signed correctly or not. Interestingly enough, our sign linguist, when we're doing a Wizard of Oz study to collect data, you know, he pretended to be the computer recognizer, he only had 90% accuracy. So we're not that far off with what the human is doing. In truth, we're very far off it. This is a very hard problem. We got many years of work left, but for this constrained situation, we're okay. And we actually deployed the system fully automatic for two weeks where we had six children use it, use the system about six hours or so. And six children who are control. And we actually saw a significant increase in their short-term memory, their ability to sign, to express themselves in sign, and their ability to understand sign. So we're very, very excited about that. This--the sign verification program, this CopyCat program is the first example, I know of, of a sign recognition system actually being used for real application in the real world. Now, I said--now, this system we have works for children ages six to eleven. That's generally after the critical learning period of a language which is zero to three. We also want to try to actually get a system for children who are zero to three as well. So what I'm going to do is show you something called SMARTSign Alert. Okay. So what we have is a system where the parent, throughout the day, gets sign language alerts. So it's just like an SMS but each SMS is a little video that shows them a new sign like this. And gives them a little quiz, which one is it? And if they don't--if they don't know, it will tell them which one it was. That was cat. So, throughout the day--we try to openly space the lessons so that the parents learn the most in the least amount of time. And it turned out--it turned out, this was really fun. We did a Spanish for me and I had a lot of fun learning it. We're actually using the first 80 words you use when talking with an infant. So, what's exciting about this, we compared learning sign language on cellphone to learning it using a desk--the same desktop application. This thing was 40% more efficient on cellphone than desktop. I was really quite surprised about that. Now, the other thing we have is a system where I can actually--I don't know if anybody can see this but--what I can do is actually talk into the system and ask for a sign. For example, thank you. "Thank you." And as it pulls up the word--the things I said, I click on it, I'll get a little video showing me the sign. [INDISTINCT]. And here is the sign for [INDISTINCT]. So what we want to do is hit the stage where parents can say things like, "Go to bed," and up comes a video, Go to Bed. And so the children actually learn sign in context. Now that--we're trying to get that out as a Google app on the app store--Android app store before the beginning of the summer, but we didn't quite--we didn't quite get there. We're still working on it. Unfortunately, the researcher working on is currently now at IBM, so we'll probably won't get on there until fall. Okay. Now, I know that--know I'm out of time. Last time, I told you guys a little bit about trying to recognize sign language directly off a motor cortex, so let me give you an update on that. For those of you who hadn't--didn't see that talk. If you have somebody who's locked in, somebody who's paralyzed, has ALS or Lou Gehrig's disease depending on how you know the disease, they cannot move a muscle. They have no way to communicate. Can we actually have people communicate through brainwaves alone? The answer is yes. We're doing forced-choice pairs, things like hot versus cold or chair versus bed and we're actually getting relatively good accuracies on this. So for these forced-choice pair's getting, you know, 90% accuracy for real signing, it even works if you're just imagining signing. Right? If you sit there in this fMRI tube and think about doing the sign. You can still get relatively decent results. And currently, we're starting to work on entire phrases, instead of, "Are you hot or cold?" You know, "Hot or cold?" "Are you in pain or are you okay? Do you want to go to your chair or to your bed?" Now, we're trying to get full phrases like, "The bed's hot, I'm in pain." So that currently involves a fMRI, a big machine. What I've got with me today is an fMRI sensor or fNIR sensor. So, this is basically a set of fancy IR transmitters receivers that you can put on a little portion of your head. It can tell you if that little portion of your head is activated--is active. And you can actually use this on a mobile device. So, what I'm currently doing is I'm wiring this up to my wearable computer, to start seeing if I can think to my computer in sign language. They get to do something for me. Now granted this is only one bit. This is actually what I call my Kill Bill interface because I'm putting this right about here, which is the, "Wiggle my big toe." So I'm trying to make it so that, you know, if I'm trying to have, you know, okay or cancel, I wiggle my big toe and it triggers this. We'll see how well it works. Okay. I am running over. There's lots--I have lots of other demos up here if you want to include a system for the--for deaf folks to be able to TTY into directly the 911 center system, for playing Dance Dance Revolution on your cellphone using sensors on your feet. That's a lot of fun. There's a system for people to learn Braille who have low vision. That's already on the iPhone. That's by Brian Key, so I can show that afterwards. I also have lots of other stuff which apparently--I can't--I can only go forward. I can't go back in this presentation. There's lots of other crazy stuff we're working on. This has been our survey. So if you want to talk to me about talking with dolphins or trying to make better mini-QWERTY keyboards, and answer those types of stuff, please come up and talk to me afterwards. And the acknowledgment slide just got killed off but you saw it there earlier. I have a lot of funders; NIDRR, DARPA, NSF, ETRI. Thank you to all of them and thank you to, of course, this is mostly [INDISTINCT] work not mine, so thanks to all of those who are on the screen, just a second go. And I'll take any questions. Thank you very much. >> [INDISTINCT] citing the recording, so do you think Morse code is going to make a comeback? >> STARNER: No, it's too hard. I don't even know Morse code well enough. I can do SOS. So, you know, that ringtone that goes [MAKES SOUND] drives me nuts because it's mostly SOS. I don't know why they chose that. It's just really frustrating. >> Hi, you talked a little bit--[INDISTINCT] there we go. You talked a little bit about what is socially appropriate and not socially appropriate? >> STARNER: Yes. >> And, of course, you personally have been testing out the stuff for a long time. >> STARNER: Yes. >> Are there changes in that or, you know... >> STARNER: Oh, yes. >> ...[INDISTINCT] IN the high level things you can [INDISTINCT]... >> STARNER: Camera phones. Holy cow. When I first started this stuff, the idea of actually having an on-body camera, people really, really hated that and now all of you have on-body cameras. People used to, you know, the idea of actually recording audio, you know, I don't record audio; I could. But more importantly all of you could record audio with your cellphones easily, 24 hours a day. Even worse than that, someone could hack your phone relatively easily and turn on your microphone without you knowing it, even make a [INDISTINCT] microphone out of all your cellphones. So, every one of you sitting here is bugged. You know, the whole cellphone revolution. You know, back when--I remember back when I started, cellphones were this big. Right? They were huge. They weighed pounds. So now, that everybody has, you know, this supercomputer in their pocket, what I do is--it looks tame. I'm just trying to control you with my brain. >> So you have a little display in your left eye. What is it for? What are you doing with it? >> STARNER: Normally, it is for--as he loses it--this. So these little notes that I would normally have on my screen as I talk, but today because I'm having equipment failures, I actually had to put my wearable and actually used it for the videos. So for example, I forgot to mention that CopyCat is at the CVPR workshop this Friday on Human Communicative Behavior Analysis. And if you ask real nice, Zahoor Zafrulla, who is there, will give you a live demo of it. So normally, I have notes on my talk as I'm giving it. Why do I have it on right now is a good question because I forgot to take it off. It's the honest truth. I--it's not hooked up right now, so why do I have it on? I'm just used to it. >> So I happen to notice that you're wearing a Twiddler. >> STARNER: Yes. >> And I know that in the past you found that that was the best wearable input device. >> STARNER: Yes. >> Is that still true or have new things come up and... >> STARNER: It depends on what you're doing. It's still the best that I know of because you can get quite fast at--like a burst of 130, I sustain 70 words per minutes. On a mini-QWERTY keyboard, the BlackBerry style keyboard, you can do, believe it or not, 60 sustained; so you can do equivalent. The only problem is you needed visual attention. So you can't actually sit here like--I'm a professor; I teach all the time. There's no way in my class typing their notes in their BlackBerry. Why? Because they have to do this all the time and they can't look at the blackboard. With the Twiddler, it's all touch typing and so you can look up just fine. If you try to do--if you try to do touch typing on a BlackBerry, your error rate goes up to 15% per character; just horrible. Your typing rate goes down to 45 words per minute instead of 60. So the BlackBerrys, if you can give your full attention to them are just as fast as a Twiddler. But as soon as you try to actually do anything where you're on the go, when you're actually moving, the BlackBerry rates go to hell. But if you're interested, we actually have a thing called on-Mac Wide Out which looks at the fact you have fat thumbs and you hit multiple keys at the same time. For those of you who are engineers, you know about kiddie bouncing, think about kiddie bouncing across multiple keys and once you have that idea you can actually reduce the amount of errors people make on a mini-QWERTY keyboard by about 25% of all errors. You can reduce about 50% of just [INDISTINCT] off by one error, probably a whole lot more, but--so, we can--we can improve mini-QWERTY keyboards, so they're better. But I don't think they'll ever make the Twiddler yet. There's nothing else out there that has come close. The only thing I know off that might get there is something called ShapeWriter on new Samsung phones where you actually do gesture things for entire words. Though [INDISTINCT] at IBM never did a real true longitudinal study on it, so we really don't know where it matches--maxes out. >> Okay. [INDISTINCT] >> STARNER: So the Twiddler--the TeK Gear has bought out Twiddler. So you go to tekgear.com, T-E-K-G-E-A-R. I sent you, TV, a special invite to talk to Scott Gilliland who's making the new Twiddler. I happen to know they got their first run of 10 samples from the factory yesterday. >> Okay. >> STARNER: And as a matter of fact I've been using one for the past week except for the fact that the greater and less than sign are where the Z and T are, and G sometimes becomes enter. >> Does it have Bluetooth? >> STARNER: This one is just USB, but I happen to know Scott has made it so it may become Bluetooth pretty easily. So if you want to get on the bandwagon and suggest improvements... >> [INDISTINCT] >> STARNER: It's--you already have one in your inbox as a matter of fact. >> [INDISTINCT] the inbox? Okay. Okay. >> STARNER: You're just not paying attention to my emails. I explicitly invited you to work on this. Yes. So... >> You know, the [INDISTINCT] is that I do have like [INDISTINCT] >> STARNER: Yes. So Scott Gilliland is the one you want to talk to. He's the one working on it right now. And, you know, there's lots--real [INDISTINCT] games and stuff to help people get up to speed on the Twiddler. >> Thank you, Thad. >> STARNER: Yes. Thank you.