Tilo Mitra - The state of gestures

Jenny Donnelly: Next up we've got Tilo who's going to give a deep dive on the state of the art gestures layer that he's been working on. [applause] Tilo Mitra: Thanks. Okay, cool. We have half an hour to lunch and there are few better ways to spend it than to talk about gesture events. I'm going to be talking about the state of gestures. My name is Tilo, I work on YUI and Pure at Yahoo. Does this work? I'll just do this. So yeah, when I was thinking about the name of this talk I didn't know what to call it. State of gesture sounds kind of interesting, thought I would get more people in the room so I went with that. But actually what the talk is about is the challenges of developing multi-device user experiences and what we're doing about it here in the YUI team. Multi-device user experiences. Okay, so multiple devices. I thought a good place to start would be, since we're talking about different devices, to talk about different types of inputs. One of the things we're seeing now is that we have different devices, all of which have different interaction patterns. Some use touch, some use mouse, some use both. I was going to start by diving into this. Let's assume a use case. Let's assume we have this button, the subscribe button, and we want to fast-click on it because we feel if this is a responsive button, and by fast-click I mean you know how you don't have to wait for that delay on touch devices. We want this button to be fast-clickable. In the old days, or I say pre-2007 really, most of the things we dealt with were computers which primarily had mouse input. There was keyboard as well but primarily people interacted with your site using the mouse. For that, if we wanted to achieve this sort of fast-click effect, this is what we would write. We would say node, this button.onclick, do something. That was very simple and it did the job very well. Then after 2007 the iPhone came out and it wasn't just the iPhone but all these touch devices that were entering the market, and still are. That gave us this notion of building sites that have touch support now. The primary input from users was touch. It also had some other input types like accelerometer and gyroscope but we're not going to get into that. Primarily people were interacting with your site through either mouse or touch. We as web developers said okay, alright, we'll take that into account, so let's use something like this. If we see that this device supports touch events, so window ontouchstart exists, we're going to use the touchend event because that fires really quickly. If the device doesn't support touch then we will fire mouse events. That seems to work. That's using feature detection too, so that's pretty good. But then the problem is that over the last maybe a year, more than a year probably, but let's say the last two or three years, we've had these devices that have crept onto the market now that have both mouse and touch functionality. I'm not saying that these devices weren't there before but they weren't there in this type of number. The background picture here is of a Microsoft Surface because that's a common one. I don't know how common it is but everyone knows about it. But there's also the Chromebook Pixel, and I couldn't find a good picture with the Pixel. Anyway, these types of devices, you can interact with them either through the mouse or you can tap directly on the screen and they send touch events. Going back to the previous snippet I showed you with the ontouchstart kind of branching, this no longer works suddenly because what happens is that these new devices, they support touch events, so they will always go to that if statement and the else listener will never get subscribed. What happens is if the user interacts with their screen everything works fine, but as soon as they stop interacting with the screen and they click on something on the keyboard on that device, we have no interaction, nothing works, everything breaks down. So we said okay. This is the actual thing that you're supposed to do for that case, which is if touchstart exists you actually have to listen to both touchend and mouseup because you don't know which is going to come in. Then if a touchend comes in you want to actually prevent default, because if you prevent default the mouse events don't fire after that. What you want to do is say on touchend mouseup I want to prevent default and then I want to do something. This is obviously if it's not a link, you don't want to send the user somewhere, you want them to stay on that same page. That's okay, but the problem is that's not all you write. Because now what we have is in devices like the Surface they have MS pointer events which fire. You want to say well okay that's fine but if they support MS pointer then we're just going to listen to MS pointer events. But then that's actually not all of it either because then if they don't support touch events or MS pointer events let's just listen to mouse events. You get to see that this is getting complicated really quickly from that initial three line node.onclick that we had. The funny thing here is that those lines of code I showed you is just for I want to click on an element and I want that click to be responsive. If that's how complicated it's getting for a simple click then what are we going to do when we're talking about more complicated gestures? This is kind of what we've been thinking about in YUI. The other thing I showed you earlier which was the prevent default to prevent mouse events from coming in after is very heavy handed. It's fine if you use that in your application but in YUI our library shouldn't prevent default for you by default, because we don't know if you want to prevent default or not. The whole notion of prevent default is kind of heavy handed, we don't really want to go down that route unless we need to. These are some of the problems with this whole idea of gestures and having something that works everywhere. Now that I've kind of cleared that up. Oh, side note. If you go to this link, I'm going to go here right now, this is how you can actually check to see what events your browser fires. So yeah, if you go on this link here, this is kind of how I did my testing initially. As you interact with this square up here you get the events that are firing and you see by default Chrome is firing mousedown, mouseup, click. What I do usually is to simulate that Microsoft Surface device that supports both mouse and touch, I actually open up the Chrome Inspector, I go into this little settings icon down here and I click enable, and I check off emulate touch events down here. I don't know if you guys can see that, it's a little small. Check off that emulate touch events. Then you have to keep the debugger open, or the inspector open, but your mouse turns into this little touch fingery thingy and now if you interact with it you get touch events and mouse events that fire. Just side note, this is how I debug to make sure my sites work well in both touch and mouse interactions. If you did that, if you went to that JS Bin link and did that for a bunch of different browsers, this is basically what you would get. On desktops as you saw, on something like Chrome or Firefox, you usually get mousedown, mousemove, mouseup, click. I'm assuming that you're clicking on an element here. On iOS you just get touch events. You just get touchstart, touchmove, touchend, and then click if the touchstart and touchend was on the same element. The click fires 300 milliseconds later because iOS checks to see if you're doing a double tap. That's the whole problem, that's why we want responsive buttons because if we just subscribe to clicks it would fire after 300 milliseconds. On Android, this is on a 4.0, it actually fires both touch and mouse events. The touch events come in first, then followed by the mouse events. Touchstart, mousedown, touchmove, mousemove, touchend, mouseup, and then the click. Then on IE10+, IE10+ supports MS pointer events. They actually fire simultaneously so you get a MSPointerdown, mousedown, MSPointermove, mousemove, MSPointerup, mouseup, and then a click. As you can see, the idea I wanted to show here is that there are different environments and they work differently. With that background in place I was thinking about how to best approach this problem. If we take a step back and talk about responsive web design, which is kind of this big buzz word that we're using for everything these days. Primarily when we talk about responsive web design what we're talking about is HTML and CSS, HTML and then media queries or something like that to make your site look good in all screen sizes. But is there a JavaScript component behind responsive web design as well? That's a question that we had on the team. If you actually break down the words responsive web design and you look at the word responsive, the word responsive means quick to react or respond. Which, if you think about it, is pretty much what we were trying to do with that button earlier. We were trying to make a button that was responsive across different devices, so it seems to fit. The idea obviously is that right now, with all those code snippets that I showed you, in all of them the events I was listening to were very closely coupled with the input type. It was a mouse event or a touch event or a MS pointer event. Well not MS pointer so much, but definitely mouse and touch. So the idea must be to decouple these events from their inputs. Something like this most likely. No matter what input causes that event to fire we want to fire some event. It could happen because of a finger, it could happen because of a mouse, it could happen because of a pen or something else. Then your application should listen to that event. If this is firing some light bulbs then it should, because this is basically what the pointer specification is about. It's a W3C specification that says basically this. I shall quote. 羨 pointer can be any point of contact on the screen made by a mouse cursor, pen, touch, including multi-touch, or other pointer pointing input device. This model makes it easy to write sites and applications that work well no matter what hardware the user has.' That's kind of what the pointer spec was basically saying, that regardless of what input you're using we're going to fire pointer events. That sounds great, that's exactly what we needed, right? Do we write a polyfill for this in the library? What do we do? A polyfill obviously, for those who don't know, is simulating that functionality but it doesn't exist natively in the browser. The polyfill is one idea but actually my question was why don't we improve the gesturemove events that we have in the library? You guys might know about the gesturemove events. Personally I kind of knew they existed but I never used them because they weren't very useful. But that was a question I had. The reason I had this question is because if you look at the gesturemove docs, this is what they say: 組esturemovestart, gesturemove, and gesturemoveend are events that serve as abstractions over mouse and touch events, forking internally based on the client device.' Which sounds exactly like what the pointer spec says basically. There's clearly disconnect because we have this thing in the library and we've had it for a while, yet we're probably not making use of it because it hasn't received love in a while and we haven't worked on it as all these specs have developed. What we basically want is something like this, and this is what we've been working on. Regardless of what kind of event comes in, whether it's a touch event or a mouse event or an MS pointer event, gesturemove events fire now in the library. This was the case before, too, but we've gone in and made it work better for cases where touch and mouse events are both firing. You don't want gesturemovestart to fire twice, once for touchstart, once for mousedown. We've done some work there. Now you can listen to these three events, and their abstractions over the lower level browser DOM events. But we didn't really stop there. What we really want you guys to listen is higher level richer events, so we have this richer set of events that are built on top of this gesture layer. Events like tap, flick, moving, so like if you hold your finger down or you mouse down and you actually move the mouse around it'll fire as the finger or the mouse is moving across the page and transforms, which is something I'm working on. Then your application listens mostly to the richer set of events because that tells you what you need. If you want to you can also listen to the gesturemove layer but you don't really have to. That architecture is what we're going towards, and we've already done a lot of work towards it. There's a pull request 1309 that has improvements to eventmove to make it work for mouse and touch and devices that support both, kind of what I was saying. For the common use case, the one I mentioned earlier, fast-clicking on a button, we already have that because we improved eventtap a few months ago and we added support for prevent default. So you can now tap, and a tap is basically a fast-click regardless of where you are. You can say button.ontap and e.preventDefault because I don't want the page to change and then do whatever you want, and that'll work. I'll give you a demo soon. Not only that but we actually have a bunch of new APIs that are coming that I wanted to show to you guys. Tap, like I said, we've been working on that. It's already out there. We have support for duel listeners which means that it works really well on devices that support both mouse and touch, so you can go on a Surface, you can tap on it with your finger, then you can go back and tap on it with your mouse. Everything will work really well. It has prevent default support. That's available right now. I think it was out in 3.10 or something. Flick, we're adding new sugar events on top of flick so you can now flickup, flickleft, flickdown and flickright. It's kind of nice because you don't have to figure out which way the flick is going. Flick, obviously when the event payload comes in you get a direction of the flick and the velocity, which is in pixels per second. Basically when someone's flicking the touchmove and mousemove events are firing and you want to see what was the displacement over how long the flick lasted, so you get this nice velocity that comes in. Those flickup, flickdown, flickleft and flickright events are in pull request 1323, which we're kind of ironing out. It's almost there. Then gesturemovestart, as I mentioned, pull request 1309 has support for the new improvements that we've been making to it. There are some other things that have not made it into that pull request yet, but basically support for delta, delta X and delta Y I call it. What this does is let's say if you tap on the screen or something... I'm just going to assume you're using a finger. But let's assume you tap on the screen so a gesturemovestart event fires. You start moving your finger and then you let go of the screen. What happens is we actually calculate the delta between the end and the start to tell you this is how much your finger actually moved. It's not calculating the moves but just saying okay this is the total delta that that finger moved. You can do really interesting things with that and I'll show you that in a demo. So we have this concept of delta X and delta Y. Then these move events, it's kind of confusing. Gesturemove* and move, we're still ironing out these names, they might change. But when I talk about the move events what I'm talking about are these higher level type of events. When you put your finger on a page and you start moving it on the page you get these move events that fire. These move events you can subscribe to moveup, movedown, moveleft and moveright, so you can say only tell me when someone holds their finger down and starts moving left. You have the ability to lock axis, so let's say you want to put your finger on something and you want to say if they move left or right prevent default, or if they move up and down don't prevent default because I want the page to be scrollable naturally, like vertically. So you can only fire move events in a specific direction. This payload also has a direction property and delta X and delta Y. That's also really useful. The delta X and delta Y tells you if I start moving between move n-1 and move n, how much did they go on the X and how much did they go on the Y, so are they going up and down, left or right, what are they doing. I will show you that in a demo. I keep talking about this demo so I will show you the demo. I'll actually start it on the computer first. Cool. I have this carousel here and it's using these gesture events. Basically this carousel has... I have a custom build of YUI which I will send the link to. It has these new APIs inside it. Now this carousel has three event subscriptions on it. It has a subscription for flicking left and right, it has a subscription for moving, so kind of like that, and then it has a subscription for tapping on these little... I don't know if you can see those, those things there. I figure that on the desktop, this whole flick concept, I don't feel it's natural. People don't like flicking with a mouse. So on a desktop you can kind of tap through this. Oh, that's the last one actually. And then this is the real cool part. Can I get the elmo working? Or power? Oh, it was already on. Give me a second. Is that up somewhere? Okay cool, so you guys can see that. This is a really pretty much native scroll that you can get here. What's basically happening is... I'll show you what's happening under the hood. When I'm tapping and I'm moving this, I'm using the delta X of the drag, sorry of the move event, and the delta X here is kind of like the amount of the next photo you can see. This is the delta X because this is the thing that I've moved from here that much, so the next photo is revealed by that much. It's a CSS transform that's applied that's kind of giving this animated effect. On these little arrows on the left and right I have eventtap set up. You can see I can fast-click through these and they work really well. Then if I flick the flickleft and flickright events allow me to go back and forward through the carousel. I have some smarts in here which basically says if I start dragging but the delta is not a sufficient amount then when I let go just go back to the current pane basically. If the delta was, let's say, half, it'll go to the next one, if it's not then it'll just stay where it's at. I'll send the link to where this lives so you can try it out. The cool part is... I think it's cool. Can we move back to the computer? Okay, cool. The cool part is that as you can see it kind of works the same way on the computer. Let's do the same thing I did before which is that inspector trick to get the touch events on here. Exact same thing. I'm not listening for any touchstart, mousedown, I'm not doing any funky things, but it's just working. Let me explain to you what's happening here. This is what we do now. Without these new APIs and this new gesture sugar layer that we're working on, if you were to create now this is pretty much what you would do, and I've tried to do this. You would set up some gesturemove* and flick listeners. You would actually, when on gesturemovestarts on touchdown you would store the page X and page Y values. I'm going to call them x1 and y1. You're going to store those values somewhere and then you're going to compare that to gesturemove, to the page X and page Y that you get from gesturemove, to figure out what the direction and delta is because we don't give you that. Then on gesturemove n, so when you lift your finger up you're going to compare that again to the ones you said before to figure out what the direction delta is there. Then you're going to have these big callback functions where you're going to branch logic based on direction. After you do all that you're going to figure out, okay, well how does this work on devices that support both mouse and touch but that doesn't fire twice, and it's trickier than you think. And you haven't really gotten to coding the actual application, this is just setting stuff up. With the new APIs what you do is you listen for move, flickleft, flickright, and gesturemoveend. Inside the move callback you get the necessary information using e.drag.direction and e.drag.deltaX. Actually don't quote me on the property names there because they might change, that's just what I'm using now. They'll probably change to something else, probably like e.gestures or something. Then you transition to previous and next panels inside your flickleft and flickright callbacks. Inside gesturemoveend you get the delta X and direction X, dirX, properties, and you decide whether or not to transition panels. Already you can see you're working at a higher level here. You're already thinking about transitioning and not setting up figuring out what direction it is and what all that stuff is. Going forward, this is where everything actually lives. I mentioned the two pull requests. I have a branch called gesture integration which actually has a build of all these modules. That's where that demo was working from. The demo itself you can check the code for it, that bit.ly link, which is a gist. You can run that with RocketHub or something, which is what I was doing. So yeah, check it out, and let me know what you think. I'm pretty excited about this stuff. I think it adds some very necessary sugar to make it easier for you guys to code gesture events across devices. That's all I had. Thanks. [applause] Andrew: We have some time for questions. Do you have any questions? Oh, here we go. Audience member: It's more like a curiosity than anything. In iOS 7 Apple decided to put the back and forth within the Safari. I was wondering if you find anything interesting there playing with all those events. Did you find a way somehow to circumvent the problem? Tilo: You know what, I tried to prevent default on touchstart. It didn't prevent those events from that behavior. But for example in the demo that I showed, I can show it to you in person, you'll be able to play around with it. On the phone those events still work, but the width in which they work, it's not that bad. I haven't prevented default, that doesn't change the way the flick works. You can still normally just flick through left and right. You don't really need to prevent default. Personally I don't think you should prevent default because ever since I got iOS 7 I've been using that thing to go back and forward very often. Short answer, I haven't figured out a way to get rid of it but it hasn't got in my way. Andrew: Do we have any more questions? Tilo: There's one over here. Audience member: Hey there. Tilo: Hey. Audience member: Can you explain a little bit more how to think about whether you'd use the gesturemove series of events versus the move series of events that you went over? Tilo: Yeah, so the move, originally they were actually drag events and that's a better way of thinking about them. The reason I'm running into problems calling them drag is because they conflict with the DOM drag events that fire in HTML5. Still figuring out the best way of what to call them. Originally I also called them gesturedrag but I don't know, it sounded weird. If you think about them as drag events it might make more sense because... To be honest I think the gesturemove events, we could consider renaming them to some sort of pointer or something like that. The reason we didn't go with the pointer initially, or at least I was hesitant, is because the pointer specification has a dependence on a CSS property called touch-action and we don't support the touch-action property, so if someone thought this was conforming to this spec 100 per cent it's not. But yeah, think about the move events that I mentioned as drag events. Audience member: It's just a little question about how to replace the double click event that we have from the YUI Grid with these gestures. Tilo: The what event, sorry? Audience member: Double click. Tilo: Yeah, I think going forward what we want to do is build up that... Double click or like hold, these are richer events. We want to kind of build those into this new layer. Just how tap and flick are there now, we want to have double tap or something like that probably, which will work for tapping choice, it'll work for clicking choice, it'll work on devices for both. We're going I think after we get this initial pass through going and add those events in and make them work well by relying on this gesturemove layer that we're kind of building out. Andrew: We have time for a couple more questions. Tilo: We have one more by Ryan. Andrew: Alright. Ryan: I just wanted to make Andrew run over here. I probably should have looked at the PR so I should already know this but I haven't so I'm sorry. How do you differentiate when a drag event starts between whether it's an MS pointer that it's starting at or it's a touch device or a mouse device? Because on touch often you want to touch and hold a bit and then you start a drag event as opposed to just touching and making a gesture. Tilo: The drag event, basically it correlates pretty much with a touchmove or a mousemove. But you can specify... To be honest that's basically where it is. It correlates to a touchmove or a mousemove. It's not exactly like a dd-drag or something like that. It just says that I'm moving while my finger's down, or I'm moving over the element with my mouse. I'm not sure, does that answer your question? I'm not sure. Ryan: Yes, thank you. Tilo: Okay. Andrew: Okay, do we have any more questions? Audience member: Have you guys... Hi. Have you guys done any work around the interaction with trackpads? Because one of the things, and I know there's some limitations there, but one of the things that when you're using let's say Google Maps, for example, that's kind of frustrating is if you're used to using tablets and you're scrolling and then you go to Google Maps you can't, for example, move the map with a swipe. Tilo: Oh right, right. Audience member: Because it's doing the swipe like at a higher level or whatever. But I was just wondering if you guys had put any thought into that stuff or done any work around those kinds of things. Tilo: Personally I have not. But the problem is that a lot of those events, the trackpad related events correlate very strongly with the OS. In fact if you go into system preferences and you look at those trackpad events, like if you click on the tradepad link it says swipe with four fingers to go to the next whatever they call it, the next screen or whatever. I feel it would be confusing for users if we... I don't even know if it's possible. You could definitely have a four fingered gesture but again, I feel bad about preventing default on these sort of paradigms because less tech savvy users know that a pinch on the trackpad is normally a zoom in the browser. I feel weird about if we change that unless it's a web app that's doing that or something. So I haven't had any experience but I don't know if I want to necessarily override all those events because they're OS level sort of stuff. [applause]