Google I/o 2012 - Cloud support

ROBERT PUFKY: I'm Robert. And today I have with me Zach and James. And today we're going to talk about cloud support and how we moved our internal support stack into the cloud. Now I actually want to talk to you guys first and say, how many of you guys have run traditional IT stacks in the past or are running them currently? Just a show of hands. Wow, OK, a lot of people. So our experience in moving our stack to the cloud should be pretty relative to you. And for you guys that are already running on App Engine or are already running on cloud, maybe you'll get some good tips and tricks about developing on App Engine. So before we talk about this, I want to talk a little bit about the background of support at Google and kind of talk about how we do things. So we can talk about the problem analysis of where we were before, the applications that we developed, and where we ended up. So the first thing I'm going to talk about is Techstops. And Techstops are really a central point at Google in a lot of the offices where anyone can go at any time for help with any kind of problem. It could be a technical issue. It could be software. It could be hardware issues. It could really be anything. And what you expect to find here is you come in, and you have fun and friendly techs that can answer your questions quickly and get you on your way and make you be more productive. These techs are actually called field techs. And in smaller offices, they equate to something like an IT site director or something like that because they also do a lot of the site work at smaller offices. We also have a program called the ITR program, or the Internal Technology Residency program, which is a two year program where we pull students from colleges that have just recently graduated. We put them through a two-year process where we show them how Google does IT. And we put them on an ops rotation. An ops rotation is like a three-month rotation with an operations team where they can go and learn what operations teams do. My team is the Support Engineering Development team. And both Zach and James did ops rotations with me. Additionally, they do office rotations, where they go and work for three months in another office and get an idea of what all our offices are like. Now to get back to the support at Google, really, our core goal in Google in support is to make sure that users' problem are fixed the first time, they're fixed quickly, and they're happy and productive. And this is really key. It leads to a lot of really high customer satisfaction scores. But that's not really what we're focusing on. We're focusing on users being really productive. . And that's where we get into this conversation we're going to talk about today. And it's our frustration with the traditional IT stacks that we were running before. And these are great, actually. A lot of IT professionals come in. They've written applications on these stacks before. They write quick applications here and there, even build up some large-scale applications. And that's great. But we've found some problems with it. We actually experienced a lot of application development, and then we had to scale it to meet our demands. We have offices all over the world, on every continent except for Antarctica. So when we launched an application, we literally had to make sure that everyone in the world could use it when we launched it. And with traditional IT stacks, we really got to focus on not only the application development but how we can scale that to the world. And this was a really big problem. Because our team is generally small, we didn't have DBAs and sysops and SREs. SREs are Site Reliability Engineers, essentially keep stuff up and running. And we came to an injunction where we wanted to say, should we actually go and hire these people to help scale this infrastructure for us? Or should we actually take a look at the core problems in the group and see if we can come up with a different way, a better way to solve these issues? So what we did is we went in, and we did a problem analysis on our group. And we tried to figure out what was wrong in our development group as well as what was wrong in the support organization and how we could fix this. So we pondered on these issues a little bit. And we found the first thing, and probably the biggest thing, is that we found that maintenance versus innovation was a huge issue. We were spending a lot of our time scaling our infrastructure, keeping it up and running. Spending a lot of time just making sure it was running as opposed to innovating in our space and really fixing the core issues that our team was there to solve. We also found that a lack of tech information, or information presented to techs in the Techstops, led to a lot more escalations in pages and bugs. And what that really meant is that the end user was not getting their problem fixed as fast as possible. So we wanted to make sure that our techs were getting the information that they needed so they could be empowered to make the right decisions and get people on their way faster. We also looked at the approvals and how we did approvals. We found that if a tech is fixing a user's computer, and their hardware is busted, why should they wait for a manager two levels away to approve a hardware swap? Why can't the tech just make the determination that the hardware's busted and swap it out? What if the tech is working at a Techstop, and the location has changed, or the phone number has changed? Why should they go through an approval mechanism to get those changes made? Why can't they just go and make these changes immediately? We also looked at the automation for our group. And what we really wanted to get done here is to remove a lot of the mundane task work from the Techstops so they can focus on interesting problems. We did a little bit of this with traditional IT stacks. But we really wanted to do this automation work and then not have to think about it and keep it up and running later on. We also looked in the self-service. Now if a Googler takes an average of 5 to 10 minutes to walk to a Techstops, that means that there's 10 to 20 minutes of that Googler's day just lost just walking to the Techstop without actually even getting their problem fixed. So what we wanted to take a look at is if we could provide information to our end users so they can fix their own problems, maybe in two or three minutes, as opposed to spending all that time walking to and from the Techstop. And again, this is something that we need to do at a global scale so that when we launch an application or a self-service application, everyone can use it. The last thing was slow applications. And again, we've, touched on this before, so I'm not going to hammer away at this. But this is really scaling applications and making sure they're fast and responsive to people that are using them. The last thing we looked at, and I'll give you a minute to look at this graph, is our application history and what it was like from inception 'til death. And what you'll see here is a histogram. And this is basically a histogram of the applications that we developed over time. And what we did was we put a trend line over this data. And we found that there's actually a really nice normal or bell curve about the average lifespan of our applications that we were developing. And this was a key insight to our problem because we were fundamentally different designing our applications wrong. We were originally aiming to build them to be 10- to 15-year applications, to be self-healing, to be self-replicating, to fail over to other DCs, et cetera, et cetera, DCs being Data Centers. And it really turned out that our application lifespan was only two to four years. So we only really wanted to focus on getting an application out, making sure it scaled to the world. It did what it needed to do. We could add features when we needed to, and we could turn it down without actually needing to work on all the maintenance stuff. We looked into why applications were lasting only two to four years, and we came up with a couple interesting points as well. First was technology changes. Some of our systems interacted with tech stacks, and these tech stacks got deprecated. So therefore there was no point in having those applications. Second, we looked at internal process changes. We're constantly evolving, and our internal processes change a lot. So some applications were deprecated by processes. And the third and probably the most important was security and scaling issues that we were having these applications. Again, our team was just a small team of developers. And we were having a lot of issues scaling our applications because we needed to get dedicated resources to do it. We also looked in the security aspect and really found out that in traditional stacks, you have to worry about OS security, how many users have access to the system, the security of your serving sack as well as the security of your code. And then what happens when a data center fails, and you need to fail over to another data center? We really wanted to remove all of that from the equation and really focus on writing the application itself. So with these problems identified, we set forth to migrate a lot of our applications to something better. And I'm going to introduce Zach, who's going to talk a little bit about Techstop information, which is one of these applications that we developed using these problems that we identified and helped solve. Zach? ZACH SZAFRAN: My name's Zach Szafran. I'm an IT resident, and I work in those Techstops that Bob was just telling you about. So recently, I had the chance to work with Bob and some of his support engineers on their cloud-based applications. Techstop Info is one of those projects I got to work on. In the next couple slides, I'm going to talk about it and how we implemented some of its features. So originally, we had this version-controlled YAML file. This was the authoritative source for all of our Techstop information. Any application that needed this type of Techstop info would actually have to sync down their own copy of this YAML file. So if you take a minute and just look over this example that we have here, you might see some human errors that we've included. These are actual human errors that you could find in our YAML file. So maybe by show of hands, does anyone see any obvious problems with this? It's kind of hard to tell. So the first, less obvious problem is the fact our indentation is incorrect. If we ran strict parsing across this YAML file, it would fail. The more obvious issue is the fact our phone number was incorrect. And things like this did make it into our YAML file. So let's say a technician found this phone number in our YAML file, and he wanted to actually fix it. What kind of process would they have to go through to get this done? So in order to make this change, they'd have to go to our repo, find the file, and sync it down to the local machine. Once they have the file, they would make their change, fix the phone number, whatever they need to do. And then they would save it. At which point, they would submit it for review, where it would be checked for spelling errors, syntax, style guide, et cetera. And then the reviewer would give the approval and say, looks good to me. After this, it's up to the technician to submit this file back up to the repository, where any application that depends on the YAML format would have to sync it down. At which point, they could use it or display it however they want. So in order to crowdsource this a little easier and avoid that tedious process for making small changes, we set up this App Engine-based replacement. We provided a UI for making all these changes. And in addition to that, we include audit logs for all the changes that are made. So now I'm going to demo Techstop Info a little bit, just so you have an idea of what it's about and what some of its features are. So here, a technician could come to our website. And they're immediately presented a list of Techstops. So let's say the technician coming here actually wanted to fix a phone number. And they know the Techstops managed by Bob. Using a Gmail-like search query, we can kind of filter this down a little bit. But that wasn't really specific enough. So let's say we also know the Techstop that Bob manages is on floor two. And since that's the only result of our query, we're taken directly into the Techstop, where phone numbers could be changed or updated. Also, an important note would be the audit log that we have down here. This basically says James was the last person to edit this Techstop info. And if we wanted to dig down a little deeper into this, we could get a full audit trail and actually see all the changes that were made to this Techstop, who made them, and what the changes were. So now that I've shown you Techstop Info and the Gmail-like search querying, I'm going to explain a little bit about how we implemented that. Really, all we ended up doing is filtering based off the models, our datastore's model attributes. So to do this, we set up this GetTechstopList method. And really, all it does is it takes a query from the UI and decides what to do with it. So the first check we do is for a colon in the query. And that says it is a Gmail-like query. So we pass it along to the GetListByAttributeQuery method, which I'll talk about in the next slide. And then if it isn't a Gmail-like query, we just take what was entered in the UI, and we use it as a standard term for a Datastore search. So the GetListByAttributeQuery method is what actually parses a Gmail-like query. Really, all it does is take this query, split it up into attribute names and values, and then we start looping through them all. So we check our Techstop model to ensure that it has this attribute. And if it doesn't, we inform the user that their query was invalid. But if it does have it, what we'll do is we'll take our current list of Techstops, and we'll start pruning out the ones that don't have that attribute or value. What we're left with at the end of this is a list of Techstops that match the entire query, meaning every attribute and value, and then we return them. So the IntersectResultsWithAttribute method does pretty much what it says. It takes our current list of tech stops, and it prunes the ones that don't have this attribute or value. So originally, when we run this method, we don't have a current results that no Techstops. So what we do is just a standard Datastore search with a little bit of filtering involved. Once we get those current results up, it actually will start passing it in to here. And we'll start looping through each of the Techstops in that result set. What we're doing is checking to make sure that they have that attribute or value. And if they don't, we just don't include it in our return results. These three functions here are really what allow us to implement the search querying. And it allows our application to quickly provide relevant information to the techs that are visiting our website. I mentioned a little earlier in the demo that we have these audit logs for keeping track of all the changes that happen to this Techstop info. We use the SaveTechstop method here for actually writing the Techstops to the Datastore. So inside, we have this private transaction method. And you can see that it's actually putting the Techstop in the Datastore. In return, we get a Techstop key, which we can then use to build this audit entry with. The audit entry says who made the change, what attributes were changed, and what their new values are. And then we write to the Datastore too. We take that private transaction method, and we run it in the db.run_in_transaction. And that ensures that both the Techstop entity and our audit entity get written to the Datastore. I also mentioned that we have a corporate database. What we're doing is we're taking our Datastore with all of its Techstop information, and we're syncing it out to this corporate database. Really, the database is meant for storing all this org-related information, so applications that depend on that can easily access it. With all of our Techstop info in here, all the website has to do is authenticate to the corporate database, and all this information's made available to them. We don't have to administrate an API access or give direct App Engine access to our application. A really good example of the types of services that consume our Techstop info from the corporate database would be an internal overlay we have for Google Maps. This displays all of our corporate locations. And that just happens to include all of our Techstops. So at this point, a technician can come to our new application we have set up, make some changes to Techstop, like the name, phone number, or GPS coordinates. And then upon saving that, it's automatically updated in this Maps overlay. And another pretty good example is the fact that all of our end user documentation, as well as call centers, have access to our new application. So they really don't have to worry about a phone number being wrong or a room number being old. As long as it's updated in our application, it's made immediately accessible to them. So if you take a minute and just look over this graph here. What we're showing are all the edits to all Techstop data that have ever happened. In the blue, we have all the edits to our YAML file. And in the red, we have all the audit logs currently in our data store. I'm going to switch this to a log scale, just to make it a little easier to see. So again in the blue, our YAML edits. And what this is displaying is about 107 edits over the period of three years. In the red, again, are the audit logs. And what those are displaying are about 1,072 audit logs in the period of five months. That's about a 72-fold increase in overall edits to our Techstop data. And this should paint a pretty clear picture about what can happen when you give complete control over the data to the people that use it and depend on it the most. There's a couple spikes in this graph that I'd like to point out. This very last blue spike was actually caused by us. Those human errors I talked about earlier needed corrected in order to import all this data into our new application. So we had to go through and manually correct all this, and that's what caused that very last spike there. But after doing that and importing all this into our application, we were able to release it. At which point, all of our technicians kind of flooded to the site to check it out. And they actually corrected any old, invalid, or missing data for our Techstops. And that's what caused this spike here. And then this last spike was actually caused by us again. But unlike the YAML spike that I talked about, we didn't actually have to manually make all these edits. We were able to send out an email to all of our technicians saying, we're making a global change, informing them of this. And they were actually able to make this change for us. We essentially were able to crowdsource a global modification without us having to do very much. So now that I've talked about Techstop Info and some of its features, I'm going to pass this off to James, who's going to talk about another support application called Unified Travel Manager. JAMES MEADOR: Thanks, Zach. Hi, everyone. My name is James Meador. And I'm also an IT resident at Google in the Internal Technology Residency program. For my operations rotation in this program, I was fortunate enough to get to spend some time with Bob and the Support Engineering team that he leads. The majority of my time spent on rotation was spent designing and developing this application called Unified Travel Manager. Now the design requirements for this application called for an application that had a modular framework and not only eased future development, but encouraged it as well. Before we get into some of the technical details about Unified Travel Manager, or UTM, as we frequently call it, I want to give you a little background information about how we handle travel at Google. Now in a typical IT support organization, if a technician decides to take vacation or gets sick or takes leave, the responsibilities will be offloaded to another member of their team or another team entirely. This can be a particularly large problem for a smaller office that's run by one or two technicians. At Google, we try to mitigate the effects of this by using what we call travel requests. Now, a travel request is something that a technician can open if they have a planned absence. And then another technician can fill that travel request. The same concept kind of applies for events. So like Google I/O, we have a need for technical support here. And what we can do is we can open a travel request, and then multiple technicians can fill that travel request. This can pose some concerns, though. What if more than one Googler or technician wants to fill one of these travel requests? What if a manager thinks that a different technician is better suited to handle a higher-profile event, like Google I/O? This is where we got the idea for the first iteration of Unified Travel Manager. The first version was focused on the application process to a travel request. So if Bob knew he was headed to the Bahamas for a week, he could open a travel application that both Zach and myself could apply to. Bob or his manager would then be able to select which one of us was going to be lucky enough to cover his position. Now the first version of UTM was great. But after the problem analysis that Bob talked about, we wanted to move this application to a no-maintenance cloud solution. And we really feel that Google App Engine was the best fit for this application. The three things in UTM that we want to talk to you about are the modular framework and design, a distinction between application administrators and application developers, and some of the caching strategies that we used, along with an implementation of what we call NDB, or Next Database, which is a new Datastore API. Before we do that, however, I want to give you a quick demo of UTM so you can kind of see what it looks like it and how the modules work. Now what you can see here is the admin panel for UTM. On the left, we've got a manager module and a travel module. In the middle, we've got read access groups and write access groups. And you can see that technicians and support engineering are two groups that have read access to the travel module. On the right, we've got two on/off switches here. And these allow us to enable or disable these modules on the fly. I'm going to go ahead and open the travel travel module here. So what you can see here is a list of the open travel requests. Now I'm going to go ahead and create a new one. Now I'll go ahead and set the destination here to the Moscone Center and set the start date and end date to the duration of Google I/O. I'll set the priority too high because this is kind of important. I'll also change the needed bodies to three because we need three technicians. And I'll specify that this is for Google I/O. I'll go ahead and save this travel request and head back out to the overview. Now if we refresh the page, you can kind of see that it's not showing up right now. But what we can do is we head back into the admin panel and flip this module on and off. Go ahead and turn the module off, refresh the page. It disappears from the nav bar. I'll go ahead and turn the module back on. The other thing we were going to show you in this application is the manager module that you saw in the module list. Now at Google, we handle travel differently, like we talked about. And part of this is we have what we call travel managers. Now a travel manager is somebody that's responsible for the travel coming in and going out of a certain site or region. And these travel managers have access to what we call the manager module in our application. The manager module allows these travel managers to specify who they manage the travel for. So in the UI, they can type in the name of a user and then drag that user to one of the different groups on the site. They can also specify a group or team, like technicians or support engineering that we showed you. And then that'll send off an Ajax request to the back end, which will then return a list of all the users in that group. So now that we've shown you about half of our demo of the application, we want to go through some of the modular design of UTM. Now this is our class structure for the modular overview of the application. On the left, we've got webappRequestHandler. And in the middle, we've got ParentRequestHandler. And on the right, you can see ModuleRequestHandler. Now each of these are subclasses of each other. So Parent Request Handler is a subclass of the webapp Request Handler. And the Module Request Handler is then, again, a subclass of that Parent Request Handler. We think of our relationship between the Parent Request Handler and the Module Request Handler a lot like the relationship between a motherboard and an expansion card. The motherboard, or our Parent Request Handler, is responsible for most of the basic routing and other remedial tasks and utilities that are associated with running the application. The Module Request Handler, or the expansion card, is focused on adding features and functionality, the bells and whistles kinds of things. Now on top of the class structure, what you see is the path that an HTTP request will take through the application. We'll go into more detail about this in a little bit. But the general gist is that an HTTP request flows all the way from the webapp Request Handler down to the Module Request Handler, and then back out to the user through the webapp Request Handler. Now to do that, however, we have to know which modules are responsible for processing which requests. Now this mapping of URLs to methods happens in the constructors for the Parent Request Handler and the Module Request Handler. Let's take a look at what these look like. Now I really like tacos. So we're going to take and build an application here called IOLunchHandler. This is going to take care of the ordering of lunch for Google I/O. Now this is our Parent Request Handler. And it's inheriting from webapp.RequestHandler. This is the constructor for IO Lunch Handler. Now we'll go ahead and call the constructor for the super. And we'll initialize two dictionaries, method and template mappings. We'll then call InitSubHandler. We don't want all of our modules to override the constructor of the Parent Request Handler because in our live application, we've got some additional functionality and processing that happens here. So this is a module called TacoHandler. It's inheriting from IO Lunch Handler. And what we'll do is we'll define this Init Sub Handler method that we just called. In this case, we'll call a method MapURLToMethod. Now what this does is it tells the Parent Request Handler that we're going to process all requests to /tacos with the DisplayTacos method. Now remember the Display Tacos name because we're going to talk about that in a little bit. Let's look at what the Map URL To Method function looks like. This method lives on the Parent Request Handler. And here's the definition of it. We'll use the URL as the key in these two dictionaries that we created in the constructor. And we'll assign a definition of the method in the method mappings dictionary. And the template's file name is a string in the template mappings dictionary. If we have a lot of modules in our application, they can be a little difficult to keep track. So we put together a class called ModuleManager that's going to help with this. Module Manager functions a lot like a dictionary in Python, but with some additional functionality. We'll use this handler's dictionary here to keep track. One of the utility methods on the Module Manager is something called AddHandler. Now what this is going to do is it's going to take the definition of a handler. And it's going to use the handler's name as the key in that handler's dictionary. We'll also create an instance of the handler here and store that as the value in the dictionary. The reason we're creating an instance is because we want to make sure that the Init Sub Handler method is called so that all of the URL to method mapping can take place. Another utility method on the Module Manager is called GetURLMappings. Now what this does is it iterates through the dictionary's keys. And it creates a list of [? tuples ?] based on the URL in the handler that's responsible for processing requests destined for that URL. Now that we have this class defined, this is how we go about using it. We'll go ahead and create an instance of the Module Manager class. And then we'll add the Taco Handler to the class. Now that we've got some data in that dictionary, we can use the Get URL Mappings method here and pass the results of that to webapp.WSGIApplication. This is what tells App Engine what URLs are responsible for processing which request and what handlers those will get routed to. The next thing we want to do is talk about how an HTTP request goes through our application. Now that we know which handlers and which modules are responsible for processing which requests, we can do this. Now whenever App Engine sees an HTTP Get request come into your application, it's going to fire off the Get method on the subclass of webapp Request Handler. In our case, that's going to be the Get method on the Parent Request Handler. So let's take a look at what that looks like. This is the I/O Lunch Handler again, our Parent Request Handler. And the definition of the Get method is right here. Now we'll use those dictionaries that we just populated. And we'll get the path for the HTTP request from the request object. We'll then look up the Get method that's responsible for processing this HTTP request and store that here. We'll do the same for the template's file name. Now we'll call that Get method. We'll store the results of this Get method in template_params. So we're going to jump out of the Parent Request Handler and into the Module Request Handler while we run this Get method. So we just left the Parent Request Handler's call module method portion, and we're headed into the module method portion of the Module Request Handler. So here's the Taco Handler module we showed you a little earlier and the Display Tacos method that I mentioned. Now we'll create a dictionary here with some pretty basic taco types in it, chicken and beef, for this example. And we'll return that dictionary back to the Parent Request Handler for processing. We're then stepping out of the module and back into the Parent Request Handler for some template rendering. Let's take a look at what this looks like. This is still happening in that Get method. So what we'll do is we'll call App Engine's template.render. We'll pass in the file name and the dictionary with the parameters to render to that template. And then we'll render that response to the output stream. Now this design lets us do some interesting things. The first that we're going to talk about is the ability to administratively disable modules. Now we showed you briefly in the demo how we had those on/off switches that allow us to toggle on or off these modules. A lot of this processing and validation happens in the constructor for the Parent Request Handler. So here's the Taco Handler module again. And we're going to assign a module ID of "tacos." we use this ID whenever we store some basic data about our modules in Datastore. Next, what you see is a method called ValidateSubHandler. Now this method is on the Parent Request Handler. And it's responsible for making sure that all of the modules are implementing the module design correctly. The first thing we'll do is we'll make sure that the module has defined an ID. If it hasn't, then we'll raise an attribute error. If we do have a module ID, then we'll fetch our module entity from Datastore based on that module ID. If Datastore doesn't return anything, or it does return a module, but the module's disabled, then we'll raise a NotConfiguredError. This is just a custom error in our application that lets us know whenever a module's accessed when it shouldn't have been. Another thing we can do with this design is implement granular access control at the module level. Now App Engine provides you with 10 application administrators and application developers in the console that you can configure. But we wanted to expand on that. We were already implementing our own user authentication model. So this was actually kind of simple. Let's show you what we did. Now what you can see here is a user in module model. Now the user model has a list of groups because the user can be a member of multiple groups. For instance, Zach is a member of both technicians and support engineering. And a module has an ACL right here. You can see read groups, write groups, and admi groups. Now if a technician needed access to a specific module, then the technicians group would be on one of these ACLs, let's say, for instance, the admin group. But how do we intersect the user's groups with the module's ACL? Well, we use a decorator. So in our decorators.py file, we have a decorator called RequiresModuleAdmin. Now the method that we're decorating that's passed in is the handler_method argument of this method. The IsModuleAdmin is the closure method inside this decorator. Now what it's going to do is it's going to use App Engine's users API to fetch the currently logged in user's user name. We'll then take that user name and fetch our user entity from Datastore. We'll also get a list of the admin groups for the current module. If we have a user entity in our Datastore for the currently logged in user, then we'll intersect that user's groups with the admin groups for the module. If there is an intersection there, then we'll go ahead and call the handler method that we're decorating with the arguments that were passed in. If there's no intersection, then we'll simply error out with a 403. Now putting this functionality in a decorator makes it really simple to use an implement throughout all of our modules. What we can do is any method that requires the currently logged in user to have module admin rights, we can just put the decorator on top of the method definition. In this situation, it's just a ChangeLunchItems method. So now that we've talked about the modular overview and design of UTM and a distinction between application administrators and developers, we want to talk a little bit about our caching strategies that we used in this application. Now who here likes a slow website? That's unfortunate. So slow websites waste people's time. If you load a page, and it takes more than five seconds to load, you've probably already moved on to something else. Now this was one of the main reasons we moved our application into the cloud. We wanted to make sure that page load times were quick. We wanted to make sure that users didn't have to wait forever whenever they were trying to access the data that they needed. And one of the things that we did was we implemented a memcache to store the results of all of our expensive method calls. We also did this through a decorator. You'll see the definition of a cache decorator here with function passed in as the argument. Now CachedFunction is the method here that's going to do most of our heavy lifting. We'll build a key based on the function's name that we're decorating and a list of the ordered arguments and keyword arguments that were passed in to the method. Once we have this key, we'll try to fetch a value from memcache. If memcache has anything stored, then we'll simply return that back out to the caller. But if we don't have anything in memcache, we'll need to call the method that we're decorating, store those results in memcache, and then return that back to the caller. Another thing that we did was we implemented something called Next Database. I mentioned earlier that this is a new API on top of App Engine's Datastore. Now one of the features that NDB provides is something called structured properties. Structured properties allow you to retain some of the organizational benefits that a normalized database gives you, but in a non-relational database like App Engine's Datastore. Let's take a look at what we did. So what you can see here is a travel application model that's based on ndb.Model instead of DB. We'll create a user property here just for the sake of this example. We'll also define a travel request model here. Now we talked about how UTM is focused on the application process for these travel requests. This is actually an abbreviated example of exactly what we do in our application. So we have an approved_applications property on the travel request itself. We'll define this is as an ndb.StructuredProperty that's based on the travel application. We'll tell NDB that this is a repeated argument by setting this to true. Now what this does is it allows us to store data about the application on the request object. This is how we'd go about putting one of these requests and applications in Datastore. We'll create a travel application with the user property set to the currently logged in user. We'll then create an empty travel request. We'll set the approved applications property on this travel request to a list of length one with the application in that list. We'll then put the travel request into Datastore. You'll note, however, that we never actually store the travel application itself. It's an ndb.Model, but we never called the .put method. The reason for this is the travel application is stored on the travel request itself. Whenever you put a travel request into Datastore, it's going to store the travel application information with it. Whenever you fetch the travel request from Datastore, NDB's going to reconstruct that travel application model on the fly. So if you were to have any utility methods defined on the NDB model, in this case, travel application, all those methods would just work. There wouldn't be any problem with getting those methods to map correctly. Now this is a method called GetTravelRequests. This is a method that's in our application that fetches all of the travel requests from Datastore. It performs some processing on those requests. And then it returns the results of that processing back to the caller. This is one of the methods that does a lot of processing and Datastore operations, so we wanted to use the caching decorator on this method. Every time we hit the method, we don't want it to run. So we'll check memcache to see if we have something stored for it. If memcache has anything, we'll just return that back to the caller. But if it doesn't, we'll have to go through this processing the next time we call the function. The other thing I want to show you is a feature called hooks that NDB provides. These aren't available in Datastore, in the original DB API. But NDB provides these for you. Now this is a travel request model. And we're going to define what we call a pre_put_hook. Now NDB's going to fire off this method before you perform any Datastore Puts on a travel request in this situation. For this example, we'll just call memcache.flush_all, but we could really do anything we want here. Zach showed you how we used audit trails in Techstop Info. We could actually use that same logic here. Now these hooks are available for all of your Datastore operations, your Puts, your Gets, and your Deletes. And you can specify the hook to run before or after those operations. Now implementing all of these caching strategies led to some pretty astounding results. Now what you can see here is worst case cold cache load times for a reporting page in our application. Now this reporting page fetched a lot of travel requests from Datastore. So it did some processing on this as well, which is why the DB load time is at 21 seconds. Now after implementing NDB structured properties, we got this down to about 1 and 1/2 seconds. The reason for this is all of the travel requests in NDB now store all of the travel applications on the travel request object itself. So with DB implementation, the relationship between travel requests and travel applications was that of a parent/child relationship. So any time we'd fetch a travel request from Datastore, we'd have to go and fetch the travel applications as well. With the NDB implementation, this was a lot quicker, thanks to structured properties. Now we will note that if you were to switch directly from DB to using NDB without implementing things like structured properties or any other caching, you might notice a slight performance decrease simply because NDB has a little bit more overhead for the new features that it adds. Now that we've talked about some of the caching strategies for UTM, and we've talked about the modular design of the application, I'm going to pass it back off to Bob for some results and conclusions. ROBERT PUFKY: Thanks, James. You know, they say live demos never work, and I think we just proved that. So I want to just wrap this up and talk about the results we got from moving our application stack to the cloud and the benefits that our team got from it, as well as our organization. And the first thing that I want to talk about is Platform as a Service, and how it actually really works for us. It's really a paradigm shift in application development for us in the sense that we're focused on actually developing and solving fundamental issues and innovating in our space, as opposed to developing and maintaining stuff at the same time, We're focusing on the people at hand and the problems and not focusing on keeping stuff up and running. It's also no longer part of your core tasks to keep these services running. You don't have to worry about stuff going down. We also found out as we started moving more and more to the cloud that we had a lot of really intrinsic benefits to this as well. There was a definite hump when we got about partway through our cloud migration that we spent a lot less time late nights and weekends at work. And this was obviously maintenance, putting out fires like we were doing in the past. All that was gone. So we actually had a really large morale increase on our team. As well as it was just a better work-life balance. People would come in, and they'd know they would work Monday through Friday, nine to five. And they could go camping on the weekend and not worry about carrying a pager or worry about something going down, they needed a failover. And this was pretty huge. And really, at the end of the day, guys, it was about our end users and making sure that they were up and running and being productive as possible. And the cloud platform, App Engine, allowed us to do this via Developers by allowing us to focus on the problems at hand, allowing us to instantly scale to the world, as well as allowing our techs and enabling our techs to have the information at hand to make empowered decisions to get people out of the Techstop faster and being more productive. Additionally, our users, Googlers, are happier as well because they can actually get their work done. They don't need to wait for problems to be solved or information to be had. So that's it with the presentation. We have about a minute 30 left, I think, before we're switching sessions. So thank you guys very much for attending. I really appreciate it. And-- [APPLAUSE] ROBERT PUFKY: If you have any questions that we can't answer during the Q and A, since it's going to be so short, just come and find me, and we can talk about it.