How to Design, Build And Run a Cloud App with Mandy Waite

MANDY WAITE: We're going to be talking for about an hour or so on how to design, build a [INAUDIBLE] application. We're going to use an example application, one that we wrote specifically for a specific purpose. I'm going to walk through the kind of design decisions that we made when it came to build an application. Why we made them, if there are any other drawbacks, any other pitfalls, and just basically drill down into the application architecture. And so how many of you either went to Google I/O, or went to Google I/O Extended? OK, a few of you. OK. So those of you that did see the talks. And if you saw the cloud stuff specifically, the keynote and the track that contained the cloud specific talks, you may have seen this application called, WalkShare. So what we decided to do for the whole event was to have one application as a theme for all of the talks. Well, at least half of the talks. Some talks didn't really go anywhere near WalkShare, where they could've done. So a couple of months before Google I/O, we took the decision to build this application called WalkShare. Anybody guess what it does? Yeah? So basically, it gathers data, GPS track data from your phone as you're walking along, as you're doing a fun walk. And at the end of the walk, you can then upload it to the internet and share it with your friends. Your friends can then comment on it. This is an Android application. We don't have iOS support. We were demonstrating the features of Android specifically. But we also had a web front end as well. And this is the web front end. This effectively allow people to go and look at walks and make comments on the walks. So here's an example here of a walk. Sorry, can we switch to the demo? So this is an example walk. One of the guys who wrote this did this walk. This is the Googleplex. And basically what we're doing, as the walk progresses, we're recording GPS tracks, GPS positions at certain places, at certain points within the walk, then uploading them to the cloud. Then we have this web user interface that can actually display the walk. And what it's doing is effectively looking at the lat-long of the GPS track, and finding the appropriate Street View image to effectively show you what the walk was like using Street View. So that's the Googleplex. You can always make comments, and so there's some comments on here. And I can add another comment. Let's say Mandy. And, hey, that was a long walk. And just to highlight something-- I meant to delete this comment earlier. But if I put a smiley face in there-- there's a bug in our code-- and hit Return, I get double smilies. And this is a Unicode conversion problem. There's a bug in the code that we'd really like to fix at some point. So I've added comments. Can we go back to the slides, please? So that was a demo. So we basically have two major user interfaces for our application. We have a web application that allows us to view walks, and we have the Android application itself. So obviously we need some point, some place where we can actually share data between these two instances of the user interface. And for that, we need to supply an API. For a traditional web application, we don't actually need an API. For a mobile device, you really do. But in this case, we're using the same API to read the same kind of data. And in the best practices of micro services. And we'll talk more about modular application design later. We chose to make a modular application. And the modules were Tracks. This is effectively gathering track data from the mobile app, and also pushing that track data out to the web user interface. A Comment module that allows to handle all the aspects of commenting, and a Leaderboard module, because we wanted to display the top walks-- or not necessarily a top walks in this case. I'm not quite sure why we designed it this way. But we basically have the top users, the ones whose walks had the most comments. In order to make this work, we need persistent storage. We need persistent storage or free modules of the application. And we also need temporary storage, which we'll see later. So we're going to go into architecture in detail. Quickly running through the agenda, this is what we're going to do. Architectural decisions, plan for getting big, scaling, running the app in production, which we may not get to. This talk generally runs for about 75 minutes. So we're going to drop something. So I'm going to try and be a bit dynamic in the way we present it. And then finally, I'm definitely going to go through the coming soon stuff. So architecture decisions. This is our shopping list for our WalkShare application. We wanted somewhere to store the data from the mobile device. So it had to be persistent, and it had to be accessible from the mobile. We also needed a modular, autoscaling, frontend, and API. We needed things to be easy to develop. After all, we work at Google. We're really busy, and we don't have a huge amount of time to spend developing applications at Google I/O. And also, because it's likely to break during the keynote, it needs to be very easy to maintain so we can go up there on stage and fix it and look really cool. We also need robust commenting system. We'll talk about that more shortly. And we also need a database for temp/summary data. And again, the reasons why we need another database for storing temporary data, we'll talk about later. So the first question is, where do we store our walks? [INAUDIBLE] these GPS tracks. So for this, on our shopping list is storage that's available to us from a mobile device. So we can upload data from a mobile, and access it through the rest of the application. So for this, we chose Cloud Datastore. And the Cloud Datastore, we're going to get into the details of why and what the Cloud Datastore does shortly. Why did we choose the Cloud Datastore? So this is the Cloud Datastore. It's something that we use internally, and is exposed externally as a service. We're going to get more into the details and the history of that in the next slide. It's effectively an NoSQL Store, a key value store that can scale massively. It has ACID transactions, support for ACID transactions, strong consistency on reads and these things called ancestor queries, where we can group data, or make queries along that group of data, and get extremely good performance, and also strong consistency across-- which is where our temporary data problem comes along across these entity groups, these ancestor groups, we have a bit more of a problem, which is why we need a temporary database. It's completely schemaless. You don't need to think about the underlying data structure, although, our abstraction libraries. So we have the Java, which includes JDO and JPA. Might force you down that route anyway. And you can also use Python as well, and you can also create models from those abstractions. So you don't have to have a data model, but you can through the abstractions. And as mentioned, it scales automatically. You don't have to do anything. There's nothing you need to do in terms of provisioning infrastructure, just like with BigQuery. It's completely managed for you shard and replication all taken care of for you. And there were some shortcuts we might mention later that might help you with that. But generally, you don't have to worry about scale at all with Cloud Datastore. So back in the day, we invented this thing called, BigTable. We published a white paper on it. And BigTable is this massively scalable NoSQL data store that we use internally for services such as Gmail. And when we built App Engine to service our needs for something that would serve web applications internally within Google, we decided to build App Engine around BigTable-- added a layer on top of it. Since then, another layer has been added in between those layers. But we have BigTable, Megastore, and then we have the Datastore on top of it. So we're effectively accessing this BigTable through Datastore, and that was tied to App Engine for quite awhile, for about five years. But now it's available everywhere as an API. So we've opened it up completely, no longer just requires App Engine. You can access it as an API-- either a restful API, or a protocol buffers API. One of the great things about it is if you run a query on a data set within the Datastore, if it's 100 megabytes, you're running the same query across 100 megabytes or 100 gigabytes. It will take the same amount of time to come back to you. So the data set, the size of data, it doesn't matter when it comes to running queries. Data, of course, is replicated across multiple data centers in a region. The stage is creaking. And you can use it from any application and language. It's a restful API. We have client library support for most of the major popular programming engines. At the time of writing, and it's probably increased since then, we were serving about 4 and 1/2 trillion requests per month with the Cloud Datastore. What about other situations that might arise? What about alternatives? So if you have an application-- and this planes well-- decision making process, we were never going to build this with SQL. But if you have an application of your own that already uses SQL, and you want to pull it into the cloud, and run it in the cloud, and make use of managed Cloud SQL resources, then you could do this with Cloud SQL. Cloud SQL is effectively MySQL in the Cloud. It offers you managed instances of MySQL, up to 16 GB of RAM, and 100 GB storage, and fully managed for you. All you have to do is create a databases and add and manage your data in the form of tables. Again, data is replicated in many geographic locations within a region, and all the failover handled automatically for you. And we have a scheduling mechanism for backups as well, so we can do backups for you on your behalf-- schedule those backups via user interface. It also is very easy to get access to, easy to migrate data from and to Cloud SQL. It support things like MySQL dump. It supports the MySQL wire protocol, which means you can use the MySQL client. And it supports JDBC. So it's just like MySQL, except when in the Cloud, [INAUDIBLE] is managed for you. It has a flexible challenging model. So you can either choose for pay per use, pay as you go to effectively with MySQL, of you can choose a package option. It's available in EU, US, and Asia data centers. And if you're using App Engine and you want to use a Cloud SQL, you will co-locate your MySQL instances with the application-- either be in EU or US. We don't have App Engine support within Asia currently, although you can run AccessApps from App Engine in Asia. Not in China though. So what about files? What about large blob data? Images, videos, backups, those kind of things? So for that, we have cloud storage. And cloud storage is a bucket based storage, an objects store. It allows you to store pretty much any amount of data-- up to 5 terabytes per object. It supports full object version. So you can turn versioning on and support multiple versions of your objects, keep a history of them. It supports notifications. So you effectively register notifications on a bucket and you basically say, whenever something changes within this bucket to the data in it, data's added or updated, I want you to fire off a request or a web hook, a URL that we can then do something with. Might be an App Engine application that handles this URL for us. In fact, if anybody saw my Google I/O talk, that's exactly what we did for my demonstration. It also, if you have 5 terabytes for object potentially, you really do need resumable uploads and downloads. It has a free 9s SLA through things like high availability, georedundancy, the full replication system that we've talked about already with Datastore and cloud SQL. It also features strong read after write consistency for objects. So basically, if you add an object or change an object, and that completes, anybody who then subsequently reads that object will see your changes, guaranteed. They'd read that object specifically, always guaranteed. And that's pretty powerful for this kind of object store. It's not the same for listings. If you do a list on the bucket, that will be eventually consistent. So you may see different results if you were just listing. But it's strongly consistent for object updates. And as with all of our services, data is encrypted at rest, and we give you all of the tools you need to provide access to your data. You can make data in cloud storage publicly available. You can even deploy entire websites on cloud storage using static content. And I need to drink some water. I'll be drinking water quite constantly. Oh, gone past that. So that solved our first problem. We've chosen the Datastore to store data from mobile. And how do we share walks? So for this, we need this modular frontend and API, a comment missing. There's a comment missing from this. Unfortunately, forgot to update it. A modular autoscaling frontend and API. And we also want it to be easy to develop and maintain. So we chose App Engine. How many of you have heard of App Engine? Yeah, quite a few of you. So App Engine, we love App Engine at Google. Why do we love App Engine? Let's talk about App Engine. So App Engine effectively allows you, as a developer, to focus on developing your code, developing your application, and not have to worry about all of your stuff like this-- building software stacks, provisioning machines, installing databases, installing software stacks like LAMP stamps, that kind of thing. It supports managed software stacks. And these are fully managed software stacks for Python, for Java, for PHP, and Go, with more on the way. With App Engine, one thing we always talk about is Autoscale. App Engine's Autoscale was legendary. I'd like to call it best of breed, but probably the marketing people would jump on me if I said that. But Autoscale is what App Engine is really famous for. It can scale out very rapidly, and scale down very quickly to make sure you always have the resources you need when you need them, but you only pay for what you use. It's also very easy to develop, and that was one of the things on our shopping list. We want it to be easy to develop. It's free to get started. You don't have to enter any credit card information at the moment to use App Engine. And you get some quota-- a reasonable amount of quota per day, and it cycles every day that you can use to run your applications. So if you have a small application you want to run, you can run it on App Engine free of charge. You also have the ability to build and test locally. And we'll look at that in more detail. And again, it comes back to the whole point, you see, you can focus on developing your application code, and not worry about building software stacks. Trivial to manage, that was another item one on our shopping list. It's fully managed, all of these instances that run your code, or all these software stacks. We applied appaches, we apply all of the updates. And they're maintained 24/7 by the guys that Julia mentioned earlier, these SRAs, Site Reliability Engineers. So yet we don't need to worry about that. And you can handle these things is in demand for Autoscale. So the different traffic patterns that we see on a regular basis, things like spikes in traffic, which is the bottom one here, computes the linear scalability, the exponential scale, logarithmic scale. It can handle all those kind of changes. And most importantly, even if you have an application that has periods of inactivity. These could be on a daily basis, maybe at night in the location where you're running the application on a weekly basis, a monthly basis, on a yearly basis, maybe in summer holidays, then you won't be consuming any resources. So you won't be paying for anything, which is really important, and a huge difference from when it comes to buying in premise servers and putting them next to your machine, next to your desk in your garage, or buying space in a co-lo, or something like that. So App Engine is extremely flexible when it comes to handling changes in demand. Local development environment is also really important. This is the ability to really build and test your application before you ultimately deploy it to production. And we use the Cloud SDK for this. Example code, there's gcloud app run. You run this on your laptop or on your desktop. It will run the application locally. It effectively runs App Engine locally, including the Cloud Datastore, which we looked at earlier. So it has its own data store. When you shut the environment down, start it again, it had the same data stored in the Datastore. So you can test, iteratively, your entire application. And once you're done, you can deploy it to production using the same command. In this case, it would just be gcloud app deploy, with a dot, which is the current directory. And there's some example output as well from the console showing us the Dev app server, which is what we call it internally. It's not running. App Engine architecture, I'm not going to spend too much time on this. But it's important to realize we have a pending queue. And we have instances that are spun up to handle requests coming into that pending queue. And we have a scheduler that will monitor the pending requests and spin up new instances on demand, or remove instances when there is no demand. And we have other services like Task Queue, which we'll look at shortly, and the Datastore, which I mentioned, and memcache, which we'll also look at shortly. We have another interactive version with this diagram coming up soon as well. So we did talk about this earlier, the touch on the idea of modular development for applications. And this is also really, really important as well. This allows us to effectively take a large application, a large monolithic application, and allow us to factor it out into small logical components called modules. And modules are a top level of App Engine. They can be in any language. WalkShare, for example, was developed in Python, Go, and Java. You add free modules, the free modules you saw earlier, and they were written in three different languages. So if you have guys who are specialists in Go developing backend code, they can be developing the Go application as a module. And other people can be doing the front end stuff in Java or Python. And all of that stuff can be merged into one application. Modules have versions. So we can deploy versions of the module. Different versions-- versions 1, version 3, the test version. And we can switch between those modules, as we'll see shortly. Modules also can be backed by Compute Engine virtual machines. And I think Julia spent a lot of time on Compute Engine earlier, so you know what they are. We'll talk about them more shortly. But basically, the module will be an App Engine module. But instead of being backed by these traditional App Engine instances, they will be backed by Compute Engine virtual machines. Modules can share state across the entire application. So each module has access to state stored in Memcache and Datastore. They have their own performance settings, their own versions. They can be deployed and updated independently of each other. So we need an API. And that's the whole point of this one. In order to share walks, we need an API. Without the API, the application would be stuck on the phone. We couldn't actually share that data with the other front end, the web user interface. So we need to build an API. So in this case, we chose to build an API from scratch. And it looks something like this. So this is a standard kind of restful example of an API. This is one call, obviously. This is so comments-- and effectively, for this user, called userbob, we want to see walk 1. And basically, our API will suck in JSON, and return JSON. Everybody know what JSON is? I guess you probably do. So here's an example of the output from that request we call gets on that URL, which is a restful URL And it returns back a JSON object representing that data. And I think we can show a demo now. Can we switch back to the laptop? OK. Just want to refresh this. I'm in Chrome. I'm in the Chrome developer's console, and what I'm interested in is looking for calls to the comment API. I was going to refresh that and let it load the whole thing. But because we're using the filter, we can see-- our call to comment/userbob/walk1. That's what it looked like. And response was JSON back. So again, we could be actually updating a comment or adding a walk. And in that case, it would be post request with a post body. We could be updating a walk using put requests, those kind of things. So it's a very simple API. We can also do tricky things like this as well. I'll get rid of this. Oh, help. What happened then? Yeah, I don't want to report inappropriate street view. This is the Googleplex, nothing inappropriate about it. So we have an interface that allows us to show you the output from the restful cause. So this case, we made a call to comment userbob, walk1, which we saw an example. And we can also look at the actual route as well. And as the GPS coordinates for all steps in that walk. So these are the GPS traces that were sent back. So this is just a useful way of querying the API for us. OK. So you can go back to the slides? So building your own API. And I'll talk about why we built our own API shortly. But you don't have to build your own API. I think Jerome already mentioned it, Cloud Endpoints. We have this thing called Google Cloud Endpoints. Allows you to simplify the whole process of building an API, and exposing it to clients, clients such as mobile devices, to gaming applications, to web applications. And basically, the whole idea is-- well, there's two approaches to doing this. That's kind of free, but the two we're going to talk about are. You can take your existing client interface application code which may be part of a web application that you have already. And you can decorate it, add decorators to it in the form of annotations. And we have an example coming up. And that will effectively instruct App Engine and the underlying cloud endpoint's run time. Do you want to expose this method? And again, I'll look at the example shortly, as an API call? It's very, very simple. Alternatively, you can take a model class, like a comment or a walk. You've already defined a model. And it doesn't have an API for it. But using tooling-- it's available in things like Android Studio, or Eclipse. You can basically generate an endpoint class from that model. So all you have to do is right click on it, say generate endpoint class, and it will generate the class for you. And this will allow you to perform all of the standard type restful calls on that resource. So I think of it as a resource. [INAUDIBLE] a comment or a walk. You can do lists, you can do gets. You can do puts. All of the kind of things you would do with a restful API. And this is all generated automatically for you. You don't have to write any code at all. The APIs that App Engine creates are effectively implemented on top of the exact same infrastructure that we use for our own APIs. So for maps, and for YouTube. Completely discoverable. And obviously, you get all of the kind of scale that we have for our own APIs. We also provide tours that are required for generating effectively mobile optimized client libraries. So effectively, we have an API here. We want to be able to consume that from an application running on a device, or running a web app. What we need is a client library, something that understands how to envoke that API. And we can generate that straight from the application code, straight from the end points that we generated. And we have tools for Android, iOS, and web, and these are available in things like Android studio and Eclipse, also for Objective-C on a command line, and also for JavaScript as well. Because this is going through Google and the Google front end, we have full out of the box denial of service protection. It supports OAuth2 as well, authentication, and also supports client key management. As an example, I kind of hacked this together earlier, because my example wasn't WalkShare. So this is basically a WalkShare method, say, get comments that would return a list of comments. And what we've done is we've annotated it. We've annotated the class initially. We're at API. Name equals comment is effectively saying, we want this API to be called comment. We want the version to be v1. And then we've annotated the method, a bit of indentation problems there. And we've said, we want this to be a get request. That could be inferred by the system from the fact that this is a get request. So we don't have to actually put this there. But we did anyway just to show it and highlight it. And also, the path for this method, what it will be when we make a REST call to it. Then we can also annotate the parameters of the method call. In this case, walk ID and user ID. And then the body of the code, effective just some kind of query on a comment system, and returns back matching results. And in the comments there at the bottom of the first box is what the REST call like, slash comment slash v1 userid, walkid. And then after we've generated a client library for this, we could then consume that from Android. Basically we have to get a version of the endpoint. And here, we say service.comment endpoint. This is available from the client library call get comments, passing the userid and the walkid, which we've gathered from the user interface, and call execute. And that will get the results back for us. So why didn't we just start with endpoints? Well, we talked about managed virtual machines a couple of times already in the keynote, and also in Julia's talk. And when we started developing this application, we wanted to use managed VMs. But they didn't have all of the plumbing because it was still early in their life cycle. And they're not fully released yet, they're still in limited preview. They didn't have all of the plumbing that was required to actually support Cloud Endpoints. So we couldn't use them. So we had to actually build the API from scratch by hand. Sorry, it's been a long day. What about other situations? So we didn't use Memcache for this application, but we could do. A Memcache is extremely important for most people. It saves having to do those expensive wreaths from the backing store, be it the Cloud Datastore, or be it Cloud SQL. You don't want to go back and get the same data all of the time once you've already have it. So you store it in Memcache. And here we have an example of three applications, all accessing a shared Memcache. This is a Memcache that's shared across all applications run in the region. So we have a shared Memcache for applications one in the US, the shared memache for applications running in the EU, in Europe. And it's a large shared cache, and we make it bigger, the more applications we have. But it's still the possibility that your data maybe be evicted by somebody else's data. So there's no predictability or deterministic nature about it. We also have an option called dedicated Memcache, which allows you to reserve specific size of Memcache for your application specifically. So it's reserved by per gigabyte, and you pay per gigabyte. So this would effectively allow you to control the way data is evicted from Memcache. Task Queues-- how many people have heard of App Engine Task Queues? Because most people loved them. So somebody here? Yep? So these are really popular. Now, people love Task Queues. Whenever anybody gets involved in App Engine development, they love using Task Queues. What it effectively does is allow us to make incoming requests from users. And then because we had to respond to the user at a reasonably quick time, we don't wait 60 seconds. There was a deadline on respondencies as of 60 seconds in App Engine. We don't want to wait that length of time. We may need to do stuff in the background. So what we can do is while we're processing the request, we'll generate a task to say, do this stuff. Put it onto a Task Queue. And then we can go back to the user and say, we're done. So this could be generating email, processing a tax return form, or something like that. Something complicated. And the Task Queue, effectively, has tasks which are web hooks. And they're handle by other App Engine instances, instances specifically written to handle those requests. And these can run for longer than 60 seconds. They can run continuously, or they can run for a longer period of time, but not necessarily 60 seconds. And what they can do is process all of these jobs outside of the user request in the background. They can access all of the services, other instances an App Engine can run, things like Cloud Datastore, Google Cloud Storage, email. We have a mail API, and other external APIs, APIs external to Google. So this is a really effective way of actually managing background processing on App Engine. But it also allows us surplus workout to things like Compute Engine or to anything else. We can have Compute Engine reading tasks from these cues, and slightly different type of cue, and process that data offline, outside the user request, but also outside of App Engine. You could also build the whole thing yourself on Compute Engine. But why would you want to do that? We'll talk about Compute Engine more shortly. So that's covered the next two bullet points. Let's go onto the next one, which is comment SPAM. So I've never really built a web form recently, but I know what the horror stories are. As soon as you create a formal on the web and you put it onto your website, there are spam bots out there that will say, ha! A form! And they would go in it, and they would fill it in to their heart's content. And you will have spam, lots and lots of spam in your comments. So generally, people protect their comments, their commenting system either via requiring to use a login or using captures. So we need a robust commenting system. So we use captures. Who likes captures? You like captures? Right, nobody likes captures. But they are effective, particularly when your user has not logged in. But captures actually require you do some image processing. And as Julia has already mentioned, image processing on App Engine is a little bit difficult. We don't have the libraries in place [INAUDIBLE] to make it work. And for our Java component of our application, a Java module, we didn't have the java.awt library-- java.awt! My colleague added this slide, and I hadn't even really given thought to java.awt for a long, long time. [INAUDIBLE] back in my past. But you can use java.awt to process images, image data. So App Engine doesn't have this library. It's been removed. It's a big overhead, it's a big chunk of stuff that we don't need. So what are we going to do? We'll use managed VMs. So we talked about managed VMs already quite extensively. Laptops gone off. And I'm not going to go into anymore detail. But basically, we can use managed VMs to effectively break the glass, as Julia said. We can break the glass and say, this won't run on App Engine. We can run this on Compute Engine, that's fine-- modular application design, no problem at all. We can just have to managed virtual machines running that aspect of our application. And the capture system looks something like that. Let me show you what it looks like in real life. Can you go to the laptop, please? OK. So going back, I actually have a better walk we can work with. I kind of recognize that place. I love Times Square, I really do. I don't know why I love Times Square, but I do. And we're going to tragically run out of time, aren't we? So in a moment, with comments, we don't have any captures, right? We've already looked at this. So we click on this. I can just type in-- so anybody can come along and make comments. They can put URLs in there to send me to places where I really don't want to go. So I want to stop this. So how do I do it? So fortunately, and also covering this issue here with the two smiley faces, the tech-op guys have just called me on my secret earpiece here to tell me that they've just deployed version 2 of the application. And we can actually just switch over to that version, and it will fix our problem source, add captures, and also fix that problem there. So what I need to do is find my WalkShare stating application, go to Versions. And after a little while, nobody demos on Wi-Fi. Really-- seriously, never do demos on Wi-Fi. So at the moment, we can see for our default module-- and again, we have a notion of a default module. We could've given this a name, but we're using this as the default. This is the one that handles our comment stuff. And at the moment, the version for that is version 1. I can also look at the comments one, and that is also going to be version 1-- think that's about it. It's the Wi-Fi. [INAUDIBLE] where are you? If you see somewhere. So we could see version 1 there as well. And the tech-ops guys have said version 2 has actually got support for captures, and it fixes our problem. So I'm going to make version 2 of the comments module the default. I was thinking about it, that's done. I'm also going to do that for the default one to fix the other problem. Change that. Make default. And the user interface has a problem here, because I even saw my colleague, Brian, do this as well. Your first instinct is to go to the Delete button, and I don't know why. But there's something about the user interface and the way it's designed that attracts you immediately to delete, and so we need to fix that. So now, we can go back to our application. We want to refresh that, because we want to see the new version of the application. So let's go onto comments now. We'll add a comment. Ah, OK. So my name is Mandy, and this is a test comment. Aw, can't you think of anything better to say than test comment? And Android. Then I can post my comment. And if I don't answer it correctly, I do ***, more foo. It won't let me post. OK, so now we've protected it. We can also check as well to see if we've fixed our issue. And post a comment. Oh, I didn't do-- very interactive. I really can't type. Obviously, the implementation would have a different capture every time, [INAUDIBLE] might be quite easy to learn. So now you see it. We've fixed it. We now have captures, and we've also fixed the bug with the smiley face. And in the actual WalkShare demos that we did during Google I/I, we actually showed them going live into the code, and fixing that bug, and updating it while they're in production, which is great. That was really clever. That's not really available yet, but it will be soon. So can you go back to the slides, please? All right. So we're getting there. We have a robust commenting system. Now what we need is a database for our comments leaderboard, a database, effectively, for temporary and summary data. Now, we've already talked about this problem that we have. We have data that's like [INAUDIBLE]. It's stored in tables, it's stored as rows in databases. When we want to do things across an entire data set, would count as specifically, becomes a bigger problem, a much more difficult problem. And the Datastore's not ideally suited for doing those operations. It is possible to do them. We do have things like sharded counters and suchlike. But we wanted to do something more interesting and really stood out. How many of you have heard of Redis? OK, a few of you. Good. And we're going to run Redis on Compute Engine. So we can't run Redis on App Engine. So we can easily run it on Compute Engine and just have the two talk to each other. So effectively, our leaderboard will show us which users are getting the most comments on their walks. [INAUDIBLE] time, I'll show you what leaderboard looks like. So why did we use Redis? So using the right tool for the job. Redis, for those of you who know it, and for those of you who don't know it, is effectively an in-memory key value store. It's extremely powerful, extremely fast. One of the great features that make it stand out is the fact that values-- keys can be big things. They can be images, long strings. But values can be objects. They can be structured data. They can be sorted sets. And having a sorted set for a value for a leaderboard is perfect, exactly what we need. We can make updates to that object whenever we need to, and we can store and extract it whenever we want to back from Redis. So Redis is perfectly suited for this particular job. So why Compute Engine? Let's quickly go through Compute Engine. I think we've covered it quite a bit today. So Compute Engine effectively gives us compute resources, virtual machines, networking, persistent discs-- all of the other things you need to wire them together, creating virtual private networks, that kind of thing. It's supported in US, Europe, and Asia zones. It adds this amazingly fast software defined networking based backbone that it runs on, which is exactly the same network that we use for our own stuff. It features sub-hour billing. Permanent billing, I think Jerome covered that earlier. There's no iOPS charges for block storage. So with some providers, you actually have to pay by the iOPS, as well as the size. For us, we only charge you by the amount, the size of your position disk, and the size of your blocked storage. And we don't need to provide virtual machines for load balancing. So again, in some cases, you have to provide a virtual machine. With HAProxy, all this can be automated for you. But you have to spin up a virtual machine to do load balancing. And if you want to go big, you have to have lots of virtual machines to do the load balancing for you. That requires a warm-up period, so it could be quite expensive to warm those up. For us, we don't have to do that. We have effectively what's called cloud native load balancing. Load balancing is inherent in our infrastructure. And so it's very easy for us to load balance across a huge number of instances. And I have an example of that coming up. Consistently fast, very quick VM provisioning-- less than 20 seconds or around 20 seconds for a single instance. Probably about less than two minutes for 1000 instances. Consistent performance-- now this is extremely important to us with our services, things like YouTube. Really when you're watching a YouTube video, you don't want the fact that somebody else is watching a YouTube video to affect your experience. So it's extremely important for us to actually prevent one person's activity from impacting another. And we get that with cloud platform. Also, cloud native load balancing-- want to mention that. This is the architecture of a Compute Engine project. I'm not going to go into this. We support Debian, CentOS, virtual machines, but also for paid options, we have SUSE, Red Hat Linux, and in limited preview we have Windows. You can also run things like Core OS, SE Linux, anything you want to you can run, but you may have to build your own image. Shared cores-- up to 16 shared core instances, instances that share cores. You'll have multiple of these instances sharing a core very cheap, very, very cheap. Perfect for experimenting or just a weekend of hacking. Anything up to 16 cores for our biggest virtual machines. After 60 gigabytes of RAM, I know this is what we offer. We have free memory configurations-- high CPU, standard, and high memory. Standard has a certain amount of memory for CPU. High memory has more. I can't remember what the exact numbers are currently. And high CPU has less. So high CPU is perfectly suited for compute intensive workloads, and high memory is perfectly suited for memory-hungry workloads. Persistent Disk-- again I'll have to sacrifice some things to be expedient. We only have about 15 minutes left. But Persistent Disk, we have standard disk or SSD. You can effectively attach this on demand, if it's a data partition, to your virtual machine. So you can attach it to one virtual machine, disconnect it, and attach it to another in read-write mode. Or you can even attach it to multiple virtual machines in read-only mode. Consistent I/O performance, and I'm going to skip forward there. You can see the bullet points. If you have any questions about it, come and see me afterwards. Local SSD is something that's new. It has some commonalities with our standard position disk offering-- per gigabyte pricing, no I/O charges, performance is consistent, office encryption [INAUDIBLE]. Everything is encrypted at rest, and live migration, which we'll talk about shortly. Kind of jumping the gun on that. The difference is in terms of reliability, local SSD has nothing to offer. The storage redundancy, checksums, and snapshots that persistent disk offer are not available. You have to do all that yourself. But what you get instead is sub-millisecond latency. So local SSD is like having disks attached to your machine. And it's very, very, very fast. And in terms of flexibility, up to 10 terabytes of persistent disk, or one up to four 375-gigabyte partitions attached to any virtual machine for local SSD. And again, if you want to know more, come and see me afterwards. Networking-- standard networking stuff, TCP, UDP, static and ephemeral IP addresses for public interfaces. Ephemeral IP addresses for internal virtual machines, but we have this thing called automatic DNS, internal DNS, which effectively allows you to address virtual machines by their name, and other things like firewalls and such like. I'm going to skip ahead, and I apologize for skipping ahead. But I want to go ahead and get as much as possible done. Live migration-- in the old days, when I first joined the cloud team, we had to explain to people why every now and again we would take a zone down, a complete zone like us-central1-a. And you would have to move all of your stuff. We would have to take all your virtual machines, all your persistent disks, and move it from one zone to another while we took this zone down. What we would do when the zone was down, we would update the firmware, update the software, we would get a broom, and we would go and sweep out the mouse crap from the racks. All that stuff, and we would have a lovely shiny zone at the end of it, and then you could but all your stuff back again if you wanted to. These days we have live migration, so we don't do an entire zone. We do part of a zone at a time. So what we do, we schedule a maintenance event on part of a zone. Anything that's in that part of a zone-- virtual machines or persistent disks-- we move. We move it to a different part of the zone. It's protected from the [INAUDIBLE] then. And we effectively do a memory copy. We do copies of all of data, and you don't lose anything. Effectively, for most people, what you're going to see is an entry in a log file saying, we moved your stuff. For latency-sensitive applications, maybe like mediastreaming, it may be possible that this brief interruption, very, very brief interruption-- two milliseconds, sub-millisecond even-- may be important to you, and we do offer you the ability to take your instances down and move them. GCE and Docker have been covered. But again, we can talk about that more later if you want to during the drink stuff, which I'm looking forward to. Should be fun. GCE is a great place to run Redis. So we've already established we want to run Redis. We've already established we want to run our Compute Engine. So our leaderboard application is going to make use of temporary storage. And that kind of looks something like that. When we have Compute Engine with all register deployment, ultimately you'll have an option in Compute Engine in the developer's console to deploy everybody's clusters. Just click to deploy. At the moment, you have to do that configuration yourself, but you can use scripts and other things as well. Other things aren't completely available yet or available in limited preview that I really shouldn't talk about. But we could talk about that later. But it is quite easy to provision a Redis cluster currently on Google Compute Engine. It's just that those tools that we have are not completely available yet. They're in limited preview, and we're not really supposed to talk about limited preview stuff. So that's what a Redis cluster looks like. Sorry. I knew this was going to be hard. And we have millions. We should always have millions. And naturally I always have my own millions as well. So we've solved that problem with Redis. We have a database for our template summary data. That's cool. And we're done, really. That's the architectural decisions. These are the kind of crises we went through when designing and building this application. We didn't cover Android or anything like that, and we didn't cover the libraries we use for the user interface, which were, if you noticed based on material design, what we announced at Google I/O is in Polymer. What about planning for getting big? This is something I know quite a bit about. So back in November, I worked on this project. In fact, I led this project to build an application for One Direction. How many of you have heard of One Direction? So I get this all the time. You've all heard of One Direction. You just won't admit it. Right? But they're a huge band. They're a boy band. They're huge. They have 17 million Twitter followers when I last checked. They are one of the biggest social media entities on the planet. And on the 23rd of November, they did a seven-hour live stream on YouTube. And they wanted, or Sony Psycho Music wanted to have a second screen application that would allow people watching live stream to interact with the whole thing. They were basically asking questions, six questions every hour, 42 questions in total. And they would get rewards, badges, and such like for getting questions right. And they could share it with their friends and say, hey, look. I got this badge. You didn't get it. Ha ha ha. But we estimated that probably about 620,000 concurrent users would access this application, which is quite a large number. This is what it kind of looked like. We saw-- again I don't want to go into the details-- again, if you look at my I/O talk on making your cloud services Google fast, we go into more detail on this. But basically we saw this traffic pattern-- spikes of up to 750 queries per second. They're effectively requests per second. Think of them as requests per second, and that just carried on for the seven hours. Apart from when this happened, when somebody actually made an announcement on the live stream about the application, suddenly we were a little bit nervous about the application maybe not being able to cope, so we didn't really talk about it too much. But at some point in live stream, probably about four hours in, they mentioned it, and we saw this huge uptake in users accessing the application. We would have a peak later on. You have to handle that kind of scale. You never know when you're going to be tech-crunched. You never know when you're going to make it on to slashdot. And everybody's going to come and access your application. If I was building Walkshare, I would expect Walkshare to be extremely popular. I would expect it to be the most downloaded application on the Android Play Store. You've got to be prepared for scale. So let's talk about scaling modules. We looked at modules earlier with App Engine. AUDIENCE: [INAUDIBLE]. MANDY WAITE: Sorry? AUDIENCE: Is that one of their songs? MANDY WAITE: I'm sorry. AUDIENCE: Be Prepared to Scale-- is that? MANDY WAITE: It's hard to see who's talking. Sorry. But yeah. So scaling for success or something like that. So modules, we looked at modules earlier. They have free options when it comes to scaling. The classic auto-scaling option, which is basically based on a very, very complicated algorithm that we have. And basic scaling or manual scaling, so basic scaling basically says, this is the maximum number of instances while just using a very, very crude way of scaling within that boundary. The manual scaling effectively allows you to add instances yourself. Sorry. Requires that you add instances yourself. So you're going to add an instance manually to make it scale. Well, that's not particularly responsive. And App Engine Autoscale looks something like this. So I like this. We have lots of requests. They go into a pending request queue. We have an instance that's been created to service requests. This forwards busy, has probably got 10,000 parts in it, not just free. So we spin up a new instance very, very rapidly to handle the requests, and we move the requests from the pending queue onto the new instance. Then we create a new instance to handle the other requests. This goes on ad infinitum. As long as we have requests in the pending queue, App Engine, the scheduler itself will be making decisions about what to do. Should I leave it in the pending request queue and wait for an instance to become available, or is it quicker for me to spin up a new instance? They have to make those decisions in real time very, very quickly. But we all know that making decisions pragmatically is extremely quick, much quicker than spinning up an instance. So it can do that. And it does it constantly. That's basically saying, do I need a new instance, or can I just stay in the pending queue? You have a little bit of control about how this works. You can say, my pending request latency can be 900 milliseconds, say. So don't do anything until it's been in the queue for that long. If it goes over 900 milliseconds, take some action. Maybe spin up a new instance. And we also have these things called idle instances as well, so if an instance is not available to service a request, those requests will be serviced-- sorry-- routed to the idle instance which will service them, while in the background spinning up a new dynamic instance. It's all very clever, and it all works extremely well. For modules such as manual scale modules, we have the option as I think Julia showed you earlier to control scalability pragmatically by saying modules.set_num_instances. We want 42 of them. And that's very easy to do. Replica Pools-- Julia mentioned Replica Pools before. This is where we can create-- I love this term-- homogeneous fleets of virtual machines. All identical. No snowflakes are all absolutely identical. Based on the template, and the templates is on the left hand. We basically define the template, feed into this mechanism called Replica Pool, and it will then go and create this thing called a resource view, which contains a load balancer and all of our virtual machines. So if you want 10 virtual machines, you will create 10 virtual machines, and then we have a load balance endpoint we can send all of our traffic to. So we can create pools into size. In a moment we can't auto-scale them, but we'll talk about that shortly. Also monitors the health of these instances, so if the instance goes away, we can speed back up again. Load balancing-- load balancing scale. We talked about load balancer earlier. So we did some tests back in November again, I think, probably about time of 1D Day where we load balanced one million requests per second across a pool of virtual machines, and it really didn't take any kind of ramp up time. I think it took four seconds to get ramped up and 120 seconds to stabilize. But very, very quickly, we were able to load balance one million requests across a pool of virtual machines. And it cost about $10 to do that test. Now the requests that were being serviced weren't particularly complicated, but it shows you can scale massively. This is a lot more than the 10,000 requests we saw with the One Direction app. When it comes to Redis, the Redis solution we had was based on Replica Pools. So we have a load balancer-- oh, I missed a couple of errors there. I missed one error. Basically our leaderboard application will talk to read-only minions of Redis. All updates, all write-up dates are sent to the master, and they're pushed out to the millions which service our requests from the leaderboard app. So do I have time for this? Nobody's telling me, so I'm going to go ahead and do it anyway. So when our application's working, let me refresh that. What I'm going to do is go to Compute Engine. So because we know that one of our modules has a load balancer and a Replica Pool, we can go and look at the load balancer configuration. We have a forwarding rule, one IP address, so all of the requests will go to this one IP address, and they will be routed to a load balance pool virtual machines, which are these ones. OK. So we have six virtual machines currently, and now six minions, six Redis minions. We should have a picture of minions here in the cloud [INAUDIBLE]. So I'm going to have a look at this one particularly. So that's been quite busy, been doing some work. I'm going to *** it. And we have browser-based SSH. It allows us to effectively click a button and lot into the virtual machine. It transfers the SSH keys and ultimately makes the connection. And again, it's dependent on the Wi-Fi access and also onto the mood of the demo gods, which we know are not very happy at the moment. And I can do PS minus EF on here. Ooh, end up bigger. Maybe make the window smaller. Yay. Oh, wait. So here-- I'm probably going to need to scale out here a bit. We can see these same process on our Redis stuff, the stuff we care about. I want to come back out, because I can't see it. We have 12339, 13234. And we're going to want to kill the child process, which is going to be 13229. So I'm-- interesting. So I'll kill those processes off now. Come on. Like I said, the demon processes don't win, but it should be dead as far as Compute Engine is concerned. Back to our load balance pool. And then we see it. It's gone. But it's not actually gone. It's been marked as unhealthy by the load balancer. And we have a health check set up that will actually detect the-- really? Seriously? Going to stop? We have to get this timing right. This is such a great talk. I could carry on talking for another 20 minutes, and you miss so much great stuff. But we'll get this right next time. And basically yes, we had this health check that effectively is a little go application that has responded on port 8082 to slash check, and if that doesn't respond on that port, then it will effectively mark that instance as bad and no longer send traffic to it. So one of the other instances in the target pool will continue to receive traffic as before, but that one won't until we bring it back up again. Once it comes back up again, it will be added back to the pool and continue to service requests. And basically just, a wrap up because Tom's telling me I have to. Go back to the slides please. So basically that's it. So we've gone through all of the design decisions. These last ones, these last sections I'm just going to drop anyway for expediency effectively. Because we just don't have enough time to deliver it all. The last couple of things I wanted to mention, and regard to Tom, you'll have to come up and physically throw me off the stage. Coming soon, beyond those virtual machines, we have Docker support for managed virtual machines. Effectively managed virtual machines, instead of being just managed virtual machines will also run Docker. And it'll look something like this. You'll provide a Docker file. You'll provide your configuration file, and we will effectively register your Docker image in a Docker registry, and create a container on a virtual machine for you, and that will be what managed virtual machines is based on. Replica pools will auto-scale eventually. We already have that limited preview. If you're interested in that, let me know. We're looking for interest in use cases for auto-scale, because the guys are really looking at how they can best develop this kind of technology, because auto-scaling virtual machines is interesting. [INAUDIBLE] the containers, maybe we should be focusing more on scaling containers and not virtual machines. And finally, saving mobile data. We have this thing called Google Cloud Save for Android, which will simply by the process of saving data from your mobile app, from Walkshare, just by sending it straight to an API. You'll say, store this data for this user, and that user anywhere can go back and retrieve that data. And we're done. So thank you.