Keynote - Cloud roadshow 2014

JEROMY CARRIERE: Welcome. Before I get started, I want to kind of do a quick question. Developers? OK. Sort of management types? Which I can say without any slight, since I am such a thing. Oh, good. OK. Anybody I didn't get with those two categories that's willing to admit it? No. OK. Fair enough. So, as Tom said, I'm Jeromy Carriere. I'm an engineering director here at Google in New York. I'm responsible for all of Google's monitoring, logging, alerting, sort of operational systems, both facing Google's internal users and facing our cloud customers. So if you've seen Google Cloud Monitoring, that's my team's project. I was told to offer a little bit of human interest, so my human interest is my daughter's joining us today. She brought me to a One Direction concert last night, so this is payback. Yeah. She's going to have to stay all day. So, does anybody-- I was going through these slides. I'm thinking, like, does anybody really need to be sold on why cloud is a thing? Does anybody need to be sold on why cloud is a thing? Does anybody not believe it? OK. Good. Anyway. I'm forced to go through the slides. CEOs. We've talked to lots of CEOs. They see technology change as driving their business. That's likely been truth for all time. But right now, the key thing that's driving businesses globally is the cloud. Again, I probably don't need to sell you too much. There are a few key elements to that, things that are really pushing cloud as a thing today. Mobile, obviously, is a huge one, right? People are using their own devices in the workplace. Common pattern, 53%, I think, is the number of people bringing their own mobile devices into the workplace. Any time. We need to be able to access data wherever you are, whenever. Any time of day. From anywhere in the world. We need people. We need collaboration. That's a huge driver today. The key thing that I like to focus on is this idea of speed. Right? So I'm sort of subscribed to the philosophy that velocity is paramount. Everything else you can fix if you can deliver quickly. So spending time, as most enterprises do-- 67% of engineering effort goes into maintaining existing systems-- that just saps velocity. So how do we fix that? What is it that we can do about it? And that's where we come-- ah, sorry. One more motivational slide. Mobile. People don't really even appreciate that cloud services are part of their daily lives. Whether it's iCloud or pick your favorite start up. It's likely running on our cloud or on AWS. Not a surprise. So it's in everybody's lives all the time. A vast majority of the workload that's running on Google's Cloud Platform today is actually being driven by mobile applications. IT trends are the sort of fundamental building blocks that are pushing this momentum today. So the first is affordable capacity. Today-- this is a stat I didn't even actually know until I read these slides-- $600 can buy enough storage for the world's music. That's pretty neat. $500-$600. On demand. You can get compute capacity when you need it, with basically no ramp, right? There's no time. You don't have to wait to get capacity. As a startup, as any enterprise, you can get going with zero capital expenditure. You can rent your way to massive scale. And then, finally, global networks in their current state-- Google having one of the best-- is we can deliver information, anywhere in the world, to any device, extremely quickly. So these are the trends that are pushing all this together. In case that you're not sold, I found this also very compelling. In 1957, the average age of a company joining the S&P 500 was 75 years. In 2013, it was 10 years. So if you don't believe that the enterprise is changing, I think this should be compelling. Google. Everyone knows Google's mission statement. We're forced to memorize it. Tattoo it. Organize the world's information and make it universally accessible and useful. Of course, this started with Search. Google Search was Sergey and Larry in their, whatever, dorm at Stanford. That was where Google began. And that required a lot of infrastructure. Over the years since Google began, there has been an enormous amount of infrastructure built, rebuilt, rebuilt again, globally to deliver on that mission statement. That means that we are running some of the largest distributed systems in the world, with extremely stringent requirements in terms of latency, in terms of reliability. Doing this required solving a lot of problems, right? How do you actually not just store multiple copies of the web, which is one thing-- you can put it on a hard drive and stick in the closet, right?-- but making it accessible globally with low latency. Making it queryable in rich ways. Applying the Knowledge Graph to overlay the raw index of the web. And then expanding that, of course, into the rest of our product portfolio. That's required us to solve some really interesting problems. And just to look at one-- networking. So Google has a fantastic global network in terms of reliability, in terms of latency, in terms of raw throughput, spanning the globe. To get there, Google actually had to reinvent the way telecom was done. For the most part, we actually have our own dedicated fiber spanning the globe. That's kind of one aspect of the sort of physical plant side of things. But there's a lot of software here, right? Google engineers are exceptionally smart. Some examples are shown here. And these are the ones that we've talked about most publicly. And you see, as you go to the right, you're moving toward more public offerings like Compute Engine. But back in 2002, the problem was, how do you store multiple copies of the web and make them accessible? So the Google File System. Then, how do you actually process that much data? That's MapReduce. And then how do you actually make it queryable in an online fashion? Building a storage system is one thing, but then actually making it queryable for online access is another thing. That's Bigtable. But how do you make it expressive? How do you actually allow users that want to ask questions of that data-- how do we make them productive? Well, that's Dremel, or otherwise known externally as BigQuery. And then another interesting point here is that Colossus is the replacement for Google File System. So even in this one slide, there are already cycles of wax and wane of a given technology. So as we learned about GFS, it taught us what we needed to do in the next generation. That's Colossus. And then you see Spanner, which is a strongly consistent global database-- which sort of flies in the face of much of the accumulated wisdom of software engineering these days, which counsels us to relax consistency constraints and favor availability over consistency. And that's the Bigtable model. But as we observed how Googlers build applications, we discovered that actually finding a way to build a global, consistent database makes many problems easier to solve. And that's embodied in Spanner. And then, finally, taking everything we've learned about building global compute infrastructures, global physical compute clusters, has led us to Compute Engine, Google Compute Engine. So that's all well and good, right? That's fun for us. But what does it mean to you? Well, the good news is that the Google Cloud Platform, the compute engine I mentioned in the previous slide, is one example. It's actually built on the same infrastructure that powers Google itself. So underneath, as I said, BigQuery is Dremel. Underneath another-- just quick example-- underneath Google Cloud Storage is a product that we use internally called Blobstore. And I can sort of repeat this pattern over and over again. So the portfolio. The products that we offer as part of the Google Cloud Platform. Loosely, in three large buckets. Compute, that's hosting applications. App Engine and Compute Engine, which we'll get into in a second. And you'll hear more, of course, throughout the rest of the afternoon on these things in depth. The storage products. Cloud Storage, Cloud SQL, Cloud Datastore. So Datastore, with an affinity to App Engine. Cloud SQL, hosted SQL offering. Cloud Storage, the Blobstore I mentioned a second ago. And then high-level application services. Cloud Endpoints. This is the facility we have for allowing you developers to build APIs hosted by App Engine. And BigQuery, again, is our SQL-like large data query capability. Sort of a columnar-inspired query system. So that's the quickest possible pass through the platform. Just to reflect on how Google Cloud Platform is evolving. So I've only been at Google 18 months. So 18 months ago, coming into Google, Cloud, in all honesty, seemed like a bit of a novelty to me. It didn't seem like it was really core to our business. That, I can swear, has changed. We have as an organization, as an engineering organization, as a business, have applied an enormous amount of energy to the cloud platform. And you can see that reflected in just this small slice of the products that we've delivered just in the last year. Since last August. Everything from Encryption at Rest, which seems like kind of a table-stakes thing, all the way through stuff that we were talking about. Tom mentioned before Containers, Google Cloud Monitoring, which I'll talk about, I think, on the next slide. And all through this, there's been a consistent theme of reducing prices to better reflect the costs that we actually incur at Google, which I'll get into in a second. So developers are moving to Cloud, because it's always going to be lower cost. We can always, as a company like Google, run it more inexpensively than you, an enterprise, can. And that sounds like it's easy to say. But it's actually true. The economics are fundamentally different for an organization like Google. And I'll come back to that in a second. Then there's a question of flexibility and adaptability. We are striving to build a cloud platform that eliminates lock-in. Now, there's only so far you can go, right? If you believe me, and I said everything you build a Google platform is 100% portable, you'd be right in questioning that. But we are striving to not just adhere to standards, but drive standards to make that true. And then finally, we want to allow you, the developer, to focus on customers. Energy you spend constructing, managing, monitoring, and maintaining infrastructure is time you're not spending building value to your business. Something that your customers are going to love. And we want to obviate that. Cloud economics. So-- this is the point I made a second ago-- even if you were sort of thinking cloud-wise, and you're building a private cloud, unless you had an extremely diverse workload, if you're a very large enterprise, you can imagine how you could drive that curve down towards the public cloud curve. But we, as a cloud provider, have the opportunity to optimize in terms of economies of scale. We have the opportunity to hire in a way that is really just not possible unless you're running a cloud platform. And we have an opportunity to multiplex workloads. And this is a really interesting and subtle and rich topic of study. But we have an opportunity to multiplex workloads across this global infrastructure in a way that lets us pass on cost savings to our customers. As an example, there are a few of many potential computing patterns, right? There's the on-off sort of batch every night. Once a week, run this job. You have the pattern of growth, right? Which is of course what we all want in our businesses. Up and to the right. That means you need more capacity. And then there's a third workload type-- this idea of burst. And every workload, generally, that's continuously on is going to have some pattern, right? Absent any other capability, you have to provision for the peak. Or, if you're optimistic, you provision for some multiple of the peak, right, to avoid problems when you have an unexpected spike in demand. So all of these things, all of these computing patterns, if you have to manage them yourselves, you have to take on either dramatic overprovisioning, or provisioning for a certain amount of growth, or overprovisioning for these batch periodic workloads. With Google Cloud Platform, you can pass on all of that complexity to Google. You no longer need to plan for every cycle of capacity you need to run your business. I'm not saying you don't have to do some work, but we are striving to take away as much of that work as possible. Concretely, since 2006, we've seen 6% to 8% price decreases in public cloud offerings annually. But we don't think that's fair, because the underlying costs that actually drive the business have been dropping at 20% to 30% . So that remaining red area there is just profit for the cloud provider. Our intent is to actually drive that red line much closer to the blue line, and pass on those savings to our customers. And you had seen that recently as we announced new pricing. And one concrete example of that is our switch in Compute Engine pricing for On Demand. Historically, you had to choose. You had to choose between On Demand and reserved compute instances. And generally, you do that, if you're running a complex enterprise, you hire a capacity engineer. You have to hire a person-- or more than one person-- to actually plan for that capacity and optimize. How much do I run on demand, which is more expensive, versus how much do I run reserved, which is less expensive, but I'm committed for some period of time. So what we've done with our new On Demand pricing is as its usage increases towards 100%, prices decrease. So without you having to do any specific engineering, you just buy On Demand. Right? You buy On Demand instances, and if you use them like reserved instances, they become priced like reserve instances. And that is just sort of one data point to hopefully give you some feeling for how Google is thinking about pricing for the cloud platform. But cloud is still too hard. We want to price it so you love it. We want to make it easy to get at. But actually building applications for the cloud is still particularly difficult in many ways. Specifically developers need to make trade-offs to get around the way cloud platforms in general-- not just ours-- are deficient in a variety of ways. And here are three kind of classical trade-offs that we see today. A trade-off between time to market or scalability. Do I build my application, as a startup, as a new project, on day one to scale, when I don't know if it's going to scale? Or do I build it so I get to market most quickly? And then deal with, ugh, maybe it's not going to scale. I'll do it later. And often you have to step in one of those two buckets-- either build it quick or build it so it's going to scale. Second example is flexibility or automatic management. So this, back to something Tom said before, we believe that the line between infrastructure as a service and platform as a service-- infrastructure as a service being maximum flexibility, platform as a service being minimal management-- we believe that that line is blurring. And Julia is going to talk about that in a little while. And then, finally, we also have to often choose between big data-- store it efficiently, make it cost-effective to ingest and process-- or make it easy to access in an ad hoc, real-time fashion. Another trade-off that's commonly made. But we think in most of these cases, we can change that "or" to an "and." We can obviate the trade-off in many cases. So first one. Time to market versus scale. How do you smash those two things together? Well, we want to make it possible to use the tools that you as developers know and love. Make deployments fast, reliable. And then make it easy to fix problems in production. And one big component of that is Google Cloud Monitoring. This is my team's project, so I'm just going to leave this slide up here for a while. Let you bask in its glow of those nice graphs. So the idea here is to have a single pane of glass for monitoring all of your workload on the Google Cloud Platform. Give you rich dashboards, alerting, provide custom metrics, give you an instant management facility, make it possible to quickly find and solve problems in production systems. Whether they're running on App Engine or Compute Engine, you have Cloud SQL databases. You've got stuff deployed globally across our network. Stack Driver is the company that we acquired back in May that is the foundation of this offering. Another super-compelling piece of this is recognizing that if you're making that trade-off, right? Either I have super-flexible or I have completely managed. One of the common complaints there is that if I go to the super-managed end of the spectrum, I lose visibility. I lose the ability to debug my application, right? We've all typed gdb binary core file. Like, go in there, find out, like, stack trace. See which pointer id reference is null. You lose that, in most cases, in a cloud environment. You're running on App Engine. You're running on managed VMs, which I'll talk about in a second. You've got your processes running wherever. You don't have any idea where it is. You can't log into the box and attach to the running process and inspect the stack. Cloud Debugger offers the ability to do exactly that in a globally deployed application. So you fire up the Cloud Debugger. If we have your source, you set a breakpoint. And we catch when the breakpoint is hit by the process running, wherever it happens to be on our platform. You see it. You can inspect the stack. Incredibly, incredibly compelling capability. And this was also announced at I/O. This is not generally available yet. And then another, final example is Cloud Trace. So we've all wanted look at traces. A hierarchical decomposition of a request. So we have an offering-- again, not yet generally available, but will be soon-- called Cloud Trace, that lets you inspect for a given App Engine request. Inspect its complete trace of calls into Datastore, calls into memcache to look for latency, to look for deviant behavior in your application. And compare from release to release. Had release four, release five. Release five seems to be relatively less performant than release four. Where is the delta? This tool will tell you. OK. The next trade-off that we want to make go away-- that, again, Julia will dig into-- is the trade-off between flexibility and management. Making it easy to manage your application, right? The App Engine kind of platform as a service model. Versus run whatever you want, however you want, but you lose the management facility. Again, we want to make that go away and blend together these two models, the Turnkey App Engine platform and the fully flexible Compute Engine platform. This is kind of the conceptual view of how this might be accomplished. At the bottom, you've got Compute Engine. We manage your infrastructure. Then you get the idea of replica pools, which is a new facility in our Compute Engine offering, where we take care of multiple instances sort of stamped out of the same virtual machine image. And you can tell us to scale it. We'll make sure that there are n healthy at any given point in time, by doing health checks. That's one additional level of management. Then up another level is managed VMs, where you give us your App Engine app. We run it in a virtual machine. You don't have to log into that virtual machine. You don't have to install software on it. We take care of that for you. But it's got your App Engine app deployed in it. And then at the top of the stack, we have so-called managed runtimes. This is the classic App Engine model. So in each of these layers, you have to have-- maybe not less code. To me that's not quite the right representation-- but you have to do less management at each level. One more plug for my team's project. We will provide logging and monitoring across the spectrum. So, regardless of whether you're running a raw VM or running on top of App Engine, or anywhere in between, we will give you visibility into your complete application portfolio. So managed VMs, which I mentioned before, so this is the idea that we can offer you a virtual machine that is managed like an App Engine app. Containers. This is one of the newest things that we've been doing in Google Cloud Platform, is offering facilities to make it easy to run containerized applications on the Cloud platform, by offering special VM images that are tuned for running containers. And, at the richest end of the spectrum, offering this new system called Kubernetes, which is a sort of miniaturized cluster management system that reflects on the lessons that Google has learned over many years of running a large-scale cluster, global managed cluster system. So this lets us offer better predictability. Lets us do things with resource allocation that increase efficiency, give you visibility into how your resources are being used. So this I think is extremely exciting, an extremely exciting offering from GCP. And, perhaps more importantly-- and you might have seen this in the press-- we're actually working with a consortium of tech companies-- including Microsoft, somewhat surprisingly-- to make this a portable offering. So that it operates the same, regardless of which cloud platform you're running on. So you can take your containerized Kubernetes application and run it in any cloud with minimal pain. Networking. I think I went through this a little bit already. Our networks are great. I won't dwell on it too much. But I think this is generally self-evident in the data. Finally, big data. So, again, trade-off between big and fast, big and ad hoc real time. There's a variety of reasons why big data is hard, right? On one side of the coin, you need specialized expertise. You need complex distributed infrastructure. It takes a lot of energy to build an efficient big data system. And then it's expensive. The people that you have to hire that have specialized expertise are expensive. Storage is expensive. Compute is expensive. So what we've decided to do is, to the best of our ability, make this easy and affordable by bringing to market a number of tools, again based on the things that we've done internally at Google. Based on our expertise with MapReduce, with large-scale SQL-like systems like Spanner. Data Flow, which is an evolution of an internal system that has largely supplanted a raw MapReduce for our internal application workloads. And, at the same time, make it easy to run open-source systems like Hadoop on Google Cloud Platform. So we're trying to put together a portfolio of building blocks to make big data accessible to our customers. Last point on this is we're also working on a fusion of streaming computation-- that's the thing I mentioned before Data Flow-- batch, typical kind of MapReduce-like applications, and graph analysis, graph databases. Putting those together into a portfolio that, again, gives you the building blocks that you need to build the kinds of applications your business requires. So, in summary, cloud's real. I didn't have to sell that one too hard, which is good. The idea of the Google Cloud Platform is harnessing the technology that we at Google have used to build Google. And, finally, all of these software and infrastructure innovations are really coming of age. And the Google Cloud Platform has as its mission to bring those to our customers. And to put our money where our mouth is, we have a so-called starter pack credit available now with this promotion code applying at the URL there. And get $500 to begin working with the Cloud platform out of the gate without any upfront expense. Thank you.