Google I/o 2013 - Distributed databases panel - An exploration of approaches and best practices

JULIA FERRAIOLI: Hey, everybody. Welcome to this session on distributed databases. This panel is really designed to give you an overview of some various approaches, some best practices, and give you the opportunity to ask questions to these industry experts that we've gathered here on stage today. So let's get started. My name is Julia Ferraioli, and I am a developer advocate working on Google Compute Engine. BRIAN DORSEY: And I am Brian Dorsey. And I'm a developer programs engineer working on Compute Engine and cloud storage. And we will be your hosts today. JULIA FERRAIOLI: Excellent. But why should you care about distributed databases? BRIAN DORSEY: We think this is an incredibly exciting time in the world of databases in general. And the NoSQL world in particular has been exploring a wide variety of ways to distribute your data. And the different databases have very individual approaches to how they approach high availability, reliability, and scalability. JULIA FERRAIOLI: And with vast computing resources available to us on demand, we're able to gather more and more data, turning it into actionable information. This information is how we grow, how we develop and improve. It's far too important to be lost to a crash. BRIAN DORSEY: Or some kind of natural disaster. JULIA FERRAIOLI: Or a klutz tripping over the wrong cable. BRIAN DORSEY: And as an industry, we've usually dealt with this sort of thing by implementing failover or some kind of backups. And we really believe that the way forward is to distribute your data onto a large number of machines. And you're going to want a database that's designed for that world. You want a database that, as you add more machines, you get more reliability. JULIA FERRAIOLI: So the last reason is actually rather self-serving. We all like to relax once in a while. So with a well-chosen, well-configured database solution, you can ditch that dreaded on call pager and finally get a good night's sleep. So let's introduce our panelists. First off, we're going to hear from Tyler Hannan from Basho, then Mike Miller, representing Cloudant. BRIAN DORSEY: Then we'll have Chris Ramsdale from the Google Cloud Datastore and Will Shulman from MongoLab. JULIA FERRAIOLI: OK, Tyler, you are up. TYLER HANNAN: Beauty. Thank you, Julia and Brian. So as they said, my name is Tyler Hannan, and I'm with Basho Technologies. A little bit about Basho-- we were founded in January of 2008. There's about 140 of us now. We have offices sort of strewn about the world. But as a company building distributed systems, we're kind of neurotic about the term distributed, and we're a distributed team as well. So about 80% of us work from home. So when thinking our product, Riak, our database, it's good to start by considering the benefits of a distributed database. We believe that Riak is a very operationally friendly database that is fault-tolerant. It can survive a node failure. It's highly available. It can survive network partition events. It's scalable. You can grow to meet demand simply by adding hardware or compute instance, and it's self-healing. Self-healing is an interesting one, because in any of the events above, be they good, bad, or indifferent, your data is safe. So if those are the benefits, what are the properties of Riak that those benefits are derived from? Riak is a key/value value store. It's open source. It's distributed. It's masterless. It's eventually consistent. Each of these could be a talk in their own right. So I'm just going to touch on a few very, very briefly. Riak is a key/value store, which means you have very, very simple operations-- gets, puts, deletes. And the value is mostly opaque. It's stored on disk as binary. I append a key to that value, either programmatically or manually. And those key value pairs are stored in a higher level name space called a bucket. I interact with these key value pairs via either an HTTP or protocol buffers API. And there's a slew of client libraries for it, some of which we've built, some which others have built. Pick a language, there's a library. And then we've layered some extras on top of that-- some lightweight JavaScript and Erlang MapReduce, full text search, secondary indices, pre- and post-commit hooks. Importantly, though, Riak is masterless. If you're new to the world of distributed systems and distributed databases in general, I encourage you to read the Dynamo specification. It's a white paper that's seminal to describe an approach. And it's the approach that Riak has adopted. A masterless deployment means that any node can serve any request, that data and load are spread evenly. This is achieved through a combination of gossip protocol, which is mesh network-like, hinted hand off. But the point is that you can achieve near linear scale by simply adding hardware or compute instance. And what does that look like? It's really three commands-- riak-admin cluster join, cluster plan-- here's what's going to happen to your datas and the replica thereof, and cluster commit, and you're off and running. In the context of what we're talking about with Google, that's as simple as firing up gcutil or the Python APIs, giving it a project, choosing a machine type-- n1-standard-4, because we all like memory as databases-- SSHing in and setting it up, and you're off and running. So when would Risk make sense for you? If you want an operationally friendly database, and you want to combine that with an operationally scalable compute platform like Google Compute. And you're in gaming, social, mobile, retail, advertising, whatever. Forget about the term "big data." Think about critical data. When your data is critical, when it must be served with high availability and low latency, then you want to look into a distributed system. And Riak may be a good one. Thank you. [APPLAUSE] MIKE MILLER: Can everybody hear me OK in the back? Well, my name's Mike Miller. I'm one of the co-founders from Cloudant. And I'm going to give you three quick slides, just pictures actually. And while I'm a technical founder, I'm going to try to keep this high level and get to the question and answer session. So how many of you are mobile developers or have that is a big part of your work? How many of you have databases that are so big that you can't fit them on a single machine? Everybody that is writing a mobile app, if you app is successful, you'll eventually hit that point. And so really, that's one of the two main reasons that we founded Cloudant. So my background is in big scale systems from Large Hadron Collider. And when we were trying to build that and serve a globally distributed populace on 200 datacenters, we realized that two things broke our model of computing. One was big data, and the other was the fact that everybody was in a different place. That last one has been rebranded as mobile. And the fact that scale is important and masterless and distributed and all the things we're going to talk about, you should just take for granted. I'm not going to go in those details. You should have a service that, when it needs to grow, you put nodes in it. And that service gets bigger and faster in a linear fashion. That's all great. But the other thing is to attack the latency problem. What we realized when we're solving that at the LHC was really that what we wanted was to take a database and take a content distribution network, like Akamai, and merge them together so that I could have my device, and I could write locally. And then it would synchronize when it could in the lowest latency fashion possible. So you've got 200 precious milliseconds for every user interaction. And that means that if you're going around the globe, you spend 50 milliseconds getting off your phone. So with Cloudant, we've built a distributed database as a service. We run it so that you don't have to. You can focus on your core competency. And we're, I think, rolled out on something like 18 to 20 datacenters around the globe now. So our big thing is footprint and giving you a low latency connection wherever you are so that you write local and think later. And so it's one database that I think a strong recommendation I have. As you choose new technologies, choose something that has a mobile strategy baked in. So we heard what it's like to install Riak-- awesome. This is what it's like to install Cloudant. You sign up. You get a username.cloudant.com-- done. And then you get access and resources automatically around the globe. So you write local, and your data is beating you around on the globe as your users move. And the way this works is pretty simple. You can actually use this cloud API. So we speak the Apache CouchDB API. Beneath that, we've [INAUDIBLE] the same concepts from Amazon's Dynamo paper around distribution and quorum. So that can handle megabytes to petabytes. But you can also run a local instance of an open source project on your desktop machine. Or you could actually run it in the browser, store state locally-- so the local store HTML5. And you can even run this on iOS and Touch. And the cool thing there is you can always write locally and preserve that full offline experience. So in terms of what I think you should have in your head as you think about choosing your database technologies, think about what things you're forced to do with your database that you'd love to get away. And so we're trying to give you a chance to say hey, do I really need global write master? Or can I just put data someplace locally and have that distribution happen automatically? So with that, I'll pass on the mic. Thank you. [APPLAUSE] CHRIS RAMSDALE: Thanks, Mike, and thanks, Julia and Brian. I'm Chris Ramsdale. I'm a product manager for the Google Cloud Platform. And I'm focused on App Engine and distributed storage within the platform itself. The Cloud Platform storage family is comprised of Google Cloud Storage for serving blob data, Cloud SQL for relational data, and the newly released Cloud Datastore for non-relational data. And for the sake of today, we're going to focus on the Cloud Datastore. And I've got a few slides, and we'll get going, and then we'll drop it into Q&A as well. So announced yesterday, the Google Cloud Datastore was our efforts to extract out the higher application datastore that was part of App Engine and make it available to any compute container, be it EC2 or Compute Engine or things like that. Launched in 2011, the high replication datastore turned out to be a really great project and a really good service for our users. We've gone for two years. We're at about 4.5 trillion transactions per month right now storing petabytes of data. Jokingly, App Engine has really been referred to as the SDK for BigTable. So what we did was we said, that's great, but it's packaged up in the App Engine SDK, so you can't get to it from anywhere else. And so we pulled that out. We put an HTTP interface on top of it. I'm not really doing justice for what we did to the API, but you should go read the docs. And now we're announcing it as the Google Cloud Datastore and, again, making it available from anywhere. And by doing that, not only do we get the hardened productivity or the productization of HRD and now inside of Cloud Datastore, we also get the features and functionality, for example, asset transactions, eventual strong consistency, a built-in query engine that lets you do fixed cost queries and things like that. Those are all available as well. So that's what the product is. And real quickly, we're going to talk about how that aligns with what our philosophies are and what we're trying to do within the Google Cloud Platform all up. So at it's core, we're trying to bring Google infrastructure to developers, to our customers, to you that are in this room today and joining us for Google I/O. Specifically, the same infrastructure that's running things like Geo and Mail and Google Maps and Search and things like that. And what that is specifically in the case of storage, is these components that you see here on the left hand side that Urs had talked about yesterday in the Cloud keynote if you happened to see it-- everything from our datacenters and our machines to our networking to the crazy services that we've built over the past few years-- Colossus, BigTable, and Megastore. And by doing that, what we allow you to do is we bring you high availability. By building on top of Megastore, we actually give you the replication across multiple geos. We give you high scalability. So Colossus for actually giving you the stores, the underlying mechanism for storing. It gives you a huge capacity and high durability, and then BigTable for horizontal scaling out. So we do that automatically for you. You don't have to actually add a node to the Cloud Datastore. We just shard appropriately and things happen. And finally, an API frontend that's actually built on top of App Engine that we'll just scale out. We're looking for access from anywhere. We think that data it should not be siloed. It should follow you wherever you go. And as our platform expands out and as customers use hybrid approaches like Compute Engine in EC2, they want their data to follow them wherever they need to go. And then finally, what's near and dear to my heart is Fully Managed. I think that we're all going to talk about management and how we do it as we get into the Q&A. And what that means for us at Google is that you get this guy. This is Michael Handler. He's one of our Site Reliability Engineers. We call them SREs. And they help keep the service up and running. They help us do planning, architecture, failovers, things like that. But because you're building on the same infrastructure as all those other products I alluded to, you not only get him. You get him cloned like 1,000 times. So you've got all these teams that are monitoring all these services that do automatically scale out for you. But sometimes things happen. And we've got the cell phone, AKA the pager to take care of it and make sure you don't have to worry about it so you get more sleep. I'll turn it over to WIll. [APPLAUSE] WILL SHULMAN: Hello everyone. My name is Will Shulman. I'm the CEO and co-founder of MongoLab. I'm going to give you an introduction to MongoDB and our MongoLab Lab cloud service. It's upside down. So real quick, what is MongoDB? MongoDB is an open source, high-performance, distributed, and document-oriented database. And I'll go through what that means really quickly. First and foremost, MongoDB is a document database. It doesn't mean, for those of you who are not familiar with that terminology, it doesn't mean it stores PDF files. It means it stores these rich JSON documents. MongoDB stores these in a binary representation of JSON called BSON. And this is a really important part of MongoDB that really differentiates it from other NoSQL databases. It's not just a key value store. It gives you these rich data structures. It's got nested objects. It's got arrays. You can represent geolocations natively. And also, unlike a lot of key/value stores, MongoDB has a rich query language. It's a real database. It allows you to do searching and sorting by not just the keys, but any nested value. So here we have all sorts of examples of doing queries and geolocation, if you look down at this example towards the bottom. It's built in. It's got aggregation, so you can do binning and grouping like you would with a relational database. And also like a relational database, although you can search and sort by anything, you can create indexes on almost anything. So if your queries are slow, you can index on any field value, nested key values deep inside the object. You can index on array values. Makes it really great, not just for a large data sets, which I'll talk about in a minute, but really as a replacement for MySQL or any relational database you use. If you spent an inordinate amount of time in your life mapping from your object-oriented programming language to a relational datastore and doing object-relational mapping, you don't have to do that anymore. Second, MongoDB is a distributed database. There are two clustering technologies with MongoDB. The first clustering technology is called Replica Sets. This is what allows you to have high availability in MongoDB. It's single master. So you can read and write. It supports both strong consistency and eventual consistency. So if you need strong consistency for your application, you can have it. If eventual consistency is acceptable for your application, you can also read from these secondaries. And its consistency model is very, very configurable. The second clustering technology is called Shard Clusters. This is basically a cluster of clusters. So it's a cluster of Replica Sets. And with Sharding Clusters, this is how you get horizontal scalability with MongoDB. Like many have mentioned, as you add nodes to the system, your rights get distributed amongst them, and it just scales linearly in a very horizontal manner. What is MongoLab? So MongoLab is MongoDB as a service. And that's what we do. MongoLab really is about automating all of the operational aspects of our running database. We try to build an expert system out of the DBA so that you don't have the DBA. You don't have to have the ops guys. And you can focus, like a lot of folks have mentioned, on building your app. We handle provisioning and scaling, replication and backups. We've got rich Web UI. We handle all the monitoring, all the alerting. And of course, all with the service comes expert support. Our product offering offers shared and dedicated VM plans. We have SSD backed plans, single and multi-node plans. So whether you're doing analytics and you just need a single node database, or you need lots of nodes, we're going to have full sharding support in 2014. And we run on all sorts of cloud providers, and we're here today and this week to announce our support for the Google Cloud Platform, which we're really, really excited about. So thank you very much. We've got a sandbox. I guess I lost my last slide. I just wanted to say we've got a sandbox outside tomorrow at Google I/O, where you can come and get a demo of our service. So we encourage you guys to stop by. Thanks. [APPLAUSE] JULIA FERRAIOLI: Thanks, guys. That was a great overview. And this is the fun part where we jump into some Q&A by the audience. I encourage you if you have questions, start lining up at the mics now. To jump start it, we'll just seed a question. We're at Google I/O. Most of us were at Urs's talk yesterday. So we heard a lot about the improvements and the type of infrastructure that we have at Google. So my first question, and this can be answered by everybody, is, what makes Google's infrastructure a great place to run distributed databases? We'll start with Will. WILL SHULMAN: So we've actually been using GCE for a little while. And first thing actually is the network. The network is a, it's blazing fast, really, really impressive. The other thing that's great about the Google Cloud Platform is it has a private distributed backbone between all the datacenters. So we're really excited about that functionality. We're really excited to offer globally multi-region clusters where you can have nodes-- different parts of the globe. But you're not talking over the open internet. You're actually talking over Google's private backbone. CHRIS RAMSDALE: So it's slightly different for us, because we're not actually building on top of Compute Engine VMs, at least today. But I think there's an analogy that I'll get to real quickly, which is that for the same reasons that it's good to build on Compute Engine for all the services they provide and the fast networking and the things that Google takes care of, it's the same reason we chose to build on top of things like Megastore and Colossus and BigTable and whatnot and are looking at things like Spanner as well, because those are the lower level pieces that are handled by Google, much like the lower level VMs are inside of infrastructure. And then also at the same time, while you can access the Cloud Datastore from things like EC2 or Rackspace or whatnot-- although I think the latency might be a little bit awkward-- we really built this for Compute Engine users, because we do think it's one of the great platforms that we're offering and bringing into the larger cloud platform. And so we wanted to give those users a fully managed non-relational schemaless datastore to truly grow their apps. MIKE MILLER: Boy, there are a lot of ways I want to answer this. Number one maybe being that it's not Amazon, which is kind of nice as a change. The reality is we built a database as a service. And we've got to bring that to the place where the developers are. There's a large and rapidly growing number of developers on Google. So for us, there are all kinds of technical reasons about why it's a great platform. If you're building a CDN, you want to know the network very well. And reliable latencies are huge as well-- so private backbone. But really, the number one thing for us is footprint, like I need to get as close to the edge as possible. And so Google has one of the biggest, if not the biggest deployments globally. So for us, that's massive. TYLER HANNAN: So Will mentioned this, and I'm going to repeat it. The private backbone between datacenters matters. Data locality is an important concern when you're distributing a global database. But also, when scalability is a problem, it's a problem now. And it's a business problem now. And so the ability to simply scale by using the tools that are provided to developers regardless of which database solution you choose is an important and, I think, unique component of Compute Engine and something that if I could surmise, will be a continued area of improvement in the future. JULIA FERRAIOLI: So it looks like we have some questions. AUDIENCE: My name's Drew [? Broadly. ?] I'm from New Zealand. One of the questions I have-- because the people from Cassandra aren't up here, and I thought I'd take the opportunity-- and what are your opinions on distributed counters, and what's your approach? TYLER HANNAN: So distributed counters was the question. Sorry, I'm repeating that, because I'm hard of hearing. We've been working ***, in concert with INRIA Research in France, on CRDTs and looking at building counters that can survive in eventually consistent world. We've also been working on adding some strong consistency capabilities into Riak. I think they're there. I think they're coming soon. I think they're a lot of scary math. So it'll be interesting to see how the market adopts. But I think we're on the cusp of distributed counters being a reality. MIKE MILLER: Yeah, CRDTs are definitely a hot area of implementation. I don't know about how hot there are research-wise, but we're certainly seeing users who have implemented a large number of those things in reality. But overall, I think a little bit of a shift in mindset when you're writing something. A lot of people who come to NoSQL will get there after going to the fully denormalized state. So you start with start schemas, and you work up all the way to full. You normalize. And that mindset is something that we have to help show people the best ways to solve problems. I think one of the things that we also see that's available now in the majority of these systems is going to an immutable data model where if you're modeling a state machine, every transition is modified-- modeled as, say, a new document in our system, a new JSON document. And the aggregate state is summed up in a secondary index. So that's a really powerful way already without even appealing to something as complicated as CRDTs to deal with rapidly changing things that have to be summed. So concrete examples-- leader boards, amounts of money in accounts, virtual currencies, which maps a very real large amount of dollars. CHRIS RAMSDALE: I think they summed it up well. AUDIENCE: Are you two able to give an input on it or not? WILL SHULMAN: I'll be honest with you, I've given almost no thought to distributed counters. So sorry about that. AUDIENCE: Thank you. JULIA FERRAIOLI: More time for more questions. AUDIENCE: So many of my friends who initially recommended MongoDB to me, later, I noticed they grew disillusioned with document-oriented databases. I'm wondering how have you noticed organizations dealing with the engineering scaling issues of actually engineering applications with document-oriented databases versus-- I'm not talking about the whole technical scaling, but building their applications. WILL SHULMAN: So for document-oriented databases is specifically your question? AUDIENCE: Right. JULIA FERRAIOLI: Sounds like a great question for Will. WILL SHULMAN: I think if you're not just talking about the technical aspects, which obviously involve sharding and using the horizontal scalability aspects of MongoDB, from an organizational standpoint, we find the customers are just loving MongoDB even actually if it's not a large data problem. So the ability to not-- and like I mentioned in my talk, almost every programming language now is an object-oriented programming language. So having a rich object data structure in your database is really just unbelievable for developer productivity. If you're using something like server side JavaScript or Python or Ruby, and you have this object in memory, you could just throw it into the Datastore. We're finding that people are really finding the schemaless nature of Mongo, the ability that you could just add fields, the ability to just insert things and take them out without having to do a lot of transformations, is really helping from an organizational standpoint, which is what I think you were getting at, really scale development, really scale productivity. MIKE MILLER: Sorry, I didn't mean to steal the mic. I think it's a good question if I interpret it a little bit myself. I think one of the questions is something like a document database. You're [? trading ?] JSON on the wire over HTTP. There's a lot of ways to model your data. And that flexibility, again, like education in terms of what is the best way for a particular system to solve this problem. In our system, there are at least four different ways to store one to many, many, many relationships. And so a lot of that has to do with just getting out best practices and understanding, oh, my problem falls in this bucket. And that's something that I think the communities in general are rapidly starting to try to get down on paper. CHRIS RAMSDALE: I'll add to it. To touch on the point of everybody that's doing mobile as well today, I think there's other things you want to consider as well in terms of going after that approach and going after doing document stores. While they're schemaless and they're great and it makes it very easy to scale, you do lack the things that you get from, say, doing row-column, where you want to do, I want a subset of my property, because you're trying to minimize the payload that's going back over the wire to your mobile clients, especially when you're doing a back end as a service type of solution when there's no server side code to filter out. You're literally going from the mobile device-- iOS, Android-- back to the Datastore. You're like, no, no, no, I don't need 100 properties. I only need the first five, because that's actually what I'm showing in my limited real estate view there. TYLER HANNAN: And I think this is an important question, so I'm actually going to answer it slightly differently, which is in the world of relational databases, we started modeling our data by saying what answers do I have. In the world of NoSQL, non-relational databases, we start modeling our data by saying what questions do I have. So you could think of it as a top down approach rather than a bottom up. And it's not an either/or. There are situations where you may want a relational datastore alongside a non-relational with something specific for geospatial indexing. That polyglot deployment is important, and it's something that, as developers, I think we can be excited about that we're not limited to a set of tools. We have a broad range of tools to draw from based upon application need. It's not a one to one application to database model anymore. It's a one to many, depending on what's living inside of the application. AUDIENCE: You mentioned Spanner. Could you speak about how it differs from BigTable, and are there any plans to expose it via an API? CHRIS RAMSDALE: To repeat the question, I think question was, I had mentioned Spanner, and was there-- AUDIENCE: How does it differ from BigTable, and are there any plans to expose it via an API? CHRIS RAMSDALE: So it doesn't differ BigTable. It differs from Megastore and how they're doing the replication. Spanner is, in many ways, the next evolution of Megastore in the fact that it does georeplication in a more efficient manner and actually can do time stamping across multiple geos. So it really gets us to where we're trying to go, is the global footprint. Whereas replication between datacenters right now via Fiber is actually pretty good. Think about where do you want to be in five years from now, 10 years from now, where you're like, I don't want to actually think about, unless it's for compliancy reasons, where my data is. I just have a bunch of users. I have a great service that's taking off. I'm the next Tumblr, Pinterest or whatever. And they are all over the place. And my data needs to be replicated. So then I go across the Pacific or I go across all the dark fiber that's in the Atlantic or whatnot, it just happens in microseconds. So that's really the goal of Spanner. And that's why I was looping it back to Megastore, because Megastore is the service that actually does the replication for us and for Google. There's a question about APIs. And yes, there's many conversations underway about should we surface an API on top of Spanner. I have my personal beliefs that I think that what we're doing inside of HRD, inside of the Cloud Datastore is the right way to go, because we're actually pulling in-- if you flip the bit on all of Google's infrastructure right now and made it public, I think everybody would freak out. It's not the way I think that everybody wants to program. So what I like what we're doing inside of the cloud platform is we're putting that layer on top of it that makes it really palatable for developers to come along and use some pretty mind-blowing infrastructure. AUDIENCE: So is that a no? CHRIS RAMSDALE: Say what? AUDIENCE: Is that a no? CHRIS RAMSDALE: No, no, no, that's a not yet. It might. It's actually been debated inside of Google right now. AUDIENCE: Hi, [? Stephan ?] [INAUDIBLE] from Rovio Entertainment in Finland. I was wondering if you have some advice, ideas, or tools to deal with backup and restore strategy in reasonable time, because usually you deal with huge amounts of data and distributed databases. And while most of them address disaster scenarios already in this redundancy setup, there's still a possibility that you'll cause data corruption by bugs in the application-- not that that would ever happen. CHRIS RAMSDALE: Yeah, in actuality, it's the application bugs that get you all the time, not that Rovio would write any bugs. Like the pigs wouldn't go off into space somewhere else and attack a planet. We have a backup and restore functionality right now that's in its very much alpha phase, I would say. That's user-initiated. We're in the process of moving that into a fully managed service. It goes with the whole theme of what we're trying to do. And that just gets us to where we want to be. That's the baseline functionality. That's an epic backup is what I call it. So if you have 10 terabytes a day, you're doing that backup all the time. And one could do the math in their head and say, that's not necessarily the way I want to be doing things. So then what we're going to do is take it from there and go into doing incremental backups with snapshots as well. So what you get is, you get full backups that are reliable. Then you get backups for cheap-- incremental. And then you get backups with snapshotting, which gives you the consistency that you can do them at anytime. WILL SHULMAN: So we run a cloud service around MongoDB. There's lots of interesting techniques that you can use with Mongo to deal with this problem. And you're right. Almost always we deal with node failure via replication. But human error is always when people come calling us and say, hey, can you restore backup for us? So we have a backup system that does snapshotting and does the typical type of backup you might expect. But there's other technologies, both core to MongoDB and stuff that we're building that's going to help even further. So one is a MongoDB technology called Delayed Slaves, or Delayed Replication. So you can actually set up a node that lags behind the master by an hour, a day, a week. And that's one technique some customers used to say OK, I fat fingered the database. I'm going to go to this Delayed Slave that's maybe an hour behind. And I'll have a set of data that is not corrupted by the bug or whatever caused the problem. The other nice thing about MongoDB is it has an op log of every operation. This is what replication is driven off of. So you can actually restore to a snapshot. Let's say you restore to a snapshot from a day ago. Then you could replay the op log to bring the data state to the operation right before the operation that started to make things go wrong or started to corrupt the database. So you could actually do point in time. And that's also something we're looking to productize as well. MIKE MILLER: Maybe I'll just add an anecdote. I'm sure everybody here has backup solutions and a story there available on the product page. One interesting piece of data, though-- and this is not fully quantitative. I'm just trying to this in my mind right now. But I would estimate about 80% of the user-initiated errors we've had are simply database deletes by mistake. You have a REST API that looks really clean. And it's like, you've got a verb. You've got a base URL. You've got a collection or a database. And then you've got a document, maybe other things behind that. And you have a script that somehow has an empty string in there. You just dropped the whole database by mistake. And so actually about 80% of the things that we have to do for users are restoring databases that they just deleted. And so number one thing you can do is rename files in garbage collects. There are basic operational policies you can take that allow to, like, oh, that email came in. Yeah, we just restored it right away. Flip the bit. So it's interesting to see actually. One of the things we should do is talk about as you run the service, what are the things you learn about the users that really drive how you go back and feed that back into the product? JULIA FERRAIOLI: Don't make delete calls too easy. CHRIS RAMSDALE: Just do what we do. We make the writes, the delete actually really expensive. And the storage is really cheap, so you just keep the data around. I've got to poke fun at myself. TYLER HANNAN: A whole slew of options and opportunities with Riak. I'll just reiterate what the gentleman said, which is backups are, again, a business challenge in addition to a technology challenge, particularly when you're considering latency. So whether you adopt a backup strategy that's a fully replicated cluster that mirrors production or whether you use some cool syincing technology or whether you do more of a traditional sort of replication, whatever you choose, plan for failure. That's my key message about distributed systems. Plan for failure. And test it. JULIA FERRAIOLI: So we have a little under six minutes left for questions. AUDIENCE: So my question is very much related to the previous one. In human error events, if you have an append-only database, where instead of deleting or updating, then you can stop those errors as well. Do you guys have any plans to support an append-only style database as a first class kind of product? MIKE MILLER: I can tell you. Sorry, sometimes it sounds like it cuts out. So we use modified portions of the Apache CouchDB storage engine. So it's an append-only copy and write b-tree So we're never overriding our data. We compact it away in the background after so much time or so many revisions. But we haven't fully leveraged that yet to productize very clever incremental slash point in time backup service. So there are great things you can do that, depending on the storage model you choose for primary and secondary indexes. CHRIS RAMSDALE: So HRD and the Cloud Datastores, since we're built on top of BigTable, we're already doing the append-only. This is why it's actually much easier just to leave your data around, rather than it is to go and do writes to delete it. And then to dovetail back into what I was talking about to the guy from Rovio is that, given that we have the foundation laid to allow us to do incremental and snapshot backups at time based on the way we're doing the appending. AUDIENCE: So there's a lot of cool technologies being introduced and rolled out to developers. But with the database, I always wonder why are we still using HTTP to interface with the data? Why are we using a stateless connection? And my personal challenge is using high concurrency at Python, you start accumulating large file descriptors for all of the HTTP connections that have come and gone and not been collected in time. And so it's a real challenge managing that versus when I'm using Postgres or something with traditional binary stateful connection, I don't seem to run into those problems. So is there a plan to maybe standardize a binary protocol or a stateful protocol for these types of services coming out? WILL SHULMAN: So MongoDB actually has a binary wire protocol. It actually isn't an HTTP interface. MongoLab actually does offer an HTTP interface on top of the Mongo protocol, and you can use it if you'd like to. But I'd say 90% of our customers use one of the standard MongoDB drivers supported by 10 Gen. So almost every programming language you might want to talk to MongoDB from, there's a driver, and it's a tight wire protocol. It's based on the BSON spec, which is at bsonspec.org. And yes, so very well understood. And all the drivers have connection pooling and things like that from a traditional relational database, what you're probably used to. CHRIS RAMSDALE: But I'll add to that. I think that the way we think about is, with the stateful connections, whether you're doing socket or whatnot, it ends up being a bottleneck in your system when you're trying to scale out horizontally. So that's one. And so we try to avoid those wherever we can. So HTTP just happens to be the mechanism. Maybe that's going to change over time. Hopefully it does. There's a fair amount of overhead to bring those connections up. And then two, what I'll add is, think about it in terms of when the server goes away, when it really is just a bunch of mobile clients, when they're actually doing, when we've got all the auth figured out and things like that. So you're making a connection back from the client. The last thing you'd want to have is a persistent connection over extremely flaky carrier connection like 3G or 4G. So I think we've got to start thinking in terms of the paradigm shift has already happened to going to stateless. MIKE MILLER: Yeah, I'll completely second that. We actually only support HTTP connection because we're firm believers-- well, if we look at our user base, it's over half JavaScript. A lot of that is directly from the clients. And so we think a decade from now, a, nobody will run their own databases. And b, we'll be, if not completely in a two-tier stack, then something that looks like a very, very thin middle tier, which is just for session management and sign-ups. And it's going to be direct connection from your mobile device. So for that, now HTTP, maybe in the future, web sockets, things that are a little bit faster and lighter weight protocols. But I think that's the direction the world's moving in. JULIA FERRAIOLI: Next question? AUDIENCE: Do you have any advice if the data is actually graph-like, where it doesn't fit into a document. For example, the data is big but you need to traverse. Think about Knowledge Graph, but the data is huge. How do you distribute it? Any advice? TYLER HANNAN: So the question was, advice regarding modeling graph style data. I'm going to say something that maybe people at my company wouldn't like, which is you should look into a graph database. They're there. They're really freaking amazing, and they do their job really well. So if that's the model that you need, I would encourage you to look into the people who are building a data storage for that model. MIKE MILLER: I'll answer that. If your data fits in memory, Neo4j is your database, that's pretty spectacular. It's got a REST API. It's awesome. And it's an incredible team with a great community. If it doesn't fit there, certain type of graph problems are amenable to MapReduce processing, not all. There's some great talks from Twitter about how they're swinging their Hadoop hammer, because it's the only hammer they have, but how they get into some really funny spots, where they have to get very clever with approximate sets and hash tables and things like that. But some problems are well-suited to MapReduce. So that opens up a lot of doors for you. It's about data modeling, again. But if that doesn't work for you, you're probably not going to solve the distributed graph problem yourself. The community has been working on that for a very long time. So we should probably start to lean on these guys over here about maybe open sourcing something around Pregel. CHRIS RAMSDALE: Noted. [LAUGHTER] JULIA FERRAIOLI: So we're just about out of time. I would encourage you to come up and talk to us after the session. We'll probably move a little bit outside, so we can actually let the next folks set up. But I just wanted to say thank you. Thank you to all of you. Thank you to our panelists for coming out here today. [APPLAUSE]