Tip:
Highlight text to annotate it
X
JULIA FERRAIOLI: Hey, everybody.
Welcome to this session on distributed databases.
This panel is really designed to give you an overview of
some various approaches, some best practices, and give you
the opportunity to ask questions to these industry
experts that we've gathered here on stage today.
So let's get started.
My name is Julia Ferraioli, and I am a developer advocate
working on Google Compute Engine.
BRIAN DORSEY: And I am Brian Dorsey.
And I'm a developer programs engineer working on Compute
Engine and cloud storage.
And we will be your hosts today.
JULIA FERRAIOLI: Excellent.
But why should you care about distributed databases?
BRIAN DORSEY: We think this is an incredibly exciting time in
the world of databases in general.
And the NoSQL world in particular has been exploring
a wide variety of ways to distribute your data.
And the different databases have very individual
approaches to how they approach high availability,
reliability, and scalability.
JULIA FERRAIOLI: And with vast computing resources available
to us on demand, we're able to gather more and more data,
turning it into actionable information.
This information is how we grow, how
we develop and improve.
It's far too important to be lost to a crash.
BRIAN DORSEY: Or some kind of natural disaster.
JULIA FERRAIOLI: Or a klutz tripping over the wrong cable.
BRIAN DORSEY: And as an industry, we've usually dealt
with this sort of thing by implementing failover or some
kind of backups.
And we really believe that the way forward is to distribute
your data onto a large number of machines.
And you're going to want a database that's designed for
that world.
You want a database that, as you add more machines, you get
more reliability.
JULIA FERRAIOLI: So the last reason is actually rather
self-serving.
We all like to relax once in a while.
So with a well-chosen, well-configured database
solution, you can ditch that dreaded on call pager and
finally get a good night's sleep.
So let's introduce our panelists.
First off, we're going to hear from Tyler Hannan from Basho,
then Mike Miller, representing Cloudant.
BRIAN DORSEY: Then we'll have Chris Ramsdale from the Google
Cloud Datastore and Will Shulman from MongoLab.
JULIA FERRAIOLI: OK, Tyler, you are up.
TYLER HANNAN: Beauty.
Thank you, Julia and Brian.
So as they said, my name is Tyler Hannan, and I'm with
Basho Technologies.
A little bit about Basho--
we were founded in January of 2008.
There's about 140 of us now.
We have offices sort of strewn about the world.
But as a company building distributed systems, we're
kind of neurotic about the term distributed, and we're a
distributed team as well.
So about 80% of us work from home.
So when thinking our product, Riak, our database, it's good
to start by considering the benefits of
a distributed database.
We believe that Riak is a very operationally friendly
database that is fault-tolerant.
It can survive a node failure.
It's highly available.
It can survive network partition events.
It's scalable.
You can grow to meet demand simply by adding hardware or
compute instance, and it's self-healing.
Self-healing is an interesting one, because in any of the
events above, be they good, bad, or indifferent,
your data is safe.
So if those are the benefits, what are the properties of
Riak that those benefits are derived from?
Riak is a key/value value store.
It's open source.
It's distributed.
It's masterless.
It's eventually consistent.
Each of these could be a talk in their own right.
So I'm just going to touch on a few very, very briefly.
Riak is a key/value store, which means you have very,
very simple operations--
gets, puts, deletes.
And the value is mostly opaque.
It's stored on disk as binary.
I append a key to that value, either
programmatically or manually.
And those key value pairs are stored in a higher level name
space called a bucket.
I interact with these key value pairs via either an HTTP
or protocol buffers API.
And there's a slew of client libraries for it, some of
which we've built, some which others have built.
Pick a language, there's a library.
And then we've layered some extras on top of that-- some
lightweight JavaScript and Erlang MapReduce, full text
search, secondary indices, pre- and post-commit hooks.
Importantly, though, Riak is masterless.
If you're new to the world of distributed systems and
distributed databases in general, I encourage you to
read the Dynamo specification.
It's a white paper that's seminal
to describe an approach.
And it's the approach that Riak has adopted.
A masterless deployment means that any node can serve any
request, that data and load are spread evenly.
This is achieved through a combination of gossip
protocol, which is mesh network-like, hinted hand off.
But the point is that you can achieve near linear scale by
simply adding hardware or compute instance.
And what does that look like?
It's really three commands-- riak-admin cluster join,
cluster plan-- here's what's going to happen to your datas
and the replica thereof, and cluster commit, and you're off
and running.
In the context of what we're talking about with Google,
that's as simple as firing up gcutil or the Python APIs,
giving it a project, choosing a machine type--
n1-standard-4, because we all like memory as databases--
SSHing in and setting it up, and you're off and running.
So when would Risk make sense for you?
If you want an operationally friendly database, and you
want to combine that with an operationally scalable compute
platform like Google Compute.
And you're in gaming, social, mobile, retail,
advertising, whatever.
Forget about the term "big data." Think
about critical data.
When your data is critical, when it must be served with
high availability and low latency, then you want to look
into a distributed system.
And Riak may be a good one.
Thank you.
[APPLAUSE]
MIKE MILLER: Can everybody hear me OK in the back?
Well, my name's Mike Miller.
I'm one of the co-founders from Cloudant.
And I'm going to give you three quick slides, just
pictures actually.
And while I'm a technical founder, I'm going to try to
keep this high level and get to the
question and answer session.
So how many of you are mobile developers or have that is a
big part of your work?
How many of you have databases that are so big that you can't
fit them on a single machine?
Everybody that is writing a mobile app, if you app is
successful, you'll eventually hit that point.
And so really, that's one of the two main reasons that we
founded Cloudant.
So my background is in big scale systems from Large
Hadron Collider.
And when we were trying to build that and serve a
globally distributed populace on 200 datacenters, we
realized that two things broke our model of computing.
One was big data, and the other was the fact that
everybody was in a different place.
That last one has been rebranded as mobile.
And the fact that scale is important and masterless and
distributed and all the things we're going to talk about, you
should just take for granted.
I'm not going to go in those details.
You should have a service that, when it needs to grow,
you put nodes in it.
And that service gets bigger and
faster in a linear fashion.
That's all great.
But the other thing is to attack the latency problem.
What we realized when we're solving that at the LHC was
really that what we wanted was to take a database and take a
content distribution network, like Akamai, and merge them
together so that I could have my device, and
I could write locally.
And then it would synchronize when it could in the lowest
latency fashion possible.
So you've got 200 precious milliseconds for every user
interaction.
And that means that if you're going around the globe, you
spend 50 milliseconds getting off your phone.
So with Cloudant, we've built a distributed
database as a service.
We run it so that you don't have to.
You can focus on your core competency.
And we're, I think, rolled out on something like 18 to 20
datacenters around the globe now.
So our big thing is footprint and giving you a low latency
connection wherever you are so that you write
local and think later.
And so it's one database that I think a strong
recommendation I have.
As you choose new technologies, choose something
that has a mobile strategy baked in.
So we heard what it's like to install Riak--
awesome.
This is what it's like to install Cloudant.
You sign up.
You get a username.cloudant.com-- done.
And then you get access and resources
automatically around the globe.
So you write local, and your data is beating you around on
the globe as your users move.
And the way this works is pretty simple.
You can actually use this cloud API.
So we speak the Apache CouchDB API.
Beneath that, we've [INAUDIBLE] the same concepts
from Amazon's Dynamo paper around
distribution and quorum.
So that can handle megabytes to petabytes.
But you can also run a local instance of an open source
project on your desktop machine.
Or you could actually run it in the
browser, store state locally--
so the local store HTML5.
And you can even run this on iOS and Touch.
And the cool thing there is you can always write locally
and preserve that full offline experience.
So in terms of what I think you should have in your head
as you think about choosing your database technologies,
think about what things you're forced to do with your
database that you'd love to get away.
And so we're trying to give you a chance to say hey, do I
really need global write master?
Or can I just put data someplace locally and have
that distribution happen automatically?
So with that, I'll pass on the mic.
Thank you.
[APPLAUSE]
CHRIS RAMSDALE: Thanks, Mike, and thanks, Julia and Brian.
I'm Chris Ramsdale.
I'm a product manager for the Google Cloud Platform.
And I'm focused on App Engine and distributed storage within
the platform itself.
The Cloud Platform storage family is comprised of Google
Cloud Storage for serving blob data, Cloud SQL for relational
data, and the newly released Cloud Datastore for
non-relational data.
And for the sake of today, we're going to focus on the
Cloud Datastore.
And I've got a few slides, and we'll get going, and then
we'll drop it into Q&A as well.
So announced yesterday, the Google Cloud Datastore was our
efforts to extract out the higher application datastore
that was part of App Engine and make it available to any
compute container, be it EC2 or Compute Engine
or things like that.
Launched in 2011, the high replication datastore turned
out to be a really great project and a really good
service for our users.
We've gone for two years.
We're at about 4.5 trillion transactions per month right
now storing petabytes of data.
Jokingly, App Engine has really been referred to as the
SDK for BigTable.
So what we did was we said, that's great, but it's
packaged up in the App Engine SDK, so you can't get to it
from anywhere else.
And so we pulled that out.
We put an HTTP interface on top of it.
I'm not really doing justice for what we did to the API,
but you should go read the docs.
And now we're announcing it as the Google Cloud Datastore
and, again, making it available from anywhere.
And by doing that, not only do we get the hardened
productivity or the productization of HRD and now
inside of Cloud Datastore, we also get the features and
functionality, for example, asset transactions, eventual
strong consistency, a built-in query engine that lets you do
fixed cost queries and things like that.
Those are all available as well.
So that's what the product is.
And real quickly, we're going to talk about how that aligns
with what our philosophies are and what we're trying to do
within the Google Cloud Platform all up.
So at it's core, we're trying to bring Google infrastructure
to developers, to our customers, to you that are in
this room today and joining us for Google I/O. Specifically,
the same infrastructure that's running things like Geo and
Mail and Google Maps and Search and things like that.
And what that is specifically in the case of storage, is
these components that you see here on the left hand side
that Urs had talked about yesterday in the Cloud keynote
if you happened to see it--
everything from our datacenters and our machines
to our networking to the crazy services that we've built over
the past few years--
Colossus, BigTable, and Megastore.
And by doing that, what we allow you to do is we bring
you high availability.
By building on top of Megastore, we actually give
you the replication across multiple geos.
We give you high scalability.
So Colossus for actually giving you the stores, the
underlying mechanism for storing.
It gives you a huge capacity and high durability, and then
BigTable for horizontal scaling out.
So we do that automatically for you.
You don't have to actually add a node to the Cloud Datastore.
We just shard appropriately and things happen.
And finally, an API frontend that's actually built on top
of App Engine that we'll just scale out.
We're looking for access from anywhere.
We think that data it should not be siloed.
It should follow you wherever you go.
And as our platform expands out and as customers use
hybrid approaches like Compute Engine in EC2, they want their
data to follow them wherever they need to go.
And then finally, what's near and dear to my
heart is Fully Managed.
I think that we're all going to talk about management and
how we do it as we get into the Q&A. And what that means
for us at Google is that you get this guy.
This is Michael Handler.
He's one of our Site Reliability Engineers.
We call them SREs.
And they help keep the service up and running.
They help us do planning, architecture, failovers,
things like that.
But because you're building on the same infrastructure as all
those other products I alluded to, you not only get him.
You get him cloned like 1,000 times.
So you've got all these teams that are monitoring all these
services that do automatically scale out for you.
But sometimes things happen.
And we've got the cell phone, AKA the pager to take care of
it and make sure you don't have to worry about it so you
get more sleep.
I'll turn it over to WIll.
[APPLAUSE]
WILL SHULMAN: Hello everyone.
My name is Will Shulman.
I'm the CEO and co-founder of MongoLab.
I'm going to give you an introduction to MongoDB and
our MongoLab Lab cloud service.
It's upside down.
So real quick, what is MongoDB?
MongoDB is an open source, high-performance, distributed,
and document-oriented database.
And I'll go through what that means really quickly.
First and foremost, MongoDB is a document database.
It doesn't mean, for those of you who are not familiar with
that terminology, it doesn't mean it stores PDF files.
It means it stores these rich JSON documents.
MongoDB stores these in a binary representation of JSON
called BSON.
And this is a really important part of MongoDB that really
differentiates it from other NoSQL databases.
It's not just a key value store.
It gives you these rich data structures.
It's got nested objects.
It's got arrays.
You can represent geolocations natively.
And also, unlike a lot of key/value stores, MongoDB has
a rich query language.
It's a real database.
It allows you to do searching and sorting by not just the
keys, but any nested value.
So here we have all sorts of examples of doing queries and
geolocation, if you look down at this
example towards the bottom.
It's built in.
It's got aggregation, so you can do binning and grouping
like you would with a relational database.
And also like a relational database, although you can
search and sort by anything, you can create indexes on
almost anything.
So if your queries are slow, you can index on any field
value, nested key values deep inside the object.
You can index on array values.
Makes it really great, not just for a large data sets,
which I'll talk about in a minute, but really as a
replacement for MySQL or any relational database you use.
If you spent an inordinate amount of time in your life
mapping from your object-oriented programming
language to a relational datastore and doing
object-relational mapping, you don't have to do that anymore.
Second, MongoDB is a distributed database.
There are two clustering technologies with MongoDB.
The first clustering technology is
called Replica Sets.
This is what allows you to have high
availability in MongoDB.
It's single master.
So you can read and write.
It supports both strong consistency and eventual
consistency.
So if you need strong consistency for your
application, you can have it.
If eventual consistency is acceptable for your
application, you can also read from these secondaries.
And its consistency model is very, very configurable.
The second clustering technology is
called Shard Clusters.
This is basically a cluster of clusters.
So it's a cluster of Replica Sets.
And with Sharding Clusters, this is how you get horizontal
scalability with MongoDB.
Like many have mentioned, as you add nodes to the system,
your rights get distributed amongst them, and it just
scales linearly in a very horizontal manner.
What is MongoLab?
So MongoLab is MongoDB as a service.
And that's what we do.
MongoLab really is about automating all of the
operational aspects of our running database.
We try to build an expert system out of the DBA so that
you don't have the DBA.
You don't have to have the ops guys.
And you can focus, like a lot of folks have mentioned, on
building your app.
We handle provisioning and scaling,
replication and backups.
We've got rich Web UI.
We handle all the monitoring, all the alerting.
And of course, all with the service comes expert support.
Our product offering offers shared and dedicated VM plans.
We have SSD backed plans, single and multi-node plans.
So whether you're doing analytics and you just need a
single node database, or you need lots of nodes, we're
going to have full sharding support in 2014.
And we run on all sorts of cloud providers, and we're
here today and this week to announce our support for the
Google Cloud Platform, which we're really,
really excited about.
So thank you very much.
We've got a sandbox.
I guess I lost my last slide.
I just wanted to say we've got a sandbox outside tomorrow at
Google I/O, where you can come and get a demo of our service.
So we encourage you guys to stop by.
Thanks.
[APPLAUSE]
JULIA FERRAIOLI: Thanks, guys.
That was a great overview.
And this is the fun part where we jump into
some Q&A by the audience.
I encourage you if you have questions, start lining up at
the mics now.
To jump start it, we'll just seed a question.
We're at Google I/O. Most of us were
at Urs's talk yesterday.
So we heard a lot about the improvements and the type of
infrastructure that we have at Google.
So my first question, and this can be answered by everybody,
is, what makes Google's infrastructure a great place
to run distributed databases?
We'll start with Will.
WILL SHULMAN: So we've actually been using GCE for a
little while.
And first thing actually is the network.
The network is a, it's blazing fast, really, really
impressive.
The other thing that's great about the Google Cloud
Platform is it has a private distributed backbone between
all the datacenters.
So we're really excited about that functionality.
We're really excited to offer globally multi-region clusters
where you can have nodes--
different parts of the globe.
But you're not talking over the open internet.
You're actually talking over Google's private backbone.
CHRIS RAMSDALE: So it's slightly different for us,
because we're not actually building on top of Compute
Engine VMs, at least today.
But I think there's an analogy that I'll get to real quickly,
which is that for the same reasons that it's good to
build on Compute Engine for all the services they provide
and the fast networking and the things that Google takes
care of, it's the same reason we chose to build on top of
things like Megastore and Colossus and BigTable and
whatnot and are looking at things like Spanner as well,
because those are the lower level pieces that are handled
by Google, much like the lower level VMs are inside of
infrastructure.
And then also at the same time, while you can access the
Cloud Datastore from things like EC2 or
Rackspace or whatnot--
although I think the latency might be
a little bit awkward--
we really built this for Compute Engine users, because
we do think it's one of the great platforms that we're
offering and bringing into the larger cloud platform.
And so we wanted to give those users a fully managed
non-relational schemaless datastore to
truly grow their apps.
MIKE MILLER: Boy, there are a lot of ways I
want to answer this.
Number one maybe being that it's not Amazon, which is kind
of nice as a change.
The reality is we built a database as a service.
And we've got to bring that to the place where
the developers are.
There's a large and rapidly growing number of
developers on Google.
So for us, there are all kinds of technical reasons about why
it's a great platform.
If you're building a CDN, you want to know the
network very well.
And reliable latencies are huge as well--
so private backbone.
But really, the number one thing for us is footprint,
like I need to get as close to the edge as possible.
And so Google has one of the biggest, if not the biggest
deployments globally.
So for us, that's massive.
TYLER HANNAN: So Will mentioned this, and I'm going
to repeat it.
The private backbone between datacenters matters.
Data locality is an important concern when you're
distributing a global database.
But also, when scalability is a problem, it's a problem now.
And it's a business problem now.
And so the ability to simply scale by using the tools that
are provided to developers regardless of which database
solution you choose is an important and, I think, unique
component of Compute Engine and something that if I could
surmise, will be a continued area of
improvement in the future.
JULIA FERRAIOLI: So it looks like we have some questions.
AUDIENCE: My name's Drew [? Broadly. ?]
I'm from New Zealand.
One of the questions I have--
because the people from Cassandra aren't up here, and
I thought I'd take the opportunity--
and what are your opinions on distributed counters, and
what's your approach?
TYLER HANNAN: So distributed counters was the question.
Sorry, I'm repeating that, because I'm hard of hearing.
We've been working ***, in concert with INRIA Research in
France, on CRDTs and looking at building counters that can
survive in eventually consistent world.
We've also been working on adding some strong consistency
capabilities into Riak.
I think they're there.
I think they're coming soon.
I think they're a lot of scary math.
So it'll be interesting to see how the market adopts.
But I think we're on the cusp of distributed
counters being a reality.
MIKE MILLER: Yeah, CRDTs are definitely a hot area of
implementation.
I don't know about how hot there are research-wise, but
we're certainly seeing users who have implemented a large
number of those things in reality.
But overall, I think a little bit of a shift in mindset when
you're writing something.
A lot of people who come to NoSQL will get there after
going to the fully denormalized state.
So you start with start schemas, and you work up all
the way to full.
You normalize.
And that mindset is something that we have to help show
people the best ways to solve problems.
I think one of the things that we also see that's available
now in the majority of these systems is going to an
immutable data model where if you're modeling a state
machine, every transition is modified--
modeled as, say, a new document in our system, a new
JSON document.
And the aggregate state is summed up
in a secondary index.
So that's a really powerful way already without even
appealing to something as complicated as CRDTs to deal
with rapidly changing things that have to be summed.
So concrete examples--
leader boards, amounts of money in accounts, virtual
currencies, which maps a very real large amount of dollars.
CHRIS RAMSDALE: I think they summed it up well.
AUDIENCE: Are you two able to give an input on it or not?
WILL SHULMAN: I'll be honest with you, I've given almost no
thought to distributed counters.
So sorry about that.
AUDIENCE: Thank you.
JULIA FERRAIOLI: More time for more questions.
AUDIENCE: So many of my friends who initially
recommended MongoDB to me, later, I noticed they grew
disillusioned with document-oriented databases.
I'm wondering how have you noticed organizations dealing
with the engineering scaling issues of actually engineering
applications with
document-oriented databases versus--
I'm not talking about the whole technical scaling, but
building their applications.
WILL SHULMAN: So for document-oriented databases is
specifically your question?
AUDIENCE: Right.
JULIA FERRAIOLI: Sounds like a great question for Will.
WILL SHULMAN: I think if you're not just talking about
the technical aspects, which obviously involve sharding and
using the horizontal scalability aspects of
MongoDB, from an organizational standpoint, we
find the customers are just loving MongoDB even actually
if it's not a large data problem.
So the ability to not-- and like I mentioned in my talk,
almost every programming language now is an
object-oriented programming language.
So having a rich object data structure in your database is
really just unbelievable for developer productivity.
If you're using something like server side JavaScript or
Python or Ruby, and you have this object in memory, you
could just throw it into the Datastore.
We're finding that people are really finding the schemaless
nature of Mongo, the ability that you could just add
fields, the ability to just insert things and take them
out without having to do a lot of transformations, is really
helping from an organizational standpoint, which is what I
think you were getting at, really scale development,
really scale productivity.
MIKE MILLER: Sorry, I didn't mean to steal the mic.
I think it's a good question if I interpret it
a little bit myself.
I think one of the questions is something
like a document database.
You're [? trading ?]
JSON on the wire over HTTP.
There's a lot of ways to model your data.
And that flexibility, again, like education in terms of
what is the best way for a particular system to solve
this problem.
In our system, there are at least four different ways to
store one to many, many, many relationships.
And so a lot of that has to do with just getting out best
practices and understanding, oh, my problem
falls in this bucket.
And that's something that I think the communities in
general are rapidly starting to try to get down on paper.
CHRIS RAMSDALE: I'll add to it.
To touch on the point of everybody that's doing mobile
as well today, I think there's other things you want to
consider as well in terms of going after that approach and
going after doing document stores.
While they're schemaless and they're great and it makes it
very easy to scale, you do lack the things that you get
from, say, doing row-column, where you want to do, I want a
subset of my property, because you're trying to minimize the
payload that's going back over the wire to your mobile
clients, especially when you're doing a back end as a
service type of solution when there's no server side code to
filter out.
You're literally going from the mobile device-- iOS,
Android-- back to the Datastore.
You're like, no, no, no, I don't need 100 properties.
I only need the first five, because that's actually what
I'm showing in my limited real estate view there.
TYLER HANNAN: And I think this is an important question, so
I'm actually going to answer it slightly differently, which
is in the world of relational databases, we started modeling
our data by saying what answers do I have.
In the world of NoSQL, non-relational databases, we
start modeling our data by saying what
questions do I have.
So you could think of it as a top down approach rather than
a bottom up.
And it's not an either/or.
There are situations where you may want a relational
datastore alongside a non-relational with something
specific for geospatial indexing.
That polyglot deployment is important, and it's something
that, as developers, I think we can be excited about that
we're not limited to a set of tools.
We have a broad range of tools to draw from based upon
application need.
It's not a one to one application to
database model anymore.
It's a one to many, depending on what's living inside of the
application.
AUDIENCE: You mentioned Spanner.
Could you speak about how it differs from BigTable, and are
there any plans to expose it via an API?
CHRIS RAMSDALE: To repeat the question, I think question
was, I had mentioned Spanner, and was there--
AUDIENCE: How does it differ from BigTable, and are there
any plans to expose it via an API?
CHRIS RAMSDALE: So it doesn't differ BigTable.
It differs from Megastore and how they're doing the
replication.
Spanner is, in many ways, the next evolution of Megastore in
the fact that it does georeplication in a more
efficient manner and actually can do time stamping across
multiple geos.
So it really gets us to where we're trying to go, is the
global footprint.
Whereas replication between datacenters right now via
Fiber is actually pretty good.
Think about where do you want to be in five years from now,
10 years from now, where you're like, I don't want to
actually think about, unless it's for compliancy reasons,
where my data is.
I just have a bunch of users.
I have a great service that's taking off.
I'm the next Tumblr, Pinterest or whatever.
And they are all over the place.
And my data needs to be replicated.
So then I go across the Pacific or I go across all the
dark fiber that's in the Atlantic or whatnot, it just
happens in microseconds.
So that's really the goal of Spanner.
And that's why I was looping it back to Megastore, because
Megastore is the service that actually does the replication
for us and for Google.
There's a question about APIs.
And yes, there's many conversations underway about
should we surface an API on top of Spanner.
I have my personal beliefs that I think that what we're
doing inside of HRD, inside of the Cloud Datastore is the
right way to go, because we're actually pulling in--
if you flip the bit on all of Google's infrastructure right
now and made it public, I think
everybody would freak out.
It's not the way I think that everybody wants to program.
So what I like what we're doing inside of the cloud
platform is we're putting that layer on top of it that makes
it really palatable for developers to come along and
use some pretty mind-blowing infrastructure.
AUDIENCE: So is that a no?
CHRIS RAMSDALE: Say what?
AUDIENCE: Is that a no?
CHRIS RAMSDALE: No, no, no, that's a not yet.
It might.
It's actually been debated inside of Google right now.
AUDIENCE: Hi, [? Stephan ?]
[INAUDIBLE] from Rovio Entertainment in Finland.
I was wondering if you have some advice, ideas, or tools
to deal with backup and restore strategy in reasonable
time, because usually you deal with huge amounts of data and
distributed databases.
And while most of them address disaster scenarios already in
this redundancy setup, there's still a possibility that
you'll cause data corruption by bugs in the application--
not that that would ever happen.
CHRIS RAMSDALE: Yeah, in actuality, it's the
application bugs that get you all the time, not that Rovio
would write any bugs.
Like the pigs wouldn't go off into space somewhere else and
attack a planet.
We have a backup and restore functionality right now that's
in its very much alpha phase, I would say.
That's user-initiated.
We're in the process of moving that into a
fully managed service.
It goes with the whole theme of what we're trying to do.
And that just gets us to where we want to be.
That's the baseline functionality.
That's an epic backup is what I call it.
So if you have 10 terabytes a day, you're doing that backup
all the time.
And one could do the math in their head and say, that's not
necessarily the way I want to be doing things.
So then what we're going to do is take it from there and go
into doing incremental backups with snapshots as well.
So what you get is, you get full
backups that are reliable.
Then you get backups for cheap-- incremental.
And then you get backups with snapshotting, which gives you
the consistency that you can do them at anytime.
WILL SHULMAN: So we run a cloud service around MongoDB.
There's lots of interesting techniques that you can use
with Mongo to deal with this problem.
And you're right.
Almost always we deal with node failure via replication.
But human error is always when people come calling us and
say, hey, can you restore backup for us?
So we have a backup system that does snapshotting and
does the typical type of backup you might expect.
But there's other technologies, both core to
MongoDB and stuff that we're building that's going to help
even further.
So one is a MongoDB technology called Delayed Slaves, or
Delayed Replication.
So you can actually set up a node that lags behind the
master by an hour, a day, a week.
And that's one technique some customers used to say OK, I
fat fingered the database.
I'm going to go to this Delayed Slave that's maybe an
hour behind.
And I'll have a set of data that is not corrupted by the
bug or whatever caused the problem.
The other nice thing about MongoDB is it has an op log of
every operation.
This is what replication is driven off of.
So you can actually restore to a snapshot.
Let's say you restore to a snapshot from a day ago.
Then you could replay the op log to bring the data state to
the operation right before the operation that started to make
things go wrong or started to corrupt the database.
So you could actually do point in time.
And that's also something we're looking to
productize as well.
MIKE MILLER: Maybe I'll just add an anecdote.
I'm sure everybody here has backup solutions and a story
there available on the product page.
One interesting piece of data, though-- and this is not fully
quantitative.
I'm just trying to this in my mind right now.
But I would estimate about 80% of the user-initiated errors
we've had are simply database deletes by mistake.
You have a REST API that looks really clean.
And it's like, you've got a verb.
You've got a base URL.
You've got a collection or a database.
And then you've got a document, maybe other things
behind that.
And you have a script that somehow has an
empty string in there.
You just dropped the whole database by mistake.
And so actually about 80% of the things that we have to do
for users are restoring databases
that they just deleted.
And so number one thing you can do is rename files in
garbage collects.
There are basic operational policies you can take that
allow to, like, oh, that email came in.
Yeah, we just restored it right away.
Flip the bit.
So it's interesting to see actually.
One of the things we should do is talk about as you run the
service, what are the things you learn about the users that
really drive how you go back and feed that
back into the product?
JULIA FERRAIOLI: Don't make delete calls too easy.
CHRIS RAMSDALE: Just do what we do.
We make the writes, the delete actually really expensive.
And the storage is really cheap, so you just keep the
data around.
I've got to poke fun at myself.
TYLER HANNAN: A whole slew of options and
opportunities with Riak.
I'll just reiterate what the gentleman said, which is
backups are, again, a business challenge in addition to a
technology challenge, particularly when you're
considering latency.
So whether you adopt a backup strategy that's a fully
replicated cluster that mirrors production or whether
you use some cool syincing technology or whether you do
more of a traditional sort of replication, whatever you
choose, plan for failure.
That's my key message about distributed systems.
Plan for failure.
And test it.
JULIA FERRAIOLI: So we have a little under six minutes left
for questions.
AUDIENCE: So my question is very much related to the
previous one.
In human error events, if you have an append-only database,
where instead of deleting or updating, then you can stop
those errors as well.
Do you guys have any plans to support an append-only style
database as a first class kind of product?
MIKE MILLER: I can tell you.
Sorry, sometimes it sounds like it cuts out.
So we use modified portions of the Apache
CouchDB storage engine.
So it's an append-only copy and write b-tree
So we're never overriding our data.
We compact it away in the background after so much time
or so many revisions.
But we haven't fully leveraged that yet to productize very
clever incremental slash point in time backup service.
So there are great things you can do that, depending on the
storage model you choose for primary and secondary indexes.
CHRIS RAMSDALE: So HRD and the Cloud Datastores, since we're
built on top of BigTable, we're already doing the
append-only.
This is why it's actually much easier just to leave your data
around, rather than it is to go and do writes to delete it.
And then to dovetail back into what I was talking about to
the guy from Rovio is that, given that we have the
foundation laid to allow us to do incremental and snapshot
backups at time based on the way we're doing the appending.
AUDIENCE: So there's a lot of cool technologies being
introduced and rolled out to developers.
But with the database, I always wonder why are we still
using HTTP to interface with the data?
Why are we using a stateless connection?
And my personal challenge is using high concurrency at
Python, you start accumulating large file descriptors for all
of the HTTP connections that have come and gone and not
been collected in time.
And so it's a real challenge managing that versus when I'm
using Postgres or something with traditional binary
stateful connection, I don't seem to
run into those problems.
So is there a plan to maybe standardize a binary protocol
or a stateful protocol for these types of
services coming out?
WILL SHULMAN: So MongoDB actually has
a binary wire protocol.
It actually isn't an HTTP interface.
MongoLab actually does offer an HTTP interface on top of
the Mongo protocol, and you can use it if you'd like to.
But I'd say 90% of our customers use one of the
standard MongoDB drivers supported by 10 Gen.
So almost every programming language you might want to
talk to MongoDB from, there's a driver, and it's a tight
wire protocol.
It's based on the BSON spec, which is at bsonspec.org.
And yes, so very well understood.
And all the drivers have connection pooling and things
like that from a traditional relational database, what
you're probably used to.
CHRIS RAMSDALE: But I'll add to that.
I think that the way we think about is, with the stateful
connections, whether you're doing socket or whatnot, it
ends up being a bottleneck in your system when you're trying
to scale out horizontally.
So that's one.
And so we try to avoid those wherever we can.
So HTTP just happens to be the mechanism.
Maybe that's going to change over time.
Hopefully it does.
There's a fair amount of overhead to bring those
connections up.
And then two, what I'll add is, think about it in terms of
when the server goes away, when it really is just a bunch
of mobile clients, when they're actually doing, when
we've got all the auth figured out and things like that.
So you're making a connection back from the client.
The last thing you'd want to have is a persistent
connection over extremely flaky carrier
connection like 3G or 4G.
So I think we've got to start thinking in terms of the
paradigm shift has already
happened to going to stateless.
MIKE MILLER: Yeah, I'll completely second that.
We actually only support HTTP connection because we're firm
believers--
well, if we look at our user base, it's over half
JavaScript.
A lot of that is directly from the clients.
And so we think a decade from now, a, nobody will run their
own databases.
And b, we'll be, if not completely in a two-tier
stack, then something that looks like a very, very thin
middle tier, which is just for session
management and sign-ups.
And it's going to be direct connection
from your mobile device.
So for that, now HTTP, maybe in the future, web sockets,
things that are a little bit faster and
lighter weight protocols.
But I think that's the direction the
world's moving in.
JULIA FERRAIOLI: Next question?
AUDIENCE: Do you have any advice if the data is actually
graph-like, where it doesn't fit into a document.
For example, the data is big but you need to traverse.
Think about Knowledge Graph, but the data is huge.
How do you distribute it?
Any advice?
TYLER HANNAN: So the question was, advice regarding modeling
graph style data.
I'm going to say something that maybe people at my
company wouldn't like, which is you should look into a
graph database.
They're there.
They're really freaking amazing, and they do their job
really well.
So if that's the model that you need, I would encourage
you to look into the people who are building a data
storage for that model.
MIKE MILLER: I'll answer that.
If your data fits in memory, Neo4j is your database, that's
pretty spectacular.
It's got a REST API.
It's awesome.
And it's an incredible team with a great community.
If it doesn't fit there, certain type of graph problems
are amenable to MapReduce processing, not all.
There's some great talks from Twitter about how they're
swinging their Hadoop hammer, because it's the only hammer
they have, but how they get into some really funny spots,
where they have to get very clever with approximate sets
and hash tables and things like that.
But some problems are well-suited to MapReduce.
So that opens up a lot of doors for you.
It's about data modeling, again.
But if that doesn't work for you, you're probably not going
to solve the distributed graph problem yourself.
The community has been working on that for a very long time.
So we should probably start to lean on these guys over here
about maybe open sourcing something around Pregel.
CHRIS RAMSDALE: Noted.
[LAUGHTER]
JULIA FERRAIOLI: So we're just about out of time.
I would encourage you to come up and talk to
us after the session.
We'll probably move a little bit outside, so we can
actually let the next folks set up.
But I just wanted to say thank you.
Thank you to all of you.
Thank you to our panelists for coming out here today.
[APPLAUSE]