Tip:
Highlight text to annotate it
X
bjbjA Ad:Tech NYC CTO Roundtable Srini Srinivasan/Moderator: We have here a distinguished panel, from several
real-time advertising companies who are key technical leaders in this area. The panel
members really need no introduction, but I'm going to request them to give a brief introduction
for themselves, starting with Mike from AppNexus. Mike Nolet: I'm Mike Nolet, CTO/co-founder
at AppNexus. What we do is we sell technology to companies, which helps them run real-time
businesses. We work with real-time sellers and real-time buyers to Ad Exchanges, SSPs,
DSPs, ad networks, all the various different kinds of entities in the space. Our major
customers are Microsoft Ad Exchange, for example on the sell side, Orange, now Interactive
Media, which is Orange Telecom in Germany, and then on the buy side we have major companies
like Collective, and also eBay, that use us to buy on a real-time basis. I asked the audience
to see who, up here, does not fit. There is one person wearing a jacket. And he works
for Aerospike, no? Srini Srinivasan/Moderator: Let me interject for a second, the person
from BlueKai was unable to attend, so I invited the CTO of Aerospike, Brian Bulkowski, to
could join us. He's the odd man out, so Mike's right. Mike Yudin: Hello. My name is Mike
Yudin, I'm the CTO at AdMarketplace. We're an advertising technology company based right
here in New York, and we operate the largest search network outside Google and Yahoo Games.
We work with major, best and largest internet brands delivering performance pay-per-click
traffic to these advertisers world wide, and remain the eighth fastest growing private
company in New York. We have a lot of traffic just like everyone here on this panel, and
we solve complex advertising problems in real time, result with data that comes our way.
Dag Liodden: I'm Dag Liodden, I'm the CTO of Tapad. Tapad is a fairly young technology
company, we help advertisers reach audiences across their multiple screens. If you're a
user in this day and age, you probably have tablets, you have iPhones, you have laptops,
and what we try to do is we help advertisers target, and measure performance across multiple
devices. Pat DeAngelis: I'm Pat DeAngelis, I'm the Chief Technology Officer for [x+1]
Solutions. We're a digital marketing hub, much more on the advertiser's side. We enable
cross-channel analytics and optimization across multiple touch points, typically for enterprise
clients. Typically me would do site optimization, we have a real-time bidding DSP if you will,
using AppNexus' sell site. We are actually a partner of AppNexus. Our clients include
J.P. Morgan Chase, Capital One, Fingerhut, FedEx, Delta, and some of the largest brands
on the Internet. Brian Bulkowski: I'm Brian Bulkowski, from Aerospike. And I'm filling
in for the BlueKai gentleman. And as co-founder with Srini Srinivasan, and one of the inventors
of our technology and database. Srini Srinivasan/Moderator: Thank you. In terms of how we will do the
panel, what I am going to do is to kick it off with a few questions myself to the panelists,
and given the fact that the room is small, we can be a little bit more interactive. So
any of the audience members here, if you find that the discussion is either too technical
or not technical enough, feel free to raise your hand. Or for that matter, if you have
questions after the first few questions are answered, I'll throw it open to more questions
from the audience. Thank you. So let's get started. Real-time big data has been used
in several critical aspects of the advertising business. We at Aerospike have been fortunate
to have a front row seat to witness the revolution of this real-time beginner technology to the
experiences of our customers. This panel is an attempt by us to bring together some of
the foremost experts in the area so that other people can learn about this evolution and
participate in this session. These companies in real-time advertising use big data in very
interesting ways. For example, they use it at the H server level, where they deal with
millisecond SLAs day in and day out. They also deal with analyzing the data and then
feeding new model in on a periodic basis, maybe every hour, every day, and so on. What
I will start with, the initial question to the panelists, the question essentially is:
What are one or two instances where real-time data processing has had a tremendous impact
in a positive way on your business over the last year or two? I'll start it off addressing
this to Mike from AppNexus. Mike Nolet: On our side, I don't know if it's
a positive impact so much as it's something without which we can't operate our business.
When we do real-time buying, which is obviously something we do, we listen to every single
real-time ad that's available. We power a fair number of them, and that's a whole bunch.
We see in our peak day I think 39.5 billion ads in one individual day. It's about 600,000-odd
requests per second. Every second -- I think right now is about peak time -- every second
we are bidding on 600,000 ads. And the reality of that is, a lot of that buying is being
driven by behavior, and so we must have cookie data server-side. For us it's not a choice.
We have to have a server-side data storage. We have to be able to do 600,000 reader requests
a second. Now we deliver about 150,000 - 170,0000 ads a second, so every time we win an auction
we do write updates to our cookie store. And so for us, we don't have a choice, right?
For us, we actually work with Aerospike, and we were - were we the first or second customer?
First. So back when it was Brian and Srini in a coffee shop in San Francisco, and they
were two guys and they said, "Yeah, we can do this for you." We didn't really have a
choice, because we were working with another vendor that was just truly terrible, who will
remain unnamed. And so we actually trusted them with this, and we've had a fantastic
ride over the last three and a half years as we started at, I think, 10,000 qps and
climbing to 600,000 qps. Probably hadn't found every bug the first time, so all good for
all of you if you want to work with it. For us, basically, real-time key value store has
enabled all of our real-time buying business. Also it's a platform for the ecosystem, so
what's really exciting is -- I don't know if you guys know this on AppNexus you can
effectively use our key value store, you can use our infrastructure and our data centers
and actually put your own data in there and use that. It's really enabled us to provide
fantastic technologies and offerings to our customers. Mike Yudin: AdMarketplace is a
search syndication network. What that means is we look at each request for ads, of which
we get about a billion a day, or 50,000 per second, in three dimensions. We'll look at
the traffic source where an ad is going to be displayed, we'll look at the user who's
going to see the ad, and we'll look at targeting information such as the keyword that the user
typed in the search box. When a request gets into our system, we look at these three dimensions.
We have to pull data in one or two milliseconds, on all of them, we have to know as much as
possible of the user, as much as possible about the traffic source, and match this based
on the keyword with all the ads that our system bid on. And then make a prediction, essentially,
as to what would be a fair price per click for each ad that matched this request. And
we turn this to advertising, all 50,000 requests per second. Just like Mike said, there's no
question, there's no sort of supplementary value of the real-time data store, it's a
necessity. Without that, we wouldn't be where we are. All the competitive intelligence for
a net company like ours, everyone on this panel is in edata and how you use that data.
The more data you have and the better access to this data you have in real time, then you
can make very intelligent decisions and you can have sophisticated offline processing.
But what is modeled all happens after the fact. The core of this is a very simple business.
You get in a request, look at data, in ten milliseconds they'll ask you to turn ads,
and then you see what happens. That's a constant, never-ending cycle. Audience member: [??] [10:06]
Well, a success story. Sure, very simple. One of our advertisers, Volvo cars, started
an ad campaign with us. Their goal was to actually get people into a Volvo dealership
to drive Volvos. So we started a very broad ad campaign. We'd never had Volvos advertised
in our system before. So how do you know if a person who is about to see this ad will
have any interest in driving a Volvo? So what you do is, you look at past history of these
users. You see, has this person searched for things like new car prices, or test drive.
If you see the user with that type of search, in the history in the last week or so, you
can probably know they're in the market for a car, especially if the request comes from
a perfect source, like a car blog or something like that. It doubles your chances. So we
use all this data. We successfully execute it on the campaign, and I think Volvo reported
that the ROI was actually higher than their search engine buy on Google. You can find
this on our website. So there s many stories like this. And this would not be possible
without data. Otherwise, you'd just be spraying and praying as they say. So we don't do that.
[11:40] Dag: So, our business is similar to what Mike said. We also do real-time buying.
What makes our setup a little bit different from all of the other places is that we're
not just looking at individual devices, we're also looking at how they're connected to other
devices. And we want to try to use that for targeting, but also for attribution reporting,
so after something happens, so if someone actually goes and buys something, we want
to see which of the devices were involved with this chain that led up to this purchase.
What Aerospike has enabled us to do is not do all these things after the fact, so traditional
in this space, you often do log shipping, and then you go through these logs afterwards.
You sift through, and try to find patterns and you kind of do an offline batch processing
of these things. What Aerospike has enabled us to do is that we can keep this entire data
set, which we call the device graph, which has data about all the devices we see and
also the connections between them. We can actually query that data in real time. If
someone goes and buys or signs up for Netflix (they're not a customer of ours by the way)
if someone buys a service, we can go into our graph immediately, start with that device
and see which other devices are related to this. And we can pull that in real time, instead
of having to shift these logs through some big distributed file system like Hadoop, and
then have to run a heavy job that maybe come up with a result 24 hours later. We can do
all these things in real time. We can call up the partners, say, a second ago someone
signed up and the cross device impression history of this subset of the graph looks
like this. So basically we have access to our entire data set and subset, in response
times at all times. Pat: We also at [x+1] have a demand-side platform, which does real-time
bidding. A lot of the stories I'm hearing from my colleagues here, I can echo the same
sentiment. If you can't make a decision based on as much data as possible, in a few milliseconds,
you're pretty much toast in that business. Where [x+1] is a little bit different, we
also do a lot of onsite optimization. What that is, basically as an example, you go to
the FedEx home page, you're going to get a bunch of offers, so we're pretty down in the
marketing funnel; you go to FedEx's website, home page, their offers that they're going
to provide are powered by [x+1], 100%. So these are pretty high value transactions.
The way we do that is typically through a predictive model. We build predictive models
continuously. We have all our processes that run and tweak these models, and basically
these models execute in real time. So one side of the equation for us is to make sure
that we can execute those models, as quickly as possible. And the other side is to make
sure we have that vector of data on that user so we can provide the best offer. Otherwise,
we can't optimize their experience, and we don't get paid. So I would say what's really
helped, and the success story for Aerospike is, we can now onboard anywhere from 5 to
10,000 attributes for a user, put that up in our data store, and slide through those
vectors with our model all day long. We can chain models together, meaning we can execute
a model. If that results in fetching some more data and executing another model, we
can certainly do that now well. Data can come from offline. We can get a file from somewhere
like an Axiom, with 500 attributes, we load it up and in the next few seconds, that person
goes to anywhere on, let's say, Chase's home page, their mobile app, what have you, we
have that data. We can execute that model, we can optimize. That just wasn't possible
to the same scale before we went to Aerospike. [16:05] Moderator/Srini Srinivasan: It's clear
that performing at really high levels is necessary for running real-time advertising businesses.
The other side of the coin is to achieve 100 % up-time. With the kind of weather issues
we've been having lately on the East coast, some of our customers have actually dealt
with these issues. I'm just going to go to Mike Yudin, and request him to talk about.
[laughter] Okay, I'm going to give everybody a chance, but I'll start with you to talk
about how hard it is to achieve 100% up-time, and how it is so important in your business.
Mike Yudin: Okay. Well, thank you, Srini. You have to remind me about the most depressing
week of my life. Moderator/Srini Srinivasan: I'm sorry. Mike Yudin: But it's all good.
We do a 100% uptime. We lost one of our data centers in the flood, and it's not just the
data center itself that lost power, it's the entire network infrastructure of the tri-state
major area, all the backbones. Your Verizons and Sprints of the world lost connectivity.
So, and then we stayed up and didn't lose a bit of data. How did we do this? We do this
by having redundant, not only redundant equipment within the data center, but also the globally
load-balanced infrastructure across multiple locations. If one gets flooded, then traffic
just gets shifted into the data center that survives. The trick here of course is to make
sure that your location has all the same data and all the same intelligence as the system
that got destroyed. There are several techniques in this, and one is cross-data center replication
of data. So this is one of the reasons why we chose Aerospike. They have this ability,
so our data centers exchange data between each other through their XDR cross-data center
application process. That works quite well, and it's fast. If a user is in Chicago, and
they do a search for a new car in Chicago, and it hits the Chicago data center, in less
than a second this information propagates through the New York data center. If the same
user then goes to another website, and a request hits and a disaster happens, and then his
next request arrives to a different location, all the data is available. Of course you have
to plan, and you have to have a disaster recovery plan in place, and then you have to have a
plan in place for what you do after everything goes back to normal. That's what we're dealing
with today. And you also have to make sure you choose a nice and sunny location for your
disaster recovery office. I spent this... Pat: And then you get earthquakes... [laughter]
Mike Y: where there are no earthquakes. I could have gone to the south of France in
the same amount of time it took me to get to Pittsburgh, Pennsylvania. So that's my
story. [19:37] Srini/Moderator: Any of the other panelists want to add their thoughts
to this? [19:45] Mike Nolet: I think the one thing I'll tell you we do is, first redundancy
Data Replication. We were talking about this before the panel, you have to have multiple
locations. Not only in advertising are multiple locations enough, but also understanding within
each of the locations how you're connected to the Internet, and how you're connected
to different partners, and your point around network infrastructure. Many of the ISPs had
major, major flooding inside their hubs. In our facility, we're lucky enough not to lose
power, so we stayed up. But we saw that half of our network providers lost power. We lost
cross connects to Google, to Amazon... Mike Yudin: Well that's what happened to us, we
were up, but all of a sudden our traffic dropped 70 % because no requests were coming through.
So how good is that? Mike Nolet: It's true. 111 Eighth Avenue is one of the largest buildings
in Manhattan, and it went down basically for eight hours. And this is actually where we
meet up with Microsoft and Amazon, with Google and all these major companies. Our network
team was actually out for seven days straight, playing Whack-a-mole, routing traffic, trying
to make it work. And then the one last thing that I'd say that we do that actually helps
a lot with the data, we actually -- well, we don't use the Aerospike replication, we
wrote our own replication layer, and what we do is we replicate incremental changes
to all of our user data. One thing we track a lot is how many ads you've seen, right?
And of course behavior. The problem is -- let's say I have your resume, right? And I have
that in your key value store. As I change the resume both in LA and New York, how do
you make sure that both copies of the resume get the exact same change? One way is to send
the whole resume across, but then you end up, actually, if you have conflicting changes,
you can end up losing data in the middle. So what we always say is that we're adding
a line to the resume, at this location, and we might do that multiple times on multiple
records on the same user. Then even if we lose connectivity, or something weird happens,
whenever connectivity comes back we stream those messages back and forth to each other
so that in the end those two copies end up being the same again. I think that's one technique
that was used very successfully to make sure that our user data in the end always ends
up being 100 % the same across all of our locations. [21:55] Srini/Moderator: I'd like
to add a follow-up, to Brian. For example, AppNexus has been our first customer, so our
cross-data center support only showed up last year. So I'm asking Brian, is Aerospike going
to solve these problems? Brian: Yeah, but I wanted to share another customer's story,
which was our first appointment with cross-data was with a company many of you probably know.
Exelate. And Exelate has three different data pools, a US data pool, a global data pool,
all of them replicating among four data centers. What happened to them was they lost one of
their New York data centers as well. Everything backed up on the servers of the data center
feeding New York was fine, it ended up when the data center came back re-replicating.
What they did say that was interesting was they actually had to call us, because when
their New York office went down, they had IP-based security into their data center so
they couldn t get into their they had abandoned their office. And at home, they couldn't actually
get into their data centers to do a graceful shutdown of the servers. So they had to call
our support guys, and say, "Hey, Kavin, can you please take down these servers gracefully
because we just got notice that there's only 30 minutes of pool left." We were happy to
oblige and help them take down their servers gracefully. That's the kind of thing we do.
Only a few of our customers lost full data centers. As you say, connectivity was really
the issue. In terms of being able to support at our layer, both what we call delta shipping
, which is just the updates, which is basically the techniques Mike was talking about, we'll
be having that probably in the next six months or so. It's an important technique, both for
bandwidth resolution as well as getting the correct data and not losing updates. [21:57]
Srini/Moderator: Okay. Mike, did you want to say something? Dag: Cross-data center,
replication and redundancy are important, but of course you also need to have intra-data
center redundancy, and this is where this product also does a really good job. If you
lose a node, you have routing into clients and into servers, so they will route traffic
to wherever the data is. If you add nodes, you don't have to plan for if you can just
add nodes and they will automatically start sifting data from the other servers. Data
centers do fail, fortunately they don't fail that frequently. Servers, they fail pretty
frequently, or they don't even fail -- someone just takes them down by mistake. I think that
happens quite a bit as well. We've seen that happen a few times. Fortunately it hasn't
eally affected us. [24:47] Srini/Moderator: There was hand raised in the audience. Is
there a question from the audience? Audience member: I got the answer... Srini/Moderator:
Okay, that's great. Please do... Audience member: I have a question. Srini/Moderator:
Please. Audience member: Can some of you comment on what lessons have been learned from scaling
systems, something that's not obvious? For example, there was something from Mike I believe
[?]. What advice can you give about, what did you learn from the scaling systems? [21:51]
Mike N: That's a broad question. I'll give you two answers to that. Specifically, two
lessons learned. I think for us, we found anything that's not simple will fail. And
now we're rare in terms of throughput and volume, because we're just running on so many
servers, running so many ads every single second. The simpler the architecture, the
fewer points of failure, hands down is the best. Which is actually why we threw out all
our load balancers -- because at some point we found that load balancers start dying at
600,000 qps. We found that we had more outages due to load balancer issues than we had due
to applications or software issues. We ended up effectively embedding load balancing into
our direct applications, and magically things got a lot better. I think simplicity is just,
hands down, hands down, where you actually want to be. And then the second one is full,
end-to-end automation of everything you do, like your example about manual error, right?
People fail. People naturally fail. They'll fail 80% of the time on the average day. This
is not a problem; this is not a bad person. Your good engineer will make mistakes all
the time. Now, at two in the morning, you make mistakes most of the time. Anything that's
not automated will have production issues. It's really that simple. If you can't point-and-click
deploy, what you have if you can't point-and-click roll back, you can't point-and-click debug,
you'll simply have production issues because at two in the morning someone's fat fingers
and types the wrong thing, and "Oops. ***. I did something wrong." And so, you want to
keep it incredibly simple, and then automate the snot out of it, is what our head of tech
ops actually says. You look at where we have production issues, we're not perfect, and
the only place where we have serious issues is where we do not have full automation in
place. Where we can't just pull up a new server, or fail over a data center, or anything like
this. And our stack, it's almost all home grown. We use Aerospike for key value stores,
we use Vertica for calling our databases for reporting, and then the rest of this -- I
don't know Hadoop if you call that a license, because you have to do so much engineering
around Hadoop to make it work. And then everything else is home grown software for us. [27:35]
Mike Yudin: So I could probably add to this. Mike is saying, keep your systems simple,
and that's key to this. The way you keep it simple is you have to be very smart in devising
your intelligence into online and offline. You do all the heavy lifting over predictive
modeling, all the crazy algorithms offline. What you program into your real system is
just really quick lookups. And another advice is to keep your system asynchronous. Because
as soon as you have components depending on other components, depending on other components,
and everything waits for the other thing to respond, and then everything is fine, everything
works just fine, but then one little thing is going to fail and there's going to be a
cascading effect and everything is going to come to a crawl, and you just have an avalanche
of degradation to the system. So you have to have a graceful degradation policy, and
you have to have asynchronicity as much as possible. We just had a discussion right before
we started, about blocking threads, and these kinds of things. That's the key to this. As
far as technology stack, and we all were just discussing this, pretty much anything works.
These days, hardware is powerful. Use a proven platform, whatever it is, C, Java, .Net, they
all work. Probably not a good idea to program your real-time ad server on Ruby on Rails,
but other than that it hasn't been a problem. Audience member: [??] Dag: What's the question?
Audience member: What s beyond Hadoop. [29:20] Mike Yudin: What's beyond Hadoop? I'm not
going to tell you, because we don't use Hadoop. [laughter] We don't use Hadoop that much.
We found it's too slow for us. The processing cycle for data in Hadoop is just... Audience
member: [??] Mike Yudin: Well, the main principle of Hadoop is a distributed, kind of greek
computing system. That's not going to go anywhere. You have to do that. People are trying all
kind of things. We ended up writing our own proprietary system. Whether that's going to
become the Internet standard or not, I don't know. But, probably like some of my colleagues,
we found that very, very few commercial or even open source solutions support this key
element, so we ended up programming a lot of this ourselves. It's kind of tragic, but
that's what it is. [30:10] Dag: I would say, in terms of lessons learned, metrics. Keep
metrics of everything, because scale kind of creeps up on you. You start seeing your
latencies jitter, and you want to correlate it. You would try to figure out what's going
on. If you're building an adtech system, you have a lot of moving parts. You have a lot
of endpoints that get hit. These have different impacts on your system, and if you don't have
metrics, you're pretty much blind. Scale, web traffic shifts, grows from month to month.
We're hooked into AppNexus and they get more traffic all the time and they one day all
of a sudden realized, "Oh, crap, we're over." Things would slide just a little and then
we're over 100K qps, for instance, and we see that it's this kind of traffic that causes
this kind of ripple. Any sort of debugging, if you have any sort of performance regressions,
always looking at the metrics as the number one thing we use for debugging. You can't
live debug and step through coding production, and sometimes the only way you can debug some
things are actually hitting them with real traffic, unless you have unlimited resources.
But metrics is what I would say. And beyond Hadoop, [laughs] a more efficient file storage
on the [??-31:35], I think. That'll do a lot. Having flat files for everything is... Audience
member: From the standpoint that Hadoop is not the answer, I already know that, what's
the future? Mike N: Why do you say Hadoop's not the answer? Audience member: Well, I...
Dag: I don't think -- Hadoop is a lot of things... Srini/Moderator: I think we're getting a little
bit off the topic. This is about real-time big data. Hadoop is not real-time in any way,
shape or form, as far as I understand. Audience member: I understand, a lot of people try
to write plug-ins, and... Srini/Moderator: Yeah, it is a way to -- for example, how Memcache
speeds up MySQL is the same philosophy there. But then things like Aerospike, and a whole
bunch of other NoSQL databases, they try to actually do a database at the speed of a cache.
So beyond Hadoop, you know? If you ask me, I'm biased. I'd say Aerospike. [laughter]
[32:30] Mike N: I'd love to make two points. One, you only have a certain amount of tools
in your toolbox, right? Hadoop is one of those tools. And if you try to screw a screw with
a hammer, it doesn't work very well. Just like if you try to use a screwdriver to hammer
in a nail, it doesn't work very well. Hadoop is the right tool for some things, Aerospike
is a fantastic tool for some things, and Vertica's a fantastic tool for other things. The problem
people make with all of these, including key value stores, is that they try to smoosh too
much functionality and too general purpose, and try to make this super-fancy, multi-tool
that does everything. And it turns out being pretty mediocre at everything. When I get
pitches from vendors, a lot of times if they sell me too much -- there are a lot of these
people working on distributed real-time MySQL systems. One of these guys came in, and pitched
to us that "We can be your key value store." I actually rejected it outright, simply because
I don't want a key value store that does SQL and joins. It means it's a multi-tool, which
you just know is going to have some kind of complicated performance issues. It's just
too complex for what I want. I hope that Brian and Srini don't try to turn Aerospike into
multi-tool, and that they understand what they're really, really good at, which is having
key value, NoSQL based data, an incredibly low latency and incredibly fast. So I think
that's the best answer to your question, that Hadoop is a tool that's really good at some
things. Aerospike -- there are new tools coming out that are really, really, really exciting.
And Aerospike is one of them. I'm also very excited about stream-based processing, which
I think we're going to start seeing more of. Which could be -- I don't know if you guys
are talking about some of this -- new products or things like that. That's what I think is
going to get really, really exciting. Audience member: [??] Mike N: No, no. Audience member:
One of the stacks [??]. [34:30] Mike Y: Well, I would like to add to this a little bit.
There are more and more companies you see at tradeshows like this. That are always on
the cutting edge, and they are the most sophisticated algorithms, the most amazing model ever. Hadoop
is just not good enough for them. Anything is not good enough for them, because they
process so much data, they have to do the next coolest thing. The truth of the matter
is that there are very little actual working models and intelligence in this advertising
world. A few things really work. If you're trying to solve a really complex problem,
beyond capabilities of the standard stacks and proven technologies, I'm going to bet
you a hundred bucks you're probably not going to solve the right problem. [35:17] Mike N:
Can I add, and then I'm going to totally let you take over. I think what you just said
is exactly true. What happens is, doing something at low scale -- like what you hear from all
the CTOs, we're saying, scale, scale, scale, scale, scale kills -- because it's very easy
to build an online advertising product at low scale. It's very easy to build a super
snazzy, dynamic, creative, it's interactive, it talks to you, it uses your web cam, you
can get real-time reporting on the back end -- if you're only serving a couple thousand
ads a day, no problem. I've got an engineer who could turn that around for you in a month.
The problem is when you start doing it a million times a day, and a billion times a day, and
ten billion times a day, and 40 billion times a day. That's when all those features and
functionalities that you have in that really cool product break. And one of the problems
with innovation in all advertising is that people don't think about scale. People raise
VC money, build a prototype that does a lot of really cool stuff, they say they do it
at scale but it really doesn't do it at scale, then they hit scale, and then the *** breaks,
and then you have to rebuild everything. And there just aren't enough commercial tools
out there to make these problems go away. So you suddenly end up needing 40 engineers
to make all this actually work. [36:26] Pat: Use the right tool for the right job, as Mike
got to, and be willing to iterate and explore before you get into production. We have a
lot of complexity, I would say, in our system, but we've approached it as splitting that
problem up into as many discrete parts as possible. Again, echoing the sentiment, it's
got to be testable. It's got to be modular. If you want to stream it, you want to communicate,
use something zero mq. There's a lot of queuing out there. There are ways to communicate among
these different components, and that way you can test them. We are very, very diligent
about metrics, as Dag mentioned. If you write code in our system or for our system, that
thing better log and let the world know what the hell is going on with that component,
pretty much at every step of the way, if it's interrogated. Because if we can't -- again,
echoing the same sentiment -- if we can't look and see exactly what's happening inside
any of these components, we're screwed. And you're flying blind. And then we can start
testing them, we can run through any regressions we need to, and we can see where the bottleneck
is going to be. You'll never catch all of them; sometimes you don't catch any of them.
Hey, you know what? We never thought someone was going to do twenty-five placements on
one page with 50 models. Sorry. Okay, we didn't. Bad on us. But you can catch a lot of that
stuff, or at least you can pinpoint it. Otherwise, again, I don't see how anyone would even want
to go to work in the morning without knowing how everything's operating. [38:09] Brian:
So I'm going to add to your question, you said something that's non-standard about scaling.
One thing I'm happy we did at Aerospike early on, is that most cluster databases -- for
example, Oracle RAC and many other of the support-based systems -- charge you per node.
And what happens then is every single time then, the ops guys, who have a feeling for
the number of nodes they want in order to have the resiliency they want, they get crowded
out by business guys and it ends up being some long, involved conversation about, "Well,
do I really need a four-node license," "Why do I have to buy a six-node license," all
that stuff. So one thing we did to make all these guys more successful, and the whole
product more robust, was our business model. Plenty of technology, and all of the logging
and stuff like that, but I wanted to say, "Hey look, let's have your ops guys really
figure out what they want in terms of reliability, the number of copies of the daily load, we'll
decouple that from the license terms, and as you start iterating, seeing your load go
up, you need to add more servers, great. You're not calling us." We want you to do that, and
feel comfortable with the amount of hardware you have without having to start thinking
about license terms. Because then people's heads explode. "Where's my budget? I didn't
ask for enough budget. I have to justify more." Stuff like that. So think about the impact
of your business model with your scalability. [39:32] Srini/Moderator: Is there any other
question from the audience? Okay. Audience member: [inaudible 00:39:35-00:40:50] When
you scale, things break [?] [40:51] Dag: That's a hard question. Of course it is hard to design
scalable systems from scratch. First of all, I think it's a mistake to always, always keep
"how do I scale this to" If you're AppNexus, you need to scale it to a certain level. If
you're us, we're not in that scale, but we're still in a high scale. But if everything you
build is always constrained by scale up front, than you will lose momentum in your innovation.
It's okay to build prototypes that don't scale so well, and if they actually turn out to
have value to your business, then you can go back to the drawing board and see how you
can make them more scalable. There are a lot of things that are easy to scale. And then
there are some things that are really hard to scale. Doing everything right from the
beginning, unless you've built an exactly identical system before, I think is impossible.
Engineers that have built similar systems, or are interested in staying up to date and
knows about the different tools, this is one of the things we're at here. NoSQL is sort
of the acronym for everything that is not a relational database, pretty much. It is
a lot of things, if we talk about just storage. Finding the right tools and thinking about
that up front can be useful, but it can be retrofitted too, to some degree. I mean, Twitter,
when they moved off their Ruby stack, onto JVM, that's gotten a little bit of attention
now with the last couple of days because of the election, and they gave credit to their
switch. That was the reason why they were able to cope with the spike in traffic now,
a couple of days ago. At any rate, the point is that if they started out thinking about
services and decoupling and building the system the way that they do now, they would probably
have never gotten to the market at all. I agree with what Mike said, about it's easy
to build something that's really fancy and works with a thousand users or a million impressions
a day, maybe, but if you go in to trying to build a new product only thinking about what's
going to happen when I have a gazillion users, then you're not going to make a lot of innovation,
basically. Audience member: So in a way, in a way I guess, it's kind of like, in a way,
it's kind of like if you're doing well, and building scale, most likely something's going
to break anyway. [43:21]Pat: Be prepared to make a lot of mistakes. Don't be afraid of
that, that's for sure. Audience member: Anybody here not making any mistakes? Mike Y: Yeah,
sure. But I think also the question was, how do you teach developers to develop at scale?
Audience member: How do you get yourself to what you do, [??]. And one thing that I like
is, I actually like to do very well. The problem is, I don't think I [inaudible 00:43:53].
So... [43:54] Mike Y: I think that the answer is, you have to know, really, a few basic,
basic principles. And if you follow these principles, you can reach decent scale. And
these principles are; you don't make calls to SQL databases in real time. You don't make
synchronized components. You have to have metrics and you have to have instrumentation
so you can see what's going on. There are certain anti-patterns that you don't do. Okay?
And you will encounter each of them. If you're doing this for the first time ever, you'll
have a lot of failures. But you learn. Srini/Moderator: Sorry to interrupt here, but this discussion
is complex, as you are learning. Every one of us here has learned form our experience
as to how to deal with this. It is possible, I can guarantee you. But it is extremely hard.
So I think we should take some of this offline, unless -- Okay, is there any other question
form the audience? I want to give the audience priority before I have a whole list here.
Okay. [45:10] The next question, I want to ask a specific question to the panel. Can
you talk about an actual problem that you had to solve over the last year, in this real-time
computer space? And what product or technology use -- it could be something like a key value
store, it could be your own home-grown thing, but it must be in production. Let's start
with Pat. Pat: Does it have to be Aerospike? Srini/Moderator: No, of course not. Pat: We
had to solve a problem of cross-data center replication, and even when we have Aerospike
now it's not quite what we need, but even before we had it we had the same problem,
or a very similar problem, that Mike N. described before. That is, we could step on a record,
we need to basically journal our changes. If something about a user changes, that needs
to be journaled, that needs to be shipped across to another data center, and applied.
We can't just do a wholesale replace of our objects across all of our data centers. So
that is one case where we used a home-grown solution, using again Zero mq, which is a
very simple lockless queuing system and we were able to journal our transactions and
ship them across to all of our data centers, update them and keep them I would say eventually
consistent without any major headaches. There are other queuing products out there that
are commercial or open source, [inaudible 00:47:00] I would call more of a tool kit
or a framework. But again, you need to understand what your replication needs and requirements
are before you make any of those decisions. If it's Okay to blow away an object on the
other side because it's out of date, then you can just fire and forget, as they say.
But that's probably the latest thing we've solved. [47:23] Mike Y: Well, a very significant
problem that we've tried to solve -- we haven't solved it yet -- is how do you combine all
this real-time data with analytical data? So we have this real-time data stored in Aerospike.
It's fantastic for really fast key value lookups. And it works. But then, imagine that you would
like to have access to all the same data and look at this not one user at a time, not one
object at a time, but find patterns and cross-dissect it. Make it available to all your dashboards
and UI tools and algorithms and look at this data set as a whole, not as each individual
key value. And keep all of this data in synch. That's a very difficult problem. So far, we've
found some hacked up solutions to this, basically. Back the whole thing up, import this into
a more friendly query able system, beat Hadoop , beat Vertica or something like this, join
it with the rest of your data, do operations in it, because you cannot do, this type of
real-time data engine doesn't work. Srini/Moderator: We're getting close to the end of our time,
so let's keep the answers brief. Dag? [48:44] Dag: I would say that making the data feeds
into actual feeds that are consumable not just by batch, in search of the database sorting
into file systems, but actually making them something you can tap into and have easily
extend your system into new types of analytical processes. So if you want to try, as Mike
mentioned stream processing, which is something that we've stated tapping into over the last
few months actually, how do you plug something into a system if it's already based on shipping
log files, or if it's something that is happening internally in machines somewhere. You want
to kind of democratize your data by making it accessible to new consumers of that data.
There are a few interesting big data queuing systems coming up now, we opted for one of
those. We run with Kafka now. That was a big shift for us. It also made our system more
resilient, the asynchronous is easier to handle, we can now easier handle failures in our data
stores and so on if that happens. That's the problem that we've been solving recently.
[50:00] Mike N: That's funny. I love your answer. So we did two things recently, changing
how we stream data. We wrote our own data streaming a couple of years ago. Every impression
generates 10 to 15 log records, we've got 6 million log records a second or so that
we're dealing with. So I borrowed a streaming infrastructure for that. And we added the
ability to start splicing data. We can now take our stream of data, we can splice it,
and we actually now have started streaming data to multiple places. It goes to Hadoop,
for our standard hammer, to do aggregation. We've now built a very highly optimized, it's
not quite streaming, but every two minutes we load into an HBase infrastructure where
we keep an offline copy of our key value data, so our guys can view every data record that
we have. in Aerospike we're also re-replicating inside HBase, which we can now do to do offline
attribute conversion, all sorts of really, really, really exciting offline stuff. We're
going to get to the point where it can do true, cross-channel attribution for any one
of our partners. And it's really actually been doing stream-based processing, or stream-based
log, and especially being able to splice that data into different places. The third thing,
we have a prototype a guy just put together with Volt DB, where for several of our clients
they stream the data into Volt DB in real time, and they have a prototype of real-time
reporting, which I actually don't know how useful it is, because in journal you need
several hours of data for interesting data. It's really cool and it actually works. What's
interesting is that there's just no open source or commercial tools yet that do any of this.
So this is a lot of the challenge we face, that you have to build this stuff from scratch.
And what's funny, five years from now, I'm sure there will be an open source tool. It's
a fantastic commercial opportunity for somebody, that'll just make it easier for the next guy.
[51:54] Pat: And I'll just add to Mike, we use our replication journals to feed a database
as you are talking about, Mike. That's how we do it. So when someone's state changes,
if you will, they saw an impression, they moved segments, since we journaled that to
ship to it across data center we can ship that to a Vertica database or something like
that, and get an up-to-date view of the customer. Srini/Moderator: Okay, we have a couple of
minutes left, so this is the last question from the audience. Audience member: Mike,
do you have any.. [52:21] Mike N: No, because it's for us, very gripey, in the sense that
when it becomes commercially available, sure, it might be good for us. At the moment, it
gives us a huge advantage that it's a generic infrastructure, any developer at AppNexus
who wants to get data from any of our data centers, from any server, into one location
just types a config, streams some data, and it magically appears in whatever data source
they want. It really gives our developers just a fantastic set of tools to run. I'm
not sure we just want to give that to everyone else just yet. We are working on another couple
open source projects, to look around dev ops systems continuous deployment, where we feel
the open source community we've benefited a ton from, we're pushing back and sharing
some of the things we've built on top of Puppet and things like that. [53:13] Srini/Moderator:
Okay. Let me ask a final question, it's not as technical. Who's the one person who has
unexpectedly helped in your career and your business in the last couple of years? It could
be your mother, it could be anybody. But it's somebody who has affected the actual business,
and career. Dag? Dag: Well, thank you. Srini/Moderator: I have an answer from me, if you want to ask
me the question. Dag: Can we hear your answer first? [53:40] Moderator: Well, it s an advisor
for our company, anIBM fellow, I worked for him 25 years ago. And then he joined as an
advisor, and then he's been so key. His name is Don Haderle, he is known as the father
of DB2 actually. And he's a completely relational database person. We're doing NoSQL, new database.
But he's kind of taken us through, taught us, Doug, Brian and I, what exactly we were
doing when even we didn't understand what it was. And, has been a great help in explaining
this to other people, including investors, and customers, Fortune Five customers and
so on. That was completely unexpected, what happened. That's the kind of thing. [54:28]
Mike N: I don't have one person but I have one group, which has been the New York CTO
Club. I don't know if anyone else here is a member of this club, but it's just an absolutely
to join full-time so far. Sounds like a recruiting pool, it turns out. But really, it's having
I've been very lucky to have access to such a fantastic group of people, who have really
helped me be successful as a CTO, and have helped AppNexus scale, both by just advising
and helping, also, hiring. Which is good. [55:20] Mike Y: Well, so it was a very unexpected
question, so my answer to this question will be someone who's completely non-technical.
That's the CEO of our company, Mr. James Hill. He really taught me how to simplify and crystallize
things. He's very good at asking questions. What is it? Why are we doing this? How are
are not in advertising is an extremely interesting way of trying to cut to the core of what you're
doing as a company. Because the stuff that we deal with in the ad tech space, all the
buzzwords that we have, all the fancy words to your point about the models, about predictions
and big data and all that stuff, if you can't cut that down to someone and explain it to
someone you meet at a random party, then you -- well, it's a very useful exercise, I would
say. Pat: I don't have person either, I would -- it's going to sound corny, but I would
say my team, who's helped and who's built all this stuff, and who operate everything
every day. These are the guys, as Mike has seen, who drove to Pittsburgh. Sometimes you
have to put in a heck of a lot of time, and I don't care how bright of an idea you have,
how smart you are, how great you are, if you don't have a good team who's willing to do
basically whatever it takes, you're not going to be successful, in my opinion. Srini/Moderator:
Okay. Well, Brian, did you want to...? Brian: I was very surprised that our investors, VC
Group, usually as an entrepreneur you think, you know what? You say nice things to your
VCs, but you take the money and you say "Thanks you very much" and you use the money, that's
their primary contribution. And anything they give you on top of that is gravy. But our
first round investor lead, Joe Addiego of Alsop-Louie Partners, has actually been an
immense help to the company. I think one of the benefits is he's very new to the VC game,
so he's not as blas And he has a great operational background, and a great sales background.
So in terms of helping us through the thicket of hiring sales people, especially who can
be very persuasive in person -- you have to figure out who's good, back to your management
question. So I want to put a shout out to him. He's been a great help through this.
Srini/Moderator: Thank you. I think that brings us to the conclusion of this panel. I'd like
to thank each and every one of the panel members for making time out of their busy schedules.
They're all running 24 by 7 systems, their teams are running them, and if it's down for
five minutes they don't make money. They're here because they've actually figured out
how to solve this problem. So thank you very much, and also for a great experience here.
Thanks to the audience for the questions. PAGE PAGE hIH} hr[Q hr[Q gdr[Q hr[Q hIH} seWsIsIsIsIsIs
hr[Q hr[Q gdK' }o}` hr[Q hr[Q hr[Q hr[Q hr[Q gdK' hPi@ hPi@ hr[Q hr[Q hr[Q hr[Q hPi@ hr[Q
hr[Q hPi@ gd8| gdPi@ hr[Q hPi@ hr[Q hRL+ hPi@ hr[Q hr[Q hPi@ hr[Q hPi@ kykyk hr[Q hr[Q hfU&
hfU& hr[Q hfU& hfU& hr[Q hu^) hu^) hr[Q hr[Q gdNQj gdu^) hr[Q hr[Q hr[Q hr[Q hr[Q hr[Q
hr[Q hr[Q hr[Q hr[Q h_(i hr[Q hr[Q gdNQj hr[Q hr[Q hr[Q hr[Q hr[Q hr[Q hr[Q hr[Q hr[Q hr[Q
hr[Q hr[Q h@#, hr[Q hr[Q hr[Q gd8| gdNQj phdhdh hr[Q hr[Q h@#, hr[Q h@#, hNQj h@#, hr[Q h@#,
gd8| &`#$ gd8| gd8| hr[Q [Content_Types].xml #!MB ;c=1 _rels/.rels theme/theme/themeManager.xml
sQ}# theme/theme/theme1.xml G$$DA : BR {i5@R V*[_X ,l\Y Ssd+r] 5\|E Vky- V4Ej 6NGU s?^V
*