Ifeelgoods Data Architecture

PETE: Okay, so you're in the Rails shop, you're all in the cloud, you're using a lot of different Amazon services and you mentioned that you had a lot of different sort of data components or data services that you're using. Tell us, for those of our audience who don't know, tell us a little bit about the difference between the AWS Data options. VIDA: Right. So we use a lot of our data we use to store regular SQL type of storage because it's quick to access, we can query across it, things like that so, you know, we have a concept of a retailer with their campaigns and the offers that they are trying to give out to their customers. We use MySQL or the SQL Data store, to store all that kind of information. But then we also have a really large number of user objects. So not only do we have to deal with scaling - we'll only have a certain number of clients that are giving our offers, fairly small. But the number of consumers that are redeeming these offers are magnitudes larger. So we have a lot more objects in that space. And it needs to be global - global data store so we want to have just one database of users across the whole world rather than partitioning out, partition out some of our enterprise clients on the different servers and things like that. So for our user store, we're actually using Simple DB to store some of the fast user data. And then we're also using offline storage because part of our process as a user, Facebook Connects with us, and gives us their data such as Likes that we can use to predict what are the things, offers they'd be interested in from us and we're also just trying to understand the shopping habits of these users according to their social data. So that data is actually quite large. We could use a blob store in SQL to store it but it makes more sense to just, we don't need it to be served up. It's just purely for offline processing so we actually write it out to S3 to disc. PETE: Okay. Can you give me a quick use case on both sides of that fence? Like, what's an example of the type of data you are using real time and what's an example of the type of data that you're interested in more batch mode? VIDA: In real time we maybe just want to know how many friends you have and how many likes you have in aggregate, so I just stores an integer to say, how many friends and how many likes you have. In offline mode, a lot of our retailers are interested in what are the top likes of our users. So, we have to actually go in and store individual users and even to the granularity of what's the most popular in the retail category versus the music category.