Iu Cloud Computing Mooc - Aws 1 - To the rescue!

Hi. My name is Jamie Kinney, and I'm a solution architect for Amazon Web Services. I'm part of the Global Public Sector Organization here at AWS, and my focus is on high performance computing and Big Data applications of the Cloud. The organizations that I typically work with are research insitutions, Department of Energy labs, space agencies including NASA but also others around the world, and I really have the... fortune of being able to talk to many folks like yourself. So I first want to start by saying thank you for taking the time to attend this session. What I'd like to do over the next hour and fifteen minutes is provide an overview of how scientists are using the Amazon Web Services Cloud, talk a bit about the capabilities of Amazon Web Services that are relevant to the scientific community, provide a few examples of how researchers are using the Cloud today, and leave some time for questions at the end of the session. [pause] Maybe to first kick things off I'd like to talk first about why we're seeing a tremendous influx of researchers beginning to use the Cloud. And it was actually Kate Keahey who... first gave me the idea of this concept of time designs. Today there's a challenge in that we have large supercomputing centers that are distributed around the world, and they're built with tremendous capacity, fantastic interconnect... and queueing systems that enable researchers to define the job that describes what they'd like to accomplish, submit the job, and then sometime later get the results back. And... while this approach works well for many, many workloads, it's very common these days to find supercomputing facilities that are running low on capacity or not able to deliver the specific type of cluster that's appropriate for a given workload. For example many supercomputing facilities will have a predefined percentage of servers that have... standard CPUs but no GPUs within them, or might be built on a certain processor architecture that may or may not include things like AVX extensions that... are found in the Intel Sandy Bridge processors. And so as a result, applications are having... are being developed to kind of run on a potentially a least common denominator platform. The other thing that happens is that not all jobs are equal. Many jobs will require large numbers of servers, have a priority that need to be completed in a certain fashion, and with shared resources that always results in queueing. And so the impact of that is that researchers may not be able to ask all of the questions that they'd like to. They might have to wait longer than anticipated to model the latest outbreak of H1N1. Or might not to be... may not be able to analyze all of the datasets that they have access to related to sea surface temperature and increasing levels of carbon dioxide in the air column. Or they might not be able to scan, y'know, as much data coming from infrared satellites and space telescopes, and thus won't find as many extra-solar planets. And... so what we want to do at AWS, Amazon Web Services, is to provide researchers with the tools that they need to be able to deliver high performance computing clusters whenever needed with exactly the configurations that's needed for the given job at the lowest cost possible as well. And so the first element that's really important for Cloud conception of high performance computing is the ability to have on-demand access to the infrastructure. We have within Amazon Web Services data centers located all over the world. I'll talk a bit about those precise locations, but anybody can show up and literally within a few minutes create an account and provision a large, y'know, large multi-teraflop supercomputer on AWS built at our... commodity infrastructure. Secondly, scientific workloads not only require large amounts of compute, but typically involve very large datasets. So it's important to be able to steadily, easily move datasets back and forth and Amazon's network capacity helps us out, as do... the file transfer tools that we make available and others have developed on top of our platforms that are available. But it's also important to be able to reduce the cost of long-term dataset storage. If you're... keeping terabytes or even petabytes of data online... you need to be able to store that in a way that... is in line with the value that's derived from having those datasets stored online. And so one of the benefits that we see is the ability to very easily share a common dataset with others, with the Cloud becoming a meet-me room, if you will, allowing many researchers to access the same dataset instead of having to copy it to twenty or 100 different locations with everybody working with their own caches of that dataset. And finally the reproducibility of results and the programmable infrastructure, I'll talk about that a little bit. So the... all of the Amazon Web Services... capabilities are elastic compute Cloud for server virtualization, storage APIs, dynamic Hadoop clusters; all of these are available not only through a web console which I'll show you over the course of the next hour or so, but we also have command line APIs and web services APIs, so literally with a click of a mouse or with a shell script that could even be scheduled and further automated, you could produce a cluster, submit your job to that... cluster that you created that precisely meets your needs... generate a result set that's stored in a Cloud, potentially transferred elsewhere outside of the Cloud, and then... and turn off that infrastructure. And you'll only be using what you need while you need it... and you don't have to go through many many manual steps of actually configuring that cluster. So that... programmable infrastructure opens up a world of possibilities.