Tip:
Highlight text to annotate it
X
Hi I'm Roger Magoulas, I'm the research director at O'Reilly
I just got back from the Strata Conference in New York
and as a practitioner
and as uh... some who follows technology trends particularly in data
i notice a couple of themes that really resonated with my uh... experience and
my interests in the data space
and having said that
this process is i wasn't able to go to and has lots of other things people
cover would pick up from the conference attendees are ones that would be usable
for new and thinking about how to turn data into an asset
the first one is using in memory storage help speed up queries
so they will cause interactive queria real-time query
well we mean that is that the query responses connect fasten up an analyst
with enough well
the kind of thinking about the problem they're trying to sell
kabhi from a couple seconds to a couple minutes are are really uh... good speed
for that kind of thing forgetting
most effective results
u_s_-funded other that we have seen some of the
really early adopter types in innovators in the space
doing this a little while ago they would use things like radisson other in memory
things to store data for
there now since i got to a certain part of the problem
the thing that really probably out with those memorable in the space at the
conference was in power
apart there is new in memory addition to do
people are also talking about
druid something that meta markets and mike driscoll
uh... presented
there's also a rob dremel into role with her other memory tools trammell comes
from gould role is an open source project
sparked out of a berkeley which is another really nice uh... architecture
for distributed
work with fast queries
even netflix in their talked about things they were doing it shows that how
prevalent this trend is
that they weren't doing some things in memory
let i want to make is that
sequel continues the matter that for the analyst about having that high-level
extraction layer to the data
is really important
has been too big in investment in
the tools
and people
skills and experiences in the steeple space
format matter
the data needs to be put into it sound like a final forms
we see things like and how i having a sequel like look
hyping use much more often than page which is more of a programmatic uh...
interfaces and more than that produce version of the heart
of duke
product city really matters and i think we're going to continue to see just kind
of dual layer in of
uh... sequel for that uh... taken a final
uh... drilling into the data
uh... and see polite tools like that
uh... the next thing is a eighty percent
of the work
of analysis isn't prepped
when you get there quite often to clean it up and to transform it you need to
standardize things into their
standard forms
uh...
normalize
and so forth to make the data
worth analyzing organizing in ways that i wanted to go up and down into the date
and so forth
it's probably but best product to the opportunity for companies to improve
their ability to to work on this
and jill hill has been covering this in his own rocket ships and washing
machines keno when an interesting fashion
but many things that
effective announces really depends
a lot of designing
the experiment
on asking the right question
at least commerce is there's a lot of talk about tools and techniques because
of the things that have crossed
organizational boundaries
but for that
uh... did a scientist themselves
this is where the focus needs to be
those other things the techniques and tools really should be set secondary
miserable about part of of data comes in
that's kinda relates the whole education thing there's a lot of talk about
are there going to be enough to scientists for our for the future
and how did we grow them
i think this is where
it's gonna be hard because you can learn the tools and techniques but getting to
have domain expertise bill that's questions knowing which techniques to
apply
as part of making an experiment workers and continue to be important
i think this is why complex strata and the being an important asset for folks
to see how other people are are doing this work
looking at the fourteenth
the really all-around product to the uh... i think what we're seeing
the data space the data ecosystem is maturing
performances
finding out what you can do and now it's okay we can do it how do we do it more
quickly how do we make these folks more productive
get better and more
uh... second accurate
uh... predictions and results and working on the culture of fitting it in
to have a company works with the data
how to make
that the analysis in the results
have an impact
and everything else in the company
the support these days i got to bring up a couple of talks that i think
the the studies are trying to get to it and if you would if you want to get a
chance to see the videos are look at the slides on a you'll see you'll get a
sense of what i was talking about with those fort means the president is as big
answers by michaelson in this region this is in power
uh... addition to do
let me know what he's talking about it
how one analysis that's quick
how much better the uh... productivity for the analyst can be
i mentioned il postino rocket ships in washington
they've talked about how washing machines
probably added more proactive even rocket ships this relates back to the
eighty percent of the workers and data prep
the beyond hadoop
fast ad-hoc queries is were druid was discussed
what you can hear there's now a discussion of the soldier with
but the motivations for why they wanted to get in memory uh... databases
and and speed up uh... their workers in their case a actually
data as part of their product
the netflix involving data size architecture biker brown was a really
great architecture nepotism has a kind of a state of the art uh... architecture
you describe how they moved from
waa point how the toes of their tools
and moved to
where they are now given a lot of real time
analysis and try to support more their users of the more efficient with their
uh... their equipment and their server farms and so forth
i talked about how important was that i have to write questions and design the
red experiments
and of might stringer from baby scoping out with a fever really nice talkin'
creative thinking and data science
and human talk a lot about how important things are
he brought the unreasonable effectiveness of data or a large data
sets and uh...
control
uh... more advanced techniques
unless it is a really
a great talk all breeding data scientists were mom and her daughter
talked about their work at nokia was amy o'connor and daniel being
and one of the nice things about the talked with here in the end you know
who's of p_h_d_ student
how she got into the math of
uh... analytics
and how excited and curious she was delivering about important point about
looking for folks in the training
and that that curiosity is probably more important than any kind of
raw kind of ability
people who want to learn more wanted
understand what's going on to a great job of that net talk i think
lol help i like that
whatever whether can or saw
it is really a thing as was reported strata further people from the cosmetics
industry from broadcast television for biosciences writer signs
these folks are kind of getting that david's story imposed on them they were
being *** to become more data savvy
and they're kind of struggling grappling with what model that man
as they were realized
people were at strata art into it and they applied internally driven go
further
there's a lot of people though who are trying to learn what's going on
and try to make it more happy couple
and i think that watching that space and and
how those people adapt
will be really important that
how the tools are adopted and how much poor data becomes two organizations in
general
peres ever really was a great out conference and we're glad we had at the
week before the big storm