Tip:
Highlight text to annotate it
X
>> DANIEL BURROUGHS: And what I'm talking about today, this is basically, a follow‑up
to a talk I gave a few years ago at DEF CON 18, about looking at information that's freely
available out there on the net, and doing some trending and analysis of it, and trying
to make something useful out of it. So a little bit about my background, I'm currently ‑‑
I'm the director of technology at the Center for Law Enforcement Technology training and
research, which is a nonprofit research center. They got spun out of work that I used to do
when I was a professor at the University of Central Florida. I was there for about ten
years and in the engineering program, taught computer engineering, I developed the computer
security curriculum there, and did embedded systems amongst some other things, and eventually
moved away from teaching and more into research. And we ended up spinning out the research
into an independent nonprofit center. I'm also the CTO for Hoverfly Technologies
and prior to this, I used to work as a research associate up at the institute for security
technology studies the Dartmouth College. So over the course of the last 20 years, some
of the things that I have worked on appear on this list, and, you know, it took me quite
a while to catch on to kind of what, like, the common theme between all of the things
I was working on. I'm slow to pick up on these things at times, and eventually, as I started
putting it together and kind. Realizing some of the same things I was doing, I realized
all of this stuff from information sharing that I'm working on now, to hardware sensor
networks, to intrusion detections, they rely on sensor fusion. Everything that we are doing,
all of those things that are ‑‑ that I listed up there, they are all based on taking
some sort of sensor, and using it to try to get some measure of reality, but the sensor
always has some limitations. Sometimes it's a significant one. Sometimes it's not so bad,
but every sensor that we look at reality, including ourselves, including when we view
things, it's always got some sort of limitation, and it's one particular view, and that influences
the data we are seeing, and you can get ‑‑ we have to work towards trying to get more
meaningfulness out of the data that we have. One of the ways that we do this, and one of
the things ‑‑ the techniques that I find most versatile, I would say, is sensor fusion,
where we take multiple sensors, multiple ways of looking at the same thing, and kind of
put that together, with the hope that we can take the limitations of one observation and
cancel it out with a different observation that has a different set of limitations.
So at least that's the hope. At least, you know, if we can put two halfway decent things
to go and get something that's more than the sum of its parts.
So before I get kind of more into my stuff, I always feel like with the ‑‑ in this
particular subject, that I have to give an acknowledgment to the guy that inspired kind
of some of these thoughts in my head and it was actually at DEF CON, way back at DEF CON
13, Broward Horne gave this talk on Meme mining for fun and profit.
It's problem ‑‑ you know, all great ideas come out of a problem. I guess a lot of ideas
come out of trying to solve a problem too, but his was a really good idea. His problem
was that he would find that he would, like, start learning some new technology, some new
tool or at least it was new to him and by the time he felt he had mastered it, it was
kind of on the way out or the market, the job market was just saturated with people
doing that now or it had just fallen by the wayside and nobody cared about it. He was
always kind of struggling, what should I spend my time learning to get ahead?
And he ended up kind of thinking about this as like, everything has this sort of saturation
curve where a trend starts happening and there's a little bit of chatter about it and eventually
it starts taking off, and everybody hears about it when it's big and growing and then
it gets boring and old. You want to try those things earlier on and went through and did
it. This is a slide pulled out of his old presentation. What he would do is he would
look at new sources and forums and blogs, for information and key words and kind of
pull them out and see what was trending on there, with the idea that that's kind of a
precursor to seeing that early chatter about it, so something can take off.
This one in this particular case, this is ‑‑ the red line shows how many times the word
"palladium" showed up in news reports and forums. And the blue is the "price of palladium"
and you can see clearly there's a lot of chatter about it before the price spiked up and then
the chatter dropped off before the price came back down. It's a really good indicator about
predicting the future there, what's going on.
Anyway, that thought inspired me and when I was ‑‑ when I was teaching, I would
have students who could come to me and they would want to know, what skills do they need
to get a good job and all of that and I tried to apply what Broward had done in a similar
way by monitoring and observing trends and this is single variable observation. It's
doing some correlation and it started off looking at Craigslist data, just because Craigslist
is nicely available. It's well organized by geographic location and you can go in in certain
categories, like where they have the job postings in there, it's categories by different types
of jobs. Craigslist is not the best place to look for
jobs but it had some interesting properties in that there are a lot of small companies
that post on there or maybe trying new things, a lot of entrepreneurial companies, start‑ups
things like that are starting there, not so much the big ones. That tends to skew it a
little bit more towards being a leading indicator, something that is ‑‑ it will come out
a bit ahead of the curve. So some of the things that I ended up looking at, just because I
found correlations in here were jobs, items for sale and adult services. And I didn't ‑‑
I'm not saying I looked for adult services on Craigslist. It's just my research took
me there. (Laughter).
So, you know, the things I saw, looked like this. This is an example. This is just showing
job postings by date. And there is a ‑‑ this is showing ‑‑ the dips you see there,
these are weekly trends, it goes dead on the weekends. There's a spike on a Monday and
a spike on a Friday. Okay, it's kind of boring. It's sort of interesting, but not unexpected.
There were certain things that stood out. In this particular case, one of the things
that jumped out at me was Austin never had a spike on a Friday. It always dropped off.
It's kind of hard to see, but it's the orange line in there. It never has a second spike
in it. I thought that was interesting. The other thing, and this is what came out
of the adult services was there was a correlation between adult services being offered or bicycles
for sale or a lot of items for sale. This led to a couple of interesting discussions.
One of my favorite moments at DEF CON, when someone said, hey, I think I can help you
out, I'm from Austin and my sister is a ***. (Laughter).
So that and then it led into a discussion of things could you sell one time, like a bicycle
or something that you can sell over and over and over again.
So, okay, that's what I had done before, and we had looked at that. There's some interesting
stuff there, but I wanted to dig a bit deeper into the data and look for more relationships
and more correlations between data and hopefully be able to pull in other sources and do some
fusions on this. I started looking for things like different cycles in like the job postings
or correlations in them. Because at the time when I was working on them, keep in mind,
I was really trying to help out some of the students and figure out what skills they needed
and what would really help them get ahead. There were definitely correlations in there.
You know, there are things that you would see, but nothing unexpected. Nothing really
interesting that jumped out in related skills. Could you say if a job was going to have one
particular tool set or skill set listed, there are other ones likely to be listed with it
as well. Again, nothing really jumped out at me as being unexpected out of it, but eventually,
there were a couple of interesting things that showed up. One was kind of funny and
it was how often the words drug test or drug screen showed up in a job advertisement correlated
with the different skills in it. And apparently, like ‑‑
(Laughter) If you don't think you are going to pass a
drug test, don't bother learning SAP, because it won't do you any good. If you want to develop
IOS applications, you know, go knock yourself out.
(Laughter). You know, I guess there's probably some logic
here, like how corporate or uncorporate the environment is, I suppose.
Another thing was looking at jobs that had benefits, and like retirement and health and
medical. You know, the interesting one, the best one was COBOL but I think it was a bit
of an outlier, there were so few jobs with COBOL. I guess to get any grizzled COBOL programmer
to come work for you, you had to give them a lot of benefits. Python and Android, and
HTML, you won't give them much in the way of benefits, I suppose.
As I was looking into this ‑‑ actually, this is much more recently. This is earlier
this year. I came across this article. This is actually out of the journal of "Psychology"
where a psychologist, Dorothy Brambel. She looked at the missed connections of Craigslist.
This is where people say, oh, I saw you as I was walking across the parking lot and tried
to catch your eye and they post it and hope they will make a connection.
This is organized by state. This is where people had the most missed connections. Walmart
has a log on the South. (Laughter).
You know, Oklahoma, it's the state fair. Of course! You know, it makes perfect sense.
And, you know, in Nevada, it's casinos. And the one thing I just had to put this up there,
one thing that just jumped out at me like crazy was Indiana, it's at home.
(Laughter). I don't know what they're doing in Indiana,
but I'm pretty sure they are doing it wrong. (Laughter).
So I was talking with a friend of mine about this stuff, Dave Kerbletski and he told me
about something that he had done in his neighborhood in Orlando, Florida. They had a rash of crime,
recently and they didn't know they had a rash of crime, until the neighbors got talking
to even other. Everybody knew a little different incident that had happened. He went and did
some searching and found out that there was some open source data that the sheriff's department
would post about their CAD, their dispatch calls and he wrote this little tool to do
some geolocating on it and tweet it out and then you can subscribe to it, and get tweets
from this thing, like, really hyper local things for your neighborhood, about what's
going on there. And it's actually one thing that's fun. I
pulled this up earlier today, and, like, you know, I was just noticing things. This is
in Orlando area. You know, the first tweet that's on there, and I'm amazed at the ‑‑
you know, the sheriff's office is putting this out, they are basically saying there's
a designated patrol area available, which means there's an area that nobody is patrolling
it currently. And this is down, like, in a real tourist trap part of Orlando. That could
be useful information to somebody, to know there are no cops there right now.
And then there's a few accidents and then I guess the people at the bottom, down on
Poppy Avenue would be happy to note that there's a fugitive from justice running around in
their area. This kind of led us to look into more sources
for data. What they offered where we were, wasn't very ‑‑ wasn't very useful or
organized. We found out and started looking in places that kind of subscribe to the open
gov system and this is a movement to have more transparent government data. Some cities
publish huge amounts of data about what's going on in their city, the fire department,
the police department, live interesting data, and Seattle, Boston, Chicago, a number of
others. These are three that I spent a bit of time looking at. There's information about
incidents that are going on, like, police, fire.
In Chicago, you can actually track where the snowplows are in the city. You can track where
garbage trucks are in realtime, from the city, which I just find really kind of fascinating.
There's information about where bicycle racks, public toilets, land marks and even where
cameras are, where the city has all of its cameras posted, which that one I thought was
actually particularly interesting but you can really go on here and make a map of what
is an observable location throughout the city, and what is not an observable location. Which,
again, that could be useful information for somebody.
Here's something, the Seattle one is great. They have their visualization tools built
right into this thing and this is showing a map showing police incidents over a period
of time, around in part of Seattle and I pulled up this area. You notice, like most of it ‑‑
everything is kind of in that same yellow orange, except for one big glowing red blob
out there and, you know, over in Georgetown. I don't know if anybody is from Seattle here.
But I'm like wondering what the heck is going on over in Georgetown.
And you look in a little bit closer and right next to it is the Boeing Propulsion Engineering
Labs, which makes me feel really good. So coming back, to an area I know a bit more
about, back in Orlando, we pulled up data that had ‑‑ we pulled out traffic ticket,
they don't publish data about who got ticket, but you can see when a traffic stop occurred.
And I ‑‑ we looked at it and pulled data, that covered three roads in the area and this
is right out by the University of Central Florida. These are three roads that run all
east‑west and they are the three major roads just kind of ‑‑ one is right into the
university, and one is a bit north and one is a bit south and they all have about the
same amount of traffic on and they all have a similar traffic pattern. And when we went
through and what this chart is showing here is this is each one of the groupings is a ‑‑
is a week long period, five week days. And then it's repeated over six weeks.
And one of the things I found really interesting was the chance of a traffic ticket occurring
on one of these roads, the order ‑‑ it was always likely at different times of the
day. It always followed the same sort of pattern, particularly between this highway 50 and University
Boulevard. The Highway 50 traffic stops all preceded the University Boulevard traffic
stops and when you go out there and you look at the traffic, the traffic pattern is not
really any different. So if you start thinking about this and start putting together, well,
why ‑‑ you know, why do you always see one before the other, I don't have ‑‑
you know I don't have hard evidence to back this up, but our belief is you are seeing
an influence of the patrol pattern of the police in the city.
So you are actually able to kind of get in there and through their information that they
are putting out sort of start tracking them. It's kind of like, you know, there's a talk
we went to earlier yesterday, I guess it was, there's a great talk with Brendan O'Connor
that was talking about tracking people by seeing, like, information their devices are
spitting out on wireless networks. It's a similar concept, that they are putting
out a lot of information here, that is ‑‑ that if you look at it the right way and you
take the right pieces of data, and put it to go, you can pull a lot more information
out about what their ‑‑ about what they're doing and what's going on.
So, you know, why ‑‑ so by this time, I have kind of changed kind of what I was
interested in doing and probably because I quit teaching and I left the university so
I don't have students anymore. I'm not that interested in helping people find jobs. So
now I found it kind of interesting to look at these government entities and the police
and other things that were going on and also because I have worked with law enforcement
a lot. It's kind of interesting to see how on one hand, they are very protective of their
data, but at the same time, they are putting out a lot of information that I'm not sure
that they quite realize how much that they are putting out there.
Frankly, I think it's actually kind of a good thing. I like being able to have more information
and being able to look back on them and like I say, why should the NSA have all the fun
on spying on people? So the ‑‑ what's next with this, and
there's ‑‑ there so much more I would like to talk about, but these 20‑minute
talks you have to be kind of fast in. What I'm really interested in is actually
expanding the model that we have been using on this data to be analyzed. We kind of built
things that are very purpose driven, that the first net of analysis, we did was very
structured around the seeking out the jobs, doing that, and then kind of got side tracked
by the crime and going off that direction. And I want to bring this back together and
try to build a more robust model for analyzing this data and throw some data mining at this,
where so far a lot of what we have done has been what I would say is like hypothesis based
where I make a prediction about something I should see in here and then a correlation
and see whether it exists in the data or doesn't exist.
And I'm sure there's a lot of relations in there that are things that I wouldn't expect
or I wouldn't find otherwise, I want to throw a bit of sort of data mining and kind of that
sort of blind either ‑‑ either AI or brute force approach to finding relations
throughout the data. So I think I'm about out of time right now
and I'm getting a nod from the back and so I will wrap it up there and if there are any
questions, I would happy to take a couple until they cut me off.
(Applause). Thank you.
Okay. Thank you.