Tip:
Highlight text to annotate it
X
>> Good morning, everybody. >> How are you guys, awake or sober?
>> Who slept in this room last night, and that's the only reason you are here?
>> One guy. >> JAIME FILSON: Okay. So this is GitDigger,
I'm WiK. >> ROB FULLER: I'm Mubix.
>> JAIME FILSON: So last night, at random, well, not random for Mubix, but we ran into
a taxi line and decided to go with him over to Pawn Stars. Everyone knows Pawn Stars?
So inside, we're walking around. We're looking at the souvenirs and all of a sudden we notice
this kiosk. Everybody is using it. What's that?
Well, we walk up to it and it has a camera. You can take a picture of yourself and they
allow you to log in with your user name and password to Facebook, Twitter, to send an
image to yourself or to tweet it out to the public.
(Chuckles). So I email to myself. I'm not giving them
anything. And this is the result on the screen.
>> ROB FULLER: Legit, right? >> JAIME FILSON: So I did most of the research.
I did all the research! >> ROB FULLER: That's me.
>> JAIME FILSON: Yeah, that's him. >> ROB FULLER: So we are not the first ones
to make wordlists. Sebastian French something, he's an awesome guy. I'm not trying to make
fun of him and also all of Matt Weir's stuff. If you haven't used his keyboard dictionary,
it's one of the best ones to find people who just use, you know, along the way.
And the other people who make awesome wordlists, you rock. Going on.
>> JAIME FILSON: So we weren't the first ones to go digging through source code. SVN digger
was released. They went through a ton of SVN repositories, linked through and then published
the frequency count of all the files and all the directories that they found and pulled
down from, I forget exactly where they pulled them down from.
>> ROB FULLER: Google Code. Just to point out really quick, if you take a picture of
that QR code, we are not trying to hack you. It's linked to the information.
>> JAIME FILSON: I made them, not him. So they are good to go.
>> ROB FULLER: The only problem with using Google Code and stuff like that, they like
to put these captures in, which makes it hard to automate stuff.
So this is ‑‑ >> JAIME FILSON: So this is how everything
got started. 2:00 in the morning, somebody posts a link to SVN digger. Everybody thinks
it's cool. I haven't seen anything like it before then. And Rob was like that's awesome.
That one line, that's why he's standing up here right now, because of that one line of
code. So I'm like, oh, this is awesome. I can do this crap, 30 minutes or so, I will
go to bed, wake up in the morning and the code will be done and I will have an awesome
wordlist. So my first problem was that I couldn't find at 2:00 in the morning, mind you, I
couldn't find a good way to get all the repositories. So I started to go to their git help list
is the most forked and I used some Python and started web scraping all of that. So do
some basic Python, I'm web scraping that. I'm saving it in SQLite, user names and project
names and then just sent my computer loose cloning all the repositories.
So now what do I do with it? I have these repositories. I'm using OSWOC to go through
each repository and keep a count of the user ‑‑ the file name and the directory. I'm doing
a whole lot of said grab oc, just trying to clean everything up and make it nice and easy.
There was a ton of manual review, because I thought it would be easy to go through and
pull out all the user names and passwords, and email addresses I found in this code.
So I spent about 17 hours total on my 30 minute project and all kinds of hours trying to pull
out user names and passwords, and I've got a mile line of said that I just copy and paste
and come back later. So OS.walk was taking forever to go through
and find everything. I thought there's got to be a better way to do this. After some
Google fool, I found betterwalk which claims that OS.walk makes unnecessary ISP calls,
is this a folder, is this a file. We don't know, API, please tell me and they cut that
out of their loop, which speeds things up to two and a half times.
So the good news is, I've got some awesome wordlists. And I posted them out on IRC. Everybody
loved them. I was like, great. But the bad news is I only have some repositories. I have
maybe the most popular repositories and that's it. SQL transactions were extremely slow.
It took maybe about 30 seconds to go is this already in my table? Yes? Okay. Let's add
one to the count. And the 17 hours of manual labor, really sucked
because I am the laziest *** on the planet. If I could have got my goon to carry me in
here, I would have. And my hard drive was full. I've had terabytes
of this data. So everybody liked it. So I'm like, okay, let's get a little serious. How
can I make this better? How can I streamline it? How can I not do 17 hours of manual labor.
First problem, storage. How am I going to store all the data? So my first thought, I
did some Googling in bitcasa, awesome, $99 a year, unlimited space. Built‑in indexing
so I can give people access to all the code and they can search for whatever in the world
they want and get it. At that time, six months ago, at that time,
there was only a Windows client. It crashed every time I tried to launch a robocopy or
just simple copy and paste, and it was extremely slow, because they encrypted all the data
on the upswing. So what might have taken me six days to upload a terabyte with my slow
*** connection would have taken, like, a month. The next option, which I thought was the option
was to have a NAS. Everything was stored in one place. It was protected. I could download
directly to it's but it's hard to get free money for these things. So I had three terabytes
already. So my solution, right there is the first ten terabytes of all the data.
(Chuckles) >> ROB FULLER: That's awesome!
>> JAIME FILSON: So the next problem is how can I make downloading these repositories
better, easier? How can I get all of the repositories? So when I was actually awake, I found the
API which I felt incredibly stupid not knowing about. And it's nice because the AP I. gives
you all kinds of nice, useful information. The only thing I haven't found is they will
tell you it's a fork of a project, but they don't tell you who was the main project, who
it was forked from. So I can keep track of how popular a project is, but I have no idea
which guy was the original. So database, SQLite sucks really bad when
you are trying to store a lot of data. I searched to my SQL. I've had questions in the past,
why didn't I use PostgreSQL, I know my SQL and again, I'm lazy. I didn't want to learn
something new. So let's put this all together now. So now
I have two main scripts. I've got the first Python script that's threaded, goes through,
downloads all the data. It's got another mode that will go through and process all of that
data. And then I have another script which I will talk a little bit more about that just
takes a long list of user names, passwords, email addresses, and I pass it to the table
name and it just goes and dumps all the data into that table. The MySQL database, I created
a table to keep track of more product information, more project information and the user names
and pass words and everything now has its own table.
And I'm keeping track of the last seen ID so that I don't have to start over or repeat
myself. So here's how the downloading works. Downloader
goes out to the API and says, give me 100 repositories. I saw ‑‑ I have already
seen 5,000. So GitHub comes back at you and says, okay, here's the next 100. So it downloads
it, dumps it into the database that I've got it and then automatically clones the repository
to my hard drive. Unfortunately, the processing got a little
better, but there's still a lot of manual work. So now, the processor mode is checking
my database, going okay, I don't have this repository, but I know it exists. It downloads
it. Great. Or it ‑‑ it goes through and auto loops it. It does a betterwalk on it.
And now if you notice the red line, that's all of my manual work. So I have to grep all
of this data, pull out user names, passwords, emails, RSA keys. All kinds of fun stuff,
and then clean it up which can take for a one grep session for one day can take four
days for me to go through and clean it all up and dump it into the database.
And then I have a Bash script that will connect to the database and dump everything and create
the wordlists and automatically send it back up to GitHub which is a real irony. I'm downloading
all of their data and yet storing it on GitHub. So the updated news. I now have all the repositories.
I can now get every single public one. Generating the wordlists with Bash script
takes minutes once everything is in the database. Because of the updates I did to the database,
I can store the repositories. It will tell me which one to go to get. The sucky part
about that is if I want to go back and grep for more stuff, I have to get this giant hub
and plug all of these hard drives in at the same time.
>> ROB FULLER: It's awesome. You should see it.
>> JAIME FILSON: Yeah. I'm estimating that it will take about 30 terabytes to download
all the repositories, however, I'm pulling that number out of my butt based off of the
first ‑‑ the amounts of repositories I got from the first 10 terabytes, because
everybody is uploading new stuff every single day. I could probably continue with this project
forever and never see the end of GitHub. >> ROB FULLER: So this is the big data drinking
game. If you just heard me say, "big data" drink, but you guys are all hungover. So I
won't ask you to do it. So obviously this is a build up to the actual
worldlist. What did we get out of it? So anyone with kids knows exactly how this goes. So
how does this go. Dun, dun, duuunnnnn! You can get the movie and just fast forward it
to that part of the movie. It's the best part. >> ROB FULLER: These are pretty straightforward
lists but the cool thing is what we see inside of them and we're not just talking about password
lists. That's the obvious use, right? I'm going to have a set of passwords that I'm
going to use against it. The all directories list and all files list is awesome, when you
are talking about web application attacks, and the user names. I didn't know that so
many people loved Bob, but they do. More than admin. So stats.
Pretty pictures. >> JAIME FILSON: I promise, this is the only
stat it's. I just wanted to give an overview of how many passwords are in the database,
versus how many are actually unique to each section.
>> ROB FULLER: So this is where it gets relevant to what I do. I'm a senior red teamer and
one of the things ‑‑ I just break stuff. I already talked about forced browsing. The
SVN digger kind of started that whole thing. The great thing about forced browsing is when
you get a set of the directories or wordlists or stuff like that, you can just exactly like
DirBuster, you can go through and find it. You can use these wordlists with DirBusters.
The small default pass wordlist which is not exactly like the same thing that I would have
expected as the default passwords and you start with root tore, blah. Static salts,
it's hilarious when you have a salt for passwords and then that repository is used as an application
out there in the real world. >> JAIME FILSON: I actually stopped pulling
out static salts, because there's so many! And I'm never going to get this done in time
to do a CFP on the project if all I did was pull out the static salts.
>> ROB FULLER: So five minutes? So number 22 on the list of files is exception.php.
I never, ever, looked for that when I was looking at a web application, even a php one.
But after WiK had done his research and shared the list, I got code execution because it
was loading the exception information and you could identify any list you want. That's
brute force browsing. And this is pretty awesome. This is one of
my favorites, NTLM SSO magic, do you know what that does? It has your user name and
password statically assigned in there. So it does NTLM.
All right. So real world stuff? Anyone see this release? The secret tokens for rails?
If you have a secret token stored in your repository and it's also used in your production,
without you clanging it, it's direct remote code execution.
So this is the gentleman, and I'm going to butcher his name ‑‑ I won't butcher his
name. He sent out an email to all 1,000 users who had this in their repositories.
>> JAIME FILSON: I'm much too lazy to do all of that.
>> ROB FULLER: You start parsing every file from the git repository. Right now WiK isn't,
but if you store your password, then the gentleman who just said it, removes it, but you can
go back in the history if you don't nuke it. Mass static code analysis: You can find a
ton of things really quickly. And .svn, when you convert an svn repository
into a git repository, sometimes people forget to delete those things and they can have configs,
cluing database configs and all kinds of things. Git ignore is an amazing little file that
tells your git repository what files to never look to commit.
Those are exactly the files that I want to look for. Because those are the things that
are important. So I usually look for that. 403 on empty directory. On GitHub or on git ‑‑
as well as SVN, it doesn't let you create a directory and commit it, unless there's
something in it. So MT directory and DS stores are usually how some people do it.
Another thing is running OCR on all the images. We actually found a gentleman or a girl that
had their password stored in an image for their repository. It was awesome!
Using the list of text files, grepping out all the emails which he already does and I'm
stopping because it gives all the ideas and we're done!
(Applause). >> JAIME FILSON: Thank you.
I actually want to give a quick thank you to nova hackers. There are any Nova hackers
in the room? >> ROB FULLER: Boo! You all suck.
>> JAIME FILSON: They suck. But without their help and support, encouragement, I would have
never kept going with this project, because they helped me out with resources. I now have
a file server which can store up to 34 terabytes of data. So once I get the original 10 bytes
switched over, I'm going to start downloading, and pulling out some more stuff.
>> ROB FULLER: Cool stuff? No? Everyone is waiting for the next talk? Questions? All
right. Cool. >> JAIME FILSON: Thanks. Thanks for coming.
(Applause) >> So for those of you filtering into the
room and looking, Made Open Hacking is about to start in ten minutes. The schedule for
Track 2 is really messed up. There are some tracks that didn't even make it on to the
schedule. Please stop by the Information Booth if you want an updated schedule in about an
hour. They're getting PDFs printed right now and they should have them in about an hour.
If you want the schedule right now, the one on the website is the most up to date, however,
they are doing a weird thing where they are telling you the ends of talk and not the start
times. So the start time of a talk is 10 minutes after the one preceding it ends.
Yeah.