Tip:
Highlight text to annotate it
X
Hi. Welcome to the Vidya tutorial Starting with Data.
This tutorial is for beginners in software development
who want to learn just enough to access data on the web
and visualize it on their own websites or mobile device
applications. The files used in this tutorial are
available on Github at the link provided. Please download them
if you would like to follow along.
You know the Internet is full of information.
News, weather, sports, history, dating advice,
dogs that look like their owners, and on and on. We’re used to looking
information up ourselves and consuming it in a manual way. It turns
out though that there are a lot of providers who
distribute information we can consume in an automated way.
Google, Facebook, Twitter, LinkedIn, Amazon, Yahoo, the New York Times,
ESPN, the United States government, and many more.
If you know how to harness this information, what you can
do with it is only limited by your imagination. There’s a reason data
science has become a thing.
Donors Choose is a wonderful online charity where you can
donate to American public school teachers who
don’t have the funding they need to help their students
learn. Many celebrities and well over a million
ordinary Americans have made a difference in the lives
of over 10 million students thanks to Donors Choose.
Donors Choose ALSO exposes their
data for us to consume in an automated
way.
In fact, Comedy Central satirist Dr. Stephen T. Colbert
DFA hosted a contest to find software applications that
most creatively exploited Donors Choose data. Before we
learn how to do the same thing, we need to understand some
terminology.
Providers like Donors Choose offer SERVICES
that let you fetch different
kinds of data depending on what you ask for.
The set of rules for how you interact with these
services is called an API. That stands for Application
Programming Interface, but no one cares. An API defines
how you ask a service to do something and
what you can expect in return. After all, if you are going
to write software to interact with these services, there
have to be strict rules in place because software can’t
make judgments like a human can. At least not until
Skynet is self-aware.
These API’s are REST API’s.
REST stands for Representational State Transfer, but no one
cares about that either. REST is a big, complex,
topic. All you need to know for now
is that fetching data with REST is exactly
like fetching web pages in a browser.
Think about how you do that. Typically, you click on a URL
or type a URL in the address bar of your browser to fetch
a web page. REST works the same way. You provide a URL,
but instead of fetching a web page, you fetch data. What the
URL looks like going in and what the data look like coming back
are defined by the API. Anyone who uses a service,
whether it’s a person or code, is called a CLIENT of the service.
So clients use URL’s going in, but the
typical format for data coming back is
JSON. That stands for JavaScript Object
Notation. You guessed it; no one cares. You
are probably used to seeing data in one place like a
spreadsheet or a chart in a
document. JSON is a format for
moving data from one place to another. And since
every modern programming language knows how to read
and write JSON, two applications
written in completely different languages can
still communicate with it. This is why JSON is
such a great fit for REST services.
Now that we have our terminology straight, let’s
get back to Donors Choose.
Here is the Donor’s Choose Developer Guide,
which describes the API’s for their services. This
might look a bit imposing, especially if you are
new to this sort of thing. Don’t worry about
understanding details. Focus instead on the big
picture for now. There is plenty of time to work out the
details later.
Here is a list of specific parameters the API needs.
As you can see, the only required one is APIKey.
Typically, you have to sign up to use a service, and
when you do, the provider gives you a key to attach
to your request for data. But Donors Choose is nice enough
to let us use the DONORSCHOOSE key to test with.
When we scroll down a bit, we start to see some examples
of how to query the data. Here is an example of a search on
music projects. Notice how the query looks just
like a URL you use with a browser. In fact, let’s just
click on it and see what happens.
And here is your first taste of what JSON looks like.
That’s a lot of stuff, so it might look
overwhelming. But if you look at it carefully, you can
see it isn’t really that complicated. You see the search
term and the total number of results.
Look further, and you will see each project has a
title and a description.
Keep scrolling and you
will find the teacher’s name, that this is a high poverty
project, where the school is, and a lot of other details.
So now comes the fun part. Sometimes you will know
exactly what you want to do, and it’s just a matter of
scanning the API documentation to see how to retrieve
the data you want. Other times, you will have no idea
what you want to do, and like a shopper at the mall, you will
have scan the API to see what strikes you.
This first proposal gives me an idea.
We can see this is a high-poverty project and
where it’s located. Let’s
see all the states with high poverty projects.
Can we do that?
Look at the Special Categories
section. There is a highLevelPoverty parameter. Setting it to
true in your query request will fetch only high poverty projects.
Let’s do that.
The JSON looks just like before except that instead
of everything, only high
poverty projects are included. It would be pretty cool to
present the states that have high poverty projects
on a map of the United States.
The next questions are obvious. How can we do what
we just did manually in an automated way? How do we get the
data from Donors Choose to our map? How do we even
MAKE a map? This is finally where we build something.
Check out this gorgeous diagram to see
what we are going to build. We already have
the Donor’s Choose REST service.
We will create a map that will show
the states with high-poverty projects
in blue. To do that, the map doesn’t need all that JSON we
saw in the response from Donors Choose--just the
information about the states. We need something between
Donors Choose and our map to filter out state data.
We can build OUR OWN REST SERVICE just like
Donors Choose did. The map will call our service
with a URL like we always do in REST. Then our service will
call Donors Choose with the URL we just used.
They will respond with all the high-poverty
projects in JSON as we’ve seen. Our service will filter
out just the state data and return the states in JSON
to the map. The map will use that list to color those states blue.
There are a lot of technologies we can use to build our
map and REST service. For the map, we will use
what every web page uses, HTML.
We will also use a graphics technology called SVG, and a
JavaScript visualization library called D3. For our
REST service, we will use the programming language
Python and a REST library called Flask. As for Donors
Choose, I have no idea what they use to build their
stuff. But as we said before, using JSON and REST
means we don’t have to know or care what Donor’s
Choose uses.
I know this sounds like a lot, but you will be amazed at
how little code it will take to do this. Besides, like I said
before, focus on the big picture and worry about the
rest later.
First, we need a map of the United States that we
can color in code. The easiest way to do this is
to use SVG, which stands for Scalable Vector Graphics.
But no one cares. The details don’t really matter
since we are just going to copy and paste.
Seriously! To find an SVG map of the United States, just
Google it and you will find a zillion maps like this one.
We can copy and paste the SVG markup
into our own HTML like this.
And now let’s look at our web page in a browser
It’s that easy to add a map. Now
let’s go back to the markup on the web page.
Much like the JSON before, there is a lot of stuff
in the SVG, but it isn’t too complicated since it is
just XML, which looks like HTML. Look at all these path
elements and their ID's. You might notice the ID’s
are state abbreviations. Web developers
know that you can use JavaScript to manipulate
HTML elements by their ID's. We can do the same
with SVG elements.
If we want to turn Texas
blue, for example, we just have to reference the
path element with ID TX and turn it blue. This makes
things a lot easier for us. The service we write only has
to get the list of high poverty projects from Donors Choose,
filter out the state abbreviations, and send them back to
the map. Let’s write this service.
We chose Python for this. It just might be the
best get up and go language out there because it is
both simple and powerful. Flask is a really nice Python library
for building your own
services without having to get your hands dirty with that
HTTP stuff and setting up a server and port and all that.
Let’s take a
look at how to use Python and Flask to create a service
that fetches the high poverty projects from Donors Choose
and filters out the state abbreviations.
In a lot of programming languages, we often dedicate blocks
of code called functions to perform a particular task.
Here, each function performs one service.
Flask lets us easily map those services to a URL that a REST
client, like our map, can call our service with. First, we have
a service called test where the URL is just a slash
and the result is just simple text. So if we make a call here,
we see this
Now that we know we are set up OK, let’s create a service that
can be reached at /projects/highpoverty/states' for our map to use.
We assign that URL to a function called high_poverty_states.
First, we use Python’s urllib2 library to make the call to Donors
Choose. Then we use Python’s JSON library to treat the
response as JSON and not just random text. Now let’s
refresh our memory of what the response from
Donors Choose looks like:
You can either check out the API documentation or just look
at a sample response as we have here. You can see the
response has proposals with the state abbreviations
labeled “state.” So back in our service…
We loop through all the proposals in the JSON and
accumulate a list of state abbreviations while throwing
away any duplicates--like if Texas shows up more than
once. The last thing we do is turn that list into JSON
so that the client, our map, can handle it. Let’s take
a look at what our service produces.
You may not understand all the code, but look at
how few lines it took to do stuff that seems
really complicated at first. It all comes down to just
fetching data, grabbing what we want, and sending that
back out. That’s our service.
The last thing we need to do is have our map call our
service and read the JSON that comes back to know
which states to color blue. This is where D3 comes in.
JavaScript is the programming language you use to
make web pages do things, and D3 is the preeminent
JavaScript library for visualizing data on the web.
Let’s take a look at how to use it with our map.
As you can see, once again it is just a few lines of code.
First, we use D3 to grab the SVG element in our web
page, which as you saw earlier contains our map. Then
we use D3 to call our service. Notice how the URL we call
matches the one we created for our service with Flask.
Finally, when that list of state abbreviations comes back
from our service, we tell D3 to loop over it. For each abbreviation,
we find the the path element inside the SVG map whose id matches
that abbreviation and color that state blue when we find it.
That’s it. We’re done. At last, let’s take a look at our map.
As you can see Texas and several other states are blue
because they have high poverty projects in Donors
Choose. And with that, we have leveraged REST
services, some SVG copy-and-paste, and only about ten lines
of code or so with Python in Flask and D3
to create an end-to-end application that fetches
data, processes it, and renders a visualization to help
others understand the data.
I do have to come clean about one thing. As it turns out,
we actually don’t need to create a service with Python
and Flask. Because filtering is so simple, we could have
just used JavaScript by itself to make
a REST call to Donors Choose directly and filter out
the states before rendering them on the map.
But I wanted to show you how simple and powerful
Python is, and a service like this will come in handy
when you want to do real analytics and other fun,
more complex things.
Let’s summarize.
There are services available all over the web to provide just
about any kind of data you can imagine. These services have
API’s that define how you can interact with them as a client, and you do
so over REST, which feels just like viewing a web page in a
browser. Every major programming language supports REST,
and we used Python both to create our own REST service
and to consume another because Python is a great
combination of simplicity and power.
We didn’t do any
data science, but we did do some simple
processing of the data and visualized it with SVG and D3 in
an intuitive way. Once you take the time to understand all
the code in detail, you can build on this to run
analytics on your data to produce some really interesting
results.
We really hope you enjoyed this tutorial and learned enough
to get started on your own REST service and data manipulation
and visualization work. We just scratched the surface here.
REST, Python, D3, Flask, and general application architecture
are all rich topics,
and we will explore them in greater detail in future tutorials
and blog posts on the Vidya website. We hope you will check
those out too. Thanks for watching. This has been
Starting with Data by Vidya.