Tip:
Highlight text to annotate it
X
In this video, we are going to continue on from our previous video where we had brought
in several hundred thousand records from the federal election campaign of president Obama
in 2008. We will actually perform some queries against this data. We will look at how Access,
and relational databases in general, can be used to analyze large sums of data.
If I go to the Relationships view, I see my primary keys in each table and I see the one
to many relationship to the foreign key in the contribution table. If I go to the contribution
table, I see the primary key, the contribution amount, and the date of the contribution,
and I see the foreign key to the contribution table.
If you look at the number of records, I have 317,000 records in the contribution table
and I have 76,000 records in the contributor table. If I look at the contributor table,
I have the name, city, employer, occupation, and primary key for each contributor. I am
going to close both of my tables and I am going to create some queries. I am going to
go to the Create ribbon and click on Query Design.
I am going to add both of my tables. I can do this control-clicking on each table to
make sure it is included or I can also double click on a table and it will add it, and I
will click on close. Because my relationship has been setup by my analyze table function,
I can see the relationship that exists between these two tables. If I just wanted to dump
the information, like what I had in Excel, I could drag down the contributor name, city,
employer, occupation, the receipt amount, and the receipt date.
Notice that I am ignoring the keys, both the primary keys that exist in both tables, and
the foreign key that helps me build my relationship. When I click on the run button, I see all
the information as it appeared in my Excel spreadsheet.
If I wanted to, I could create a query that contained only the contributor table and just
show the contributor names.
If I wanted to see all of the contributors that were attorneys, I can right click on
my query, go to Design view, and where it says criteria, I can type in the word “attorney”
in quotes and then I can run my query. And I can see all of the attorneys that are in
my database. There were 6,424 that identified themselves as attorneys. In some cases, I
may see typos or duplicate records that were introduced by spelling errors when forms were
submitted or changes in employment. The data isn’t terribly accurate.
I want to some analysis with my data. For example, I will go back to the Create ribbon
and click on Query Design, and I am interested in seeing which occupations donated the most
money. So I am going to drag down the occupation field, and I am going to drag down the receipt
amount field, and then I am going to go to the Query Tools Design ribbon and click on
Totals, and I want to group by occupation while summing the contribution amount. And
also what I will do is I will sort by the contribution amount descending, which means
it is going to show me my highest summed contribution amount first and I will click on run.
Notice that attorneys are the highest contributors, followed by retirees, people who did not type
in their information. If I wanted to I could change my sort order here. We also see blank
data, see lawyers, physicians, consultants, etc.
I am going to create a new query. This time I want to see which city on average contributed
the most money. So I am going to pull down contributor city, and then I am going to pull
down contributor amount. I am going to click on Totals. I am going to group by contributor
city, but I am going to average contribution amount, and I am going to click on run. I
forgot to sort my data, so I can right click and say Sort Largest to Smallest and do it
outside of my query. Looking at this data, it strikes that the
information may not be terribly reliable because it is very likely that the max contribution
of $2,300 could be donated by one person and their city will then appear at the top of
the list. So I am going to go back to Query Design and this time I will remember my sort.
And I am going to add a criteria so I am going to pull down one of the contribution fields.
It doesn’t really matter which one I pull down because I am still grouping by the contribution
city and I am going to count the number of contributions in each city. And then I am
going to run my query to see if this affecting my results.
Now when I run my query, I can see that my suspicion was accurate: that cities with only
one contributor or two contributors skew the results. I want to ignore these, so let me
take a brief look at my data and it looks like I have a few cities where we have many
contributors. So I am going to pick a number, let’s say 10, and I am going to make that
the requirement for information to appear in my query. I am going to go to the criteria
field for the contribution count and I am going to put in a greater than sign (