Tip:
Highlight text to annotate it
X
I want to begin with is asking where you've been my whole life because I spent a large
part of my life sitting in front of World War II-era microfilm machines turning cranks
in rapture really while entire years of my youth passed by. I made the mistake of, I
went to graduate school not knowing I was going to be a historian and I made the mistake
one day of reading a newspaper on microfilm which had never occurred to me was possible
or that microfilm existed outside of old James Bond movies and it took me maybe 30 minutes
to make a fundamental discovery that shaped everything that I've done for the last 35
years which, oh my God this stuff actually happened, and it all happened at the same
time and people didn't know they were living in history and they didn't know how things
were going to turn out. What are we going to do about that? And my mom had been a 5th
Grade school teacher for 30 years told me when I went to graduate school to study history,
she said, "Well what for? We already know what happened." And I read those newspapers
and said, "Oh no, we don't." And so from that moment in front of the artifacts
that you folks are doing so much to share with the world, my life changed. From that,
a whole another, a 600 page book grew when I was looking for lynchings and discovered
Coca-Cola and football games on the same pages and I thought, "Ok, how would you tell that
story?" And then when I discovered the big difference it made in my life, I had 160 of
my students in my previous institution hit the microfilm machines and chronicle what
they found there in six months of a newspaper. And they would cuss me and tell me this is
*** my eyes, it's dark in there, all these kind of things and then at the end of
it they would say that was worth any number of books I could have read and certainly your
entire semester of lectures seeing what was in those newspapers. So then I wrote this
book in which I read every newspaper I could find from the 1890s and the American South,
which meant, you remember what this was like, going to the library and persuading people
we need this newspaper from Shreveport in 1892. Other people will use it, they will
really, please buy this for me. And they fell for it and I happily used papers and then
I actually bought my own microfilm machine so I could just be a complete nerd on weekends,
at nighttime, partly because I made the mistake of trying to eat M&M's while I was at a
microfilm machine trying to keep the span of attention longer and the librarian, "Mr.
Ayers. You're setting a terrible example for the students." It's microfilm and
they're M&M's, they melt in your mouth not in your hand. Really, it's going to
be OK. But of course now you go to libraries, "Another latte!" But back then it was
not that way. So then, you know you heard briefly about the valley of the shadow, I
thought it out back in 1991and then the World Wide Web comes along and we said, "We built
the thing. " And I don't get to use this very often at SGML which we thought stood
for Sounds Good Maybe Later. And it turned out Maybe Later turned out to be HTML, and
we were able to build sites since 1993 and I knew what needed to be on there. It needed
to be the newspapers that had changed my life. And so I said, "Surely we can OCR these
babies in 1993." You couldn't. But we could scan them all, we could digitize them
all. And we did that. And then it turned out that we didn't know that something called
the PDF file was going to be invented. And so we had the same group four facts compression
tiffs that we had to figure out some kind of device that would show them on the computer
screens at the time. And they just all choked when you called up a page of a newspaper.
They couldn't handle it but we persisted. And so then we transcribed 10,000 pages of
newspapers ourselves. Well ourselves being not me exactly but graduate students who were
paid nicely for it by NEH grants, thank you very much. In the Valley of the Shadow and
then xml-tagged and it's still one of the larger groups I think of xml-tagged newspapers
on these two communities in the American Civil War. Then I wrote a book in which you could
go to the digital version and see every newspaper or letter or diary in which it was based.
Then we wrote an article for the American Historical View that claims to be the first
native digital peer-reviewed article, and the fact that it was also the last one, and
it's ten years ago, is one of the reasons I'm before of you today. That's a little
discouraging that, here's my beef: you folks are helping us through the most profound social
change of our time, surely scholars can think of more to do with them, with the work that
you're doing than we have so far. And so that's why I jumped at this chance to think
about, because I've been trying to do some of my part in all this. Where this is a piece
of paper that managed, probably driving some people crazy. So I thought wouldn't it be
great if we could share this possibility of reading all these newspapers with lots of
people. So this is something I first thought of at the University of Virginia and then
took with me when I went to the University of Richmond in 2007. In which students at
classes all across the country, as it turns out the continent, can do research and instead
of just getting a B+ on it back and then "great" and throw it away and all that work is for
naught, wouldn't it be great if all those people who read six months of the newspaper
were actually able to preserve it and share it with other people. So I'll just do a
quick search for newspapers and see what we got. And all these things, at DSL.richmond.edu.
So here's a 191 articles about newspapers and you can see that the idea is that, that's
a monograph but it's only one paragraph long. It's got everything a monograph has
in it. It has original sources, it has secondary context, it makes an argument and you can
go back to where it came from and it fits all together. So I have no idea what that
is, we're not going to look at that, I don't want you paying attention to that, listen
to me while I'm talking, that's always the danger of showing things. You can map
it geographically, and so we're still expanding all that. Now that was devised before Chronicling
America really came online. And now, ever since, I've been trying to figure out exactly
what to do with it. One of my former students, Andrew Torget, at the University of North
Texas came in, and I'm sure all of you have seen this, in which you would go through and
for Texas newspapers, graphs that map the newspaper, the quality of the tagging, and
show it in maps, and then show the percentage of the words that are good words. And then
the stamp of the university they've mapped using Chronicling in the meta-data, the spread
of the newspapers across the United States. And all those things are really cool. What
I'm trying to think about, what do we do now, what else might we do with it. Voting
America has every vote in the United States from 1840-2008 and many of those were from
newspapers originally. We're doing a new digital edition of the Paullin Historical
Atlas, Atlas of the Historical Geography of the United States that will be out in a few
weeks from the digital scholarship lab. And we were looking on there and it said most
of this information came from newspapers or the Library of Congress. So things that we
now just think of as being there were not just always there. They were recorded by newspapers
and then transcribed into the iconic maps of American history. Some things too that
you might not think of as newspapers turn out to be with the sesquicentennial of the
Civil War coming, I thought it would be great if people actually had a chance to really
look in the records themselves to see why people said they seceded. So in Virginia delegates
came from, 152 delegates came from all the counties of Virginia to Richmond and debated
how they were going to save the United States. Most of them were Unionists; they came there.
We call it the Secession Convention now, but they didn't know it was the Secession Convention.
And they talked, and then they talked and talked. And then talked some more. And it
was all written down by newspaper reporters and recorded in the newspapers of Richmond.
And then in 1965, all that was gathered into a four-volume 3,000 page book, books that
were published by the Library of Virginia. So when the sesquicentennial came along, I
decided wouldn't it be great if we would actually make that available, so that people
could see what's in those 3,000 pages. And so it just read 3,000 pages and found every
time that slave, with an asterisk so slavery, slave holders mentioned, and everybody knows
that the Civil War was fought over state's rights which is why slavery is only mentioned
1,432 times in all of that. But this is newspapers that then became something else and now that
we're trying to find ways that you might look inside them to find the patterns. So
I did it as a map, you can do it as a time plot, it can show you the frequency with which
they mention the word, anything that has to do with slavery. So those are newspapers as
well so sometimes newspapers are not immediately visible but they underpin most of what we
know about the American past and certainly the 19th century American past. And I am grateful
for that. So what we're doing now is trying to think about making a kind of scholarship
that would be worthy of the work that you folks are doing. And one thing that we're
doing is, actually I think I'll show something else instead. I think the most sophisticated
work so far is done by my colleague Rob Nelson at the University and he took Mining the Dispatch,
newspapers, Richmond Dispatch that was tagged with an IMLS grant to XML and is doing topic
modeling on it. He's now, he'll be coming out with a new article that is comparing it
to the New York Times. Now I love this, we got that by buying the CD-ROM that they sell
for 50 bucks. And it has aski-text from all the New York Times articles from the Civil
War era. And OK, but there it was in clean, and now and then doing the comparison. And
you all know what topic modeling is; which is it basically the computer reads the newspapers
and finds the patterns within it without knowing what the words mean. And so if you look at
fugitive slave ads, the computer says well I notice all these words of ***, years,
reward, boy, man, name, jail, delivery, give, left, delivered and apparently those are runaway
slave ads. And then Rob is a, you can adjust the chart for various degrees of the threshold
by which it would recognize these. And then has the ads themselves and this is the critical
thing; how do we look at the big patterns and not lose the thing that makes the work
that you do so important. The newspapers are windows into the soul of America. They are,
every page is interesting. Even the ads, really. You can look at those, and I'm talking to
the converted I know here, but it's wonderful to assign this or to go to a general audience
and show them these things and just to watch their eyes open as they realize what these
newspapers are. Now Rob is comparing these and seeing how the northern and southern vocabularies
of war changed, how they ebbed and flowed with politics or with battles and it's remarkable
but the crucial thing is you're always able to go back to the original record. And so
the challenge before us right now is how do we combine the power of the analogue and the
power of the digitial. And you folks are doing that on a scale that no one else is. That
you are using the remarkable ingenuity and skill that you have to make it possible to
not just sit in front of a microfilm and hope that you come across a record of a crime,
which is what I did for two years in my dissertation. You feel a little ghoulish. Yes! No, I mean
I feel bad for you but all right some evidence you know? To do all that. And then do this
for everything else and what would it have meant what is it meaning right now for dissertations
that are being written, for books that are being written that Chronicling America exists.
Now I'll have to say that I recently published an essay called, not that one but the one
that was on there when I left it, which is called "Does Digitial Scholarship Have a
Future?" and I'm sure it's going to roll around. This is, let's see if we can
find it. There we are. And I started just to pretend I hadn't written it and just
say some things to you I said well no maybe I'll just go ahead and show you but here's
what I said. Basically it's a jeremiad. "Though the recent popularity of the phrase
digital scholarship reflects impressive interdisciplinary ambition and coherence, two crucial elements
remain in short supply in the emerging field. First the number of scholars willing to commit
themselves and their careers to digital scholarship has not kept pace with institutional opportunities."
By which I mean you all. You've given us incredible tools. And I keep saying, "Hey
everybody. Let's go write something that really takes advantage of that full capacity."
And instead you'll read a book or a journal article. it's got a newspaper in it, if
you figure the odds of them actually just stumbling over that would have been fairly
small and you know they found it through search and Chronicling, and it doesn't say that.
It just says this newspaper. Right? And so in some ways the contribution, I see people
smiling, the contributions that you're making some ways are hidden because here's the
thing; well let's read more and see what the thing is. "Second, today few scholars
are trying as they did earlier in the web's history, by which I mean way back in the early
90's, to re-imagine the form as well as the substance of scholarship. In some ways,
scholarly innovation has become domesticated with the very ubiquity of the web bringing
a lowered sense of excitement, possibility and urgency. These two deficiencies form a
reinforcing cycle. The diminished sense of possibility weakens the incentive for scholars
to take risks and the unwillingness to take risks limits the impact of the excitement
generated by boldly innovative projects." So what I'm saying to my colleagues is wow
look at this. Look at all these newspapers we have for American history. Surely we can
think of something cool and new to do with it. And people who for a long, I remember
when I first started talking to people about this World Wide Web thing and one of my colleagues,
slightly older said, "Isn't this a bit like the hula-hoop?" I said,"No, this
is a bit like television; it's going to change everything here pretty soon." And
now we don't even think about using technology when we use Google Earth or when we use Skype.
So what seemed ten years ago, if you're a faculty member trying to get people to adopt
technology they just take it for granted. But we skipped the stage that where we had
the idea that maybe we could do something ourselves with it instead of just shipping
PDFs around, of scholarship where we could've written 30 years ago. Maybe we can do new
kinds of scholarship if we would think about the capacity that's inherent in things like
Chronicling America. So what might that look like? Well we've tried to do one project
at the DSL in Richmond that has some of the elements of what we call generative scholarship.
And so this is Visualizing Emancipation. It's animation I won't play because experience
shows that your eyes will follow any moving object over here rather than listening to
me. And this, the blue dots are everywhere the United States Army was, across the era
of the entire Civil War. The red dots are the places where African Americans interacted
with that army. A large part of this is from the original records of the War of the Rebellion,
128 volumes that have been digitized, geo- indexed, and animated. But a lot of the other
sources and source types, one of the things that you can have are newspapers. And so you
can see here, these are evidence that we went with the newspapers, many of them Chronicaling
to be able to look for runaway ads. And what this shows, the reaction of enslaved people
to an unexpected moment of freedom. You hear the Union Army is somewhere nearby; maybe
you can make your way to it. Tragically one of the things that you see is that the outcome
often was not what they had hoped. Often it was abuse. Often it was conscription. Often
it was being dragged back into slavery if the Confederate Army caught you and so forth.
This wouldn't have been possible five years ago. And what else is not possible my colleague,
Scott Nesbit, who actually made this, is writing what would scholarship look like to describe
this. We're used to taking quotes and weaving them together into a story but what if you
start with a moving image that shows enormous complexity over an area the size of continental
Europe that shows the behaviors of millions of people. What would scholarship look like
if you did that? Will it be a journal article? Will it be blog entries? And you'll see
up at the top, add an event, which means that we'll be able to add the capacity of crowd
sourcing to this. It also says data download. And following the wonderful example of Chronicling,
be able to share all the information behind this. So we're trying to make a scholarship
worthy of the gift that you've given us. We're trying to think about how would we
rise to the possibility of plentitude, of the multitude of things that you've given
us. I'm sometimes chagrinned by looking in the newspaper and people seem to think
that history moves forward when you dig up a body to see if it actually has arsenic in
it, right. That's how we make new discoveries in history. Or somebody finds a box under
their grandmother's bed after she passes away with some old letters in it. But you
folks are giving us billions of words, of boxes that we have yet to open. But there
are patterns within it that we've only begun to glimpse. So I would like to let you know
that there are people out here who are so grateful for all that you're doing. We're
using what you're giving us in the way that you're giving it to us now. How much faster
could I have written books and if I could actually look for what I was looking for rather
than just wandering around and until I kicked over it, and there it is. At the same time
that I'm not giving up the serendipity, the surprise that comes from reading the whole
page, of seeing the simultaneity. So things that would have seemed like science fiction
when I was first starting in all this, you've created. And now I want you to help us have
the courage to write the kind of scholarship that can do justice to it. I think that we
need to be thinking about what Chronicling America makes visible to us. The distribution
and spread and decline of different kinds of newspapers but also how the fact that what
can they tell us and what can they not? They feel like the voice of the past, understanding
the political affiliation and all that, race is obvious and gender, kind of taken for granted
sometimes. So the point of being is that I hope that we'll have occasion going forward
for you all to say, you know I spent a lot of time with these. Here's what I think
a great study of it could look like. Here's what I think that a form of scholarship that's
actually native to the web. Now not a book, doesn't mean anything against books but
what it does mean is that we're living in the middle of the most profound social transformation
of our time. You're a part of it. Scholarship is kind of standing on the edge; you watch
entertainment, you watch video, you watch music, you watch books be transformed but
scholarship hasn't really figured out a way yet to use the power not just to disseminate
what we're writing otherwise, but to re-imagine what scholarship might be. You folks have
had that imagination, you're doing the work that allows us to dream big and I just wanted
to come here today and thank you. Thank you very much.
[Applause] Who wants to go first? I'm ready. I'm
Errol Somay; Director of the Virginia Newspaper project, Library of Virginia. My question
to you Dr. Ayers is, why not? Why isn't there more scholarship or energy behind this?
Is it just because it's just so overwhelming? If you'd read my essay, Errol, "Does Digital
Scholarship Have a Future?" There's many reasons; the risk-reward balance doesn't
look right. It's pretty clear what you need to do to get tenure and this isn't it. Right?
And so I'm trying to build a bridge from both sides. You know I was dean for a long
time and so sitting, judging a lot of tenure files and now I read all the tenure files.
And you can see that scholarship is a very clear thing. Scholarship is a contribution
to an ongoing conversation. It's not just any thought you happen to have about a subject.
And the reason we spend so much time in graduate school bringing people up on the literature
is so you can make a meaningful contribution to that conversation. And to the extent that
you're not, it's not scholarship and you're not going to get tenure. OK? So the trick
is for us to think about how do you make an argument with digital sources and through
digital means that you couldn't make otherwise. The most obvious answer is show a lot of stuff.
I've tried that. It takes a long time and people still don't think of it as scholarship
because unless it makes an identifiable argument that can be tested and then saying, OK now
we know this, we can go on to the next thing. So I think that, and here's the irony: presidents
love this because it's something for us to talk about, deans love it, department chairs
love it. Not so much the department chairs but departments like, "Oh. That looks kind
of like teaching or service, which are both good but they're not scholarship." And
so on one hand, people keep saying we need to change the standards of scholarship and
I say yeah but you kind of got to realize this is the game we signed up for. To contribute
to ongoing conversations. There's lots of ways to do that. Does yours? On the other
hand, relax. You can contribute to an ongoing conversation through a really brilliant blog
entry as well as a journal article. So both sides need to recognize what they can bring
to it. So I think maybe that's, it's nobody's fault, it's just that there's very few
incentives to do it. And the main incentive once you've tried it and you see that people
out there are using stuff online in far greater numbers than you could have gotten otherwise
and that people can instantaneously have access to it all over the world, you start saying,
"Oh this is why I would do this." And so I think that you know that what I worry
about is, I gave a talk at the Library of Congress maybe 10 years ago and a young woman
stood at the end of it, she says, "You know that makes me want to cry." And I said why
is that? She says, "Cause we would love to do the stuff that you're telling us to
but they won't let us." And the trick is figuring out who they is and sometimes
they is we. Hi I'm Millie Fries from Iowa. I come to this project as an educator, middle
school, high school, so first thank you for speaking a language I understand without acronyms
that, because I didn't understand that. What I am wondering why we would ever need
another middle school, high school history textbook again if we can teach with some of
the tools like you showed us. We would have kids so engaged in history we'd never have
to sell a subject ever again or have someone ask, "Well we already know what happened."
Why can't we have some kind of template sort of things where here's the topics you
want to cover, here are the resources, the newspapers, the primary sources. Can we move
to something like that and completely away from textbooks? That's a great question.
I taught a class at UR this spring called Touching the Past which is about all of the
different ways that the past is present in our lives. Video games and movies and had
them watch how much history was on television for a week and to compare the Bancroft Prize-winning
books with the bestselling books on Amazon, they felt sorry for us. And I asked them to
write their first essay, tell me about your experience in school up to this point with
history and they all said the same thing. I love my history teachers. I hated my textbook.
And the textbook seems like the thing that people equate history with. I meet people
on airplanes; I tell them I'm a historian. I always hated history, the names and dates.
Yeah I know, right? But why did that? It's because we've domesticated it to the textbook.
There's no reason to answer your question that if we make enough cool projects that
believe me I've worked with teachers, they would much prefer to have things that kids
can discover the truth for themselves rather than double-column, shrink-wrapped, processed,
corporatized textbooks. Of which I've written one. And this is so much better. So in all
honesty, what's Visualizing Emancipation is meant so people in middle school or high
school can actually imagine the most profound social change ever in this nation's history
which is the end of perpetual bondage of 4 million people. But rather than reading about
first there was the 13th Amendment then the 14th Amendment, maybe you could click on each
one of those red dots and then a story of an individual person trying to make himself
or herself free. That's the great promise I will end with this. Chronicling America
is what we should be doing; a great democratizing effort that if scholars can think of other
ways to present it, to channel it, it connects. The other thing the kids told me is that local
history is the history that really first got them interested and kept them connected to
history. We know that that's what people really care about. The work you're doing
is a way to make that bridge to the local. What I'm talking about are tools like Visualizing
Emancipation that also lets us see how that might connect with state, regional, national,
even international patterns. So if we're going to make it useful, we know people love
it, but we need to give them tools that let them show, what's the larger story here.
But I do believe Chronicling America and the great projects of you know the documentation
in the Library of Congress and all those things, they're a great gift that we're going
to figure out pretty soon how to rise to. And that's my message today; have faith
in us and give us ideas. We'd love to be your allies and partners. Thanks again.
[Applause] I wanted to talk about this project that I
hope is making good on the incredible amount of work that has gone into assembling historic
newspapers in Chronicling America and that, I hope, is not imperiling my tenure case.
But we shall see. This is a project that is investigating the anti-bellum culture of reprinting.
So just really briefly, I want to talk about what I mean by that. As many of you will know
if you've worked with 19th-century newspapers, they were kind of an all-purpose media. They
didn't only contain what we think of as news. They also included poetry, they included
fiction. They included travel narratives and for a lot of readers, they were the primary
vehicle whereby they got all kinds of content. And it's not very great, but you can see
here a poem by Whittier and a short story, a temperance story, and an advertisement and
down on the page is some news. As part of that configuration, this is also before the
rise of most modern copyright law. And so texts would freely circulate within the system
of 19th-century newspapers. If I was an editor in St. Louis, then I would subscribe to newspapers
in New York and Boston and Philadelphia, and when they came in, I would browse through
them and I would, if there was anything I thought my readers might like or if there
was anything that filled a certain number of column inches that I needed filled, then
I would simply take it and I would reprint it. Sometimes I would change aspects of it;
sometimes I would remove the author's name, put one of my own author's names on the
piece. Sometimes I would change it to suit my readership, I might change it to suit my
political bent or something of that nature. And here you can see actually, this is a poem
by the Scottish poet, Charles Mackay, the title of it changes as it moves around the
country. The first line of it changes, and as we're going to talk about a little later,
the new version not Mackay's version is actually the one that becomes the most popular
version and becomes turned into a song, that was apparently beloved by Abraham Lincoln,
although that I think is more anecdotal than maybe true. So some scholars have written
about this culture of reprinting, most famously and perhaps most powerfully, Meredith McGill,
who writes about this culture of reprinting that really formed the basis for not only
newspapers but magazines and even books which were reprinted as well. But in this book,
American Literature and the Culture of Reprinting, she actually talks about how hard it is to
get at the reprinted text. Basically she says 19th-century newspapers aren't all that
well indexed and so the only way to get at them is through bibliographies that have been
compiled by other scholars. Or you can do what Ed talked about and you can read all
the newspapers. You can sit down and you can read them and sort of index them yourself
as you go through. Now when you digitize the newspapers, things get a little bit better
because you can search them.. Right? So if you know that a text was popular, you can
go in here and you can search for it and you can find more instances of it and this is
actually what got me started down this whole crazy path. I found this Nathaniel Hawthorne
story that was reprinted and I went to some digital archives and I started searching and
in about three days of searching, I uncovered three times as many copies of that story as
the best bibliography of Nathaniel Hawthorne listed and I knew I was on to something. The
problem is that with just a basic search interface, you can only find the things that you already
know are there because you have to have keywords that you can use to find those copies. So
getting at the text that were really popular during the 19th-century that we've lost
is impossible through basic search and so this is where I began to talk with my new
colleague, David Smith, to see if we could solve that problem using the data. So David's
going to talk a little bit about some of the work he's done.
So our initial approach to this problem needs to solve a couple of technical issues with
this, solve a couple of technical issues due to the fact that you've been digitizing
so much data. We made it a little easy on ourselves by starting out with the period
covered by Meredith McGill and the work that Ryan had done before, just the 1860 and before
period. There's nothing in principle that limits us to that but it allows us a relatively
small playground in which to get started, you know a mere 41,000 issues of 132 different
newspapers. So there are a couple of features of this data which some of you here are more
familiar with than anyone else. One thing, there are no breaks between the articles so
if the point is to find reprinted stories, poems and so forth, those stories and poems
run together into the rest of the content of the newspaper. So at a high level, what
are we going to do? What we'd like to do is say hey, there's this passage, this article
in the Journal Extra that matches up with this article in the Jeffersonian and that
also matches up with that article in the Cleveland Morning Leader and we actually close the loop
and you know ideally we want this cluster to say this text was reprinted three times
or 100 times, whatever on this slide. But even to start this problem, we need to solve
this computational problem of finding these pairs of issues inside 41,000 newspapers,
just in the ante bellum period, which means that in theory, a brute force approach would
require 874 million pair-wise comparisons between issues of newspapers. And not only
that, because we don't know where the boundaries of the articles are, we need to search every
cell of this grid, which indicates the beginning point of where two newspapers might start
to match up and the ending point for where two newspapers might stop matching up. And
then by hypothesis, there's this other stuff, there's these other things in the corners,
which you know, ads that aren't the same, you know other articles that aren't the
same in the two newspapers. And again the final problem is one that will be very familiar
to, or the final couple problems will be familiar to people in this room, there are species
of reprinting that are not interesting from the point of view of viral text, they're
just reprinting that happens in the normal course of doing business as a newspaper. Like
having a masthead in the upper left and right there, or having you know the National Republican
newspaper's manifesto that they reprint every week or advertisements, you know there's
a standing ad for a certain oculist that appears every week in the Vermont newspaper or in
every newspaper in Nashville or something like that. That doesn't mean that we don't
want reprints within the same newspaper. Iit's interesting to know, right, that you know
a paper printed The Raven in 1848 and again in 1852 we just don't want this kind of
boilerplate reprinting. The final problem we want to solve is again of great interest
to everyone here, which is the state of the optical character recognition. This is why
you show the page images in a lot of the interface. But you know, were work, we're working with
what we've got and I'll have a few more remarks about that later. So as a computer
scientist, speaking again about making one's tenure case, how do I explain working on this
project from the point of view of my field, what's interesting about this from the computer
science point of view. First of all, a lot of the work on finding duplicate text has
been in a very different setup, mostly on the web. For instance, where you want to remove
duplicate webpages from your search engine index so you save space, or so you don't
have the same results showing up again in search results and also so that you can detect
plagiarism by students in programming assignments or writing assignments. So again that's
sort of you know most of the document is being copied which is very different from what's
here. On the other end, people have worked at much smaller bits of texts being reprinted,
sometimes called meme tracking by Jure Leskovec and others, where you know they're talking
about very short, quotable, viral phrases of three or five or 10 words and again, that's
not what we're dealing with. We need to search for text that might, be a few 100 or
a few 1000 words but still much smaller than the whole document. We don't know the boundaries,
unlike some other people, and we have this problem of wanting to ignore very close duplicates
in the same newspaper, as well as the noisy OCR. So at a high level as I sketched out,
we want to first approximately detect these pairs of newspapers that might contain reprinted
texts, approximately find these high-confidence regions, the passages that are actually being
reprinted and then link these passages together into clusters of viral networks. How are we
going to do that? So we adopt a strategy that at its basic level is familiar from anybody
who has built an inverted index. But rather than building an inverted index of terms,
we'll do it with of collections of terms. Here I'll show an example of building an
inverted index of five grams. But we also build indexes of longer end-grams as well
as ones that aren't contiguous sequences of five words. So how do we build that index?
Well we run through each document, here are three example documents 1, 2 and 3, and get
the first five words. So those five words appear in document 1. Those five words appear
in document 1, the second sequence of five words and so on. And with cleaner text, we
can afford to get longer end-grams, which filters have some of those spurious matches
earlier, we can use gapped end-grams. So what we'd like to see is that if a text is reprinted
widely as this article announcing the completion of the first trans-Atlantic cable was, we
would expect to see a few end-grams appear in multiple texts in that actual cluster.
That end-gram appears in the first and the third documents but not in the second due
to no CR error. That end-gram appears in the first and the second, sorry, first and the
third. That end-gram appears in the first and the second there. And the one thing to
note here however is that we can ignore a lot of the index terms, this is, which we
couldn't do if we were just searching for an individual document. We can ignore all
of the terms that only appear once, which by Zipf's Law is going to cut our index
in half. By definition, we're only looking for things that are repeated. So we'll take
that index and instead of organizing it by the order that these end-grams appear in the
text, we will organize it by the order. By end-gram, which allows us to see, by document
pair,which allows us to see which document pairs share a lot of text or only a little
bit of text. By document here I mean, I'm sorry, an entire newspaper. So newspapers
1 and 2 only share that end-gram in this example. Newspapers 1 and 3 share a couple of them.
We're going to prune out newspapers from the same series to cut out this boilerplate.
We're going to cut out end-grams that are just very common, fixed phrases in the language,
things are very common. So after this process, we've got ourselves down from down to only
comparing 15 million pairs of newspapers in this corpus or less than 1% of the total number
of pairs that we would have to compute if we'd done this brute force. So now we'll
talk about finding the reprinted passages inside, and we'll use an algorithm that
is related to one that you might have seen, so-called edit distance if you, or Levenshtein
distance. It's often used for instance in finding cognates between two languages. For
instance couleur to color where for instance French inserts a U is vis a vis the American
spelling E goes to O and so forth. And the nice thing about alignment dyamic programming,
alignment algorithms like this is that it allows you to search this exponential number
of paths or possible ways that two newspapers could line up in only quadratic time due to
the work of Edsger Dijkstra and Vladimir Levenshtein. Anyway, so but we don't want the alignment
of the whole word, the whole newspaper issue, we only want the part in between. To speed
that up, we'll anchor this alignment of thepoints, of the end-grams that we already
found and then we have these pair-wise alignments. To cut a long story short, from this point
on we use a single link clustering and find these connected components within that graph.
So what did we find, what does it look like? Well here's an example of James Buchanan's
farewell address where he's sort of concerned talking about how horrible it is that slavery
is tearing the country apart, and you know obviously this is widely reprinted in the
corpus and even though there are a lot of differences in the OCR transcription, we can
find four out of a cluster of 30 different examples of this in the collection. Or other
temperance stories by TS authors. So that's the stage at which we turn it back over to
Ryan and some of our other colleagues for analysis. Ok. You still with us? Excellent.
So what does this mean for me? Well what this means for me is an incredible corpus to work
with and at the moment we're working with about 392,000 texts. Clusters, we have several
thousand clusters of widely reprinted texts from the 19th-century. The exciting thing
for me as a scholar of the period is that the vast majority of these are just not texts
that literary scholars have ever really paid any attention to either because they're
obscure, because they're anonymously written, because they don't fit the genre categories
that literary scholars tend to work with. So what are the things that we're finding?
These examples are drawn from our top 20 of the most widely reprinted things in the corpus
that we've generated. Unsurprisingly a fair number of political speeches, but what's
interesting is that the political speeches get contextualized in widely different ways.
They go viral but the different newspapers that print them are using them for very different
purposes to support different causes. This is Washington's farewell speech that goes
viral actually on two separate occasions in the years leading up to the Civil War and
for different reasons. I'm happy to dig into more of these in the Q&A if you want
cause I'm going to kind of fly through them. The other thing that we get is a lot of news,
unsurprisingly. David mentioned the message that Queen Victoria sends on the completion
of the trans-Atlantic cable and actually this also goes viral twice. The reason it goes
viral twice is that the first time the telegraph operators don't actually transcribe the
entire message and the newspapers reprint it along with a lot of commentary about how
rude the queen was in her message and then they reprinted again with the entire message
saying "oops our bad" she wasn't actually rude we just didn't get the whole message.
So it actually sort of circulates around the country twice in two different versions. An
awful lot of stories, fiction, sentimental stories in particular. Lots of stories of
husbands and wives and children. We know that this is a popular genre in the 19th-century
and we see it reflected in a lot of the things that go viral. What I find really interesting
about a lot of these is that in the newspaper they get framed in a very particular way.
I don't know quite what to call them actually. I've been working with a few possibilities,
anecdotes. Because often these stories are not unlike the thing that you get from your
cousin in your email, "Did you hear that story about...?" By which I mean they're
framed not quite as fiction and not quite as news. The following letter was found by
a husband, by his wife after she passed away and it's this very sentimental letter. But
of course it doesn't say what his name was or what her name was or where they lived or
anything that would allow you to track it down. And there's no snopes.com so you can't
immediately go and verify whether this story took place. Lots of travel narratives, travel
narratives are very popular. This is a lovely one about journeying, sailing through the
Paris sewers actually in canoes, which I guess would be fun. And lots of jokes unsurprisingly.
Lots of jokes in your Facebook feed, lots of jokes in 19th-century newspapers. This
is one is about a young husband who decides he's going to put his foot down and assert
his authority over his wife and then is summarily disciplined for doing so. So what else can
we do with these? And what's exciting is when we put this newspaper data into conversation
with other kinds of data, we can learn some really great things about 19th-century print
culture writ large. And so one of these, the Newberry Library in Chicago provides an atlas
of historical county boundaries, so what did the country look like at various points of
time? And when I first started experimenting with this data in our GIS, I brought Chronicling
America data about the founding of newspapers and I overlaid that with the historical county
boundary data from the Newberry Library to get this map that shows the spread of newspapers
and the growth of political boundaries in the country from the beginning all the way
to the year 2000 which is when the data ends. Just as a broad visualization, it's fascinating.
I love the way that the newspapers sort of strike out for the territories. They appear
and then the political boundaries follow closely from them. I need to change the color, it
looks a little bit too much like an AT&T commercial at the end but I like this visualization.
We can also bring in historical census data and we can do things like try and get snapshots
of the potential readership for particular stories. We know where they were printed.
We have the historical county boundaries. If we bring the census data in, then we can
maybe learn something about who might have been reading different stories and whether
those audiences were different. So here are just a few John Greenleaf Whittier poems,
again these are all from our top 20 most viral texts. Another poem, that Charles Mackay poem.
This affectionate spirit which is really a little bit of parental advice that tells dads
that it's okay to show affection towards your children. You will not ruin them by giving
them a hug basically. And then this really self-indulgent article about how much smarter
kids will be if they read a newspaper, which was widely printed in the newspapers. At the
broadest level, we can get a sense of how many reprintings we've discovered in the
data thus far. I mean we're only working with what's in Chronicling America right
now and the approximate population within say five miles, or we could do 10 miles or
two miles of where those were reprinted. But the census data is actually much richer than
this so you can actually dig in and see how many literate people lived within five miles
of where this was printed. If you have an abolitionist piece, what was the slave population
like near where this was printed? One interesting geographic visualization just based on the
data thus far, this is the speech that the abolitionist John Brown gives at his sentencing
hearing. It goes viral, it's widely reprinted. But alone among everything in our top 20 most
reprinted texts, the John Brown speech is not printed in Kansas and it is not printed
in Nebraska which were the two places that were fighting at the moment about slavery.
It actually is printed a little bit in the South. One might sort of assume well it probably
wasn't printed in the south. It was in the South but not in the Midwest at least in the
papers that we're looking at now and I have to add that caveat whenever we talk about
this stuff. There's also lovely collections of historical maps and you can bring these
into conversation with your data. This is from the David Rumsey collection, an 1843
traveler's map of the United States. This was a map that included railroad lines, post
roads, things of that nature for people trying to get around the country. And this was actually
what got me thinking about geography in the first place. I geo-rectified this map which
means you bring it into alignment with modern coordinate systems and I overlaid some of
the print histories that I had been working with on it, and I immediately noticed this
close correlation between print histories and the railroad networks. And this is perhaps
not shocking, population, rail, print, they're all going to follow the same paths but I had
not really been thinking about transportation networks before I visualized this and this
got me thinking about it. And so then I said well is there good data about historical transportation
networks? And it turns out that the University of Nebraska has a brilliant project, Railroads
in the Making of Modern America, where they actually provide GIS data for the transportation
network at various points in the history of the country. And so we can bring that data
in, again, into conversation. This is the railroad network in 1861, or sorry, that was
'55, and then '61. And we can also do some time visualizations so that you see the
spread of the rail network along with different printings of stories. And what's interesting
here is that there are some stories that seem very closely aligned with the transportation
network and there are others that seem to not be so closely aligned. This Charles Mackay
poem for instance seems to appear primarily in places that are not on the rail network
and that's maybe an interesting thing to dig into. Most of these visualizations suggest
further research. They suggest a connection and then you want to dig in and find out is
this really a connection and what's going on here. You want to learn more about it.
We don't have to watch all these. I just threw them all in there because I like them.
And I'm going to let David talk about some of the modeling that he's been doing. Right.
So what we'd like to do and are currently working on, you've seen some qualitative
visualizations that we've been doing, we'd like also to do quantitative evaluations to
make sure that we're finding a significant number of the things that were actually being
reprinted. We're currently working on some manual cluster construction. Here's some
texts that we know were reprinted, let's just manually find all of them and evaluate
that we're getting them. We'd like to build models to try and distinguish between
texts that did go viral or not or to characterize different kinds of texts that went viral or
to try and characterize perhaps some of these more usual or unusual genres. So the one thing
to note just at the high level on the quantitative evaluation is that just by looking at large
clusters that are very long, we're able to very easily get without a lot of labor,
find the very long texts that are being reprinted. Not surprising right? There are just more
opportunities for those end-grams to match and to overcome the problems with the OCR.
And as you get shorter and shorter texts reprinted, say down to 1,000 matching characters among
them, it takes more computational effort to find them. So not surprising. Another interesting
quantitative thing to note about these data are the time lag between the initial text
and the last text in a cluster or say the median text in a cluster. How long did it
take different kinds of texts to travel around the country? And if you plot the distribution
of these median time lags, you see there are two peaks here. One is around, this is on
a log scale which means that the first peak is around two or three weeks. There's certain
texts, that you know, newsy one might speculate, that travel very fast. But then there's
another peak out here at seven, at around three years. And you can also plot this across
time, as the years go on, newspapers become better at retailing faster texts. Communication
is getting better. I mean there are just more texts and more newspapers. Again, not surprising
to all of you who work on this data. So is there anything different about these fast
and slow texts? Well so if we fit a regression model to the texts of these fast and slow
clusters, you find that articles that travel fast tend to use terms in this period, not
surprisingly, like Texas, Mexico, Zachary Taylor, say the Mexican War, also things to
do with trials, corpse, cases so forth. Whereas the slow texts are airier and more relaxing,
love, young, earth, awoke, benevolence, behold, bright, woman, things. Some other work that
we're currently working on that we don't have any results yet are digging into these
individual clusters trying to actually trace the chain of transmission using good old fashioned
textual-critical tools or cladistics and stemmatology. How can we, can we account for some of these
missing bridge texts? Can we use statistical inference to distinguish OCR errors from editorial
changes that might indicate that two texts were jointly influenced? And finally can we
think about modeling the network? Can we actually get a quantitative correlation between these
clusters of reprinting and the railroad network or the network of papers that shared a political
view or a religious view or a social view like the editors are brothers-in-law or something
like that? So I'll close just with a couple of remarks about moving beyond Chronicling
America. The questions that this corpus has allowed us to ask are applicable to lots of
other areas. For instance, source criticism of the immense literature that comes out of
the Civil War or any area in history. Things like Grant's memoirs are going to get reused
in different ways by different historians. Or another project that I'm working on now
with a political scientist in tracking policy ideas in bills. So the sad fact is, if you
know Congress, that most bills fail. And if you're in the minority, say a Democrat in
2005, the bill that you introduce is probably going to fail. And most people just look at
the bills that pass. The question is, are there ideas in those bills that fail that
show up again in bills that do pass. Perhaps, a little bit later or in the same session.
And you know going back in an even higher level of granularity, can we use these networks
of texts to do better search? The short answer is yes, you want to retrieve clusters not
just individual passages in texts. So I'll close with a newspaper that's not in Chronicling
America because it's from the wrong country. It's the Economist from 1871, which maybe
gives a name to what we're doing. Some of the philosophers should turn from the invention
of electrometers, galvanometers, hygrometers and so forth to the far more difficult problem
of inventing a mode of measuring the intensity and diffusion of political wishes and convictions.
So how do they diffuse? So I know we're running out of time so I'm going to do this
really quickly but the final kind of modeling that I've tried to do is network modeling.
Ok. Now? The final kind of modeling that I've been working with our data is network modeling.
Reprinted texts are actually a pretty direct influence or can be a pretty direct indication
of influence. So we have all of this data about texts that were shared between different
publications and if we take that, we can use it to model the networks of influence during
the antebellum period. So here what you're looking at, I'm going to zoom in in a second,
but the circles, the nodes in this network, are individual newspapers. And the lines between
them are shared reprints. If two newspapers share one reprint, there's a very thin line
between them. If they share hundreds of reprints with one another, then there's a very thick
line between them. The colors indicate communities. These are groups of newspapers that shared
a lot of the same texts. And so it's figuring out the network software, figuring out that
these are possibly communities. And what's very interesting thus far about the experiments
I've done with the network visualizations is that they really are indicating these fascinating
connections between newspapers that would be very hard to get at if you were just reading
the newspapers. There are communities that emerge that are not geographic, that span
wide geographies. David alluded to one. There was this incredibly clear connection that
came out in the network visualization between a newspaper in Vermont and a newspaper in
Missouri, which is quite a span in the 1840's but this incredibly strong connection between
them so we asked one of the graduate students working on the project to dig into this. What's
going on here? And she discovers that the editors were brothers --in-law. And that
they were probably just sharing a lot of newspapers and copying from each other frequently. And
so the network graphs have been very suggestive. They've also been overturning maybe some
of the presuppositions we tend to make. Because you can't read all the newspapers, scholars
often read certain newspapers. Newspapers in New York and Boston and Philadelphia get
an inordinate amount of attention frankly, and in our network visualizations, we're
finding that there are newspapers in Nashville that are incredibly central to the reprinting
during the period. Kind of brokers of information, the Nashville Union American came up this
morning. The Nashville Union American is incredibly important in our data set at least, as a kind
of broker of reprinted text. So what are our next steps? Our next step is that we want
more data. It's very incomplete at this moment. You know what's in Chronicling America
right? Every time there's a new batch, we perform the analysis again and we're finding
new connections. We're finding new reprinted texts. We've also started conversations,
this is perhaps something not to say in this gathering, but we've started conversations
with some of the commercial archives of 19th-century periodicals, to try and get access to their
data. They are not so forthcoming as Chronicling America, which will be a shock to you. And
we've started to annotate the data and I wanted to point out the incredible work especially
of Abby Mullen here, but also of Matthew Williamson. These are two history grad students who are
working on this project with us. Abby has compiled an incredible amount of data about
these newspapers. I hope that this eventually has a home on Chronicling America, to be honest
with you. Editorial tenure, what happened to various editors? Things like, wives who
took over newspapers when their husbands' names were still on the masthead after their
husbands died or something of this nature. And she is just building this affiliation,
political affiliations of all of these different newspapers which often shift midstream from
one party to another. It's a thorny problem as we're learning. And so we're annotating
the data, we're building a web interface for the project, viraltexts.org. At the moment
it's just a placeholder website but within a few months, at least some of our preliminary
findings will be available there, some of the data will be browsable there and searchable
there. And we just want to thank the NEH who gave us a grant to do this project and also
the NU Lab which is our intellectual home at Northeastern and thank you.
I've been at the Library for over 20 years.
I started as a work-study student and I continually moved my way up. But that's enough about
me, we're here to talk about genealogy and how you do genealogy at the Library of Congress.
And later I'll give some examples of how I've used Chronicling America to locate
things relevant to my own family. So I have 15 minutes and I know I'm right before break,
so I'm going to get started. OK. Like I said my name Ahmed Johnson, I'm a reference
librarian at the Library of Congress in the Local History and Genealogy Reading Room.
Just a little background and information about the Library of Congress; the Library of Congress
was established as a legislative library in 1800. Of course the British came around and
burned the capitol in 1814. So what did we do? We purchase Thomas Jefferson's personal
library. And that contained over 6,000 volumes. True renaissance man. Had books about everything,
right, and I think actually there's an exhibit at the Library of Congress right now, where
you can see those various types and it may be available online as well. What do we have?
We have three buildings; the Adams Building, the Jefferson Building and the Madison Building.
We have 21 reading rooms and seven overseas offices. OK. What do we have at the Library
of Congress? We have every book ever published, right? No. Impossible. I get that all the
time, "You have everything ever published?" No, we don't, that would be impossible.
But what do we have? We have over 151 million items in our collections. Not all books, actually
we have about 23 million books and we have over 117 million non-classified special items.
What are special items? Newspapers, of course. Manuscripts, telephone books, sheet music,
posters, photographs and so forth. So we're not just books. We add about 10,000 items
a day and supposedly if you lined all our collections up, you could travel from Washington,
DC, to Milwaukee, Wisconsin, over 525 miles. So a massive collection, right? OK, what about
reference services? These statistics were based on 2012. We welcome more than 1.7 million
on-site visitors. I talk to most of them because everyone wants to find out their family history,
right? Also we provided reference services to over 550 individuals and persons via telephone
and through written correspondence. And electronic, I'm sorry. What do we have at the Library
of Congress's local history and genealogy reading room? Well, we have over 60,000 genealogies,
and when I say "genealogies" what exactly do I mean? Family histories. Someone publishes
the information, sends in a copy to the Library of Congress or we receive a copy via copyright.
We also have over 100,000 local histories. And when I say "local histories," I don't
mean local for just the Washington, DC, area. People come into the reading room all the
time and they are confused by that. "When you say local, do you mean just local for
the Washington, DC, area?" No, it's for the entire country. So if you know what county
your relatives lived in, you can search our catalogue and see what books we have relating
to your family and where they lived. And keep in mind we're not an archive or a repository
for unpublished materials. There's exceptions to everything I say. We're not an archive
for unpublished material but we have newspapers, we have manuscripts and so forth but primarily
when you're looking for family histories and local histories, we have published materials.
So keep that in mind. OK, our staff. We have specialists in everything from African American,
yours truly. British Isles, Canadian, Hispanic, Scandinavian and so forth. Now, when I say
all these things, don't think it's a different person for each subject area. Times are hard,
people are taking on more duties. I'm probably going to be Hispanic by next week. But we
can answer questions about city directories, the origin of names, maritime history, migration
and immigration as well as biography and others. How do you do genealogy? This is really basic.
I know you're not here to get a course in genealogy but I always like to provide this
cause this is what I do. I always suggest that you begin with yourself and work backwards.
We all have two parents, four grandparents, eight great-grandparents right? So don't
start with great-grandpa or great-grandma. Start with yourself and work your way up.
You may find connections further up the line. Right? And you want to document these vital
records. What are vital records? Birth, death, marriage, sometimes divorce records. You want
to document your information with census records, which go back to 1790. Interview your oldest
living relative. Grandma, great-grandma, interesting things she has to say about her life and other
things that were going on before you were here. Then you want to look at things lying
around the basement and attic and the trunks and so forth. Now once you do that, you want
to get out into the community. County courthouses, state archives, a genealogical society, historical
society in the area where your family lived. Not where you live now, you may not find too
much unless you stayed in the same location where your relatives came or were from. After
exhausting all your sources at home, of course like I just said, you venture out to the community,
county courthouse and so forth, then you trace your family back to the 1790 census which
is the first census for the United States. Did I skip one? OK, this is our home page.
This is the best access point for information about our services and collections. As you
can see here, we have links. I think the best link on this page is the Ask a Librarian link.
That allows you to submit a question directly to one of us, our reference librarians. And
now we won't answer your question for you, but we'll lead you in the right direction,
tell you where to go to find your information. Often times, I get people say, "My great-grandfather
came from here. Give me everything you have on him." Not going to work, right? You do
your own research, we'll tell you where to find it. And it may be the Library of Congress,
it may not be. We may refer you to the National Archives and other places. We also have biographies
and bibliographies and guides and also how to search our library's catalogue, which
is available online. Why would you come to the Library of Congress? Usually people come
to the Library of Congress to use our subscription data bases. How many of you are familiar with
ancestry.com? We have it at the Library of Congress for free. Free is always good, right?
Free at the Library of Congress. We also have others. We have over 300 subscription data-bases
at the Library of Congress. So often times, people come to the Library of Congress to
use our subscription data-bases. We also have Heritage Quest and many others. What can you
do from home? We have an excellent website called American Memory, which has digital
collections and as you can see you can browse by topic; African American, government, law,
immigration, American Expansion and so forth. For the purposes of this talk, I chose immigration
and as you can see for immigration we have 13 collections. All of these are key word
searches, so you just put in a name, you can put in a location. See what you get, make
it your own. And as an example, I selected California, 1849-1900. Why were people going
to California during that time? Gold, they were looking for gold, right? You can search
this collection. Similar to Chronicling America, you can put in keyword searches, you can search
by subjects, you can search by titles. Because genealogy is not just about names, dates and
locations, right? It's about what made people do the things they did. What made them move
from one location to another? Back during these times, people had a shared existence.
It was more communal, people tended to go to the same churches, attend the same schools
and so forth. So I say all of that to tell you that you may not find your relative here,
but you may find instances of why they may have come to California during this time,
ok? So once again, I can't guarantee that you're going to find something directly
related to your relative, but you may find something very interesting about that time
period. Also my family's from the Washington, DC, area, I'm a fourth-generation native
Washingtonian, so this is really great for me. Similar to the other database, keyword
searches and this has information from the 1600's to 1925. Same thing, all keyword
searches. Let's talk about Chronicling America. I use this database daily. I can give you
hundreds of stories of where I've actually found information for researchers so I'm
delighted to be asked to come here to speak. I'm really hot up here right now, though.
I think it's the bright lights. But anyway, Chronicling America is great because, like
I just mentioned, genealogists are usually interested in names, dates, locations, vital
records, births, marriages and deaths. Newspapers have obituaries, so we're always looking
for newspapers. I think the first search I conducted, I selected, as you all know you
can select by state or you can do a particular newspaper. I selected the Shenandoah Herald,
which is in Woodstock, Virginia, and I located an obituary for a Thelma Dysart. I selected
that because one of the big shots at the Library of Congress, his last name is Dizard but it's
not spelled that way so it doesn't work. This is a story about a two-year-old who died,
really tragic. But the only reason I picked it was because the second in charge of the
Library of Congress name was Bob Dizard so I just thought maybe I could find something
interesting. I think he's from Virginia. But as I mentioned earlier, my family is from
the Washington, DC, area. So I went to DC newspapers and look at what I found. The Open
Forum. My second great-grandfather's name was Hiram S. Haywood. Now, in many documents
I found information about him, him being a fireman. I found the date he got married,
I found all kinds of information. But if you look at this article here, this tells you
about his personality. The Open Forum, this is a letter to the editor where he wrote about
how they wanted more money. And he talks about the price of beef stock being 15 cents a pound
and that was up, that was because of inflation but yet they didn't get any more money.
So he's pleading his case and he titles it, "To the Men in Charge, Washington, DC."
Great stuff, right? So now I have this gem from Chronicling America about my second great-grandfather.
And this was in 1913, the Washington Herald. And I have another example, the real estate
transfers during this time period, lots of information about real estate appeared in
newspapers. Oh you know what, let me go back one slide. Another thing I saw that his occupation
was a fireman. Now African American, 1913 fireman. This blew my mind. I didn't know
we had African American firemen in 1913. He wasn't the fireman that you think of today.
He was the person, like you said, that would put the, light the gas lamps and do the, I
can't think of the name. Scutter the coal for heat in the buildings and so forth. So
I found out through this article exactly what he did and he actually died doing this. But
the next thing I was able to locate was a real estate transfer. Once again, Hiram S.
Haywood, Lot 102, Square 5113, 10 dollars, a stamp of 50 cents. My family still owns
that property on Sheriff Road, which is in the Deanwood section of Washington, DC, so
another great find. And I remember doing an oral history with my great-aunt and she mentioned
this amusement park that they used to visit as kids and it was called Suburban Gardens.
So what did I do? I went to Chronicling America and did the same kind of search, Washington,
DC, and I got 288 hits for Suburban Gardens. Suburban Gardens was the first black-owned
and operated amusement park in DC and there has not been an amusement park in DC since.
I got all kinds of information about this park. I only have 15 minutes so I didn't
show you everything I found but Cab Calloway performed there. They even talked about when
they bought their first rollercoaster, how it cost 30,000 dollars. And that was in 1920,
I believe it opened , 1920-21. So the reason I show these examples, and it's really hot
up here, so I'm kind of, I'm trying to deal with it as best I can, but overall I
wanted to provide you with a brief history of the Library of Congress, tell you how massive
our collections are, talk about some of our digital collections that you can use from
home and then tell you about a few things that you can do at the Library of Congress
and then provide you some examples of why we just love Chronicling America. Now in closing,
I would just like to say, as a librarian at the Library of Congress, often times, we have
so much, it's so massive, right? You can get caught up in the newspapers alone, the
photographs, the maps and so forth but when you have a database like this that allows
you to search because many of the paper newspapers aren't indexed, this makes it so much faster,
as the gentleman was stating earlier. You can do so much more in such a shorter period
of time, because I hated microfilm. I hated looking at microfilm and I'm fairly young
so I can imagine my older clientele, who are usually doing genealogical research, how they
felt. So every time I can use this data base, I dive right in. So thank you very much.