Chronicling America - Historic american newspapers online

I want to begin with is asking where you've been my whole life because I spent a large part of my life sitting in front of World War II-era microfilm machines turning cranks in rapture really while entire years of my youth passed by. I made the mistake of, I went to graduate school not knowing I was going to be a historian and I made the mistake one day of reading a newspaper on microfilm which had never occurred to me was possible or that microfilm existed outside of old James Bond movies and it took me maybe 30 minutes to make a fundamental discovery that shaped everything that I've done for the last 35 years which, oh my God this stuff actually happened, and it all happened at the same time and people didn't know they were living in history and they didn't know how things were going to turn out. What are we going to do about that? And my mom had been a 5th Grade school teacher for 30 years told me when I went to graduate school to study history, she said, "Well what for? We already know what happened." And I read those newspapers and said, "Oh no, we don't." And so from that moment in front of the artifacts that you folks are doing so much to share with the world, my life changed. From that, a whole another, a 600 page book grew when I was looking for lynchings and discovered Coca-Cola and football games on the same pages and I thought, "Ok, how would you tell that story?" And then when I discovered the big difference it made in my life, I had 160 of my students in my previous institution hit the microfilm machines and chronicle what they found there in six months of a newspaper. And they would cuss me and tell me this is *** my eyes, it's dark in there, all these kind of things and then at the end of it they would say that was worth any number of books I could have read and certainly your entire semester of lectures seeing what was in those newspapers. So then I wrote this book in which I read every newspaper I could find from the 1890s and the American South, which meant, you remember what this was like, going to the library and persuading people we need this newspaper from Shreveport in 1892. Other people will use it, they will really, please buy this for me. And they fell for it and I happily used papers and then I actually bought my own microfilm machine so I could just be a complete nerd on weekends, at nighttime, partly because I made the mistake of trying to eat M&M's while I was at a microfilm machine trying to keep the span of attention longer and the librarian, "Mr. Ayers. You're setting a terrible example for the students." It's microfilm and they're M&M's, they melt in your mouth not in your hand. Really, it's going to be OK. But of course now you go to libraries, "Another latte!" But back then it was not that way. So then, you know you heard briefly about the valley of the shadow, I thought it out back in 1991and then the World Wide Web comes along and we said, "We built the thing. " And I don't get to use this very often at SGML which we thought stood for Sounds Good Maybe Later. And it turned out Maybe Later turned out to be HTML, and we were able to build sites since 1993 and I knew what needed to be on there. It needed to be the newspapers that had changed my life. And so I said, "Surely we can OCR these babies in 1993." You couldn't. But we could scan them all, we could digitize them all. And we did that. And then it turned out that we didn't know that something called the PDF file was going to be invented. And so we had the same group four facts compression tiffs that we had to figure out some kind of device that would show them on the computer screens at the time. And they just all choked when you called up a page of a newspaper. They couldn't handle it but we persisted. And so then we transcribed 10,000 pages of newspapers ourselves. Well ourselves being not me exactly but graduate students who were paid nicely for it by NEH grants, thank you very much. In the Valley of the Shadow and then xml-tagged and it's still one of the larger groups I think of xml-tagged newspapers on these two communities in the American Civil War. Then I wrote a book in which you could go to the digital version and see every newspaper or letter or diary in which it was based. Then we wrote an article for the American Historical View that claims to be the first native digital peer-reviewed article, and the fact that it was also the last one, and it's ten years ago, is one of the reasons I'm before of you today. That's a little discouraging that, here's my beef: you folks are helping us through the most profound social change of our time, surely scholars can think of more to do with them, with the work that you're doing than we have so far. And so that's why I jumped at this chance to think about, because I've been trying to do some of my part in all this. Where this is a piece of paper that managed, probably driving some people crazy. So I thought wouldn't it be great if we could share this possibility of reading all these newspapers with lots of people. So this is something I first thought of at the University of Virginia and then took with me when I went to the University of Richmond in 2007. In which students at classes all across the country, as it turns out the continent, can do research and instead of just getting a B+ on it back and then "great" and throw it away and all that work is for naught, wouldn't it be great if all those people who read six months of the newspaper were actually able to preserve it and share it with other people. So I'll just do a quick search for newspapers and see what we got. And all these things, at DSL.richmond.edu. So here's a 191 articles about newspapers and you can see that the idea is that, that's a monograph but it's only one paragraph long. It's got everything a monograph has in it. It has original sources, it has secondary context, it makes an argument and you can go back to where it came from and it fits all together. So I have no idea what that is, we're not going to look at that, I don't want you paying attention to that, listen to me while I'm talking, that's always the danger of showing things. You can map it geographically, and so we're still expanding all that. Now that was devised before Chronicling America really came online. And now, ever since, I've been trying to figure out exactly what to do with it. One of my former students, Andrew Torget, at the University of North Texas came in, and I'm sure all of you have seen this, in which you would go through and for Texas newspapers, graphs that map the newspaper, the quality of the tagging, and show it in maps, and then show the percentage of the words that are good words. And then the stamp of the university they've mapped using Chronicling in the meta-data, the spread of the newspapers across the United States. And all those things are really cool. What I'm trying to think about, what do we do now, what else might we do with it. Voting America has every vote in the United States from 1840-2008 and many of those were from newspapers originally. We're doing a new digital edition of the Paullin Historical Atlas, Atlas of the Historical Geography of the United States that will be out in a few weeks from the digital scholarship lab. And we were looking on there and it said most of this information came from newspapers or the Library of Congress. So things that we now just think of as being there were not just always there. They were recorded by newspapers and then transcribed into the iconic maps of American history. Some things too that you might not think of as newspapers turn out to be with the sesquicentennial of the Civil War coming, I thought it would be great if people actually had a chance to really look in the records themselves to see why people said they seceded. So in Virginia delegates came from, 152 delegates came from all the counties of Virginia to Richmond and debated how they were going to save the United States. Most of them were Unionists; they came there. We call it the Secession Convention now, but they didn't know it was the Secession Convention. And they talked, and then they talked and talked. And then talked some more. And it was all written down by newspaper reporters and recorded in the newspapers of Richmond. And then in 1965, all that was gathered into a four-volume 3,000 page book, books that were published by the Library of Virginia. So when the sesquicentennial came along, I decided wouldn't it be great if we would actually make that available, so that people could see what's in those 3,000 pages. And so it just read 3,000 pages and found every time that slave, with an asterisk so slavery, slave holders mentioned, and everybody knows that the Civil War was fought over state's rights which is why slavery is only mentioned 1,432 times in all of that. But this is newspapers that then became something else and now that we're trying to find ways that you might look inside them to find the patterns. So I did it as a map, you can do it as a time plot, it can show you the frequency with which they mention the word, anything that has to do with slavery. So those are newspapers as well so sometimes newspapers are not immediately visible but they underpin most of what we know about the American past and certainly the 19th century American past. And I am grateful for that. So what we're doing now is trying to think about making a kind of scholarship that would be worthy of the work that you folks are doing. And one thing that we're doing is, actually I think I'll show something else instead. I think the most sophisticated work so far is done by my colleague Rob Nelson at the University and he took Mining the Dispatch, newspapers, Richmond Dispatch that was tagged with an IMLS grant to XML and is doing topic modeling on it. He's now, he'll be coming out with a new article that is comparing it to the New York Times. Now I love this, we got that by buying the CD-ROM that they sell for 50 bucks. And it has aski-text from all the New York Times articles from the Civil War era. And OK, but there it was in clean, and now and then doing the comparison. And you all know what topic modeling is; which is it basically the computer reads the newspapers and finds the patterns within it without knowing what the words mean. And so if you look at fugitive slave ads, the computer says well I notice all these words of ***, years, reward, boy, man, name, jail, delivery, give, left, delivered and apparently those are runaway slave ads. And then Rob is a, you can adjust the chart for various degrees of the threshold by which it would recognize these. And then has the ads themselves and this is the critical thing; how do we look at the big patterns and not lose the thing that makes the work that you do so important. The newspapers are windows into the soul of America. They are, every page is interesting. Even the ads, really. You can look at those, and I'm talking to the converted I know here, but it's wonderful to assign this or to go to a general audience and show them these things and just to watch their eyes open as they realize what these newspapers are. Now Rob is comparing these and seeing how the northern and southern vocabularies of war changed, how they ebbed and flowed with politics or with battles and it's remarkable but the crucial thing is you're always able to go back to the original record. And so the challenge before us right now is how do we combine the power of the analogue and the power of the digitial. And you folks are doing that on a scale that no one else is. That you are using the remarkable ingenuity and skill that you have to make it possible to not just sit in front of a microfilm and hope that you come across a record of a crime, which is what I did for two years in my dissertation. You feel a little ghoulish. Yes! No, I mean I feel bad for you but all right some evidence you know? To do all that. And then do this for everything else and what would it have meant what is it meaning right now for dissertations that are being written, for books that are being written that Chronicling America exists. Now I'll have to say that I recently published an essay called, not that one but the one that was on there when I left it, which is called "Does Digitial Scholarship Have a Future?" and I'm sure it's going to roll around. This is, let's see if we can find it. There we are. And I started just to pretend I hadn't written it and just say some things to you I said well no maybe I'll just go ahead and show you but here's what I said. Basically it's a jeremiad. "Though the recent popularity of the phrase digital scholarship reflects impressive interdisciplinary ambition and coherence, two crucial elements remain in short supply in the emerging field. First the number of scholars willing to commit themselves and their careers to digital scholarship has not kept pace with institutional opportunities." By which I mean you all. You've given us incredible tools. And I keep saying, "Hey everybody. Let's go write something that really takes advantage of that full capacity." And instead you'll read a book or a journal article. it's got a newspaper in it, if you figure the odds of them actually just stumbling over that would have been fairly small and you know they found it through search and Chronicling, and it doesn't say that. It just says this newspaper. Right? And so in some ways the contribution, I see people smiling, the contributions that you're making some ways are hidden because here's the thing; well let's read more and see what the thing is. "Second, today few scholars are trying as they did earlier in the web's history, by which I mean way back in the early 90's, to re-imagine the form as well as the substance of scholarship. In some ways, scholarly innovation has become domesticated with the very ubiquity of the web bringing a lowered sense of excitement, possibility and urgency. These two deficiencies form a reinforcing cycle. The diminished sense of possibility weakens the incentive for scholars to take risks and the unwillingness to take risks limits the impact of the excitement generated by boldly innovative projects." So what I'm saying to my colleagues is wow look at this. Look at all these newspapers we have for American history. Surely we can think of something cool and new to do with it. And people who for a long, I remember when I first started talking to people about this World Wide Web thing and one of my colleagues, slightly older said, "Isn't this a bit like the hula-hoop?" I said,"No, this is a bit like television; it's going to change everything here pretty soon." And now we don't even think about using technology when we use Google Earth or when we use Skype. So what seemed ten years ago, if you're a faculty member trying to get people to adopt technology they just take it for granted. But we skipped the stage that where we had the idea that maybe we could do something ourselves with it instead of just shipping PDFs around, of scholarship where we could've written 30 years ago. Maybe we can do new kinds of scholarship if we would think about the capacity that's inherent in things like Chronicling America. So what might that look like? Well we've tried to do one project at the DSL in Richmond that has some of the elements of what we call generative scholarship. And so this is Visualizing Emancipation. It's animation I won't play because experience shows that your eyes will follow any moving object over here rather than listening to me. And this, the blue dots are everywhere the United States Army was, across the era of the entire Civil War. The red dots are the places where African Americans interacted with that army. A large part of this is from the original records of the War of the Rebellion, 128 volumes that have been digitized, geo- indexed, and animated. But a lot of the other sources and source types, one of the things that you can have are newspapers. And so you can see here, these are evidence that we went with the newspapers, many of them Chronicaling to be able to look for runaway ads. And what this shows, the reaction of enslaved people to an unexpected moment of freedom. You hear the Union Army is somewhere nearby; maybe you can make your way to it. Tragically one of the things that you see is that the outcome often was not what they had hoped. Often it was abuse. Often it was conscription. Often it was being dragged back into slavery if the Confederate Army caught you and so forth. This wouldn't have been possible five years ago. And what else is not possible my colleague, Scott Nesbit, who actually made this, is writing what would scholarship look like to describe this. We're used to taking quotes and weaving them together into a story but what if you start with a moving image that shows enormous complexity over an area the size of continental Europe that shows the behaviors of millions of people. What would scholarship look like if you did that? Will it be a journal article? Will it be blog entries? And you'll see up at the top, add an event, which means that we'll be able to add the capacity of crowd sourcing to this. It also says data download. And following the wonderful example of Chronicling, be able to share all the information behind this. So we're trying to make a scholarship worthy of the gift that you've given us. We're trying to think about how would we rise to the possibility of plentitude, of the multitude of things that you've given us. I'm sometimes chagrinned by looking in the newspaper and people seem to think that history moves forward when you dig up a body to see if it actually has arsenic in it, right. That's how we make new discoveries in history. Or somebody finds a box under their grandmother's bed after she passes away with some old letters in it. But you folks are giving us billions of words, of boxes that we have yet to open. But there are patterns within it that we've only begun to glimpse. So I would like to let you know that there are people out here who are so grateful for all that you're doing. We're using what you're giving us in the way that you're giving it to us now. How much faster could I have written books and if I could actually look for what I was looking for rather than just wandering around and until I kicked over it, and there it is. At the same time that I'm not giving up the serendipity, the surprise that comes from reading the whole page, of seeing the simultaneity. So things that would have seemed like science fiction when I was first starting in all this, you've created. And now I want you to help us have the courage to write the kind of scholarship that can do justice to it. I think that we need to be thinking about what Chronicling America makes visible to us. The distribution and spread and decline of different kinds of newspapers but also how the fact that what can they tell us and what can they not? They feel like the voice of the past, understanding the political affiliation and all that, race is obvious and gender, kind of taken for granted sometimes. So the point of being is that I hope that we'll have occasion going forward for you all to say, you know I spent a lot of time with these. Here's what I think a great study of it could look like. Here's what I think that a form of scholarship that's actually native to the web. Now not a book, doesn't mean anything against books but what it does mean is that we're living in the middle of the most profound social transformation of our time. You're a part of it. Scholarship is kind of standing on the edge; you watch entertainment, you watch video, you watch music, you watch books be transformed but scholarship hasn't really figured out a way yet to use the power not just to disseminate what we're writing otherwise, but to re-imagine what scholarship might be. You folks have had that imagination, you're doing the work that allows us to dream big and I just wanted to come here today and thank you. Thank you very much. [Applause] Who wants to go first? I'm ready. I'm Errol Somay; Director of the Virginia Newspaper project, Library of Virginia. My question to you Dr. Ayers is, why not? Why isn't there more scholarship or energy behind this? Is it just because it's just so overwhelming? If you'd read my essay, Errol, "Does Digital Scholarship Have a Future?" There's many reasons; the risk-reward balance doesn't look right. It's pretty clear what you need to do to get tenure and this isn't it. Right? And so I'm trying to build a bridge from both sides. You know I was dean for a long time and so sitting, judging a lot of tenure files and now I read all the tenure files. And you can see that scholarship is a very clear thing. Scholarship is a contribution to an ongoing conversation. It's not just any thought you happen to have about a subject. And the reason we spend so much time in graduate school bringing people up on the literature is so you can make a meaningful contribution to that conversation. And to the extent that you're not, it's not scholarship and you're not going to get tenure. OK? So the trick is for us to think about how do you make an argument with digital sources and through digital means that you couldn't make otherwise. The most obvious answer is show a lot of stuff. I've tried that. It takes a long time and people still don't think of it as scholarship because unless it makes an identifiable argument that can be tested and then saying, OK now we know this, we can go on to the next thing. So I think that, and here's the irony: presidents love this because it's something for us to talk about, deans love it, department chairs love it. Not so much the department chairs but departments like, "Oh. That looks kind of like teaching or service, which are both good but they're not scholarship." And so on one hand, people keep saying we need to change the standards of scholarship and I say yeah but you kind of got to realize this is the game we signed up for. To contribute to ongoing conversations. There's lots of ways to do that. Does yours? On the other hand, relax. You can contribute to an ongoing conversation through a really brilliant blog entry as well as a journal article. So both sides need to recognize what they can bring to it. So I think maybe that's, it's nobody's fault, it's just that there's very few incentives to do it. And the main incentive once you've tried it and you see that people out there are using stuff online in far greater numbers than you could have gotten otherwise and that people can instantaneously have access to it all over the world, you start saying, "Oh this is why I would do this." And so I think that you know that what I worry about is, I gave a talk at the Library of Congress maybe 10 years ago and a young woman stood at the end of it, she says, "You know that makes me want to cry." And I said why is that? She says, "Cause we would love to do the stuff that you're telling us to but they won't let us." And the trick is figuring out who they is and sometimes they is we. Hi I'm Millie Fries from Iowa. I come to this project as an educator, middle school, high school, so first thank you for speaking a language I understand without acronyms that, because I didn't understand that. What I am wondering why we would ever need another middle school, high school history textbook again if we can teach with some of the tools like you showed us. We would have kids so engaged in history we'd never have to sell a subject ever again or have someone ask, "Well we already know what happened." Why can't we have some kind of template sort of things where here's the topics you want to cover, here are the resources, the newspapers, the primary sources. Can we move to something like that and completely away from textbooks? That's a great question. I taught a class at UR this spring called Touching the Past which is about all of the different ways that the past is present in our lives. Video games and movies and had them watch how much history was on television for a week and to compare the Bancroft Prize-winning books with the bestselling books on Amazon, they felt sorry for us. And I asked them to write their first essay, tell me about your experience in school up to this point with history and they all said the same thing. I love my history teachers. I hated my textbook. And the textbook seems like the thing that people equate history with. I meet people on airplanes; I tell them I'm a historian. I always hated history, the names and dates. Yeah I know, right? But why did that? It's because we've domesticated it to the textbook. There's no reason to answer your question that if we make enough cool projects that believe me I've worked with teachers, they would much prefer to have things that kids can discover the truth for themselves rather than double-column, shrink-wrapped, processed, corporatized textbooks. Of which I've written one. And this is so much better. So in all honesty, what's Visualizing Emancipation is meant so people in middle school or high school can actually imagine the most profound social change ever in this nation's history which is the end of perpetual bondage of 4 million people. But rather than reading about first there was the 13th Amendment then the 14th Amendment, maybe you could click on each one of those red dots and then a story of an individual person trying to make himself or herself free. That's the great promise I will end with this. Chronicling America is what we should be doing; a great democratizing effort that if scholars can think of other ways to present it, to channel it, it connects. The other thing the kids told me is that local history is the history that really first got them interested and kept them connected to history. We know that that's what people really care about. The work you're doing is a way to make that bridge to the local. What I'm talking about are tools like Visualizing Emancipation that also lets us see how that might connect with state, regional, national, even international patterns. So if we're going to make it useful, we know people love it, but we need to give them tools that let them show, what's the larger story here. But I do believe Chronicling America and the great projects of you know the documentation in the Library of Congress and all those things, they're a great gift that we're going to figure out pretty soon how to rise to. And that's my message today; have faith in us and give us ideas. We'd love to be your allies and partners. Thanks again. [Applause] I wanted to talk about this project that I hope is making good on the incredible amount of work that has gone into assembling historic newspapers in Chronicling America and that, I hope, is not imperiling my tenure case. But we shall see. This is a project that is investigating the anti-bellum culture of reprinting. So just really briefly, I want to talk about what I mean by that. As many of you will know if you've worked with 19th-century newspapers, they were kind of an all-purpose media. They didn't only contain what we think of as news. They also included poetry, they included fiction. They included travel narratives and for a lot of readers, they were the primary vehicle whereby they got all kinds of content. And it's not very great, but you can see here a poem by Whittier and a short story, a temperance story, and an advertisement and down on the page is some news. As part of that configuration, this is also before the rise of most modern copyright law. And so texts would freely circulate within the system of 19th-century newspapers. If I was an editor in St. Louis, then I would subscribe to newspapers in New York and Boston and Philadelphia, and when they came in, I would browse through them and I would, if there was anything I thought my readers might like or if there was anything that filled a certain number of column inches that I needed filled, then I would simply take it and I would reprint it. Sometimes I would change aspects of it; sometimes I would remove the author's name, put one of my own author's names on the piece. Sometimes I would change it to suit my readership, I might change it to suit my political bent or something of that nature. And here you can see actually, this is a poem by the Scottish poet, Charles Mackay, the title of it changes as it moves around the country. The first line of it changes, and as we're going to talk about a little later, the new version not Mackay's version is actually the one that becomes the most popular version and becomes turned into a song, that was apparently beloved by Abraham Lincoln, although that I think is more anecdotal than maybe true. So some scholars have written about this culture of reprinting, most famously and perhaps most powerfully, Meredith McGill, who writes about this culture of reprinting that really formed the basis for not only newspapers but magazines and even books which were reprinted as well. But in this book, American Literature and the Culture of Reprinting, she actually talks about how hard it is to get at the reprinted text. Basically she says 19th-century newspapers aren't all that well indexed and so the only way to get at them is through bibliographies that have been compiled by other scholars. Or you can do what Ed talked about and you can read all the newspapers. You can sit down and you can read them and sort of index them yourself as you go through. Now when you digitize the newspapers, things get a little bit better because you can search them.. Right? So if you know that a text was popular, you can go in here and you can search for it and you can find more instances of it and this is actually what got me started down this whole crazy path. I found this Nathaniel Hawthorne story that was reprinted and I went to some digital archives and I started searching and in about three days of searching, I uncovered three times as many copies of that story as the best bibliography of Nathaniel Hawthorne listed and I knew I was on to something. The problem is that with just a basic search interface, you can only find the things that you already know are there because you have to have keywords that you can use to find those copies. So getting at the text that were really popular during the 19th-century that we've lost is impossible through basic search and so this is where I began to talk with my new colleague, David Smith, to see if we could solve that problem using the data. So David's going to talk a little bit about some of the work he's done. So our initial approach to this problem needs to solve a couple of technical issues with this, solve a couple of technical issues due to the fact that you've been digitizing so much data. We made it a little easy on ourselves by starting out with the period covered by Meredith McGill and the work that Ryan had done before, just the 1860 and before period. There's nothing in principle that limits us to that but it allows us a relatively small playground in which to get started, you know a mere 41,000 issues of 132 different newspapers. So there are a couple of features of this data which some of you here are more familiar with than anyone else. One thing, there are no breaks between the articles so if the point is to find reprinted stories, poems and so forth, those stories and poems run together into the rest of the content of the newspaper. So at a high level, what are we going to do? What we'd like to do is say hey, there's this passage, this article in the Journal Extra that matches up with this article in the Jeffersonian and that also matches up with that article in the Cleveland Morning Leader and we actually close the loop and you know ideally we want this cluster to say this text was reprinted three times or 100 times, whatever on this slide. But even to start this problem, we need to solve this computational problem of finding these pairs of issues inside 41,000 newspapers, just in the ante bellum period, which means that in theory, a brute force approach would require 874 million pair-wise comparisons between issues of newspapers. And not only that, because we don't know where the boundaries of the articles are, we need to search every cell of this grid, which indicates the beginning point of where two newspapers might start to match up and the ending point for where two newspapers might stop matching up. And then by hypothesis, there's this other stuff, there's these other things in the corners, which you know, ads that aren't the same, you know other articles that aren't the same in the two newspapers. And again the final problem is one that will be very familiar to, or the final couple problems will be familiar to people in this room, there are species of reprinting that are not interesting from the point of view of viral text, they're just reprinting that happens in the normal course of doing business as a newspaper. Like having a masthead in the upper left and right there, or having you know the National Republican newspaper's manifesto that they reprint every week or advertisements, you know there's a standing ad for a certain oculist that appears every week in the Vermont newspaper or in every newspaper in Nashville or something like that. That doesn't mean that we don't want reprints within the same newspaper. Iit's interesting to know, right, that you know a paper printed The Raven in 1848 and again in 1852 we just don't want this kind of boilerplate reprinting. The final problem we want to solve is again of great interest to everyone here, which is the state of the optical character recognition. This is why you show the page images in a lot of the interface. But you know, were work, we're working with what we've got and I'll have a few more remarks about that later. So as a computer scientist, speaking again about making one's tenure case, how do I explain working on this project from the point of view of my field, what's interesting about this from the computer science point of view. First of all, a lot of the work on finding duplicate text has been in a very different setup, mostly on the web. For instance, where you want to remove duplicate webpages from your search engine index so you save space, or so you don't have the same results showing up again in search results and also so that you can detect plagiarism by students in programming assignments or writing assignments. So again that's sort of you know most of the document is being copied which is very different from what's here. On the other end, people have worked at much smaller bits of texts being reprinted, sometimes called meme tracking by Jure Leskovec and others, where you know they're talking about very short, quotable, viral phrases of three or five or 10 words and again, that's not what we're dealing with. We need to search for text that might, be a few 100 or a few 1000 words but still much smaller than the whole document. We don't know the boundaries, unlike some other people, and we have this problem of wanting to ignore very close duplicates in the same newspaper, as well as the noisy OCR. So at a high level as I sketched out, we want to first approximately detect these pairs of newspapers that might contain reprinted texts, approximately find these high-confidence regions, the passages that are actually being reprinted and then link these passages together into clusters of viral networks. How are we going to do that? So we adopt a strategy that at its basic level is familiar from anybody who has built an inverted index. But rather than building an inverted index of terms, we'll do it with of collections of terms. Here I'll show an example of building an inverted index of five grams. But we also build indexes of longer end-grams as well as ones that aren't contiguous sequences of five words. So how do we build that index? Well we run through each document, here are three example documents 1, 2 and 3, and get the first five words. So those five words appear in document 1. Those five words appear in document 1, the second sequence of five words and so on. And with cleaner text, we can afford to get longer end-grams, which filters have some of those spurious matches earlier, we can use gapped end-grams. So what we'd like to see is that if a text is reprinted widely as this article announcing the completion of the first trans-Atlantic cable was, we would expect to see a few end-grams appear in multiple texts in that actual cluster. That end-gram appears in the first and the third documents but not in the second due to no CR error. That end-gram appears in the first and the second, sorry, first and the third. That end-gram appears in the first and the second there. And the one thing to note here however is that we can ignore a lot of the index terms, this is, which we couldn't do if we were just searching for an individual document. We can ignore all of the terms that only appear once, which by Zipf's Law is going to cut our index in half. By definition, we're only looking for things that are repeated. So we'll take that index and instead of organizing it by the order that these end-grams appear in the text, we will organize it by the order. By end-gram, which allows us to see, by document pair,which allows us to see which document pairs share a lot of text or only a little bit of text. By document here I mean, I'm sorry, an entire newspaper. So newspapers 1 and 2 only share that end-gram in this example. Newspapers 1 and 3 share a couple of them. We're going to prune out newspapers from the same series to cut out this boilerplate. We're going to cut out end-grams that are just very common, fixed phrases in the language, things are very common. So after this process, we've got ourselves down from down to only comparing 15 million pairs of newspapers in this corpus or less than 1% of the total number of pairs that we would have to compute if we'd done this brute force. So now we'll talk about finding the reprinted passages inside, and we'll use an algorithm that is related to one that you might have seen, so-called edit distance if you, or Levenshtein distance. It's often used for instance in finding cognates between two languages. For instance couleur to color where for instance French inserts a U is vis a vis the American spelling E goes to O and so forth. And the nice thing about alignment dyamic programming, alignment algorithms like this is that it allows you to search this exponential number of paths or possible ways that two newspapers could line up in only quadratic time due to the work of Edsger Dijkstra and Vladimir Levenshtein. Anyway, so but we don't want the alignment of the whole word, the whole newspaper issue, we only want the part in between. To speed that up, we'll anchor this alignment of thepoints, of the end-grams that we already found and then we have these pair-wise alignments. To cut a long story short, from this point on we use a single link clustering and find these connected components within that graph. So what did we find, what does it look like? Well here's an example of James Buchanan's farewell address where he's sort of concerned talking about how horrible it is that slavery is tearing the country apart, and you know obviously this is widely reprinted in the corpus and even though there are a lot of differences in the OCR transcription, we can find four out of a cluster of 30 different examples of this in the collection. Or other temperance stories by TS authors. So that's the stage at which we turn it back over to Ryan and some of our other colleagues for analysis. Ok. You still with us? Excellent. So what does this mean for me? Well what this means for me is an incredible corpus to work with and at the moment we're working with about 392,000 texts. Clusters, we have several thousand clusters of widely reprinted texts from the 19th-century. The exciting thing for me as a scholar of the period is that the vast majority of these are just not texts that literary scholars have ever really paid any attention to either because they're obscure, because they're anonymously written, because they don't fit the genre categories that literary scholars tend to work with. So what are the things that we're finding? These examples are drawn from our top 20 of the most widely reprinted things in the corpus that we've generated. Unsurprisingly a fair number of political speeches, but what's interesting is that the political speeches get contextualized in widely different ways. They go viral but the different newspapers that print them are using them for very different purposes to support different causes. This is Washington's farewell speech that goes viral actually on two separate occasions in the years leading up to the Civil War and for different reasons. I'm happy to dig into more of these in the Q&A if you want cause I'm going to kind of fly through them. The other thing that we get is a lot of news, unsurprisingly. David mentioned the message that Queen Victoria sends on the completion of the trans-Atlantic cable and actually this also goes viral twice. The reason it goes viral twice is that the first time the telegraph operators don't actually transcribe the entire message and the newspapers reprint it along with a lot of commentary about how rude the queen was in her message and then they reprinted again with the entire message saying "oops our bad" she wasn't actually rude we just didn't get the whole message. So it actually sort of circulates around the country twice in two different versions. An awful lot of stories, fiction, sentimental stories in particular. Lots of stories of husbands and wives and children. We know that this is a popular genre in the 19th-century and we see it reflected in a lot of the things that go viral. What I find really interesting about a lot of these is that in the newspaper they get framed in a very particular way. I don't know quite what to call them actually. I've been working with a few possibilities, anecdotes. Because often these stories are not unlike the thing that you get from your cousin in your email, "Did you hear that story about...?" By which I mean they're framed not quite as fiction and not quite as news. The following letter was found by a husband, by his wife after she passed away and it's this very sentimental letter. But of course it doesn't say what his name was or what her name was or where they lived or anything that would allow you to track it down. And there's no snopes.com so you can't immediately go and verify whether this story took place. Lots of travel narratives, travel narratives are very popular. This is a lovely one about journeying, sailing through the Paris sewers actually in canoes, which I guess would be fun. And lots of jokes unsurprisingly. Lots of jokes in your Facebook feed, lots of jokes in 19th-century newspapers. This is one is about a young husband who decides he's going to put his foot down and assert his authority over his wife and then is summarily disciplined for doing so. So what else can we do with these? And what's exciting is when we put this newspaper data into conversation with other kinds of data, we can learn some really great things about 19th-century print culture writ large. And so one of these, the Newberry Library in Chicago provides an atlas of historical county boundaries, so what did the country look like at various points of time? And when I first started experimenting with this data in our GIS, I brought Chronicling America data about the founding of newspapers and I overlaid that with the historical county boundary data from the Newberry Library to get this map that shows the spread of newspapers and the growth of political boundaries in the country from the beginning all the way to the year 2000 which is when the data ends. Just as a broad visualization, it's fascinating. I love the way that the newspapers sort of strike out for the territories. They appear and then the political boundaries follow closely from them. I need to change the color, it looks a little bit too much like an AT&T commercial at the end but I like this visualization. We can also bring in historical census data and we can do things like try and get snapshots of the potential readership for particular stories. We know where they were printed. We have the historical county boundaries. If we bring the census data in, then we can maybe learn something about who might have been reading different stories and whether those audiences were different. So here are just a few John Greenleaf Whittier poems, again these are all from our top 20 most viral texts. Another poem, that Charles Mackay poem. This affectionate spirit which is really a little bit of parental advice that tells dads that it's okay to show affection towards your children. You will not ruin them by giving them a hug basically. And then this really self-indulgent article about how much smarter kids will be if they read a newspaper, which was widely printed in the newspapers. At the broadest level, we can get a sense of how many reprintings we've discovered in the data thus far. I mean we're only working with what's in Chronicling America right now and the approximate population within say five miles, or we could do 10 miles or two miles of where those were reprinted. But the census data is actually much richer than this so you can actually dig in and see how many literate people lived within five miles of where this was printed. If you have an abolitionist piece, what was the slave population like near where this was printed? One interesting geographic visualization just based on the data thus far, this is the speech that the abolitionist John Brown gives at his sentencing hearing. It goes viral, it's widely reprinted. But alone among everything in our top 20 most reprinted texts, the John Brown speech is not printed in Kansas and it is not printed in Nebraska which were the two places that were fighting at the moment about slavery. It actually is printed a little bit in the South. One might sort of assume well it probably wasn't printed in the south. It was in the South but not in the Midwest at least in the papers that we're looking at now and I have to add that caveat whenever we talk about this stuff. There's also lovely collections of historical maps and you can bring these into conversation with your data. This is from the David Rumsey collection, an 1843 traveler's map of the United States. This was a map that included railroad lines, post roads, things of that nature for people trying to get around the country. And this was actually what got me thinking about geography in the first place. I geo-rectified this map which means you bring it into alignment with modern coordinate systems and I overlaid some of the print histories that I had been working with on it, and I immediately noticed this close correlation between print histories and the railroad networks. And this is perhaps not shocking, population, rail, print, they're all going to follow the same paths but I had not really been thinking about transportation networks before I visualized this and this got me thinking about it. And so then I said well is there good data about historical transportation networks? And it turns out that the University of Nebraska has a brilliant project, Railroads in the Making of Modern America, where they actually provide GIS data for the transportation network at various points in the history of the country. And so we can bring that data in, again, into conversation. This is the railroad network in 1861, or sorry, that was '55, and then '61. And we can also do some time visualizations so that you see the spread of the rail network along with different printings of stories. And what's interesting here is that there are some stories that seem very closely aligned with the transportation network and there are others that seem to not be so closely aligned. This Charles Mackay poem for instance seems to appear primarily in places that are not on the rail network and that's maybe an interesting thing to dig into. Most of these visualizations suggest further research. They suggest a connection and then you want to dig in and find out is this really a connection and what's going on here. You want to learn more about it. We don't have to watch all these. I just threw them all in there because I like them. And I'm going to let David talk about some of the modeling that he's been doing. Right. So what we'd like to do and are currently working on, you've seen some qualitative visualizations that we've been doing, we'd like also to do quantitative evaluations to make sure that we're finding a significant number of the things that were actually being reprinted. We're currently working on some manual cluster construction. Here's some texts that we know were reprinted, let's just manually find all of them and evaluate that we're getting them. We'd like to build models to try and distinguish between texts that did go viral or not or to characterize different kinds of texts that went viral or to try and characterize perhaps some of these more usual or unusual genres. So the one thing to note just at the high level on the quantitative evaluation is that just by looking at large clusters that are very long, we're able to very easily get without a lot of labor, find the very long texts that are being reprinted. Not surprising right? There are just more opportunities for those end-grams to match and to overcome the problems with the OCR. And as you get shorter and shorter texts reprinted, say down to 1,000 matching characters among them, it takes more computational effort to find them. So not surprising. Another interesting quantitative thing to note about these data are the time lag between the initial text and the last text in a cluster or say the median text in a cluster. How long did it take different kinds of texts to travel around the country? And if you plot the distribution of these median time lags, you see there are two peaks here. One is around, this is on a log scale which means that the first peak is around two or three weeks. There's certain texts, that you know, newsy one might speculate, that travel very fast. But then there's another peak out here at seven, at around three years. And you can also plot this across time, as the years go on, newspapers become better at retailing faster texts. Communication is getting better. I mean there are just more texts and more newspapers. Again, not surprising to all of you who work on this data. So is there anything different about these fast and slow texts? Well so if we fit a regression model to the texts of these fast and slow clusters, you find that articles that travel fast tend to use terms in this period, not surprisingly, like Texas, Mexico, Zachary Taylor, say the Mexican War, also things to do with trials, corpse, cases so forth. Whereas the slow texts are airier and more relaxing, love, young, earth, awoke, benevolence, behold, bright, woman, things. Some other work that we're currently working on that we don't have any results yet are digging into these individual clusters trying to actually trace the chain of transmission using good old fashioned textual-critical tools or cladistics and stemmatology. How can we, can we account for some of these missing bridge texts? Can we use statistical inference to distinguish OCR errors from editorial changes that might indicate that two texts were jointly influenced? And finally can we think about modeling the network? Can we actually get a quantitative correlation between these clusters of reprinting and the railroad network or the network of papers that shared a political view or a religious view or a social view like the editors are brothers-in-law or something like that? So I'll close just with a couple of remarks about moving beyond Chronicling America. The questions that this corpus has allowed us to ask are applicable to lots of other areas. For instance, source criticism of the immense literature that comes out of the Civil War or any area in history. Things like Grant's memoirs are going to get reused in different ways by different historians. Or another project that I'm working on now with a political scientist in tracking policy ideas in bills. So the sad fact is, if you know Congress, that most bills fail. And if you're in the minority, say a Democrat in 2005, the bill that you introduce is probably going to fail. And most people just look at the bills that pass. The question is, are there ideas in those bills that fail that show up again in bills that do pass. Perhaps, a little bit later or in the same session. And you know going back in an even higher level of granularity, can we use these networks of texts to do better search? The short answer is yes, you want to retrieve clusters not just individual passages in texts. So I'll close with a newspaper that's not in Chronicling America because it's from the wrong country. It's the Economist from 1871, which maybe gives a name to what we're doing. Some of the philosophers should turn from the invention of electrometers, galvanometers, hygrometers and so forth to the far more difficult problem of inventing a mode of measuring the intensity and diffusion of political wishes and convictions. So how do they diffuse? So I know we're running out of time so I'm going to do this really quickly but the final kind of modeling that I've tried to do is network modeling. Ok. Now? The final kind of modeling that I've been working with our data is network modeling. Reprinted texts are actually a pretty direct influence or can be a pretty direct indication of influence. So we have all of this data about texts that were shared between different publications and if we take that, we can use it to model the networks of influence during the antebellum period. So here what you're looking at, I'm going to zoom in in a second, but the circles, the nodes in this network, are individual newspapers. And the lines between them are shared reprints. If two newspapers share one reprint, there's a very thin line between them. If they share hundreds of reprints with one another, then there's a very thick line between them. The colors indicate communities. These are groups of newspapers that shared a lot of the same texts. And so it's figuring out the network software, figuring out that these are possibly communities. And what's very interesting thus far about the experiments I've done with the network visualizations is that they really are indicating these fascinating connections between newspapers that would be very hard to get at if you were just reading the newspapers. There are communities that emerge that are not geographic, that span wide geographies. David alluded to one. There was this incredibly clear connection that came out in the network visualization between a newspaper in Vermont and a newspaper in Missouri, which is quite a span in the 1840's but this incredibly strong connection between them so we asked one of the graduate students working on the project to dig into this. What's going on here? And she discovers that the editors were brothers --in-law. And that they were probably just sharing a lot of newspapers and copying from each other frequently. And so the network graphs have been very suggestive. They've also been overturning maybe some of the presuppositions we tend to make. Because you can't read all the newspapers, scholars often read certain newspapers. Newspapers in New York and Boston and Philadelphia get an inordinate amount of attention frankly, and in our network visualizations, we're finding that there are newspapers in Nashville that are incredibly central to the reprinting during the period. Kind of brokers of information, the Nashville Union American came up this morning. The Nashville Union American is incredibly important in our data set at least, as a kind of broker of reprinted text. So what are our next steps? Our next step is that we want more data. It's very incomplete at this moment. You know what's in Chronicling America right? Every time there's a new batch, we perform the analysis again and we're finding new connections. We're finding new reprinted texts. We've also started conversations, this is perhaps something not to say in this gathering, but we've started conversations with some of the commercial archives of 19th-century periodicals, to try and get access to their data. They are not so forthcoming as Chronicling America, which will be a shock to you. And we've started to annotate the data and I wanted to point out the incredible work especially of Abby Mullen here, but also of Matthew Williamson. These are two history grad students who are working on this project with us. Abby has compiled an incredible amount of data about these newspapers. I hope that this eventually has a home on Chronicling America, to be honest with you. Editorial tenure, what happened to various editors? Things like, wives who took over newspapers when their husbands' names were still on the masthead after their husbands died or something of this nature. And she is just building this affiliation, political affiliations of all of these different newspapers which often shift midstream from one party to another. It's a thorny problem as we're learning. And so we're annotating the data, we're building a web interface for the project, viraltexts.org. At the moment it's just a placeholder website but within a few months, at least some of our preliminary findings will be available there, some of the data will be browsable there and searchable there. And we just want to thank the NEH who gave us a grant to do this project and also the NU Lab which is our intellectual home at Northeastern and thank you. I've been at the Library for over 20 years. I started as a work-study student and I continually moved my way up. But that's enough about me, we're here to talk about genealogy and how you do genealogy at the Library of Congress. And later I'll give some examples of how I've used Chronicling America to locate things relevant to my own family. So I have 15 minutes and I know I'm right before break, so I'm going to get started. OK. Like I said my name Ahmed Johnson, I'm a reference librarian at the Library of Congress in the Local History and Genealogy Reading Room. Just a little background and information about the Library of Congress; the Library of Congress was established as a legislative library in 1800. Of course the British came around and burned the capitol in 1814. So what did we do? We purchase Thomas Jefferson's personal library. And that contained over 6,000 volumes. True renaissance man. Had books about everything, right, and I think actually there's an exhibit at the Library of Congress right now, where you can see those various types and it may be available online as well. What do we have? We have three buildings; the Adams Building, the Jefferson Building and the Madison Building. We have 21 reading rooms and seven overseas offices. OK. What do we have at the Library of Congress? We have every book ever published, right? No. Impossible. I get that all the time, "You have everything ever published?" No, we don't, that would be impossible. But what do we have? We have over 151 million items in our collections. Not all books, actually we have about 23 million books and we have over 117 million non-classified special items. What are special items? Newspapers, of course. Manuscripts, telephone books, sheet music, posters, photographs and so forth. So we're not just books. We add about 10,000 items a day and supposedly if you lined all our collections up, you could travel from Washington, DC, to Milwaukee, Wisconsin, over 525 miles. So a massive collection, right? OK, what about reference services? These statistics were based on 2012. We welcome more than 1.7 million on-site visitors. I talk to most of them because everyone wants to find out their family history, right? Also we provided reference services to over 550 individuals and persons via telephone and through written correspondence. And electronic, I'm sorry. What do we have at the Library of Congress's local history and genealogy reading room? Well, we have over 60,000 genealogies, and when I say "genealogies" what exactly do I mean? Family histories. Someone publishes the information, sends in a copy to the Library of Congress or we receive a copy via copyright. We also have over 100,000 local histories. And when I say "local histories," I don't mean local for just the Washington, DC, area. People come into the reading room all the time and they are confused by that. "When you say local, do you mean just local for the Washington, DC, area?" No, it's for the entire country. So if you know what county your relatives lived in, you can search our catalogue and see what books we have relating to your family and where they lived. And keep in mind we're not an archive or a repository for unpublished materials. There's exceptions to everything I say. We're not an archive for unpublished material but we have newspapers, we have manuscripts and so forth but primarily when you're looking for family histories and local histories, we have published materials. So keep that in mind. OK, our staff. We have specialists in everything from African American, yours truly. British Isles, Canadian, Hispanic, Scandinavian and so forth. Now, when I say all these things, don't think it's a different person for each subject area. Times are hard, people are taking on more duties. I'm probably going to be Hispanic by next week. But we can answer questions about city directories, the origin of names, maritime history, migration and immigration as well as biography and others. How do you do genealogy? This is really basic. I know you're not here to get a course in genealogy but I always like to provide this cause this is what I do. I always suggest that you begin with yourself and work backwards. We all have two parents, four grandparents, eight great-grandparents right? So don't start with great-grandpa or great-grandma. Start with yourself and work your way up. You may find connections further up the line. Right? And you want to document these vital records. What are vital records? Birth, death, marriage, sometimes divorce records. You want to document your information with census records, which go back to 1790. Interview your oldest living relative. Grandma, great-grandma, interesting things she has to say about her life and other things that were going on before you were here. Then you want to look at things lying around the basement and attic and the trunks and so forth. Now once you do that, you want to get out into the community. County courthouses, state archives, a genealogical society, historical society in the area where your family lived. Not where you live now, you may not find too much unless you stayed in the same location where your relatives came or were from. After exhausting all your sources at home, of course like I just said, you venture out to the community, county courthouse and so forth, then you trace your family back to the 1790 census which is the first census for the United States. Did I skip one? OK, this is our home page. This is the best access point for information about our services and collections. As you can see here, we have links. I think the best link on this page is the Ask a Librarian link. That allows you to submit a question directly to one of us, our reference librarians. And now we won't answer your question for you, but we'll lead you in the right direction, tell you where to go to find your information. Often times, I get people say, "My great-grandfather came from here. Give me everything you have on him." Not going to work, right? You do your own research, we'll tell you where to find it. And it may be the Library of Congress, it may not be. We may refer you to the National Archives and other places. We also have biographies and bibliographies and guides and also how to search our library's catalogue, which is available online. Why would you come to the Library of Congress? Usually people come to the Library of Congress to use our subscription data bases. How many of you are familiar with ancestry.com? We have it at the Library of Congress for free. Free is always good, right? Free at the Library of Congress. We also have others. We have over 300 subscription data-bases at the Library of Congress. So often times, people come to the Library of Congress to use our subscription data-bases. We also have Heritage Quest and many others. What can you do from home? We have an excellent website called American Memory, which has digital collections and as you can see you can browse by topic; African American, government, law, immigration, American Expansion and so forth. For the purposes of this talk, I chose immigration and as you can see for immigration we have 13 collections. All of these are key word searches, so you just put in a name, you can put in a location. See what you get, make it your own. And as an example, I selected California, 1849-1900. Why were people going to California during that time? Gold, they were looking for gold, right? You can search this collection. Similar to Chronicling America, you can put in keyword searches, you can search by subjects, you can search by titles. Because genealogy is not just about names, dates and locations, right? It's about what made people do the things they did. What made them move from one location to another? Back during these times, people had a shared existence. It was more communal, people tended to go to the same churches, attend the same schools and so forth. So I say all of that to tell you that you may not find your relative here, but you may find instances of why they may have come to California during this time, ok? So once again, I can't guarantee that you're going to find something directly related to your relative, but you may find something very interesting about that time period. Also my family's from the Washington, DC, area, I'm a fourth-generation native Washingtonian, so this is really great for me. Similar to the other database, keyword searches and this has information from the 1600's to 1925. Same thing, all keyword searches. Let's talk about Chronicling America. I use this database daily. I can give you hundreds of stories of where I've actually found information for researchers so I'm delighted to be asked to come here to speak. I'm really hot up here right now, though. I think it's the bright lights. But anyway, Chronicling America is great because, like I just mentioned, genealogists are usually interested in names, dates, locations, vital records, births, marriages and deaths. Newspapers have obituaries, so we're always looking for newspapers. I think the first search I conducted, I selected, as you all know you can select by state or you can do a particular newspaper. I selected the Shenandoah Herald, which is in Woodstock, Virginia, and I located an obituary for a Thelma Dysart. I selected that because one of the big shots at the Library of Congress, his last name is Dizard but it's not spelled that way so it doesn't work. This is a story about a two-year-old who died, really tragic. But the only reason I picked it was because the second in charge of the Library of Congress name was Bob Dizard so I just thought maybe I could find something interesting. I think he's from Virginia. But as I mentioned earlier, my family is from the Washington, DC, area. So I went to DC newspapers and look at what I found. The Open Forum. My second great-grandfather's name was Hiram S. Haywood. Now, in many documents I found information about him, him being a fireman. I found the date he got married, I found all kinds of information. But if you look at this article here, this tells you about his personality. The Open Forum, this is a letter to the editor where he wrote about how they wanted more money. And he talks about the price of beef stock being 15 cents a pound and that was up, that was because of inflation but yet they didn't get any more money. So he's pleading his case and he titles it, "To the Men in Charge, Washington, DC." Great stuff, right? So now I have this gem from Chronicling America about my second great-grandfather. And this was in 1913, the Washington Herald. And I have another example, the real estate transfers during this time period, lots of information about real estate appeared in newspapers. Oh you know what, let me go back one slide. Another thing I saw that his occupation was a fireman. Now African American, 1913 fireman. This blew my mind. I didn't know we had African American firemen in 1913. He wasn't the fireman that you think of today. He was the person, like you said, that would put the, light the gas lamps and do the, I can't think of the name. Scutter the coal for heat in the buildings and so forth. So I found out through this article exactly what he did and he actually died doing this. But the next thing I was able to locate was a real estate transfer. Once again, Hiram S. Haywood, Lot 102, Square 5113, 10 dollars, a stamp of 50 cents. My family still owns that property on Sheriff Road, which is in the Deanwood section of Washington, DC, so another great find. And I remember doing an oral history with my great-aunt and she mentioned this amusement park that they used to visit as kids and it was called Suburban Gardens. So what did I do? I went to Chronicling America and did the same kind of search, Washington, DC, and I got 288 hits for Suburban Gardens. Suburban Gardens was the first black-owned and operated amusement park in DC and there has not been an amusement park in DC since. I got all kinds of information about this park. I only have 15 minutes so I didn't show you everything I found but Cab Calloway performed there. They even talked about when they bought their first rollercoaster, how it cost 30,000 dollars. And that was in 1920, I believe it opened , 1920-21. So the reason I show these examples, and it's really hot up here, so I'm kind of, I'm trying to deal with it as best I can, but overall I wanted to provide you with a brief history of the Library of Congress, tell you how massive our collections are, talk about some of our digital collections that you can use from home and then tell you about a few things that you can do at the Library of Congress and then provide you some examples of why we just love Chronicling America. Now in closing, I would just like to say, as a librarian at the Library of Congress, often times, we have so much, it's so massive, right? You can get caught up in the newspapers alone, the photographs, the maps and so forth but when you have a database like this that allows you to search because many of the paper newspapers aren't indexed, this makes it so much faster, as the gentleman was stating earlier. You can do so much more in such a shorter period of time, because I hated microfilm. I hated looking at microfilm and I'm fairly young so I can imagine my older clientele, who are usually doing genealogical research, how they felt. So every time I can use this data base, I dive right in. So thank you very much.