Digital Humanities Sampler, Part 2

[Ginther] My name is James Ginther , St. Louis University, my co-PI is Abigail Firey from Kentucky, she is playing the silent partner today. We are working on a project connected to the creation of scholarly editions, scholarly digital editions. Some of you may know the standard process is begin with an unpublished manuscript, create a transcription, and do some collation and eventually create an XML coded file that can be displayed on the internet using style sheets. The two problems we want to address within our start-up grant are, (1) how do you insure accuracy in transcription, particularly since now we have access to unpublished manuscripts and digital archives, and secondly how do you ensure accurate and appropriate markup in creation of a digital edition itself. Now the current solution that is being used is to do sort of divide and conquer. So we begin with the scholarly transcription by the expert in the field. That then is handed over to a support staff, somebody who will do the XML and coding; somebody who has to become familiar in some part with the content of the text and be able to apply the appropriate TEI mark up. And then the issue of accuracy or representation, or even transparency sometimes is accomplished by allowing the digital edition to be attached to the original images from the manuscript, allowing the user to see what the edition is based on. We would like to actually kind of collapse these things and expand the people who will actually be involved in digital humanities. Sometimes we find the people who want to use to use digital humanities methods are kind of overwhelmed by the need to be technical experts in things like TEI. And so what we'd like to do is to using an API or an application we developed for another project that is able to identify the location of lines on a manuscript page, be able to parse them out as individual images and allow basically people to transcribe on screen with the unpublished manuscript page. We'll also include tools that will allow them use automated XML encoding and also an XML toolbox allow them to basically put everything under the hood and also provide editing tools, dictionaries, and Unicode character sets like that. There is a prime example why this is important-- for those who have ability in Latin, you'll notice that last line actually has an error in the transcription and we would like to be able to use this a tool as well to allow readers to be able to identify possible problems in the edition and be able to supply corrections and things like that. Thank you very much. [Thomas-Hilburn] Hello. I'm Hale Thomas and I am with the University of Arizona College of Humanities and the Poetry Center, and this fall marks the fiftieth anniversary of the Poetry Center, and if you ever make it to Tuscon, I invite you to come see our shrine to poetry. For the last fifty years, many poets have been making an annual sojourn to the Poetry Center to participate in the visiting poets and writers reading series. Early on, right in the early sixties, the poetry staff, in their infinite wisdom, decided we should be recording these readings. To this day, we have a little bit over six hundred recordings and we're adding about another twenty each year. In 2006, we began the equivalent of a phase one start-up grant to create a digital archive for these digitized recordings, as well as to make a player application available for people at the Poetry Center. Our current phase two grant is taking that digital collection and by late next week will hopefully be available for all over the internet . The next steps beyond just making the simple collection, which we have now with pretty standard library meta-data, is to kind of add the scholar-- what we're considering the scholarly layer, to hopefully greatly increase the scholarly information incorporated right into the collection. You would be able to, from the poets, follow what poems were published in what editions, or if they were never published at all. It could even be an earlier draft or a poem that's been read over multiple decades and has evolved significantly in its meaning and sense of place. The third phase-- So that will, the second phase we hope to land in November sometime... and the third phase will add a whole contextual layer where we are going to invite the public to actually contribute their knowledge and expertise to the collection and hopefully greatly increase its scholarly value in that capacity as well. Thank you. [Snyder] Hello, my name is Lisa Snyder and my co-PI Scott Freedman is lurking in the back. We are both here from UCLA and our project is the development of new runtime software that will facilitate the educational use of real-time computer models of historic urban environments. Our formal test bed is the NEH funded model of the Egyptian temple complex at Karnak, but the software is intended to be used both by other content creators who are interested in sharing their 3D work and with the other models develop at UCLA, such as the urban simulation team's reconstruction of the world's Columbian exposition, which is what you see here. The software has four major project components. The first is an intuitive navigation interface design. The second is the ability to create linear narratives within the 3D world. The third is spatially organize supplementary resources and annotations. And finally, and perhaps most importantly, it has the ability to add in branding and end-user restrictions so that content creators can feel comfortable sharing their intellectual property. The next steps for us is to disseminate our software prototype and our test-bed project. Then we want to conduct extended testing with content creators, both in-service educators, students, and museum visitors. We want to modify and expand our software's functionality and finally want to create the backend to support an archive of these academic vetted environments for use in formal and informal educational settings. Thank you. [Chourasia] Hi, I'm Amit Chourasia from the San Diego Supercomputer center at UCSD. I'm the co-project director for this project called Drama in the Delta. This is a collaboration between the SDSC and Theater and Dance Department, where doctor Emily is the PI for this project. What we are trying to do here is, how can we the present past performances? or what can we do to recreate passed 3D environments and show them interactively? or how can we engage and train students in different disciplines, like humanities computing, and involve them in this research and at the same time make all this work widely accessible? What we are trying to do here is-- we are chronicling some events from Arkansas' internment camps, which are culturally significant and then we are using digital methods to illustrate some of those things... and then while doing that process we are creating some cinematic footage which can be created as a video and distributed. And then use that digital media in a set to create an interactive game which can engage the students, or other people, to actually perform those significant events and see them what may have happened. In this case, there are a lot of racial boundaries which were blurred in internment camps, which are not necessarily known to a lot of people and this is an engaging medium to the audience to get to know all these things in a very, very interactive manner. So this is kind of the outline, kind of the initial thoughts, we have on the outreach for this project. We our thinking of using this: the outcomes of this project in the class at UCSD. Also, we are planning to share the game with the TeacherTech program in San Diego which involves about three hundred teachers and then we are also willing to share this with the Japanese American National Museum and also everything is available on the website which is open for people to download and try out and test it out. Thank you. [Cooney] Good morning, my name is Charles Cooney. I am representing the ARTFL Project at the University of Chicago. The name of our project is Le Dictionnaire Vivant de la Langue Francaise, or The Living Dictionary of the French Language. Essentially this is an experiment in what we were are calling community lexicography. The aim of the project is to provide a worldwide user community a platform to develop a dictionary that reflects the way French is actually used and understood in francophone cultures around the globe. The means by which we hope to achieve this are user voting and submission features. We will encourage our users to rate the several definitions and word-use example sentences that are returned with each word search. In the event that a word does not exist in the lexicon, the user can submit new words, new definitions, example sentences, and other resources. The DVLF is basically running and ready for release. What we've done to date is to draw for each of our head-words, we've drawn a traditional style definitions from our collection of French historical dictionaries. Then we developed data-mining algorithms to extract what we hope our particularly salient examples of word usage from our corpora of French literary texts and from data we downloaded from the web, particularly from French language newspapers. You can see the screenshot of search results. We also are including lists of pseudonyms, sometimes antonyms, a little pronunciation and IPA, and links to other resources and features. In the going forward, we're envisioning a two track development process. One track to enhanced user experience, whether through a text to speech pronunciation, or developing a mobile app for handheld users. And then we would also like to develop the underlying data, so that it can be released as open source French word-net. This could be a resource for more intensive linguistics and data-mining research in the French language. Thank you very much. [Van Liefferinge] Hi I'm Stefaan Van Liefferinge from the University of Georgia. Our project is titled Artificial Intelligence for Architectural discourse. It is a collaborative effort between myself, an architectural historian specializing in medieval architecture, and Michael Covington and Walter Potter from the Institute for Artificial Intelligence. Michael Covington is a specialist in natural language processing, and Walter Potter is a specialist in knowledge representation. Now the goal of this project is to brainstorm actually on an idea: if a computer can understand architecture. So what we would like to do is to apply artificial intelligence methods to represent architecture computationally Why... I mean how did this projects start? Well, actually it is because in architectural discourse, and also in architecture also, what he see is that there's an internal logic that is reflected in the discourses on architecture, whether they are verbal or written. That means we talk about a building, or about a structure, in a certain way and not in another. We don't put a capital under a column, we put it above a column usually. And also in artificial intelligence on the other hand, well, there is a concept which is ontology, which is a structure domain-specific knowledge representation, meaning that it represents knowledge in a certain structured way. Our idea here is to develop an ontology for Gothic architecture. Why? Because then we can actually have a computer work on the representations of architecture. How does that work? Well we all are working right now at this stage, again it's a beginning project. Well, first of all we have human experts: architectural historians, such as myself, and also our specialists in artificial intelligence. We studied discourses on architecture for descriptions of Gothic cathedral architecture mainly because it's my field, my specialization. This creates this ontology, which are a number of rules and concepts that we then can store in a computer. There is a validation moment and also afterwards what we hope for future applications of this is to use the structure representations from a computer from having for example a real-time corrector for students so that if students produce a text describing architecture that they could get to have real-time corrector there. Also 3D reconstructions are possible and also it could be checked for consistency, completeness, etc.. also to for data mining for example. Thank you very much. [Lester] My name is Dave Lester. I am the assistant director at the Maryland Institute for Technology in the Humanities, we are a digital humanities center at [University of] Maryland. Before I talk about my slides, and I know I have not a lot of time, what I'm discussing is something kind of technical and my grant, the grant narrative actually explained it pretty well so I'm not going to try to cram a bunch of technical things into two minutes and then confuse everyone but I will give you is the who, what, why, and where, all those details you want to know. I have posted on Twitter. Well, I will be actually after this, be posting a link to the original narrative. So in the kind of spirit of openness, if you want read that, feel free to go find that. We received funding to host a two-day workshop focused on the integration of APIs into digital humanities projects and working collaboratively to prototype uses of APIs during a kind of working weekend. So earlier, when Jeremy Boggs was discussing how the CHNM approach to tool development is building using platforms, APIs are actually a very great way to leverage the power of existing content, existing tools, in new ways to build software that doesn't necessarily recreate the wheel but it is using what is already out there. This workshop will include digital humanities developers, project managers, workshop leaders, to not only have a discussion, but to learn from industry leaders who have successful APIs, a few of them are listed up here: Flicker Google maps, Freebase, Twitter. If you have an iPhone application that you use to, you know, connect to Twitter right now, that uses an API. If you want to create a Google mashup of a Google map and a flicker photo, that uses an API. If you want to create an augmented reality application you're going to use an API. So you should come to this workshop. So the event itself, it'll be a mixture of a few things. One, formal presentations from actual representatives from several of the companies that were on that last slide including Google and a few others that will be announced shortly. There will be breakout sessions. So, the mornings will be presentations. The afternoons we'll break into small groups. If you're a project manager and you want to talk about the ways that you can use APIs on your projects you can talk to other project managers. If you're a developer and you want to write code, like Patrick, you can do that too. There'll be lunchtime lightning talks, where we can have these brief periods where we discuss things. Applications are in October, workshop is in February. I'll see you there, thank you. [Seefeldt] Hi, I'm Doug Seefeldt from the University of Nebraska Lincoln and my colleague and I, Will Thomas, are addressing a set of dilemmas facing scholarly communication in history. As we all know, the authors are out there creating complex digital historical scholarship, you know there are some in this room, Andrew Torget, Vika Zafrin: People here who are taking the steps to do this. we also know that authors need established journals. We need them to peer review and disseminate our scholarship so that our peers can vet our work and we can reap the rewards that come with that in our fields. But one example of this dilemma of where to place your work is clear from Will Thomas' efforts in digital history to date. An American Historical Review piece from 2003, if we look at data from about an eighteen month period we can see a hundred and twenty four thousand hits about six thousand unique visitors. An article he did the very next year, which was also placed in a different space, Southern Spaces which is the online journal of southern history and culture, doesn't quite have the recognition in the field of historical scholarship that the American Historical Review does and you can see in its disparity between visitors from the Southern Spaces journal to just the website version standing up at the Virginia Center for Digital History. It's a vast chasm there I think. Editors, their dilemma is where are we going to place this work in our journals? How do we do this? There are certainly challenges in reconciling digital format from something that is designed for print, but we think about branding. How do you brand an article beyond the actual bound printed journal? Peer review: who are the peers that then can evaluate the content and the form of these things? So what we've done is we've created a meeting that is happening actually Friday out in Lincoln called "Sustaining Digital History" where we are bringing these journal editors together to discuss the future of scholarly communication in the digital realm in history. Thanks a lot. [Torget] Hi. (waits for a "Hi" back from audience} There you go. I am Andrew Torget. I'm at the University of North Texas and I did give Doug five bucks to plug my work at the start of his talk. Our project is a collaboration with UNT and Stanford University to basically deal with what is the biggest problem facing me as a historian moving forward, which is stuff. And way too much of it. We often talk about this in terms of 'what do you do with a million books,' but mostly with the work that I do the problem is actually newspapers and the digitization that is happening with those. It is actually far more content, far more varied, and in some ways far more accessible now than anything else. It is going to be a continued problem, thanks for the NEH and Library of Congress in the Chronicling America project, which is basically trying to digitized all America newspaper since 1836. They are off to a great start. They've done a million pages, they're probably going to do about twenty millions' their estimate. The problem for me is you can see my equation down there. You give me twenty million pages, I don't know what to do with it. Finding content in something so large can be overwhelming and so what we're trying to do is try to develop some techniques and tools to mine deeply into this stuff, bring out useful information, and then make sense of it through visualization. So the whole idea of it is that we're teaming up with some really smart people in computer science with natural language processing experience to do some text mining work to bring out patterns. And then because we're still going to be pulling out huge patterns, maps those things across time and space and that's why we're teaming up with Stanford to help us work on that. The whole idea is to try to do an experimental model basically to see if you can do this. This is the high-risk high-reward aspect, I think, of these projects. We're taking on we think a very big project but something that hopefully we can scale up long-term into something to be extremely useful for humanities scholarship across the board. So, Thanks. [Nesbit] Hello, I am Scott Nesbitt from the digital scholarship lab at the University of Richmond and I am here representing our project at the U of R, Landscapes of the American Past. We think that atlases do a pretty good job of organizing a lot of information well. And we have it on good authority John Franklin Jameson when he was helping build essentially the discipline of history a hundred years ago, developing things like the American Historical Review and the American Council of Learned Societies, also planned an atlas to bridge the gap between the public and some of these more specialized scholarly works and also the documentary editing collection that he was going about doing. Well we have a lot of documents being produced, a lot of archives, and we think that now is propitious time to start on new atlas to rethink about how we might organize the massive amount of texts that are being produced to become more available to scholars. So, what we're thinking is... well, you know, a new atlas might be a fine way of doing that. Our particular project that we're starting out with is a map of Emancipation. We are using a lot of sources that you all are all familiar with: The Valley of the Shadow, the Perseus Digital Library, the Making of America project and others... ... To start to think about, well, when did Emancipation happened during the war? and where was it happening? how did that occur? We're essentially trying to take all of this content, all these great archives and applied to specific historiographic question that we think does well spatially. So what we're hoping is that this atlas will eventually become able to serve as a common ground for people from all walks of life to come try to see America as a commonly held space and in order to do that we're trying to bring together all kinds of different technologies and approaches from kind of hunting and pecking through sources to close readings of secondary texts and also information retrieval and discovery. Thank you. [Koller] Hello, I'm David Koller from the University of Virginia and our project concerns the application of high performance computing resources to digitized 3D models of cultural heritage. In particular, we're investigating applications of algorithms for fast processing and geometric analysis of scanned 3d models, such as the large models you might get from laser scanning with 3-dimensional laser scanners and this project is kind of an outgrowth of a prior NEH award from the humanities high performance computing program that was initiated by Bret and his staff a couple of years ago in collaboration with the NERSC supercomputer center out in California and the Department of Energy. So this is kind of a start-up start-up award for us, that was. The first application that we are looking at for HPC , high performance computing, is efficient processing of raw 3d scan data into completed 3d models. Scanning cultural heritage objects often creates humungous models with billion of points. For example, our scan at Stanford University, back in 1999, of Michelangelo's David, 5 meters tall at quarter-millimeter resolution, created a point cloud with over 1 billion raw point measurements so this necessitates new logarithms for highly paralyzed 3d scan processing that can be applied in high performance clusters and super computing environments. We've already at this point to date applied these methods to a variety of cultural heritage objects, not just statuary, but also archaeological sites, as well as historical buildings and structures. For example, this building that we scanned at colonial Williamsburg well you can imagine a building of this magnitude scanned at quarter millimeter resolution is going to generate a gigantic data set with tens of billions of points which you cannot process on just your desktop PC you need a super computer or high performance cluster. The Second application that we're looking at for super computing is automatic reconstruction of fragmented archaeological artifacts like this jumble of fragments you see here that is often encountered at archaeological sites and so we've developed a system for scanning those fragments and then automatically reassembling the fragments with supercomputer algorithms. And our test data set there is the digitized fragments of the Forma Urbis Romae or this ancient marble map of Rome, that we also scanned back at Stanford several years ago. Thank you. [Wiesner] Hi, I'm... (small jump motion to the microphone) I'm too short, Susan Wiesner and I'm co-PI of the ARTeFACT Movement Thesaurus. My other Pi is out hiking the Sierra Nevadas, Brad Bannet, and Rommie Stalnakera is going to tell you about our project. [Stalnakera] Research into movement and movement-based arts depends greatly on the ability to peruse documentation beyond static written text and photographic images, in other words: film and video. That said the current practice of viewing hours of film undermines researchers abilities to find movement derived data, find that data quickly, find it accurately described, and reuse the data. The ARTeFACT Movement Thesaurus is a continuation of the ARTeFACT project developed by Dr. Wiesner and Jay McCourtney at the University of Virginia in 2007, which began with the intent of capturing, preserving, and providing access to movement derived data. Our level two digital start-up grant is allowing us to take a major step towards the last point: Providing access. As a first step, we developed an ontology on which to base a controlled vocabulary for meta-data. Then working with Dr. Brad Bennett, the research director of the motion analysis and motor performance laboratory at Kluge Children's Rehabilitation Institute, we have captured over 200 codified dance movements and tai chi movements, phrases, and repertoire. Using a Vicon 8 camera 3d motion analysis system with two digital video cameras. What you see here is a portion of a pirouette on point, a ballet movement. Contrary to popular belief by using codified movements, we are not limited to ballet, as some modern and jazz techniques are also codified. Even if some step names would be considered folksonomy. Here is the pirouette in 3d, using motion capture enables us to see all three dimensions, no matter the front of the camera or the movement. Which will allow us better associate the motions captured in 3d with the filmed movements seen in 2d. the Vicon software also allows us to validate our ontology through a database of start/stop points and characterizations of the movement relative to the body, not space. As a next step we've recently begun discussion with engineers at UVA in order to develop algorithms for detecting dance movements on film. We are very pleased with this process and in the meantime we've had a lot of fun. Thank you.