The moment I walked into the stacks of Yale’s Sterling Memorial Library, I knew it was love at first smell. The scent of yellowing paper reminded me of the library in the small town where I grew up. I had nearly exhausted the collections of that library by the time I got to high school, whereas I haven’t even managed to set foot on each of Sterling’s sixteen floors.
What to do with all these centuries of words, endlessly piling up on library shelves? Most books in Sterling are a part of what’s called “the great unread”—the profusion of books and other cultural output that go ignored by humanities scholars. From all the novels and histories, all the theory and criticism published in the last half century, we select a relative few, creating a comprehensible, manageable canon out of the overwhelming possibilities.
Until now, humanities scholars have been devoting their lives to narrowing down the resources in their field. But as libraries like Sterling begin translating their texts into digital forms, scholars have begun to wonder whether we might be able to use new technologies to make use of entire libraries of works, rather than a narrow few. New computational techniques can then work to “read” these millions of forgotten texts, drawing out topics and trends. Using these tools, scholars in humanities and the sciences are partnering on projects that weren’t possible even ten years ago—things like text-mining, cultural mapping, and automated literary analysis. These new techniques are part of what’s called the digital humanities, a discipline that is just now beginning to take root at Yale. While the movement raises exciting possibilities, it is also asking scholars to confront how they define the humanities as a discipline.
Last March, Peter Leonard became Yale’s first digital humanities librarian, a new position that signals Yale’s growing support for scholars exploring the intersection of digital technology and the humanities. In his office on the second floor of Sterling, thick volumes of Old Norse, Swedish, and Icelandic literature sit next to a few well-worn books on programming. One shelf is dedicated to a few relics of technologies past: a motherboard from a 1980s NEXT computer sits alongside a Nune, a tablet-style educational device from the early nineties.
Leonard’s path to the digital humanities began with a relatively traditional humanities education: an undergraduate degree in art history and a graduate degree in literature. Like most literature students, he was trained to read a novel closely and meticulously, and to dissect paragraphs around a seminar table. It was only during his postdoctoral work at UCLA in Nordic literature that Leonard became interested in quantitative approaches to literary questions. Today, he focuses on computer analyses of large collections of digital texts. As an example of this technique, he showed me an analysis he conducted on 120 years worth of Vogue magazines. The project tracked changes in the publication’s attention to particular topics from decade to decade. In the early twentieth century, the editors were more interested in social interactions; more recently, the focus has shifted to beauty products and fashion.
As digital librarian, Leonard will help define a young and diffuse field. Since coming to Yale, he has begun holding open office hours every Thursday with the Digital Humanities Working Group, fielding questions from students and faculty, and getting a sense of what the digital humanities will look like at Yale. “The definition of digital humanities is probably a little different on every campus,” Leonard explained. In addition to computational research methods and digitization, the digital humanities can include fields such as instructional technology and new media studies. How these areas will relate to one another and which ones Yale will focus on are questions that Leonard and other Yale researchers are working to answer.
There’s also another, perhaps even more pressing question to consider: how will the digital humanities define its work in relation to the disciplines from which it so often borrows tools, such as the social sciences? The technical challenges of repurposing the objective, mathematical tools of the social sciences or of representing the full complexity of a manuscript in a digital form, mask a deeper tension in reconciling the work of very different disciplines: one that focuses on data-driven, empirical answers, and the other which is more comfortable with ambiguity and open-ended questions. Leonard asks, “how do we make use of techniques that might emerge from outside our disciplines, without becoming unduly beholden to them?”
One of the earliest projects in the digital humanities was finding ways to turn physical collections of “humanities data”—which can range from scholarly works to novels, photographic images to historical records—into a form that can be analyzed by a computer. That project of digitization remains at the center of the discipline.
The work of women’s, gender, and sexuality studies professor Laura Wexler, a photographic historian, falls into this category. She oversees a collaborative digital humanities project spearheaded by graduate students Lauren Tilton and Taylor Arnold, who come, respectively, from the fields of American Studies and statistics. The team is mapping and analyzing over 160,000 photos taken by photographers hired by the federal Farm Security Administration’s Office of War and Information from 1935 to 1945, creating an online database that is searchable and accessible to a range of audiences from middle schoolers to schoolteachers to academics.
A version of Photogrammer, the tool they’ve created, is now available online in beta. You can easily follow a particular photographer’s walk around Chicago, search for photos from your home county, or use the caption search to find every image containing a lumberjack. A few decades ago, the only way to look at these photos would be to rifle though hundreds of file cabinets in a dank basement in Washington.
Wexler, Tilton, and Arnold are still determining what kinds of questions they might answer with their newly collected data. Beyond correlating various attributes of the photos and the geographical location where they were taken, they’re also interested in harnessing the potential of crowdsourcing. “It’s designed to be a public platform,” Wexler explained. Users could participate by tagging photos or uploading current images of the same locations to see what’s changed.
“We were able to take this big messy data set and put it into a form that’s nimble,” said Wexler. Humanities data tends to be disorganized, posing interesting technical challenges for programmers. The very idea of “data” in the humanities has been challenged by some scholars, such as Joanna Drucker, a professor of information studies at UCLA, who argues that rather than thinking of data as a direct representation of reality—in the sense of its Latin root, which means “what is given”—we should think of it as something that is collected and constructed. Digital data offers only a partial representation of an object; the object has to be broken down into a set of encoded attributes before it can appear on your screen. Deciding how to divide an object like a painting into a few lines of code is the biggest challenge in any digitization project.
“A book communicates meaning to us, but it doesn’t communicate meaning to a machine,” said Carol Chiodo, a Dante scholar and member of the Digital Humanities Working Group. “Anyone who’s tried to extract text from a PDF knows that.” She pointed to the Text Encoding Initiative (TEI) as a set of guidelines for how to format texts so that computers can understand them. These guidelines determine how to mark important features such as paragraph, section, and chapter breaks.
If you’ve ever been forced by a website to prove you’re a human by transcribing a nearly unreadable word, you may have contributed to cleaning up some messy humanities data yourself. These “CAPTCHA” (“Completely Automated Public Turing test to tell Computers and Humans Apart”) challenges are used by websites to ensure that someone filling data into a form isn’t actually an automated program attempting to hack the system. A decoding system called reCAPTCHA manipulates this tool for research by taking images from old manuscripts in the process of being digitized and using them as challenges. Algorithms identify which words in a manuscript are most likely to have been wrongly transcribed, and these are outsourced to an army of unwitting Internet users. reCAPTCHA’s first project was cleaning up digitizations of the New York Times, but it’s now moving on to work on the millions of books scanned by Google.
Technical challenges like these are part of what draws computer scientists to get involved with the digital humanities. “Humanists aren’t inhibited by knowing what’s difficult to accomplish with computational techniques, so that can lead us to interesting new challenges,” said Holly Rushmeier, chair of Yale’s computer science department. Rushmeier’s work often involves visual data mining: transforming objects or images into digital data that can be manipulated and analyzed. Before coming to Yale, she worked on several cultural heritage projects at IBM. One involved creating a digital model of Michelangelo’s Pietà. The statue was partially destroyed by Michelangelo himself, and another sculptor attempted to repair it. An art historian who was trying to understand what drove Michelangelo to damage his own work wanted to see how the statue looked with the repaired pieces removed. The techniques developed in the course of the project weren’t just of future use to scholars, but also to the development of commercial 3D imaging products. “With this type of work, you have both the immediate benefit for the scholar, and all these spin-off benefits,” said Rushmeier.
But what about the “great unread”? Peter Leonard is helping some Swedish scholars use a program that can identify topics in a huge collection of novels. He showed me a page of results. Each topic is displayed as a cluster of words, their relative sizes depending on their importance in the text, with related passages on the side.
The topics aren’t suggested by the researcher, but are drawn out of the text by the program itself. “The wonderful and terrible thing about it is that it only shows you what it thinks is there,” Leonard said. There are limitations, of course—a computer can’t read 1984 and come out with a topic called “fascism” if it’s never directly mentioned—but topic modeling can expose unexpected connections.
Techniques like topic modeling can have larger implications for academic fields, and could potentially alter the canon itself. Leonard gave me the example of research done on nineteenth-century Swedish novels. When studying these novels, scholars had usually focused on the works of Ibsen and Strindberg, identifying the authors as primarily responsible in the drive to replace Romanticism with realism during that period, writing about the economic rights of women and life in bourgeois households.
In the seventies, though, painstaking archival research showed that there was also a large cohort of female writers who contributed to this movement, examining topics that had previously been ascribed solely to men. Those researchers were using traditional techniques—reading as many books in the archive as possible. Forty years later, digital humanities scholars wanted to see if they could find similar results using topic modeling. They found that the same topics that turned up in Ibsen and Strindberg were also in the works of female writers.
“In that case, we had a cheat sheet to check our results against,” said Leonard. But it was proof that the technique could identify previously overlooked pockets of important scholarly work.
While nearly every faculty member I spoke to was excited about the potential of the digital humanities to analyze large volumes of humanities data, they all acknowledged certain limitations. The first problem is assembling a truly comprehensive collection of humanities data in the first place, when not all of it is accessible.
“There’s a selection process at every stage,” Leonard explained. “First, not every writer gets their book published. Then, not every book gets purchased. Over the next hundred years, the library makes decisions about whether we should keep these books. “Now the question is: which books do we want to send to Google?” Because information is lost at each step, scholars need to qualify their conclusions to acknowledge the way these missing pieces could bias their results.
Of course, there are humanities questions that this sort of approach simply can’t answer. For more contextualized and nuanced readings of particular texts, we need humans, not machines. “No one’s trying to say, Oh, it’s time to become a big data scientist and forget humanities training,” Leonard said. “But we can keep that incredible value as we use new techniques to engage with works that might have fallen outside the canon.”
Franco Moretti, a digital humanities pioneer and founder of the Stanford Literature Lab, snuck a jab at Yale into his well-known, now decade-old essay, “The Slaughterhouse of Literature,” calling close reading a “secularized theology…that has radiated from the cheerful town of New Haven over the whole field of literary studies.”
Things have changed since Moretti wrote that essay, but Yale’s English and Literature departments are still known for emphasis on close reading, and the University has been reluctant to embrace the “distant reading” aspects of digital humanities research. “We have a deep commitment to close reading, and that is something that computers have not been good at doing yet: that very human act of reading, understanding, digesting, and making an argument,” said English professor Amy Hungerford, whose work in American literature has grown to encompass studies of changing media forms.
For Hungerford, the development of digital humanities at Yale, particularly in her department, is a “chicken-and-egg problem.” In order to attract digital humanities scholars, Yale needs to have the community, resources, and senior faculty that support them. But in order to invest in those resources, someone needs to lobby the administration and raise awareness about the need for Yale to make those investments.
Trip Kirkpatrick, a member of the Digital Humanities Working Group and a senior instructional technologist for Yale, agreed that the growth of the digital humanities at Yale hasn’t been as rapid as it has been at some peer institutions. “Yale is in many ways a structurally conservative institution,” said Kirkpatrick. “It’s been around for over three hundred years and expects to be around permanently, so it looks at things in terms of big time frames.”
Meanwhile, the digital world moves fast. Kirkpatrick pointed out that Wikipedia, a now indispensible resource for most college students, was only invented in 2001. The digital humanities is evolving at a similarly rapid pace, in a way that might be difficult for Yale to deal with under its current system. “What Yale needs to be working on is not necessarily making individual changes, but figuring out ways to adapt to change differently,” Kirkpatrick said. “We’re great on stability, not so much on agility.”
Kirkpatrick still hopes that digital humanities can be worked into the curriculum. He imagines an English department in which professors might list macroanalysis as one of many scholarly interests—right alongside, say, Anglo-Saxon poetry or Marxism—and a course on computational stylistics might be offered as an elective for undergrads.
There are already some scholars incorporating digital techniques and social science tools into their work, Kirkpatrick pointed out; they simply don’t publish in online journals devoted to digital humanities or consider it central to their research. There is a pitfall: having scholars use digital techniques without maintaining digital humanities’ separate identity makes it difficult to easily identify progress. “If digital text analysis falls alone in a forest, does anybody hear it?” Kirkpatrick asked. He believes that digital humanities will only be thought of as a self-standing field if scholars who clearly identify their work as digital humanities become successful.
I asked Leonard what separates the social sciences from the humanities if both begin to use the same tools. For him, what differentiates the disciplines isn’t so much methodology as the sort of questions they want to ask, but he acknowledged that the relationship can be “uneasy.”
“People will keep doing close readings forever,” Leonard said. “It’s just that some people are interested in supporting close readings with quantitative and algorithmic analysis of humanities data.”
Regardless of how the work is labeled, the new generation of scholars is clearly making a turn toward a more technological approach. Erin Maher, a senior at Yale, is a perfect example: she began her undergraduate years in the humanities before switching computer science, and finally majoring in women’s, gender and sexuality Studies. “I thought, well, now I have to drop my interest in technology,” Maher said. But classes like Wexler’s that incorporate new media studies have allowed her to find an intersection between her two interests.
Maher’s senior project, under Wexler’s guidance, will study communities developed on the microblogging website Tumblr. In the past, Maher told me, anthropologists studying communities like Tumblr have tended to be “disconnected.” “They publish their work in places that the people they’re writing about will never have access to,” she explained.
Maher also works as a programmer for Yale’s Information Technology Services, helping develop educational tools such as Pnut, which will allow students to give lecturers feedback in real time. She’s hoping that in the future she will use her programming skills to help build more accessible scholarly publishing platforms online, building ladders in and out of the ivory tower.
It’s this ethos that particularly drives Chiodo in her work on the history of Dante scholarship. “We’ve got seven hundred-plus years of commentaries and scholarship, of blood, sweat, tears and tons of words expended on this author. I’m finding that digital tools are allowing me to say new things, to see the work in a new way, and that’s true of my colleagues as well,” Chiodo said. “That in and of itself offers huge possibility.”
Riding the elevator up through the stacks, I’m now hyper-aware of the “great unread” around me. Tracking down a particular volume of Chaucer, I feel myself taking part in a long tradition of skimming across the top of a vast reservoir of cultural output. While exploring the digital humanities, I’ve realized that in my past three years as an English major I’ve never been forced to truly examine the tools I’m using. While I may go on close reading Chaucer’s stanzas, I’ll do so with a better sense of other technologies of knowledge, other ways of seeing, that—for the moment—exist outside of my reach.