Text, Music, and Image as Digital Artifact

Lawrence Woof

© Nineteenth Century Studies. From Volume 15 (2001): 131–33.

The future, we are from time to time assured, is digital. In this, my first contribution as electronic resources review editor, I want to consider what this could mean in practice for scholars, by standing back and taking an overview of the broad directions in which scholarly electronic resources are moving, using the trajectory of the last ten years to speculate about the next.

It was around a dozen years ago that it became clear that the functionality of computers could bring a new clarity to the humanities in general and to textual studies in particular. In part, this was simply because texts became searchable, but it was also because of the very real benefits for scholarship provided by text-encoding conventions, principally SGML (and currently XML). Text encoding is the procedure whereby descriptive tags—invisible to the user but not to the computer—are placed within an electronic (i.e., machine-readable) document. The purpose of these tags is to describe and locate aspects of the text. The process of encoding a text makes explicit that text’s internal hierarchy, whereby, for instance, a novel might be thought of as a sequence of nested, hierarchical structures, such as a number of chapters, each containing one or more paragraphs, each in turn containing one or more sentences, and so on. This model of text does not necessarily represent a fundamental truth about the way in which texts work; however, it is highly attractive and useful in computing, mainly because, by referencing the hierarchical structures, the model allows texts to be searched in sophisticated, context-sensitive ways.

Inevitably, this model became formalized. By the late 1980s, the Text Encoding Initiative (TEI) had been formed and was beginning the fundamental task of determining a set of standards for the encoding of digitized documents. (This standardization would, e.g., allow all similarly encoded documents to be interrogated using a single complex search.) In parallel with the TEI movement, there developed an accompanying philosophy of text that consolidated this sense of clarity and control that was being achieved through text encoding. This philosophy held that a text in some sense was its structure, defining text as “an ordered hierarchy of content objects.” 1 The centrality of the hierarchical structure to the text-encoding project in general became such that a text came to be thought of as predicated on this abstract structure. This position was built on the following, slightly uneasy proposition: If two people read the same novel in different editions—for instance, one version printed in twelve-point Times Roman, the other in ten-point Palatino—those people are nevertheless reading the same text. Allen Renear, David Durand, and Elli Mylonas usefully summarize this approach: “X and y are the same text if and only if they are the same ordered hierarchy of content objects. Therefore texts are ordered hierarchies of content objects.”2  What this Platonic abstraction necessarily ignores, of course, is the specificity of a text. Scholars have moved away from the agenda of New Criticism to study, in Don McKenzie’s phrase, “texts as recorded forms, and the processes of their transmission, including their production and reception”—that is, texts as historical documents, produced and read under specific circumstances.3

This omission of the concrete does not stem from a need to reengage with the text-centered critical traditions associated, for example, with I. A. Richards. Rather, the decision to view (and theorize) text as structure has essentially been a pragmatic one, dictated by the limitations of the computer as a medium during the 1990s. Indeed, it is arguably only as these limitations start to recede that we become fully able to recognize their shaping influence on our conceptual framework.

Until very recently, computers have been able to process text efficiently only in its machine-readable form. The information that would have indicated most clearly the specificity of the text (usually a high-resolution image of the page of text in question) has been beyond the capacities of the available technology. To have done anything other than display machine-readable text would have required far too much from the available bandwidths and hard drives (not to mention too much processor time). However, the steady growth in processor power and bandwidth over the past decade has recently reached a critical threshold, and a new approach to digital resources is now possible. The scholar of the coming decade will be examining as a matter of course images of books, not just machine-readable transcriptions. This will mark a fundamental shift, a move away from digital abstractions and toward digital artifacts. In nineteenth-century terminology, we might think of this as a move away from a classical aesthetic toward a gothic one.

This change in how digital texts are presented to us will not lead automatically, I think, to the downgrading of the current model based on the ordered hierarchy of content objects. Rather, the change will allow the sense of the uniqueness and irreducibility of artifacts to be integrated into this abstraction: the structure of a literary work will become tightly linked to its material foundation. As a consequence, philosophies of texts—both those explicitly associated with text encoding and those that we might classify as belonging to “theory” more generally—will similarly reestablish themselves in association with material culture. There will necessarily be some loss of clarity as abstraction is recontextualized in the complexity of specificity, but I would argue that this will leave humanities computing all the richer.

But the greatest change provided by this technological leap will lie, not in theory, but in practice. Instead of it being possible to search only the machine-readable transcription, new modes of search engine will interrogate the specific image files themselves. These modes will not simply search metadata (descriptive terms associated with a digital object) but rather allow access to the density of the image itself. We can imagine algorithms that, by analyzing the values of the pixels in the image, could search for a specific printer’s mark, or quality of paper, or particular shape or motif in an illustration. In other words, the next generation of digital resources could see all the book-centered skills and methods of the traditional bibliophile—those very practices that at times must have seemed distinctly at odds with humanities computing—returning to scholarship with redoubled force.

This move from abstracted data models to what we might call a digital materialism will not be confined to textual studies. Search engines will be designed to allow art historians to search for homologous qualities in images. Drawing from artificial intelligence, it will be possible to show a computer several images—say, of birds—and it could be trained to search databases of images at the level of their pixels for similar items.

Musicology will similarly be changed through this shift toward digital artifacts. The machine-readable version of music—MIDI—will be supplemented by real-time analysis of CD-quality audio files. As with art, the historian of music will be able to search databases of actual recordings and of facsimiles of scores for homologous instances of melodic and/or harmonic material.

So what will be the consequences for scholarship of this move toward the digital artifact? Most immediately, the rise that has been seen in comparative thematic studies will continue: there will be new histories to write, chronicling the morphology of a given theme (from music, images, or texts). Textual studies in particular will become concerned with the physical dimension of literature. Across the humanities, the return to the artifact will at various levels encourage a return to history, both at the level of theory and at the level of practice, a development that can only be welcomed.

 

 

 

Notes

1.      S. J. DeRose, D. G. Durand, E. Mylonas, and A. H. Renear, “What Is Text, Really?” Journal of Computing in Higher Education 1, no. 2 (1990): 3–26.

2.      Allen H. Renear, David G. Durand, and Elli Mylonas, “Refining Our Notion of What Text Really Is: The Problem of Overlapping Hierarchies,” in vol. 00 of Research in Humanities Computing: Selected Papers from the ALLC/ACH Conference, ed. Susan Hockey and Nancy Ide (Oxford: Clarendon; New York: Oxford University Press, 1996), 263–80, 268. This essay also provides a succinct exploration of various critiques of this position.

3.      D. F. McKenzie, Bibliography and the Sociology of Texts, Panizzi Lectures, 1985 (London: British Library, 1986), 4.