Text, Music, and Image as
Digital Artifact
Lawrence Woof
© Nineteenth Century Studies. From Volume
15 (2001): 131–33.
The future, we are from time to time
assured, is digital. In this, my first contribution as electronic resources
review editor, I want to consider what this could mean in practice for
scholars, by standing back and taking an overview of the broad directions in
which scholarly electronic resources are moving, using the trajectory of the
last ten years to speculate about the next.
It was around a
dozen years ago that it became clear that the functionality of computers could
bring a new clarity to the humanities in general and to textual studies in
particular. In part, this was simply because texts became searchable, but it
was also because of the very real benefits for scholarship provided by
text-encoding conventions, principally SGML (and currently XML). Text
encoding is the procedure whereby descriptive tags—invisible to the user
but not to the computer—are placed within an electronic (i.e.,
machine-readable) document. The purpose of these tags is to describe and locate
aspects of the text. The process of encoding a text makes explicit that text’s
internal hierarchy, whereby, for instance, a novel might be thought of as a
sequence of nested, hierarchical structures, such as a number of chapters, each
containing one or more paragraphs, each in turn containing one or more
sentences, and so on. This model of text does not necessarily represent a
fundamental truth about the way in which texts work; however, it is highly
attractive and useful in computing, mainly because, by referencing the
hierarchical structures, the model allows texts to be searched in
sophisticated, context-sensitive ways.
Inevitably, this
model became formalized. By the late 1980s, the Text Encoding Initiative (TEI)
had been formed and was beginning the fundamental task of determining a set of
standards for the encoding of digitized documents. (This standardization would,
e.g., allow all similarly encoded documents to be interrogated using a single
complex search.) In parallel with the TEI movement, there developed an
accompanying philosophy of text that consolidated this sense of clarity and
control that was being achieved through text encoding. This philosophy held
that a text in some sense was its structure, defining text as “an
ordered hierarchy of content objects.” 1 The centrality of the hierarchical structure to the text-encoding
project in general became such that a text came to be thought of as predicated
on this abstract structure. This position was built on the following, slightly
uneasy proposition: If two people read the same novel in different editions—for
instance, one version printed in twelve-point Times Roman, the other in ten-point
Palatino—those people are nevertheless reading the same text. Allen Renear,
David Durand, and Elli Mylonas usefully summarize this approach: “X and y are
the same text if and only if they are the same ordered hierarchy of content
objects. Therefore texts are ordered hierarchies of content objects.”2 What this
Platonic abstraction necessarily ignores, of course, is the specificity
of a text. Scholars have moved away from the agenda of New Criticism to study, in
Don McKenzie’s phrase, “texts as recorded forms, and the processes of their
transmission, including their production and reception”—that is, texts as
historical documents, produced and read under specific circumstances.3
This omission of
the concrete does not stem from a need to reengage with the text-centered
critical traditions associated, for example, with I. A. Richards. Rather, the
decision to view (and theorize) text as structure has essentially been a
pragmatic one, dictated by the limitations of the computer as a medium during
the 1990s. Indeed, it is arguably only as these limitations start to recede
that we become fully able to recognize their shaping influence on our
conceptual framework.
Until very
recently, computers have been able to process text efficiently only in its
machine-readable form. The information that would have indicated most clearly
the specificity of the text (usually a high-resolution image of the page of
text in question) has been beyond the capacities of the available technology.
To have done anything other than display machine-readable text would have
required far too much from the available bandwidths and hard drives (not to
mention too much processor time). However, the steady growth in processor power
and bandwidth over the past decade has recently reached a critical threshold,
and a new approach to digital resources is now possible. The scholar of the
coming decade will be examining as a matter of course images of books,
not just machine-readable transcriptions. This will mark a fundamental shift, a
move away from digital abstractions and toward digital artifacts. In
nineteenth-century terminology, we might think of this as a move away from a
classical aesthetic toward a gothic one.
This change in
how digital texts are presented to us will not lead automatically, I think, to
the downgrading of the current model based on the ordered hierarchy of content
objects. Rather, the change will allow the sense of the uniqueness and
irreducibility of artifacts to be integrated into this abstraction: the
structure of a literary work will become tightly linked to its material
foundation. As a consequence, philosophies of texts—both those explicitly
associated with text encoding and those that we might classify as belonging to
“theory” more generally—will similarly reestablish themselves in association
with material culture. There will necessarily be some loss of clarity as
abstraction is recontextualized in the complexity of specificity, but I would
argue that this will leave humanities computing all the richer.
But the greatest
change provided by this technological leap will lie, not in theory, but in
practice. Instead of it being possible to search only the machine-readable
transcription, new modes of search engine will interrogate the specific image
files themselves. These modes will not simply search metadata (descriptive
terms associated with a digital object) but rather allow access to the density
of the image itself. We can imagine algorithms that, by analyzing the values of
the pixels in the image, could search for a specific printer’s mark, or quality
of paper, or particular shape or motif in an illustration. In other words, the
next generation of digital resources could see all the book-centered skills and
methods of the traditional bibliophile—those very practices that at times must
have seemed distinctly at odds with humanities computing—returning to
scholarship with redoubled force.
This move from
abstracted data models to what we might call a digital materialism
will not be confined to textual studies. Search engines will be designed to
allow art historians to search for homologous qualities in images. Drawing from
artificial intelligence, it will be possible to show a computer several
images—say, of birds—and it could be trained to search databases of images at
the level of their pixels for similar items.
Musicology will
similarly be changed through this shift toward digital artifacts. The
machine-readable version of music—MIDI—will be supplemented by real-time
analysis of CD-quality audio files. As with art, the historian of music will be
able to search databases of actual recordings and of facsimiles of scores for
homologous instances of melodic and/or harmonic material.
So what will be
the consequences for scholarship of this move toward the digital artifact? Most
immediately, the rise that has been seen in comparative thematic studies will
continue: there will be new histories to write, chronicling the morphology of a
given theme (from music, images, or texts). Textual studies in particular will
become concerned with the physical dimension of literature. Across the
humanities, the return to the artifact will at various levels encourage a
return to history, both at the level of theory and at the level of practice, a
development that can only be welcomed.
1. S. J. DeRose, D. G. Durand, E. Mylonas, and A. H.
Renear, “What Is Text, Really?” Journal of Computing in Higher Education
1, no. 2 (1990): 3–26.
2. Allen H. Renear, David G. Durand, and Elli
Mylonas, “Refining Our Notion of What Text Really Is: The Problem of
Overlapping Hierarchies,” in vol. 00 of Research in Humanities Computing:
Selected Papers from the ALLC/ACH Conference, ed. Susan Hockey and Nancy
Ide (Oxford: Clarendon; New York: Oxford University Press, 1996), 263–80, 268.
This essay also provides a succinct exploration of various critiques of this
position.
3. D. F. McKenzie, Bibliography and the Sociology
of Texts, Panizzi Lectures, 1985 (London: British Library, 1986), 4.