The following is an extract of a larger talk (slides) I recently gave at the University of Wisconsin Madison’s Digital Humanities Research Network. At some point Ill write out the full talk but recent provocations by folks I admire (e.g. James Baker, Purdom Lindblad, and Michelle Moravec) pushed me to put some of the words to page.
Over the past couple years I’ve focused a great deal of effort on communicating the concept of data in the Humanities. Attendant to efforts of this kind are common questions like why it matters. Why call a digital image data? What’s the point of thinking about texts as data? In a pragmatic sense, what does it do for me to call a thing by another name where its current label has held up pretty well over not just recent memory, but centuries really? The effort to see a digital object as data and to reckon with it as such is as much a functional consideration as it is an ethical imperative.
In plain terms a functional consideration takes into account how to leverage computation to work with data to explore a question. The ethical imperative refers to a process of translating your own disciplinary training on how to work with objects, to interpret their structure, their provenance, their authenticity, and the labor that has been invested into their manufacture and relative permanence to have equal purchase in a digital environment. Translating these imperatives leads to a critical data praxis. A critical data praxis leads to better questions.
So first, the functional angle. Working with a digital object as data is a functional consideration. It’s a functional consideration insofar as it moves you toward an engagement that is predicated on its actual materiality. Digital photos, videos, audio, and images are composed of vast numerical differences, that are recorded, stored, and made interpretable to a computer, its components, and at end of line – you.
Recognition of the nature of the recorded difference underlying the familiar, surface level representation you see on a computer screen is the first step toward a functional orientation that allows you to leverage difference manifest in the data computationally to extend research questions that you might typically accord to a paper book or a film. The primary difference at the level of the research question that a functional orientation engenders are the micro and macroscopic levels at which your questions can be extended in a digital environment.
In Morgenstern’s Spectacles or the Importance of Not-Reading, Martin Mueller wrote, “Every surrogate has its own query potential, which for some purposes may exceed that of the original.” I think that one of the most fruitful ways to explore that potential is to ruminate on the “affordances” that the data offer. In short, it is useful from a functional perspective to ask the question, “what characteristics of these data define their possible use?”
Identification and exploration of these characteristics can be guided in a structured way by considering Lev Manovich’s five principles of new media, which are described at length in The Language of New Media. Typically I focus on three of the principles (1) Numerical Representation (2) Modularity (3) Automation. Madeleine Sorapure provides a super helpful breakdown of each.
With a functional orientation achieved, we come to the ethical imperative. Seeing and working with an object as data requires figuring out how to extend the ethics that many of you learn as you train to become Humanists in your respective fields. At the base of many Humanities disciplines, the thing we share in common are the sensibilities that are imparted to us as we consider how to work with the material evidence of Human activity, interest, and concern. Speaking from my own disciplinary training, I know that Historical inquiry hinges upon accessing, evaluating, interpreting, documenting, and developing arguments predicated upon primary sources.
Drawing from a common textbook used to help acclimate undergraduate students to the discipline of History, A Pocket Guide to Writing in History, we see that Historians-to-be are instructed to consider who the author of a primary source is, why they created the source, to try and evaluate the intended audience, to suss out unspoken assumptions, to try and identify bias, and to think about its likely historical reception.
When we combine the notion of the paratext common to literary interpretation we round out an approach to studying the past that focuses on the message as well as the medium, or in other words the express as well as latent information present in any one given object.
These considerations are marshalled in support of the development of arguments which are in turn contextualized by comparing and contrasting prior arguments evidenced in secondary sources. Common conventions for working with objects allow arguments to be proposed based on their interpretation and accordingly for those arguments to be tested by peers in that disciplinary community. In the end, existing praxis is based on notions of integrity that make clear the intent of an argument. There is certainly precedent in the Humanities that align well with issues discussed in more traditionally data intensive fields like reuse, reproducibility, transparency, and so on.
The challenge we face as Humanists when working computationally with data is to adapt our praxis so that it is equally suited to evaluating primary data as it is to evidencing our work with that data so that it can be evaluated by intended audiences at the minimum, the most heterogeneous audiences at the maximum. Its not as difficult a stretch as you might imagine, and despite some characterizations of anachronistic Humanistic practice, we have the experience and sensibilities to make this happen relatively easily. We just require a ramp to new vocabulary and an awareness of corollary structures and labor involved in the organization and production of materials that form the axes on which our research spins.
We know how to work critically with physical objects. If we work with digital objects, it is an ethical imperative to extend those considerations to ensure that the research claims are developed, extended, and evaluated properly. Furthermore this effort extends our ability to acknowledge the labor invested in the creation of the data that we use and the potential impact that the forces driving that investment have on the repository of data that is available to us.