Infotecarios Interview: Thoughts on DH and Librarianship

interview cross posted at Infotecarios in Spanish
interview posted in English here on request

Many thanks to Natalie Baur for the request to talk about DH and Librarianship for Infotecarios. Hoping that some of what is said will be useful to folks seeking to get more involved in DH and librarianship more generally!

Tell us a little bit about your academic background, your experience, and your professional interests that have brought you to being a Digital Humanities Librarian at MSU.

My path toward librarianship began while pursuing a Masters degree in History at San Francisco State University. Leading up to that point I had wandered a bit, working in sales for a software company and as an English teacher in two different countries. When I decided to attend graduate school for my History degree my intention was to earn a PhD (which I may still do someday), but I had always been curious about working in archives or libraries. It was during this time that I became aware of the Hispanic Association of Colleges and Universities (HACU) National Internship Program. This is an organization that provides financial as well as administrative support in connecting hispanic students with concrete work experience at Federal agencies throughout the United States. Through this organization I secured internships at the National Archives (NARA) and the Library of Congress.

These experiences were formative in the sense that they provided the means to get a foot onto the path of librarianship and they both had core digital components – digitization in the case of NARA and digital pedagogy as well as digital preservation at the Library of Congress. I was lucky to be able to convert my Library of Congress internship into a full time job. I completed my graduate degree in History, worked at the Library of Congress for a time, and then made a decision to earn a graduate degree in Library and Information Science.

Having had such a great experience with HACU I looked for similar diversity oriented support systems to help develop my skills as a librarian. This led me to the American Library Association Spectrum program, as well as the Association for Research Libraries Initiative to Recruit a Diverse Workforce. I was lucky to apply to and be selected by both programs. Going into library school with membership in both programs provided financial support, but perhaps most importantly it tied me into a high achieving group of diverse individuals. This social aspect provided an invaluable network as I began my studies at what turned out to be a wonderful program at the University of Illinois at Urbana Champaign. If I had any advice to give about library school I would stress the value of work experience. It is essential that you earn it, ideally in a setting that you would like to work in post graduate school. I would also stress thinking creatively about ways that coursework can be bent toward creating products that can have a direct impact on those jobs, the job you would like to have, and if the prior arent possible at the very least an outcome as a piece of research that you can share with your intended professional community.

I am nearing the completion of my first year as Digital Humanities Librarian at Michigan State University. My professional interests in History, digital humanities, and data curation have so far served me well. Inside the library I have been working on finding data that resides in our collections, often the product of digitization initiatives, and repackaging and making it available in such a way that it is more easily useable in digital humanities research. I do a lot of digital humanities and data curation teaching and consultation for both faculty and graduate students in collaboration with campus partners inside and outside of the library. The same goes for supporting community-building events like brown bags and THATCamps. On the research side, the first year has focused on digital humanities pedagogy, humanities data, and literary analysis at scale.

 As a librarian, how would you define Digital Humanities? How do you see it developing in the US (and beyond)?

I define Digital Humanities as an interdisciplinary approach to utilizing methods and tools, often computational in nature, to formalize, extend, and refine (inter)disciplinary questions. As with anything interdisciplinary I believe that use of the methods and tools common to this type of inquiry must be predicated on responsible use (e.x. engagement with literature that gave birth to methods enacted through computational tools). As a pragmatic concern I see the digital aspect of Humanities research growing, if only because the material of cultural production is increasingly encoded digitally. Consider the example of a Historian seeking to understand the early 21st century 30 years from now. How will they work with a resource like a web archive without fairly robust skill in working with digital content?

What role do you see GLAMS having in DH work? And librarians and archivists?

As GLAMs have always done, they have a key role in providing materials that humanists can use in their research – in the case of digital humanities – data. The experience of providing this data in usable form – data cleaning and preparation to serve a particular end – situates librarians and archivists as prime educators for this invaluable step in the research process. Digital Humanists may not always work with GLAM data, but 9/10 times, they will need to clean and prepare their data. Librarians and archivists can and should also serve a primary role in communicating core principles of data curation. A data curation perspective is necessary to ensure that data underlying a given project remains accessible and usable. Furthermore this perspective lends itself well to advising on research project documentation. Without this added layer of documentation it becomes difficult to manage a project as well and to communicate it in such a way that it can be peer reviewed effectively.

What advice would you give to an aspiring DH Librarian? What tools, strategies and concepts will they need to bring to the table?

Be honest. Be humble. Be brave. Remain curious. There is too much to know and you will never know it all. Full stop. If you cultivate the prior four sensibilities you’ll insulate yourself well. A big part of doing this DH thing is learning how to be comfortably vulnerable about things you dont know, confident in the things you do know, and cognizant of the types of challenges that are best met by bringing a group of people together to tackle. The grand challenges! Exciting, no?

Lastly, share a little bit about your typical day as DH Librarian. What are the highlights and challenges to what you do?

Always something to do! I’ll just run through today. This morning I met with a research group in Linguistics to advise on research data management. I will likely spend a couple of hours following up on things that came up during that meeting. I’m particularly excited about trying to figure out how to archive data from a mobile app. Following that Ill put some work in on a digital humanities needs assessment, evaluate whether I have a good chance at a grant that’s been on my mind, work on planning digital humanities events for the next semester, and probably mess around cleaning, preparing, and visualizing data that I plan to promote next semester.

AHA 2015: Data for Historical Research

At Kalani Craig’s invitation I had the great pleasure of joining a rockstar cast of instructors for the AHA Getting Started in Digital History workshop series. During my workshop, “Data for Historical Research” (slides below), I cast a pretty wide net.

The general purpose was to:

  1. recast common historical objects of inquiry (audio, text, video, images) as data
  2. define data
  3. highlight the affordances that data offer for extending a research question
  4. talk about “object essentialism” and what it might cause us to miss
  5. discuss structured vs. unstructured data
  6. speak truth about work that goes into cleaning and preparing data
  7. discuss use cases
  8. talk about relationship between disciplinary knowledge and technical knowledge
  9. give resources galore


As always one of the best parts of teaching are the questions you get during and after.
Going to gloss a couple below.

  • Doesn’t there seem to be continuity between data prep and cleaning in digital scholarship and how Historians have long gathered resources, organized them, and utilized them for analog research?

I would say enthusiastically yes! A great deal of continuity. Whether explicit or implicit, disciplinary training imparts a high degree of technical fluency for working with a wide array of different sources, organizing them, and making sense of them. If I were to highlight one important distinction I would say that the work of cleaning and prepping data for a digital project is predicated on different data formatting requirements depending on the methods and tools you would want to apply to them. So if you wanted to do a bit of topic modeling as well as some network analysis on a set of letters, chances are you are going to need to do different prep work for each. Knowing what this prep work will entail in advance of gathering your data is essential. Best to save as much sanity as possible.

On the note of preserving sanity. When embarking on the data prep phase of a digital project it’s best to take small steps. For example if you want to map a correspondence network of 19th century writers, try to prep the data for as small a set of letters that represent the full range of features you want to capture as possible. Say, 10 letters. Map them.

Did it work? Yes? Awesome.

Didn’t work? Sads.

Honestly better to realize that your formatting is not working with 10 letters rather than get to the end of a  line of 400 letters and try to reverse engineer where you went wrong and why the dang method and tool aren’t working for you. Also, personal plug – talk to a librarian they are always happy to help.

  • How do you work with people who don’t think of their work as digital in nature?

This happens a lot. Whether or not someone thinks of their work as digital, in the best cases, they are generally curious enough to seek me out and ask about x thing that they heard about, or to request my definition of DH. Generally I find the best way to make this interaction beneficial is to ask a lot of questions geared toward what this person is currently researching and what types of research materials they work with. Once you have the research questions in hand it makes it easier to frame utilization of various digital approaches and the methods they encapsulate in a manner that makes sense – optimally this familiarizes an approach like network analysis in such a way that it can be more readily seen as a way of extending a research question rather than as some sparkly spaghetti visualization thing.

Word to the wise, if you are seeking to learn more about DH I’d advise that you lead with your research questions. This will lead to a more productive exchange for all parties!


Exploring Community Cookbooks

cross posted at MSU Libraries DSC Sandbox

Digital Humanists inside and outside of libraries make use of cultural heritage institution collection metadata in digital projects. Examples of use can be seen in the classroom, in research that seeks to gain insight into scholarly production in the field of History, and in library attempts to gain insight into aspects of their collections like manuscript provenance and the interdisciplinary character of monograph holdings. At MSU Libraries we are moving toward supporting this type of inquiry by preparing and making available data that correspond to the library catalog as a whole and subsets of that data that correspond to holdings in Special Collections. It is worth noting at the outset that the work of preparing this data is made possible by a wide cast of key players across the library that contribute programming, cataloging, metadata, and subject area expertise.

The rationale for creating subsets of the larger dataset is that metadata corresponding to a Special Collection, what we might call a ‘collection of distinction’ around these parts, provides information about something that is representative of a unique body of materials. As representative of a unique body of materials, the reasoning follows that insight derived from the metadata at scale can in turn be used to support and even extend a wide range of research questions that might be asked from these materials on an item by item basis. In effect, it allows application of a macroscopic lens as well as microscopic lens to library collections.

During a recent meeting Autumn Faulkner brought the Community Cookbook collection to my attention and it seemed like a good candidate for initial exploration.

Some basic information about the collection records:

  • Overall No. Records = 4366
  • Records with Publisher = 3078
  • Records with Country = 4366
  • Records with City + State/Country = 3121
  • Records with State alone = 1245
  • Records with Publication Date = 2770

After getting the catalog records (thanks Autumn Faulkner), I used a bit of Python (thanksDevin Higgins for the help there) to extract a number of pieces of data from the records.


Since then I’ve embarked on use of a whole lot of OpenRefine and Excel for data cleaning and normalization. This basically entails wrangling data into a consistent format. For example, geographic information representation in catalog records is highly variable – a lot of Lans. mi, Lansing Michigan, Lansing, Michig., Lansing Mich., etc., rather than dedication to Lansing, MI. Across thousands of records, a not insignificant amount of love and care goes into getting data like this into such a state that it is amenable to allowing exploration of say, a question that is predicated on being able to map a special collection across time and space using a tool like Palladio.

Community Cookbook data cleaning and normalization is ongoing. Initial steps have been made with publication data as well as publication locations in the sense that they have been normalized and they have been augmented with latitude and longitude data via geocoder.


Community Cookbooks, mapped across time and space


Distribution of Michigan Community Cookbooks


Distribution of Kansas Cookbooks

A fair amount of work remains to be done but the hope is that initial results spark some interest!