AYBABTU or Topic Modeling in the Humanities

Many thanks to Jennifer Guiliano, Travis Brown, and the rest of the team at MITH for putting together such a fantastic workshop. It was inspiring to find myself in the MITH mothership amidst people helping to bring DH closer to the center of our various disciplines – despite past and present nay sayers. A recent response to the latter here.

A shout out is also due to Jennifer Serventi and the National Endowment for the Humanities. Finally I would like to thank the Association of Research Libraries for generously supporting my attendance at the workshop. In an age of increased emphasis on the advantages of space collapsing technologies like geographically distributed learning environments, Skype, and Twitter, the value of meeting face to face still helps to make truly vital connections.

Like Scott Weingart (who blogs most excellently here), one of my chief regrets was not having enough time to talk to more people, though I did manage to squeeze in a moment to mumble how much I admire Natalia Cecire’s thoughts on DH, though in retrospect, thoughts on most things might have been more apt.

Some of the more salient points that I drew from the discussions throughout the day, online and offline, relate to issues around that recurring worry – “scientism invading the humanities”, documentation of workflows, documentation of tools,  how to work with computer scientists, and working with Librarians and Archivists.

(1) Using tools from the sciences does not equal a new rise of scientism in the humanities. 

Some have expressed concern that use of tools like LDA could give rise to a new scientism in the humanities. This is not a new worry, and it has been articulated before along slightly different planes. While I might use a tool like LDA I dont think I am doing work that is somehow more positivistic than any other approach I would take. Im just using one tool among many  – in my mind there is little substantive conflict in sitting LDA alongside World Systems Theory and Derrida.

(2) Seriously ya’ll we need to document our workflow.

Documentation, documentation, documentation! This is where we could learn a lot from our peers in the sciences. We need to learn how to better document our workflows. For humanists I would argue that the value of this effort lies not so much in enabling reproducibility of results as it is about laying bare the interpretative choices that are made in the course of a project. The workflow serves a pedagogical and critical role. Let students, aspiring humanists, and your peers critique your method at a granular level if they choose to do so. Its a learning experience for both parties, and it holds the potential to help drive the development of this field.

(3) We need better documentation for tools. We need some gateway GUIs.

It is amazing how many tools have been developed that can be cross-purposed for humanistic application. I am appreciative! That being said more than one member of the audience mentioned that better documentation for the tools would be a big help. Im not quite sure how this could be best remedied. It might make sense for computer science departments to recruit the humanist perspective early on, even if the tool is not intended for a humanist audience – in this case the humanist perspective might lend itself to developing a tool with even wider application.

And on the note of “gateway GUIs” I think it was Scott Weingart and later Ryan Cordell who pushed for more GUIs. This argument sparked some pedagogical issues, with David Mimno taking the stance that everyone should learn command line, code, etc. That makes sense, but if we want wider use of these tools it helps to have gateway GUIs to get humanists interested in taking the next step and dedicating time to yet another effort, in what is already a very packed schedule. In order to level up we would do well to lower initial buy in cost.

(4) How can Humanists better approach working with Computer Scientists? 

This question is intended to achieve two things: avoid a client-service model between humanists and computer scientists and discover what potential value or incentive working with a humanist holds for computer scientists. David Blei and Jordan Boyd-Grayber suggested that most computer scientists would be interested in working with humanists on the basis of  an unfamiliar set of data to work with. David said that much of the data that computer scientists tend to work with is old hat within the community so new sets of data have high probability of holding some allure. Seems like good advice,  but I can’t help but wonder what value humanism, broadly conceived, holds for computer scientists. It is evident that humanists need computer scientists to help come to productive terms with vast corpora of data that are literally impossible to analyze using what Ted Underwood referred to as that “wrinkled protein sponge” we have sitting between our ears. But do computer scientists need us in any way aside from channeling interesting data to them to work with? What else do we offer aside from data? 

(5) Thar’ be data wranglers in our midst!

Librarians and archivists are our friends. They’ve always been our friends. They can be our data friends as well. As someone who has a foot both in the humanities and library/archives-land, I can say that my purpose in pursuing the latter field of study  is to become as proficient as possible in realizing the potential of data for scholars working throughout the academy. I want to connect humanists with data, and work with them to do interesting  things. In the LIS field, I am not alone in this desire.

For many of the people at the topic modeling workshop emphasizing the importance of librarians and archivists to their data intensive efforts is likely unwarranted – Im looking at you Emory folks, MITH folks, and CHNM folks – but for the wider community I encourage you to think anew about how you can work with your colleagues in the library and archives to do something cool. I can say with high confidence that they’ll be eager to help/partner/contribute.

(6) Head for the beehive. 

John Unsworth once said, “If an electronic scholarly project can’t fail and doesn’t produce new ignorance, then it isn’t worth a damn.” Talk about a rallying cry – from 1997! Im a bit remiss that he no longer teaches at my institution. But I digress. The Topic Modeling workshop demonstrated to me, that the method has reached a significant amount of support and demonstrated application.  We should not feel that we need to hedge the use of LDA to peers that are critical of the method. Im confident that I can head toward the beehive, the areas where my analysis might cause considerable contention, and I know that in my defense I can point to a rich tradition of demonstrating the deep value that Topic Modeling can offer. In advance, Id like to thank the many topic modeling workshop attendees for building that tradition.

Further Readings:

Sarita Alami’s “On the Topic of Topic Modeling: NEH/MITH Workshop Wrap-up”

Collaborative Google Doc, Workshop Notes (Courtesy of Brian Croxall)

#dhtopic Storify 

2 comments

  1. I have checked your website and i’ve found some duplicate content, that’s why you
    don’t rank high in google’s search results, but there is a tool that can help you to create 100% unique
    content, search for; Boorfe’s tips unlimited content

Leave a comment

Your email address will not be published. Required fields are marked *