IT News & Events

News about IT at Indiana University and the world


Teaching Text Analysis with R

Developing an open instructional workflow - in R - for text analysis.

The Cyberinfrastructure for Digital Humanities (CyberDH) group has developed an open instructional workflow – in R – for text analysis that aims to build algorithmic understanding and basic coding skills before scaling up analyses. The aim is to provide code templates that are open, repeatable, and sustainable and can be adapted, remixed, and scaled to fit a wide range of text analysis tasks. To this end, we have created a three-step process of introducing R: web-deployed Shiny apps, highly marked up RNotebooks, and lightly commented RScripts, both in “regular” and higher performance versions. All are available for download on Github (with associated sample data from Shakespeare and Twitter).

For scholars and students doing original work in this area, an understanding of fundamentals of the coding behind text analysis is necessary for them to be full participants in the research and to be able to question results adequately. They may also engage in a self-directed interactive dialog with their analysis, rather than following a fixed workflow prescribed by GUI-driven analysis tools. For example, when using these tools (the Shiny app), a user can see that there are 70 instances of the word ‘father’ in Hamlet. King Lear only boasts five more at 75. Thus, the spectral father in Hamlet, who dies before the play begins, is almost as present as the physical father (Lear himself) whose entire dilemma revolves around how to best pass on his role as king/father to an entire kingdom. Via the RNotebooks and code remixing, the user is then able to look at the role of fatherhood across the Shakespearean dramatic corpus.

Caption:  Shiny web app of frequent terms in eight popular Shakespeare plays. This simple frequency count reveals that the spectral father in Hamlet is almost as present as the physical father who dominates King Lear.