Skip to main content
Link to Libraries homepage
Link to Libraries homepage
Rutgers University Libraries

Graduate Specialist Program (New Brunswick Libraries): Text Analysis in R

Home page for the New Brunswick Libraries' Graduate Specialist Program

R Setup

New to R?

First you'll need to download R itself for whichever operating system you're using. Second, you'll want to download RStudio, a program that makes working in R much easier by adding a text editor for writing or loading R code and a workspace for easy viewing of data in memory. While you're there, check out the basic overviews for new users.

If you want a fast, easy way to give R a try, you can use the RStudio Cloud. This web browser version of RStudio looks and functions just like the desktop version, and it will save data between sessions as well. All it takes is a Google or GitHub account to log on.

Text Analysis of Historical Newspapers

To download the materials used in this two-part workshop on text analysis of historical newspapers in the Chronicling America database, head to the project repository on GitHub and click the green "Clone or download" button.

The first workshop introduces strategies for fuzzy string matching, using OCR-derived text from the Perth Amboy Evening News. The second workshop begins with the result of the previous and explores of a few possible methods for analyzing phrase use over time, page location, collocate words, and uniqueness. Use the PDF files to follow along and load the "for users" .Rmd files into RStudio in order to execute the code yourself; see the Readme file at the bottom of the page for more details.

Additional Resources

Looking to start using R for quantitative Digital Humanities work? These are just a few resources that I've found particularly useful for beginners.

Text Analysis

There are a number of great resources online for learning R from square one with a focus on analyzing texts, from social media posts to historical documents to novels.

Julia Silge and David Robinson's Text Mining with R is a practical book freely available online begins with the basics of working with character strings and eventually introduces several more advanced approaches for working with text.

Matthew Jockers' workshop using Moby-Dick makes for a nice introduction to working quantitatively with literary texts.

There is always more than one way to do anything in R; whether you're trying to remember which functions do what or you just want to learn other available functions, RStudio's R Cheat Sheets are excellent and highly-accessible resources. The stringr package cheat sheet is a particularly helpful guide (or reminder!) for basic text analysis.

Data Organization / Manipulation

A lot of DH work isn't so much text analysis as it is quantifying humanistic data, from metadata on publishing to survey results. Whether working with a small dataset or a large one, it's a safe bet you'll need to organize and manipulate the structure of your data in a tidy, consistent way; as such, data organization and manipulation is as good a place to begin as any.

Garret Grolemund and Hadley Wickham's R for Data Science is a great practical book freely available online that covers all the basics of using R packages in the tidyverse for analyzing structured data (as well as introducing visualization).

The cheat sheet for the dplyr package is nice to keep handy as well.

The materials for my "Data 101" workshop, part of the Fall 2018 workshop series, offer a hands-on introduction that borrows from both of the above.

Finally, if you get stuck while getting started don't hesitate to run a search on stackoverflow.com. Odds are that someone has had the same issue before and that someone else has given an at least semi-helpful answer.

Digital Humanities Graduate Specialist

Rutgers, The State University of New Jersey, an equal access/equal opportunity institution. Individuals with disabilities are encouraged to direct suggestions, comments, or complaints concerning any accessibility issues with Rutgers web sites to: accessibility@rutgers.edu or complete the Report Accessibility Barrier / Provide Feedback Form.