Skip to Main Content

Digital Humanities

A resource guide for learning about and starting projects in the Digital Humanities

Libraries Licensed Text and Data Mining (TDM) Sources

Gale Digital Scholar Lab

Link: https://www.libraries.rutgers.edu/databases/gale-digital-scholar-lab

In the Gale Digital Scholar Lab, Rutgers affiliates may access and analyze data from the following Gale Sources:

  • Archives Unbound: Federal Response to Radicalism in the 1960s
  • Archives Unbound: Federal Surveillance of African Americans, 1920-1984
  • British Library Newspapers
  • Eighteenth Century Collections Online
  • Nineteenth Century Collections Online
  • Nineteenth Century U.S. Newspapers
  • Refugees, Relief, and Resettlement
  • Seventeenth and Eighteenth Century Burney Newspapers Collection
  • The Economist Historical Archive
  • The Making of Modern Law
  • The Times Digital Archive
  • U.S. Declassified Documents Online

While Rutgers affiliates are restricted to the sources above, Gale does offer ready-made datasets from partner institutions that may include non-RU corpora (see Learning Center > Getting Started > Datasets).

Build content sets in increments of 10,000 documents. Download in increments of 5,000 documents. Import from other sources is possible, either one document at a time or via bulk uploads of data and metadata as CSV files.

The DSL is extremely well documented. Visit https://gale.libguides.com/DSLabSupport/ for tutorials, instructional support, the research showcase, an "Introduction to Digital Humanities" course, and case studies.

ProQuest TDM Studio

Link: https://tdmstudio.proquest.com/

Rutgers affiliates have access to ProQuest’s TDM Studio, a web-based portal for text and data mining research using any of several licensed ProQuest subscriptions encompassing current and historical newspapers, dissertations and theses, scholarly journals, and primary sources. TDM Studio offers two ways of accessing the data:

  • The Visualization Dashboard, appropriate for newcomers to text and data mining, allows researchers to build a corpus of relevant ProQuest documents and provides them with three options to visualize the datasets: maps, topic modeling, and sentiment analysis.
  • Workbench assumes familiarity with programming in either Python or R. As with Visualization, Workbench allows researchers to build a corpus of relevant ProQuest documents, then they are provided with a Jupyter Notebook environment where they can pursue nearly any mode of TDM analysis in either programming language. The Workbench also contains several scripts for getting started.

Create an account at https://tdmstudio.proquest.com/createaccount. Be sure to use your Rutgers email address in the account request.

TDM Studio is well documented. See their FAQs and Quick Start Guides for more information.

Additional (Libraries) Sources

A diverse range of data sources, some licensed, some with Rutgers ties, others just open and interesting, may be found at this site: https://rutgersdh.github.io/dh-sources/.

Cross-Disciplinary Datasets

Caselaw Access Project (CAP)

The Caselaw Access Project, makes 360 years of case law freely available online, digitized from the collections of the Harvard Law School Library.

Options for data access include:

If you're not sure how to utilize the data here, the CAP has a great gallery of sample projects from which to draw ideas.

Humanities Datasets: THATCamp ACRL 2013

The set of resources featured here was crowdsourced during THATCamp ACRL2013. Contributors include Amanda Rust, Keith Stranger, Steve Stone, and other participants of THATCamp ACRL 2013. The list below represents updates and edits by Krista White; edited in August 2025 by Francesca Giannetti to prune dead links.

  • Cultural Data Project: Users must submit an application to CDP in order to receive datasets. Datasets must be destroyed three months after their use. For more information, see the Dataset Access page.
  • Association of Religion Data Archives (ARDA): Surveys, polls and other data about religion from around the world. All data are submitted by researchers. Data is heavily weighted toward Christian religion in the U.S., with some international data.
  • National Archive of Data on Arts and Culture: NADAC is a repository that facilitates research on arts and culture by acquiring data, particularly those funded by federal agencies and other organizations, and sharing those data with researchers, policymakers, people in the arts and culture field, and the general public.
  • 20th-Century American Bestsellers: Work done by John Unsworth and students of UVA and the University of Illinois Urbana-Champaign.
  • Dr. Who Villains and Monsters since 1963: Requires the creation of a Tableau Public account to work with the file in its proprietary format. May or may not be exportable to .CSV or other formats.
  • Theatrical Lighting Database (Archived): The New York Public Library does it again with a collection of plots, focus charts, cue sheets and more from four landmark Broadway productions digitized from their collections.
  • Old Time Radio Network: More than 12,000 old time radio shows available for listening. Metadata include show name, episode name, air date and episode length.
  • Open GLAM Datasets: Datasets from GLAM (Galleries, Libraries, Archives and Museums) institutions that are open for (re-)use.
  • AusStage: AusStage, led by the Drama Department at Flinders University, works with researchers to map patterns of live performance in order to explore networks of artistic collaboration in the creative and performing arts to expose opportunities for audience research and development.
  • UNESCO Institute for Statistics: The United Nations Educational, Scientific and Cultural Organization (UNESCO) has a series of datasets available about a variety of topics in the arts and humanities.