HathiTrust Digital Library is the product of the HathiTrust, a partnership of major research institutions and libraries working together to preserve our cultural record of print materials. As of January 2013, the digital library comprises over 10 million volumes, over 3.2 million of which are public domain, and includes almost half the print holdings at SU Libraries. HathiTrust provides its members with full-text searching across the entire repository, full-text PDF downloads for items in public domain or not otherwise under copyright, and full-text access to brittle out-of-print items in SU Libraries. Researchers can conduct computational analysis of works in the HathiTrust Digital Library through HathiTrust Research Center (HTRC), which contains a suite of tools and services for text-based, data-driven research, such as HTRC Algorithms and Data Capsule.
A multi-year global digitization and publishing program focusing on primary source collections of the long nineteenth century. Researchers can check the number of document relevant to key terms over a specific period of time through the Term Frequency function.
The Europeana Newspapers project has converted 10 million historic newspaper pages to full text for Europeana. It has also developed a number of open source software tools, such as Named Entity Recognition Tool for Europeana Newspapers.
The Library of Congress Lab provides a list of APIs, bulk downloads, and tutorials for researchers to explore the machine-readable access to its digital collections.
The Digital Public Library of America (DPLA) aims at providing public access to digital holdings within America’s libraries, archives, museums, and other cultural heritage institutions. DPLA offers public API and Bulk Download that grant access to all of DPLA’s records under a permissive license.
NYT offers ten APIs to facilitate a wide range of uses, from custom link lists to complex visualizations.
An open access linguistic corpus consisting of 15 million words of American English automatically annotated for logical structure, word and sentence boundaries, part of speech (multiple tag sets), shallow parse (noun and verb chunks), and named entities.
Reddit provides API to access data from its posts, threads, comments, users and more. Historic Reddit data can be downloaded from this website.
An organization & set of tools and materials and media for chronicling historically significant events via social media.
A suite of easy-to-use web tools for beginners that introduce concepts of working with data.
Simple text analysis.
A web-based data visualization platform for creating thematic maps and reports with demographic and socio-economic data of the United States.
A free software that can allow anyone to connect to a spreadsheet or file and create interactive data visualizations for the web
An online text analysis platform which provides services such as Text Similarity Analysis, Entity Extraction, and Text Classification.
Novice-friendly, peer-reviewed tutorials that help humanists learn a wide range of digital tools, techniques, and workflows to facilitate research and teaching.
Introduction to doing sentiment analysis, word and document frequency analysis, and topic modeling with the tidytext R package.
Tutorials designed for users with some familiarity with R, but require no knowledge of spatial analysis.
Tutorial materials for managing GIS data with Python.
Teaching command line,html, and python basics for humanists.