Skip to Main Content
Syracuse University Libraries

ProQuest TDM Studio Guide

Introducing ProQuest TDM Studio at Syracuse University

Syracuse University Libraries is among the first institutions in the U.S. to subscribe to ProQuest’s new text and data mining platform, known as ProQuest TDM Studio. This tool allows research teams at Syracuse University the ability to mine large volumes of published content from the millions of pages of news, scholarly and other publications provided to the SU campus community through current subscriptions to contemporary and historical ProQuest databases. Tap into resources such as the American Periodicals Series, US Newsstream’s coverage of millions of newspaper articles, the vast corpus of fiction and other book length texts within Early English Books Online, or the many decades of alternative press journalism within Alt-Press Watch and other ProQuest resources spanning multiple centuries of publications. TDM Studio workbenches allow Syracuse University researchers the capacity within ProQuest’s text and data mining environment to apply programming languages like R or Python to execute queries, develop datasets, and extract and analyze the text of publications central to their research. ProQuest TDM Studio is also pre-equipped with helpful programming libraries and scripts for getting started, and users can import or write their own scripts and packages to extend the capabilities of the platform. A separate TDM Studio data visualization dashboard supports geographic analysis, topic modeling capability, and sentiment analysis across a set of major U.S. and international newspaper titles.  Syracuse University Libraries is delighted to offer the Syracuse University community early access to this valuable resource as a tool to assist researchers in applying digital scholarship techniques to the wealth of licensed content available through numerous current ProQuest subscriptions. This is a powerful tool for analyzing recent and more deeply historical scholarly publications, primary source texts in the humanities, business, public policy, public health and other scientific literature, as well as extensive recent and older U.S. and international journalism.

For more information or general assistance, contact SU Libraries data services.

To access this text and data mining tool, complete the SU Libraries' ProQuest TDM Studio proposal form.

Please also see detailed User Information on this guide.

User Access and Dataset Limits

A ProQuest TDM Studio workbench is a text and data mining resource available to individuals or research teams in the Syracuse University community for either short or long term use with these user limits:

  • One (1) research team at a time, consisting of a maximum of TDM Studio accounts available for five (5) research team members 
  • A limit of ten (10) total simultaneous datasets created and saved by a research team within the TDM Studio platform at any one time
  • The volume of each dataset should be limited to half a million sources (item records) or fewer when possible
  • A maximum limit of 15mb per week is available for export outside of the TDM Studio environment. Please note that the corpus of text remains in the TDM Studio. Those datasets cannot be exported, only programs and resulting secondary analysis.
  • Each research team using TDM Studio is limited to mining content from sources aggregated within ProQuest databases to which SU Libraries has current licensing agreements in place with ProQuest.
  • At least one team member across a maximum of five receiving TDM Studio access must be a current Syracuse University NETID holding faculty member, enrolled student or currently employed SU staff person.
    • One SU-affiliated team member person will be designated as the team lead and will handle all communication between the team and SU Libraries.
    • The lead research team member coordinating a project use of TDM Studio must be a current NETID holding Syracuse University faculty member, student or staff person.
    • A TDM Studio research team can include researchers who are not Syracuse University NETID holders if they are working on a TDM Studio team with one or more SU-affiliated researchers.
  • Externally or internally generated research funding being connected to a project is not a prerequisite for use of the TDM Studio tool at Syracuse University.

Login information: ProQuest TDM Studio access is passworded. The login will not be through SU’s authentication via NetID and password. Access will be set up by ProQuest, and TDM Studio team members will be provided with login information. ProQuest TDM Studio access and passwords are set up for individual users who comprise a research team, ranging for one to five persons, and accounts/passwords may not be shared beyond those persons.

TDM Studio Visualization Dashboard:  The visualization dashboard supporting geographic analysis, topic modeling and sentiment analysis across a defined set of major newspaper titles is available 24/7 to all NETID holding Syracuse University faculty, students, or staff.  This is a graphical user interface driven tool and does not require knowledge of R or Python.

User information: Sections of SU Libraries’ Access to Licensed Web Resources Policy relevant to use of licensed resources apply.

Access Time Periods (7 day & 30 day)

Short Term Use
7-days: Individuals and teams may reserve a ProQuest TDM Studio workbench for a seven-day period, which is intended to either trial the platform and the appropriateness of its features for a possible project, or for direct use of TDM Studio not anticipated to require more than a seven-day project completion time frame. Short term use may be renewed for an additional seven days, unless another project team has booked the tool for that time period. Individuals or project teams are welcome to renew a seven-day booking once, but are strongly encouraged to make a 30-day project use booking if they have determined extended access best fits their project goals.

Extended Term Use
30 days: Individuals and teams with an established project may reserve a ProQuest TDM Studio workbench for periods of up to 30 days, with the possibility of renewals in seven-day increments or more by request should more time be needed and should there be no other requests by other individuals or groups for project time.

Request Access for a Text and Data Mining Project

Individuals or teams wishing to use ProQuest TDM Studio should submit a ProQuest TDM Studio proposal to SU Libraries that includes basic information about their project and how they intend to use ProQuest TDM Studio. Literacy in writing either Python or R is needed by at least one member of a team that will be assigned an account, called a "workbench."

Planning: Because access is shared Syracuse University-wide and does not support multiple simultaneous team access please plan ahead when deciding when you would like to book access to ProQuest TDM Studio, especially if your use of the platform is anticipated to involve deadlines, other complex scheduling or calendar conflicts, or a need for multiple periods of short or extended term use.

Additional planning advice:
Individuals or groups submitting an access proposal need to provide:

  • Contact information for the project leader and, if applicable, up to four other team members
  • Research question(s) and short description of how the individual or team sees ProQuest TDM Studio aiding the investigation of the question(s)
  • Identification of SU or external funding that is associated with the project (if any)
  • A brief project plan for short or long term requested period
  • Level of expertise in Python or R for each of the one or more team members and brief indication of the roles and responsibilities of each individual on the project. TDM Studio uses either R or Python in a Jupyter Notebook environment. There is only limited help with these, so SU Libraries highly recommends having at least one team member who is well-versed in one of these languages on each team.

Downloading information:

  • All members of a team have read/write access to all data, programs and results.
  • TDM Studio users will be able to analyze, but not download a full data set (i.e. full corpus of text remains in TDM Studio environment).
  • Research team members can download all of their programs and related analytical results up to 5mb per week.
  • Downloads are queued and team members are sent an email when the download is ready.
  • The link in the email will work only once.
  • Team member coordination for the use of those emailed links is advised.

Saving work: Since all data, programs and results will be deleted when a team’s 7 day or 30 day time slot has ended, it is imperative that teams save all work and keep detailed notes on all steps and selections as there is no way to save them in TDM Studio, and teams may want to refer to them after the TDM Studio access period has ended.

Submit Your Feedback

Please note that Syracuse University Libraries is piloting this resource and your feedback is very much welcome.  Please share feedback via email with Data Services.