Skip to Main Content
Syracuse University Libraries

ProQuest TDM Studio: Home

Description of ProQuest TDM Studio for text mining

ProQuest TDM Studio

Please note that Syracuse University Libraries is piloting this resource and your feedback is very much welcome.  Please share feedback via email with Data Services.

ProQuest TDM Studio at Syracuse University

ProQuest TDM Studio allows research teams at Syracuse University the ability to mine large volumes of published content from the millions of pages of news, scholarly and other publications provided to the SU campus community through current subscriptions to contemporary and historical ProQuest databases

TDM Studio workbenches allow Syracuse University researchers the capacity within ProQuest’s text and data mining environment to apply programming languages like R or Python to execute queries, develop datasets, and extract and analyze the text of publications central to their research. ProQuest TDM Studio is also pre-equipped with helpful programming libraries and scripts for getting started, and users can import or write their own scripts and packages to extend the capabilities of the platform.

TDM Studio data visualization dashboard supports geographic analysis, topic modeling capability, and sentiment analysis across a set of major U.S. and international newspaper titles. This is a powerful tool for analyzing recent and more deeply historical scholarly publications, primary source texts in the humanities, business, public policy, public health and other scientific literature, as well as extensive recent and older U.S. and international journalism.

ProQuest has an extensive guide on the TDM Studio product.

ProQuest TDM Studio Visualizations

Available to all students, faculty and staff at Syracuse University, the Visualizations component of TDM Studio does not require advanced coding skills and supports a point and click creation of data visualizations, enabling users to:

  • Mine discussion of subjects across thousands of instances of current and historical journalism, and to the corpus of doctoral dissertations in ProQuest
  • Analyze a set of newspaper titles based on available SU subscriptions in ProQuest, including the Washington Post, New York Times, Los Angeles Times, Chicago Tribune, Wall Street Journal, Globe and Mail, The Guardian, Sydney Morning Herald, South China Post, Times of India, and several others
  • Apply pre-built visualizations to one's research questions using established text mining methods
  • Visualize global trends by comparing impact of a topic across multiple geographic locations
  • View interactive chronological displays to reveal changes in the subject of news coverage over time
  • Use an embedded topic modeling component to analyze major themes in those news reports or other documents
  • Analyze a sentiment analysis that plots out over time the emotions or affective states within your source documents (news articles or doctoral dissertations)
  • Manage as many as five simultaneous research projects of 10,000 documents each

Upon initial release, excepting the doctoral dissertations which date back much further - the range of news publication dates covered is primarily from 1990 to present (earlier for some titles)

ProQuest has a detailed guide on using the Visualizations platform as well as instructions on how to create an account.

ProQuest TDM Studio Workbench

The Workbench Dashboard allows researchers to create large datasets with access to a Jupyter Notebook environment where researchers can apply programmatic methods in Python or R to interrogate their chosen content. Metadata can be exported to work with in other environments. Researchers can access their institution’s ProQuest content as well as upload their own within the Workbench. For a video walkthrough of the Workbench Dashboard visit the TDM Studio LibGuide TDM Studio Workbench Dashboard Walkthrough.

 

  • One (1) research team at a time, consisting of a maximum of TDM Studio accounts available for five (5) research team members. The person making the request must be from SU, team members can be from other institutions.
  • Workbenches can be used for one month. Extensions can be made as long as there are no other teams waiting.
  • A limit of ten (10) total simultaneous datasets created and saved by a research team within the TDM Studio platform at any one time
  • The volume of each dataset should be limited to half a million sources (item records) or fewer when possible. Each dataset can have up to two million documents.
  • A maximum limit of 15mb per week is available for export outside of the TDM Studio environment. Please note that the corpus of text remains in the TDM Studio. Those datasets cannot be exported, only programs and resulting secondary analysis.