Skip to Main Content

Lehigh University Libraries - Library Guides

TDM Studio

Accessing your Jupyter notebook

We will go into the development environment by clicking on Open Jupyter Notebook.

Main Interface

 

 

When you first enter the Jupyter notebook you will see several folders.  One that states Getting Started which will help you get started and access your dataset.

In this folder, you will find answers to common questions in the ProQuest TDM Studio Manuals folder.  You can find information about importing and exporting and Frequently Asked Questions.

When you first sign up for TDM Studio, your team will have the option to attend an onboarding session to walk you through the interface and notebook to familiarize your team with the environment and answer any questions in order to get their project started.

From here, we will choose to look at the Topic Modeling sample script found in the TDM Studio Visualization Samples folder.

Topic Modelling Example

This Topic Modeling script is an example using matrix factorization for detecting topics within a dataset of newspaper documents where we searched for the terms "COVID OR Coronavirus". This one is written in Python as you can see in the upper right corner.  You can write your scripts in either R or Python within Jupyter.

Topic Modeling is just one example of text mining but it provides us with methods to organize, understand, and summarize large collections of textual information. 

It helps in:

  • Discovering hidden topical patterns that are present across the collection
  • Annotating documents according to these topics
  • Using these annotations to organize, search and summarize texts

Topic modelling can be described as a method for finding a group of words (i.e topic) from a collection of documents that best represents the information in the collection.

This topic modeling example produces a series of visualizations and scores the reoccurrence of the listed topics within documents across time.

In this example, you can see the topics of – students, school, year, time, government, pandemic, children, schools, like, work. From this graph, we could possibly interpret it was likely that remote schooling and work were topics that quickly gained traction in the early stages of the pandemic.

You can also see in the second topic – cases, new, reported, vaccine, deaths, state, total, health, active, rate – that talks of vaccines and case numbers climbed more gradually over the course of the pandemic.

Of course this is just the beginning of research but gives the researcher an idea of connected topics for further investigation.

Now from this point, you can export the tables and data behind these graphs, the visualizations themselves, the script, and any derivative data.  The only thing that cannot be exported is the full text or any consumptive information that would allow the researcher to reconstruct the full text.