Slides and recordings for some of the presentations are available below in the description section.

The times listed below are in Eastern Standard Time (EST).

Tuesday, May 2

Time Session
8:30 Coffee
9:00 1. Welcome & Introductions
Chairpersons: Kara Handren & Kelly Schultz (University of Toronto)
9:30 2. Keynote - From Text to Research: Tales from the TDM workflow
Keynote Speaker: Andrew Piper (McGill University)
10:30 Coffee Break
10:45 3. Short Presentations on TDM Research and Development
Speakers: Michelle Alexopoulos (University of Toronto), Sam Hansen (University of Michigan), Catherine Yeh (University of Toronto)
Moderator: Marcel Fortin (University of Toronto)
12:15 Lunch
1:15 4. TDM Copyright and Licensing (Panel Discussion)
Speakers: David Hansen (Authors Alliance), Ariel Katz (University of Toronto), Rachael Samberg (University of California Berkeley)
Moderator: Graeme Slaght (University of Toronto)
2:30 5. Terms of Service Exercise (Group Activity)
Facilitators: Rachael Samberg (University of California Berkeley), Leslie Barnes (University of Toronto)
3:30 Coffee Break
3:45 6. Social
Facilitator: Nick Field (University of Toronto)
4:45 End of Day

Wednesday, May 3

Time Session
8:30 Coffee
9:00 7. Presentations from TDM Service & Tool Providers
Speaker: John Dillon (TDM Studio), Amy Kirchhoff (Constellate), Jess Ludwig (Gale), Janet Swatscheno (HTRC)
Moderator: Dany Savard (University of Toronto Mississauga)
10:30 Coffee Break
10:45 8. Researcher Presentations Related to TDM Tools
Speakers: Elio Colavito (University of Toronto), Dr. Kun Lu (University of Oklahoma), Dr. Raina Heaton (University of Oklahoma),
Dr. Raymond Orr (Dartmouth College), James Mason (University of Toronto), Dr. Keyao Pan (Florida International University)
Moderator: Kara Handren (University of Toronto Scarborough)
12:00 Lunch
1:00 9. How Libraries Are Supporting TDM - Challenges and Opportunities (Panel Discussion)
Speakers: Kara Handren (University of Toronto), Daniel Hickey (New York University), Stephanie Labou (University of California San Diego)
Moderator: Kelly Schultz (University of Toronto)
2:30 Coffee Break
2:45 10. Birds of a Feather
Facilitator: Nick Field (University of Toronto)
4:15 11. Closing Remarks
Chairpersons: Kara Handren & Kelly Schultz (University of Toronto)
4:30 End of Day

SESSION DESCRIPTIONS

2. Keynote - From Text to Research: Tales from the TDM workflow

10:45 - 12:15

Speaker: Andrew Piper
Description: Researchers the world over are using new techniques in natural language processing and machine learning to study the history and on-going present of the human documentary record. As leaders of major repositories of textual information, librarians have a key role to play in that process. Through a series of illustrative case studies, I will highlight where I see current challenges and future opportunities for developing a more robust infrastructure for TDM research in conjunction with academic and state libraries. What kind of TDM infrastructure do we need to support a deeper, more accurate understanding of human behavior when it comes to textual communication?

3. Short Presentations on TDM Research and Development

10:45 - 12:15

Speaker: Michelle Alexopoulos (Slides) (Recording)
Title: Using AI to investigate the effects of central Bank Communications: It’s not just what they, say but how they say it!
Description: Economic policies enacted by the government and its agencies have large impacts on the welfare of businesses and individuals—especially those related to fiscal and monetary policy. Communicating the details of the policies to the public is an important and complex undertaking. Policymakers tasked with the communication not only need to present complicated information in simple and relatable terms, but they also need to be credible and convincing—all the while being at the center of the media's spotlight. In this talk, Prof. Alexopoulos will review how recent advances in AI can be applied to construct measures of Federal Reserve Chairs’ emotions in public testimonies expressed via their words, voice, and face, as well as her research team's findings to date.

Speaker: Sam Hansen (Slides)
Title: Operationalization, MOU Automation, and Just Where Exactly is the Data: Stories about Overcoming Access Issues for Text and Data Mining Data
Description: There are many issues for librarians who are working in the text and data mining space, not least of which is figuring out how to connect user with the data in the first place. In this presentation Sam Hansen will discuss how the University of Michigan, Ann Arbor is approaching this thorny problem. They will discuss their work with the library's Text and Data Mining Service Teams and Advisory Group and how they are actively working to transform large structured data into a relational dataset on the University high perfomance computing service, why it was necessary to survey all the librarians about the data sets they may just be stewarding, the ways in which Qualtrics can be used to automate the collection of Memorandums of Understanding, and the multi-year discussion around what would constitute a meaningful and useful text and data mining service.

Speaker: Catherine Yeh
Title: Using Bibliometric and Journal Text Data to Evaluate Gender Knowledge Production and Citation Patterns
Description: The availability of large-scale text and citation data on academic journals enables researchers to apply computational techniques to answer questions about structures and patterns in knowledge production. This presentation will draw on analyses from my dissertation on gender and knowledge production in two social science disciplines: sociology and economics. I will show how researchers can leverage large-scale journal data and a variety of computational methods to evaluate gender inequality in academia. I will discuss the process of cleaning and constructing large-scale data for analysis, as well as common techniques in computational text analysis used by social scientists. Using journal citation and text data for 6 journals (3 sociology, 3 economics) that were published between 1970-2015 (N=17,006), I illustrate how as women’s participation in higher education increased since the 1970s, there is increased scholarly attention to topics such as “gender” and “family.” At the same time, gender inequalities in citation appear to reduce over time.

4. TDM Copyright and Licensing

1:15 - 2:30

Speakers: David Hansen, Ariel Katz, Rachael Samberg, Graeme Slaght (Moderator)
Description: This panel will bring together legal copyright experts who have engaged with questions around TDM and the law. David Hansen, Ariel Katz, and Rachael Samberg will discuss the current legal landscape and emerging legal developments around TDM. Possible topics include Technical Protection Measures (TPM); licenses vs. fair dealing/fair use); risk management in academic contexts; how US and Canadian policies might influence each other; and the legal future for TDM. Graeme Slaght, Scholarly Communications & Copyright Outreach Librarian, will chair.

5. Terms of Service Exercise

1:15 - 2:30

Facilitators: Rachael Samberg, Leslie Barnes (Slides)
Description: In this interactive session, attendees will learn to untangle content license agreements and website Terms of Service (ToS). We encounter these kinds of agreements with nearly every product or data set researchers seek to mine: licensed databases, social media platforms, search engines, and procured data. Rachael Samberg will guide us in how to understand the terms of these agreements. She’ll help tease out the relationship between fair dealing/ fair use and other clauses that shape or even override TDM rights. Leslie Barnes will discuss technical aspects of accessing textual data, introducing basic workflows for understanding how, practically, researchers might be able to access data under different licensing agreements or frameworks.
Participants will then engage with sample TDM reference requests and license agreements to address how researchers can lawfully or responsibly conduct research. What can they download and mine? What will they be able to publish or share? Must they use a certain technical method to access the data?

6. Social

3:45 - 4:45

Facilitator: Nick Field
Description: Building a community goes more smoothly when everyone has met each other, so please join us for a welcoming and low-stress opportunity to meet and mingle. We will provide snacks and a light structure, just to keep things streamlined.

7. Presentations from TDM Service & Tool Providers

9:00 - 10:30

Speaker: John Dillon (Slides) (Recording)
Title: Partnership for Innovation Through Text and Data Mining
Description: Students and faculty need access to large amounts of content in order to apply cutting-edge approaches and to harness data and information in many different disciplines. Text and data mining (TDM) is fast becoming a required skill for unlocking trends found in unstructured content, making it possible to employ approaches like natural language processing, artificial intelligence, machine learning, and predictive models. These trends are impacting many disciplines across the university including Political Science, Economics, Information Science, Sociology, and Business Studies, to name a few. Libraries are establishing service centers to provide access to content, software and real-time support for researchers.
In this session, you will hear about TDM Studio, a platform that can augment the research process, from content access to the interrogation of data. Join us to learn about how TDM Studio has helped libraries partner with academic researchers to quickly achieve their goals.
Learn more: TDM Studio (proquest.com)

Speaker: Amy Kirchhoff (Slides) (Recording)
Title: Teach Text & Data Analysis with Constellate: Prepare your students for a data-driven future
Description: Has your campus struggled with teaching text & data analysis? In this session, you will learn how Constellate can help your faculty and staff learn the skills necessary to teach workshops on text & data analysis and even integrate it into their regular classes. Constellate is the only text analysis platform that integrates access to scholarly content and open educational resources into a cloud-based lab to help faculty more easily and effectively teach text analysis and data skills. With Constellate, learners across all disciplines can apply text analysis methods to datasets, and hone their skills with support from on-demand tutorials, live classes taught by experts, and engagement with an inspiring user community.
Learn more: Constellate (constellate.org)

Speaker: Jess Ludwig (Slides) (Recording)
Title: Gale Digital Scholar Lab in the Global Classroom
Description: Gale Digital Scholar Lab is a text and data mining cloud-based platform designed to advance humanities research, instruction, and learning. Created for primary sources research, the tool reveals new pathways for inquiry in Gale Primary Sources content—as well as local collections—as researchers progress through a data curation workflow. In this session, Gale will share global case studies of how the product has been used in instruction and showcase recent product enhancements that support collaborative, project-based learning as well as data literacy and critical thinking skills development.
Learn more: Gale (gale.com)

Speaker: Janet Swatscheno (Slides) (Recording)
Title: Introduction to HathiTrust and HTRC
Description: This presentation will introduce attendees to the data and computational tools of HathiTrust and the HathiTrust Research Center (HTRC). HathiTrust is a member organization that operates a repository of over 17.5 million items digitized at a network of partner libraries. This massive collection is available for computational analysis primarily through the tools and services of HTRC. Attendees of this workshop will be introduced to the HathiTrust and HathiTrust Digital Library as well as the HTRC and its data and core services.
Learn more: HathiTrust (hathitrust.org)

8. Researcher Presentations Related to TDM Tools

10:45 - 12:00

Speaker: Elio Colavito (Slides) (Recording)
Title: A Little Help from my Friend: Mapping Transtopia with the Digital Scholar Lab
Description: My GALE-CLGBTH [Committee on LGBT History] Non-Residential Fellowship-funded digital mapping project, “Mapping Transtopia: Trans-Masculine Mutual Aid, Activism, and Community Formation, 1970-2005” uses an interactive map created on ArcGIS StoryMaps, a digital mapping and storytelling platform. The project seeks to organize and spatialize the vastness and complexity of late 20th-century trans-masculine community building, resource sharing, and identity-making. It fixes primary sources and histories of trans-masculinity to their geographies, tracing and connecting letters, magazines, and other material from sender to sender and city to city. I built the scaffolding for this map using analysis tools from GALE’s Digital Scholar Lab. This paper explores the ways that those tools shaped the interpretation and organization of my data.

Speaker: Dr. Kun Lu, Dr. Raina Heaton, Dr. Raymond Orr (Slides) (Recording)
Title: Exploring Native American Texts using Text Mining methods
Description: Native Americans represent a historically under-resourced textual community. While there has been an ever-increasing number of Native authors creating works since the 1960s, no corpus of Native-authored works exists from which to draw insights about this particular community, and give them the recognition equal to other similar communities of practice (e.g. History of Black Writing). In collaboration with the HathiTrust Research Center (HTRC) and with the support of a Scholar-Curated Worksets for Analysis, Re-use and Dissemination grant (SCWAReD), we have created a preliminary database of Native-authored works, which allows us to use text mining techniques to reveal novel characteristics of this community, such as their identity, worldview, representation, and modes of expression. Text mining also offers a new approach to looking at the ways in which Native authors express themselves and how they may or may not differ from other authors.

Speaker: James Mason (Slides) (Recording)
Title: Using TDM Studios to help systematically analyze research trends in music
Description: I will discuss the process of creating datasets with TDM Studios and how to work with them in the Jupyter Notebook environment provided by Proquest. I will discuss text analysis and natural language techniques used, including topic analysis, categorization, tokenization and lemmatization with Python and some standard Python libraries. Finally I will offer thoughts on how I will use this to reflect on our music collection as part of a larger collection analysis project.

Speaker: Dr. Keyao Pan (Slides) (Recording)
Title: Teaching about Text and Text Analysis in the Era of Generative AI
Description: It is an understatement to say that generative AI like ChatGPT is having a huge impact on higher education. To date, discussions of such powerful tools in the humanities have largely focused on cheating prevention, although pedagogies that seek to integrate them into the classroom, particularly in the areas of text processing and production, are also emerging. This presentation seeks to provide some examples of the use of generative AI to 1) enrich the teaching of text comprehension and discussion, and 2) facilitate the teaching of digital humanities skills such as text mining and analysis, along with more conventional (but still new) tools such as Constellate. The presentation will explore how generative AI can be used to create new kinds of learning experiences and empower students to create their own text-based projects.

9. How Libraries Are Supporting TDM - Challenges and Opportunities

1:00 - 2:30

Speakers: Kara Handren, Daniel Hickey, Stephanie Labou, Kelly Schultz (Moderator)
Description: This panel will bring together librarians from the US and Canada. Kara Handren, Daniel Hickey, and Stephanie Labou will discuss how they and their libraries are supporting TDM. Possible topics include types of TDM support offered; tools subscribed to; who are the users of TDM services; challenges and obstacles to supporting TDM and plans for the future. Kelly Schultz, Data Visualization Librarian, will chair.

10. Birds of a Feather

2:45 - 4:15

Facilitator: Nick Field
Description: To wrap up our two days together, we will gather for smaller, more focused conversations, drawing on ideas everyone has shared. Find a group discussing a topic close to your heart and join in!