▶ From Text to Tech

Computers:Students are not required to bring their own laptops for this workshop. Desktop computers will be provided by DHOxSSS

Abstract:

With large amounts of text becoming available through digitization efforts, there is a growing need for automatic analyses in the Digital Humanities to support distant reading. This workshop, originating from the HiCor research network, will impart some of the basics for working computationally and quantitatively with texts. It will take a hands-on approach to processing text, including cleaning and adding automatic linguistic annotation using freely available computational tools and the Python programming language, a very flexible tool with a wide range of applications in Humanities research.

The workshop proceeds in a stepwise manner, with an introduction to corpus linguistics followed by basic programming in Python. The workshop will also teach how to explore texts quantitatively, for example by creating frequency lists and visualizations, and more advanced types of analysis, such as topic modelling. The practical sessions are accompanied by lectures that discuss research which demonstrates concretely how Python and corpus linguistics can be applied to answer questions in a range of humanistic disciplines. The workshop rounds off with a practical problem-solving session covering the topics of the week.

No prior knowledge of programming is required, but attendees should be comfortable with identifying file paths on their own computer and installing software.

16:30 - 17:30

Tuesday

11:00 - 12:30

The session provides a basic introduction to programming for digital humanities using the Python language. Among the topics covered are assignments and variables, data types, conditional statements, and reading/writing data.

16:30 - 17:30

Wednesday

11:00 - 12:30

This session will explore how researchers can use evidence from the Historical Thesaurus of the OED in combination corpus methods to investigate lexical features of social identity, with the language of Shakespeare and his contemporaries as a case study.

14:00 - 16:00

Corpus linguistics with Python: The session provides and introduction to doing corpus linguistics in Python and NLTK. Topics include collocations, frequency lists, and key words.

16:30 - 17:30

Python and more NLTK [Continued]

Thursday

11:00 - 12:30

Creativity is what we say it is: using corpus linguistics to identify key aspects of creativityAnna Jordanous

As a concept, creativity is complex and multi-dimensional, encompassing many related aspects, abilities, properties and behaviours. Using techniques from the field of statistical natural language processing, we have identified a collection of fourteen key components of creativity. Words were identified which appeared significantly often in connection with discussions of the concept, and a measure of lexical similarity was used to cluster these words. A number of distinct themes emerged, which collectively contribute to our understanding of how creativity is composed.

This session gives a non-technical introduction to topic modelling along with examples of Python code.

16:30 - 17:30

Topic Modelling [Continued]

Friday

11:00 - 12:30

Corpora do what? On theory, method and data in Digital HumanitiesKnut Melvær

Having stumbled my way into the Digital Humanities, I have had to overcome an array of challenges when it comes to messy data, undocumented and buggy software, the rapid advancements in the tech-world and the scarcity of theorizing about what digital methods such as “distant reading” really tell us. In this session I will invite you to explore some of these issues and discuss how we can make DH more approachable with regards to theory and method.

14:00 - 16:00

Problem solving session

The session will provide an opportunity to apply the skills taught during the week, with instructors present to provide guidance.