AI-powered content analysis: Using ChatGPT to measure media and communication content
Methods tutorial #28835, module (political) communication research methods, Winter term 2023/2024
Last updated on 2024-01-17 at 15:15
Important links
Overview
Large language models (LLM; starting with Googleâs BERT) and particularly their implementations as generative or conversational AI tools (e.g., OpenAIâs ChatGPT) are increasingly used to measure or classify media and communication content. The idea is simple yet intriguing: Instead of training and employing humans for annotation tasks, researchers describe the concept of interest to a model such as ChatGPT, present the coding unit, and ask for a classification. The first tests of the utility of ChatGPT and similar tools for content analysis were positive to enthusiastic (Gilardi et al., 2023; Rathje et al., 2023). However, others pointed out the need for more thorough validation and reliability tests (Pangakis et al., 2023; Reiss, 2023). Easy-to-use tools and user-friendly tutorials have proliferated the methods to the average social scientist (Kjell et al., 2023; Törnberg, 2023b). Yet (closed-source, commercial) large language models are not entirely understood even by their developers, and their uncritical use has been criticized on ethical grounds (Bender et al., 2021; Spirling, 2023).
In this seminar, we will engage practically with this cutting-edge methodological research. We start with a quick refresher on the basics of quantitative content analysis (both human and computational), focusing on quality criteria and evaluation (validity, reliability, reproducibility, robustness, replicability). We will then attempt an overview of the rapidly developing literature on LLMsâ utility for content analysis. The central part of the seminar will be dedicated to small evaluation studies by student teams. Questions can range from understanding a toolâs parameters (e.g., Whatâs the effect of a modelâs âtemperatureâ on reliability and validity?) to practical optimization (e.g., Which prompts work best for a given task?) to critical questions (e.g., Does the classification show gender, racial, âŠ, biases?).
Requirements
- Some prior exposure to (standardized, quantitative) content analysis will be helpful. However, qualitative methods also have their place in evaluating content analysis methods. If you have little experience with the former but can contribute with the latter, make sure to team up with students whose skill set complements yours.
- Prior knowledge in R or Python, applied data analysis, and interacting with application programming interfaces (API) will be helpful but are not required. Again, make sure that the teams overall have a balanced skill set.
- You will use your computer to conduct your evaluation study. Credit for commercial APIs (e.g., OpenAI) will be provided within sensible limits.
- This is not a programming class. Neither are programming skills required nor will you acquire such skills in a systematic way. I primarily work with R and sometimes copy, paste, and adapt some Python code. So, my examples will be mainly in R. However, you are free to use whichever software you like.
Session plan
(1) 18. 10.: Hello
Class content: Introduction, demo, and organization
Organization: Find a partner for the state-of-the-art presentation. The goal is to find a partner who complements your skill set. Select or find an additional text. Register your presentation in the Blackboard Wiki.
Homework: Listen to this podcast episode with Petter Törnberg: LLMs in Social Science
(2) 25. 10.: Refresher: Traditional content analysis (human and computational)
Class content: Quick refresher on the basics of quantitative content analysis (both human and computational), focusing on quality criteria and evaluation (validity, reliability, reproducibility, robustness, replicability).
Texts (if needed): Krippendorff (2019) (but not the parts on computational content analysis), Van Atteveldt et al. (2022), Kroon et al. (2023).
State of the art: Overview
Class content: Short presentations on current work about LLM-based zero-shot classification
- Short presentations (15 Minutes)
- One paper presented by two participants
Texts: Some recommendations include Burnham (2023), Gilardi et al. (2023), Hoes et al. (2023), Kjell et al. (2022), Kuzman et al. (2023), Laurer et al. (2023), McCoy et al. (2023), Ornstein et al. (2023), Pangakis et al. (2023), Qin et al. (2023), Rathje et al. (2023), Reiss (2023), Törnberg (2023a), Yang & Menczer (2023), Zhong et al. (2023). You are free to use other texts (check citations in and to these texts to find more). Text assignment will be managed via Blackboard.
(3) 01. 11.: State of the art I
(4) 08. 11.: State of the art II
(5) 15. 11.: State of the art III
(6) 22. 11.: Work on first ideas
Class content: Support in class and office hours
Organization: Now, at the latest, form teams for the evaluation study. The goal is to create teams with diverse skill sets. In my experience, three to five persons are a good team size, but your preferences might differ.
(7) 29. 11.: Present first ideas
Class content: Presentations and feedback
(8) 06. 12.: Work on design of evaluation study
Class content: Support in class and office hours
(9) 13. 12.: Present design of evaluation study
Class content: Presentations and feedback
(10) 20. 12.: Organize evaluation study
Class content: Progress report and support in class and office hours
Winter break
(11) 10. 01.: Conduct evaluation study
Class content: Live coding: How to talk to OpenAI models using the API
(12) 17. 01.: Conduct evaluation study
Class content: Live coding: How to set up the evaluation study
(13) 24. 01.: Conduct evaluation study
Class content: Class evaluation 1; Help desk: Collect data for evaluation study
(14) 31. 01.: Conduct evaluation study
Class content: Class evaluation 2; Live coding: Quantitative evaluation
(15) 07. 02.: Conduct evaluation study
Class content: Help desk: Qualitative and quantitative evaluation
(16) 14. 02.: Final presentations
Class content: Presentations (15 Minutes per group) and feedback
Aims
The primary aims of a methods tutorial are twofold: firstly, to equip participants with the essential knowledge and skills required to effectively utilize Large Language Models (LLMs) for content analysis, enabling them to extract valuable insights and meaning from textual data. Secondly, the tutorial seeks to provide a comprehensive understanding of the methodologies involved in conducting an evaluation study of a new method. Through this, participants can gain proficiency in assessing the performance and effectiveness of novel approaches, fostering innovation and informed decision-making within the realm of natural language processing and data analysis (wordy phraseology according to ChatGPT).
Tasks
- 5 ECTS â 125-150 hours workload
- Active participation, not graded
- Participation in class: read texts, ask questions, discuss, give feedback to other students
- Short presentation of a published evaluation study report (in pairs)
- Not a detailed description, but a summary for the class. The audience should learn a) what kind of questions and studies might be interesting and b) which texts might be worth reading once they have decided on a study idea.
- Plan and conduct an evaluation study (in groups)
- Present the results of your own evaluation study (in groups)
Contact information
Division Digital Research Methods
Email: marko.bachl@fu-berlin.de
Phone: +49-30-838-61565
Webex: Personal Meeting Room
Office: Garystr. 55, Room 274
Office hours: Tuesday, 11:00-13:00, please make an appointment via email.