Inhaltsanalyse mit künstlicher Intelligenz

Seminar #28535 im Modul Methoden: Wissenschaftstheoretische Grundlagen, Datenerhebung und Statistik, Sommersemester 2024

Marko Bachl

Freie Universität Berlin

15. 04. 2024

Herzlich Willkommen

Bevor wir anfangen können:
Wer ist alles da?

Agenda

Vorstellung
Demo: Inhaltsanalyse mit KI
Überblick über das Seminar
Organisatorisches (verschoben auf 2. Sitzung)
Aufgaben bis zur nächsten Woche

Vorstellung

Demo: Inhaltsanalyse mit KI

DALL·E 3; Prompt

Demo: Inhaltsanalyse mit KI

Beispiel: Klassifikation von Inzivilität in Social-Media-Kommentaren

Klassifikation mit OpenAI GPT-4 (bekannt aus ChatGPT Premium) und der Programmiersprache R

Pakete

library(httr2) # Kommunikation mit API über HTTP
library(jsonlite) # JSON-Dateien
library(tidyverse) # Datenmanipulation und Grafik

Untersuchungsmaterial

Wir brauchen ein paar Kommentare zum Testen:

Einen klar inzivilen Kommentar
Einen klar nicht inzivilen Kommentar
Zwei mehrdeutige Kommentare:
- Einen nicht inzivilen Kommentar, der fälschlicherweise als inzivil klassifiziert wird
- Einen inzivilen Kommentar, der fälschlicherweise als nicht inzivil klassifiziert wird

Wir könnten das in ChatGPT machen

https://chat.openai.com/

URL für Anfrage

req = request(base_url = "https://api.openai.com/v1/chat/completions")
req |> 
  req_dry_run()

GET /v1/chat/completions HTTP/1.1
Host: api.openai.com
User-Agent: httr2/1.0.1 r-curl/5.2.0 libcurl/8.4.0
Accept: */*
Accept-Encoding: deflate, gzip

Key zur Anmeldung bei OpenAI

Schlüssel und Token niemals öffentlich teilen!

key = readLines("example/openai_key.txt")

req |> 
  req_auth_bearer_token(key) |> 
  req_dry_run()

GET /v1/chat/completions HTTP/1.1
Host: api.openai.com
User-Agent: httr2/1.0.1 r-curl/5.2.0 libcurl/8.4.0
Accept: */*
Accept-Encoding: deflate, gzip
Authorization: <REDACTED>

Prompt (1)

Codieranweisung: Was soll KI-Assistent tun?

instr = paste(readLines("example/instr_def_reason.txt"), collapse = "\n")
cat(instr)

Your task is to evaluate whether a comment contains incivility.

Incivility is defined as a statement that contains any of the following features: Vulgarity, Inappropriate Language, Swearing, Insults, Name Calling, Profanity, Dehumanization, Sarcasm, Mockery, Cynicism, Negative Stereotypes, Discrimination, Threats of Violence, Denial of Rights, Accusations of Lying, Degradation, Disrespect, Devaluation.

You should assign the comment a numeric label, 1 or 0.
1. The comment is incivil. It contains any of the mentioned features.
0. The comment is civil. It does not contain any of the mentioned features.

Answer with 0 or 1, followed by a semi-colon and then a brief motivation. For instance: "1; The comment is incivil. It has many elements of an uncivil comment, such as name-calling, mockery, and threats of violence." Do not use quotation marks.

(Törnberg, 2023)

Prompt (2)

Codiereinheiten: Was soll klassifiziert werden?

cod = readLines("example/comments.txt")
cat(cod, sep = "\n")

Arschloch!!!
Du siehst sehr schön aus <3
Du siehst ja schön aus.
Dass du mich Dummkopf genannt hast, finde ich nicht schön.
Für eine Frau ist das gar kein schlechter Vorschlag.

Anfrage

req |> 
  req_auth_bearer_token(key) |> 
  req_body_json(list(
    model = "gpt-4",
    messages = list(
      list(role = "system", content = instr),
      list(role = "user", content = cod[1])
    ),
    temperature = 0,
    max_tokens = 50
  )) |> 
  req_dry_run()

POST /v1/chat/completions HTTP/1.1
Host: api.openai.com
User-Agent: httr2/1.0.1 r-curl/5.2.0 libcurl/8.4.0
Accept: */*
Accept-Encoding: deflate, gzip
Authorization: <REDACTED>
Content-Type: application/json
Content-Length: 998

{"model":"gpt-4","messages":[{"role":"system","content":"Your task is to evaluate whether a comment contains incivility.\n\nIncivility is defined as a statement that contains any of the following features: Vulgarity, Inappropriate Language, Swearing, Insults, Name Calling, Profanity, Dehumanization, Sarcasm, Mockery, Cynicism, Negative Stereotypes, Discrimination, Threats of Violence, Denial of Rights, Accusations of Lying, Degradation, Disrespect, Devaluation.\n\nYou should assign the comment a numeric label, 1 or 0.\n1. The comment is incivil. It contains any of the mentioned features.\n0. The comment is civil. It does not contain any of the mentioned features.\n\nAnswer with 0 or 1, followed by a semi-colon and then a brief motivation. For instance: \"1; The comment is incivil. It has many elements of an uncivil comment, such as name-calling, mockery, and threats of violence.\" Do not use quotation marks."},{"role":"user","content":"Arschloch!!!"}],"temperature":0,"max_tokens":50}

Antwort

resp = req |> 
  req_auth_bearer_token(key) |> 
  req_body_json(list(
    model = "gpt-4",
    messages = list(
      list(role = "system", content = instr),
      list(role = "user", content = cod[1])
    ),
    temperature = 0,
    max_tokens = 50
  )) |> 
  req_perform()

resp |> 
  resp_body_string() |> 
  prettify()

{
    "id": "chatcmpl-9EFrKWldKi4O8ZgFkcgldzxb9a6XT",
    "object": "chat.completion",
    "created": 1713184582,
    "model": "gpt-4-0613",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "1; The comment is incivil. It contains vulgarity and swearing."
            },
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 225,
        "completion_tokens": 16,
        "total_tokens": 241
    },
    "system_fingerprint": null
}

Alle Kommentare

Code

req_list = cod |> 
  map(~ {
    req |> 
      req_auth_bearer_token(key) |> 
      req_body_json(list(
        model = "gpt-4",
        messages = list(
          list(role = "system", content = instr),
          list(role = "user", content = .x)
        ),
        temperature = 0,
        max_tokens = 50
      ))
  })

# Antworten für alle Kommentare ####
resp_list = req_list |> 
  req_perform_parallel()

# Extrahieren und aufbereiten ####
tibble(
  Kommentar = cod,
  Klassifikation = resp_list |> 
    map_chr( ~ {
      .x |> 
        resp_body_json() |> 
        _$choices |> 
        _[[1]] |> 
        _$message |> 
        _$content
    })
) |> 
  knitr::kable()

Kommentar	Klassifikation
Arschloch!!!	1; The comment is incivil. It contains vulgarity and swearing.
Du siehst sehr schön aus <3	0; The comment is civil. It does not contain any elements of incivility. It is a compliment in German, saying “You look very beautiful <3”.
Du siehst ja schön aus.	0; The comment is civil. It does not contain any elements of incivility. It is a compliment in German, saying “You look beautiful.”
Dass du mich Dummkopf genannt hast, finde ich nicht schön.	1; The comment is incivil. It contains name-calling, which is a feature of incivility.
Für eine Frau ist das gar kein schlechter Vorschlag.	1; The comment is incivil. It contains negative stereotypes and devaluation based on gender.

Fragen?

Seminarplan (Syllabus)

Zum Abtippen: https://bachl.quarto.pub/inhaltsanalyse_mit_ki/
Link in der Begrüßungsmail oder auf Blackboard

Fragen?

Organisatorisches

Bilden der Arbeitsgruppen (verschoben auf 2. Sitzung)

5 Arbeitsgruppen
28 Studierende
3 Gruppe mit 6 Studierenden, 2 Gruppen mit 5 Studierenden

Fragen?

Aufgaben bis zur nächsten Woche

In der Gruppe organisieren (verschoben auf 2. Sitzung)
Lehrtext(e) finden und Zugang sicherstellen
Offene Fragen sammeln und ggf. im Blackboard-Forum stellen

Fragen?

Vielen Dank — bis nächste Woche

Marko Bachl

marko.bachl@fu-berlin.de

Literatur

Stoll, A., Wilms, L., & Ziegele, M. (2023). Developing an incivility dictionary for German online discussions – a semi-automated approach combining human and artificial knowledge. Communication Methods and Measures, 17(2), 131–149. https://doi.org/gsnfdn

Törnberg, P. (2023). How to use LLMs for text analysis. arXiv. https://doi.org/mqx9

Inhaltsanalyse mit künstlicher Intelligenz

Herzlich Willkommen

Bevor wir anfangen können:Wer ist alles da?

Agenda

Vorstellung

Demo: Inhaltsanalyse mit KI

Demo: Inhaltsanalyse mit KI

Demo: Inhaltsanalyse mit KI

Pakete

Erkennen von Inzivilität in Social-Media-Kommentaren (Stoll et al., 2023)

Untersuchungsmaterial

Wir könnten das in ChatGPT machen

URL für Anfrage

Key zur Anmeldung bei OpenAI

Prompt (1)

Codieranweisung: Was soll KI-Assistent tun?

Prompt (2)

Codiereinheiten: Was soll klassifiziert werden?

Anfrage

Antwort

Alle Kommentare

Fragen?

Seminarplan (Syllabus)

Fragen?

Organisatorisches

Bilden der Arbeitsgruppen (verschoben auf 2. Sitzung)

Fragen?

Aufgaben bis zur nächsten Woche

Fragen?

Vielen Dank — bis nächste Woche

Literatur

Bevor wir anfangen können:
Wer ist alles da?