Corpora

German Suicide Note Corpus CorGeS

The Corpus of German Suicide Notes, short CorGeS (pronounced like the plural noun for the dog breed), is a collection of suicide notes. In total, there are 261 notes in the corpus.

There are several parts to this corpus as detailed below:
Part 1 contains transcribed suicide notes from the 1910s to 1930s from a police corpus. The subcorpus comprises a total of 118 cases with 155 notes. All notes were collected by a police officer and were retyped or rewritten in the collection process. Some of the notes were left in relation to attempted suicides, while five notes are presumably not related to a suicide, but remain in the dataset for full disclosure. For some cases more than one note has been collected, leaving 150 suicide notes in this subcorpus. I gained ownership of the corpus through the police officer's estate administrator. Part 2 of the corpus contains suicide notes gathered from a publication by Willemsen. Part 3 of the corpus contains notes from a book by Morgenthaler. Part 4 of the corpus contains notes from a book by Grashoff.

The corpus has restricted access due to the sensitive nature of the data. It can be requested via Aston University's FoLD. For the anonymised version, personal information, such as names or locations, have been replaced. The non-anonymised version, either as text files, scans of the notes or the full physical folder, is only available by specific application as it contains personal information of the deceased. If you would like to work with the corpus, please get in touch either through FoLD or by email.

Suggested references:

Roemling, D. & Busso, L. (2026). CorGeS: The Corpus of German Suicide Notes. Applied Corpus Linguistics, Volume 6, Issue 1. https://doi.org/10.1016/j.acorp.2025.100177
Roemling, D. (2024). CorGeS - The Corpus of German Suicide Notes. [Data set]. https://doi.org/10.5281/zenodo.15359633. Access request available at FoLD.

CRIME: The Corpus of Recorded Investigative, Media, and Evidence-based proceedings

The CRIME corpus is a project together with Steven Coats from the University of Oulu.
The corpus is a structured, searchable language resource containing high-quality Automatic Speech Recognition (ASR) transcripts and audio from police investigative interviews, courtroom proceedings, and criminal-justice-focused media content. Sourced from publicly accessible YouTube channels and made available under the provisions of the EU Data Mining Act, the corpus enables in-depth research into linguistic, phonetic, pragmatic, and discourse features of criminal justice interactions.
You can acces the corpus search here. If you wish to download the static version of the corpus, please follow this doi: https://doi.org/10.7910/DVN/MLMB6E.

Suggested reference:

Coats, S. and Roemling, D. (2025). C.R.I.M.E.: Corpus of Recorded Investigative, Media, and Evidence-based Proceedings, static version 0.1. https://doi.org/10.7910/DVN/MLMB6E. Harvard Dataverse. V1.