Transhistorical Corpus of Written English

Project Details


The Transhistorical Corpus of Written English (TCWE) is a diachronic text corpus developed at Edge Hill University as part of the 'From Manuscripts to Messaging' project which took place between 2019–2021, directed by Dr. Imogen Marcus, with the assistance of Dr. Ursula Maden-Weinberger (see this link for more information about the project: The corpus contains five different text types: sermons, statutes, letters, emails, and instant messages. The texts within the corpus range in date from the fifteenth to the twenty-first centuries. The sermons and statutes date from the 15th - 21st centuries, the letters from the 15th - 20th centuries, and the emails and instant Whatsapp messages from the 21st century.

The corpus has been designed to investigate innovation in digital written language, in particular the way it has been previously been conceptualised as a hybrid of speech and writing, in a historical context. It is for this reason that the corpus contains sermons (towards the spoken end of a conceptual speech-writing continuum), statutes (towards the written end of a conceptual speech-writing continuum), as well as letters, email and instant messages. However, the corpus does not need to be used exclusively for this purpose.

It contains a large amount of metadata and each user can therefore use many search criteria, including text type, century, and in the case of letters, author name, author gender, recipient name and recipient gender. The TWCE is freely available online via the Sketch Engine platform (see the link to the right of this page).

Layman's description

Dataset for linguistic analysis.
Short titleTWCE
Effective start/end date31/01/1931/10/21


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.