Clusters of keyness: A principled approach to selecting key items

    Research output: Contribution to conferencePaperpeer-review

    226 Downloads (Pure)


    Keyness analysis is perhaps the most widely used technique within corpus approaches to (critical) discourse studies. As an automated keyness analysis usually returns a much larger number of key items than is feasible to examine manually within an appropriate co-text, the approach to selection of key items is of paramount importance, as it will determine the results and conclusions (Gabrielatos & Marchi, 2011). Currently, studies tend to adopt a methodologically naïve approach to selecting key items for manual analysis: they remove items from consideration before the automated analysis by using frequency thresholds or stoplists and/or or select a small sub-set of items returned by the automated analysis (e.g. the top-N key items and/or key items that they deem relevant to the focus of the study) (see Pojanapunya & Watson Todd, 2016). However, the above approaches lack a principled rationale, and adopting them can remove important key items from consideration and lead to cherry-picking – consequently rendering results and conclusions questionable. Also, keyness studies predominantly focus on differences between the compared corpora, and there are very few studies using keyness analysis to examine similarities (Taylor, 2013). This paper will discuss a new approach to selecting key items in a principled fashion, and will demonstrate the relevant procedures via a case study. The approach utilises cluster analysis, and caters for a focus on both difference and similarity. However, in order to contextualise the proposed procedure, the paper will need to preface its main focus with addressing a number of relevant misconceptions regarding the nature of keyness, the selection of the corpora to be compared (usually referred to as the study and reference corpus), and appropriate metrics for establishing keyness. References Gabrielatos, C. & Marchi, A. (2011) Keyness: Matching metrics to definitions. Corpus Linguistics in the South 1. University of Portsmouth, 5 November 2011. Pojanapunya, P. & Watson Todd, R. (2016) Log-likelihood and odds ratio: keyness statistics for different purposes of keyword analysis. Corpus Linguistics and Linguistic Theory, DOI: 10.1515/cllt-2015-0030. Taylor, C. (2013) Searching for similarity using corpus-assisted discourse studies. Corpora, 8(1), 81-113.
    Original languageEnglish
    Publication statusAccepted/In press - 28 Oct 2017
    EventCorpus Linguistics in the South - University of Cambridge, Cambridge, United Kingdom
    Duration: 28 Oct 2017 → …


    ConferenceCorpus Linguistics in the South
    Country/TerritoryUnited Kingdom
    Period28/10/17 → …


    Dive into the research topics of 'Clusters of keyness: A principled approach to selecting key items'. Together they form a unique fingerprint.
    • Keyness Analysis: nature, metrics and techniques

      Gabrielatos, C., 7 Feb 2018, Corpus Approaches to Discourse: A Critical Review. Taylor, C. & Marchi, A. (eds.). Oxford: Routledge, p. 225-258 34 p.

      Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

      Open Access

    Cite this