TY - CONF
T1 - Clusters of keyness: A principled approach to selecting key items
AU - Gabrielatos, Costas
PY - 2017/10/28
Y1 - 2017/10/28
N2 - Keyness analysis is perhaps the most widely used technique within corpus approaches to (critical) discourse studies. As an automated keyness analysis usually returns a much larger number of key items than is feasible to examine manually within an appropriate co-text, the approach to selection of key items is of paramount importance, as it will determine the results and conclusions (Gabrielatos & Marchi, 2011). Currently, studies tend to adopt a methodologically naïve approach to selecting key items for manual analysis: they remove items from consideration before the automated analysis by using frequency thresholds or stoplists and/or or select a small sub-set of items returned by the automated analysis (e.g. the top-N key items and/or key items that they deem relevant to the focus of the study) (see Pojanapunya & Watson Todd, 2016). However, the above approaches lack a principled rationale, and adopting them can remove important key items from consideration and lead to cherry-picking – consequently rendering results and conclusions questionable. Also, keyness studies predominantly focus on differences between the compared corpora, and there are very few studies using keyness analysis to examine similarities (Taylor, 2013). This paper will discuss a new approach to selecting key items in a principled fashion, and will demonstrate the relevant procedures via a case study. The approach utilises cluster analysis, and caters for a focus on both difference and similarity. However, in order to contextualise the proposed procedure, the paper will need to preface its main focus with addressing a number of relevant misconceptions regarding the nature of keyness, the selection of the corpora to be compared (usually referred to as the study and reference corpus), and appropriate metrics for establishing keyness.
References
Gabrielatos, C. & Marchi, A. (2011) Keyness: Matching metrics to definitions. Corpus Linguistics in the South 1. University of Portsmouth, 5 November 2011.
Pojanapunya, P. & Watson Todd, R. (2016) Log-likelihood and odds ratio: keyness statistics for different purposes of keyword analysis. Corpus Linguistics and Linguistic Theory, DOI: 10.1515/cllt-2015-0030.
Taylor, C. (2013) Searching for similarity using corpus-assisted discourse studies. Corpora, 8(1), 81-113.
AB - Keyness analysis is perhaps the most widely used technique within corpus approaches to (critical) discourse studies. As an automated keyness analysis usually returns a much larger number of key items than is feasible to examine manually within an appropriate co-text, the approach to selection of key items is of paramount importance, as it will determine the results and conclusions (Gabrielatos & Marchi, 2011). Currently, studies tend to adopt a methodologically naïve approach to selecting key items for manual analysis: they remove items from consideration before the automated analysis by using frequency thresholds or stoplists and/or or select a small sub-set of items returned by the automated analysis (e.g. the top-N key items and/or key items that they deem relevant to the focus of the study) (see Pojanapunya & Watson Todd, 2016). However, the above approaches lack a principled rationale, and adopting them can remove important key items from consideration and lead to cherry-picking – consequently rendering results and conclusions questionable. Also, keyness studies predominantly focus on differences between the compared corpora, and there are very few studies using keyness analysis to examine similarities (Taylor, 2013). This paper will discuss a new approach to selecting key items in a principled fashion, and will demonstrate the relevant procedures via a case study. The approach utilises cluster analysis, and caters for a focus on both difference and similarity. However, in order to contextualise the proposed procedure, the paper will need to preface its main focus with addressing a number of relevant misconceptions regarding the nature of keyness, the selection of the corpora to be compared (usually referred to as the study and reference corpus), and appropriate metrics for establishing keyness.
References
Gabrielatos, C. & Marchi, A. (2011) Keyness: Matching metrics to definitions. Corpus Linguistics in the South 1. University of Portsmouth, 5 November 2011.
Pojanapunya, P. & Watson Todd, R. (2016) Log-likelihood and odds ratio: keyness statistics for different purposes of keyword analysis. Corpus Linguistics and Linguistic Theory, DOI: 10.1515/cllt-2015-0030.
Taylor, C. (2013) Searching for similarity using corpus-assisted discourse studies. Corpora, 8(1), 81-113.
M3 - Paper
T2 - Corpus Linguistics in the South
Y2 - 28 October 2017
ER -