Statistics and visualisations of theatre corpora using corpus analysis software
Laburpena
Corpus linguistics is a powerful quantitative methodology that relies on frequency data and statistical procedures (Han 2019). According to Gries (2013), scientific quantitative research has three main goals: description, explanation and prediction of data. Within this frame, statistics makes sense of quantitative data by means of analysis and useful visualisations (Brezina 2018).
There are many techniques that have been designed for monolingual corpora such as statistical identification of collocations or keywords. While most of these can also be applied to different types of corpora, such as parallel and comparable ones, it seems that a dedicated set of statistics related to structural singularities of text types such as theatre plays is missing.
In this study, we propose a range of different adaptations of statistics and visualisations that apply and interrelate theatre-specific filters. Dramatic texts division in structural units is a specific feature of this genre (Andaluz-Pinedo and Sanjurjo-González in press). Utterances, speakers, stage directions and dialogues are an intrinsic part of these texts that must be taken into account when developing useful and descriptive statistical procedures. It is thus necessary to offer statistics and visualizations that apply and interrelate theatre-specific filters. Some examples of this adaptation may be quantitative analysis based on the units of characters, utterances, stage directions and dialogues instead of using all the texts data as a whole.
As Anthony states (2013), “the functionality offered by software tools largely dictates what corpus linguistics research methods are available to a researcher”. In order to improve this functionality when theatre corpora are analysed, further work includes the integration of this approach into an existing corpus analysis software that processes theatre play-texts such as ACTRES Corpus Manager (Sanjurjo-González, 2017).