Synthetic Data as a Strategy to Resolve Data Privacy and Confidentiality Concerns in the Sport Sciences: Practical Examples and an R Shiny Application

M. Naughton, D. Weaving, T. Scott, H. Compton

Research output: Contribution to journalArticle (journal)peer-review

1 Citation (Scopus)
5 Downloads (Pure)

Abstract

There has been a proliferation in technologies in the sport performance environment that collect increasingly larger
quantities of athlete data. These data have the potential to be personal, sensitive, and revealing and raise privacy and confidentiality
concerns. A solution may be the use of synthetic data, which mimic the properties of the original data. The aim of this study was to
provide examples of synthetic data generation to demonstrate its practical use and to deploy a freely available web-based R Shiny
application to generate synthetic data. Methods: Openly available data from 2 previously published studies were obtained,
representing typical data sets of (1) field- and gym-based team-sport external and internal load during a preseason period (n = 28)
and (2) performance and subjective changes from before to after the posttraining intervention (n = 22). Synthetic data were
generated using the synthpop package in R Studio software, and comparisons between the original and synthetic data sets were
made through Welch t tests and the distributional similarity standardized propensity mean squared error statistic. Results: There
were no significant differences between the original and more synthetic data sets across all variables examined in both data sets
(P > .05). Further, there was distributional similarity (ie, low standardized propensity mean squared error) between the original
observed and synthetic data sets. Conclusions: These findings highlight the potential use of synthetic data as a practical solution to
privacy and confidentiality issues. Synthetic data can unlock previously inaccessible data sets for exploratory analysis and
facilitate multiteam or multicenter collaborations. Interested sport scientists, practitioners, and researchers should consider
utilizing the shiny web application (SYNTHETIC DATA—available at https://assetlab.shinyapps.io/SyntheticData/).
Original languageEnglish
Pages (from-to)1213-1218
Number of pages6
JournalInternational Journal of Sports Physiology and Performance
Volume18
Issue number10
Early online date18 Jul 2023
DOIs
Publication statusPublished - 18 Jul 2023

Keywords

  • data analysis
  • hypothesis generation
  • simulation
  • sport performance
  • technology

Fingerprint

Dive into the research topics of 'Synthetic Data as a Strategy to Resolve Data Privacy and Confidentiality Concerns in the Sport Sciences: Practical Examples and an R Shiny Application'. Together they form a unique fingerprint.

Cite this