Efficient treatment of outliers and class imbalance for diabetes prediction

Research output: Contribution to journalArticle (journal)peer-review

108 Citations (Scopus)
1165 Downloads (Pure)

Abstract

Learning from outliers and imbalanced data remains one of the major difficulties for machine learning classifiers. Among the numerous techniques dedicated to tackle this problem, data preprocessing solutions are known to be efficient and easy to implement. In this paper, we propose a selective data preprocessing approach that embeds knowledge of the outlier instances into artificially generated subset to achieve an even distribution. The Synthetic Minority Oversampling TEchnique (SMOTE) was used to balance the training data by introducing artificial minority instances. However, this was not before the outliers were identified and oversampled (irrespective of class). The aim is to balance the training dataset while controlling the effect of outliers. The experiments prove that such selective oversampling empowers SMOTE, ultimately leading to improved classification performance.
Original languageEnglish
Article number101815
JournalArtificial Intelligence in Medicine (AIIM)
Volume104
Issue number101815
Early online date10 Feb 2020
DOIs
Publication statusPublished - 30 Apr 2020

Keywords

  • Outlier detection
  • Imbalanced data
  • Machine learning
  • Data preprocessing
  • Oversampling
  • SMOTE

Fingerprint

Dive into the research topics of 'Efficient treatment of outliers and class imbalance for diabetes prediction'. Together they form a unique fingerprint.

Cite this