Abstract
A myriad of recent literary works has leveraged generative adversarial networks (GANs) to generate unseen evasion samples. The purpose is to annex the generated data with the original train set for adversarial training to improve the detection performance of machine learning (ML) classifiers. The quality of generated adversarial samples relies on the adequacy of training data samples. However, in low data regimes like medical diagnostic imaging and cybersecurity, the anomaly samples are scarce in number. This paper proposes a novel GAN design called Evasion Generative Adversarial Network (EVAGAN) that is more suitable for low data regime problems that use oversampling for detection improvement of ML classifiers. EVAGAN not only can generate evasion samples, but its discriminator can act as an evasion-aware classifier. We have considered Auxiliary Classifier GAN (ACGAN) as a benchmark to evaluate the performance of EVAGAN on cybersecurity (ISCX-2014, CIC-2017 and CIC2018) botnet and computer vision (MNIST) datasets. We demonstrate that EVAGAN outperforms ACGAN for unbalanced datasets with respect to detection performance, training stability and time complexity. EVAGAN's generator quickly learns to generate the low sample class and hardens its discriminator simultaneously. In contrast to ML classifiers that require security hardening after being adversarially trained by GAN-generated data, EVAGAN renders it needless. The experimental analysis proves that EVAGAN is an efficient evasion hardened model for low data regimes for the selected cybersecurity and computer vision datasets. Code will be available at HTTPS://www.github.com/rhr407/EVAGAN. ImpStatement Artificial Intelligence (AI) applications can help improve the quality of human life. The use of AI is not only limited to medical anomaly detection and drug discovery but can be leveraged in computer networks to keep people safe from malicious activities on the Internet. However, the AI-based models can be biased towards the majority class of data on which they are trained due to data imbalance. Anomaly data samples are always scarce as compared to normal data samples. So this is an open research problem to solve. Our work is an effort to improve the AI-based methods in detection performance, stability and time complexity. Using the proposed technique, we can train our AI model using fewer anomaly samples, improving the cost-efficiency compared to state-of-the-art in anomaly detection.
Original language | English |
---|---|
Journal | IEEE Transactions on Artificial Intelligence |
Early online date | 4 Aug 2022 |
DOIs | |
Publication status | Published - 4 Aug 2022 |
Keywords
- Generative adversarial networks
- Training
- Artificial intelligence
- Generators
- Probability distribution
- Mathematical models
- Computer security