The identification of non-genuine or malicious messages poses a variety of challenges due to the continuous changes in the techniques utilised by cyber-criminals. In this article, we propose a hybrid detection method based on a combination of image and text spam recognition techniques. In particular, the former is based on sparse representation based classification, which focuses on the global and local image features, and a dictionary learning technique to achieve a spam and a ham subdictionary. On the other hand, the textual analysis is based on semantic properties of documents to assess the level of maliciousness. More specifically, we are able to distinguish between meta-spam and real spam. Experimental results show the accuracy and potential of our approach.
|Journal||Soft Computing - A Fusion of Foundations, Methodologies and Applications|
|Early online date||21 Dec 2015|
|Publication status||E-pub ahead of print - 21 Dec 2015|
Shao, Y., Trovati, M., Shi, Q., Angelopoulou, O., Asimakopoulou, E., & Bessis, N. (2015). A Hybrid Spam Detection Method Based on Unstructured datasets. Soft Computing - A Fusion of Foundations, Methodologies and Applications, 21(1), 233-243. https://doi.org/10.1007/s00500-015-1959-z