Detection of spam-posting accounts on Twitter

Research output: Contribution to journalArticle

10 Citations (Scopus)
17 Downloads (Pure)

Abstract

Online Social Media platforms, such as Facebook and Twitter, enable all users, independently of their characteristics, to freely generate and consume huge amounts of data. While this data is being exploited by individuals and organisations to gain competitive advantage, a substantial amount of data is being generated by spam or fake users. One in every 200 social media messages and one in every 21 tweets is estimated to be spam. The rapid growth in the volume of global spam is expected to compromise research works that use social media data, thereby questioning data credibility. Motivated by the need to identify and filter out spam contents in social media data, this study presents a novel approach for distinguishing spam vs. non-spam social media posts and offers more insight into the behaviour of spam users on Twitter. The approach proposes an optimised set of features independent of historical tweets, which are only available for a short time on Twitter. We take into account features related to the users of Twitter, their accounts and their pairwise engagement with each other. We experimentally demonstrate the efficacy and robustness of our approach and compare it to a typical feature set for spam detection in the literature, achieving a significant improvement on performance. In contrast to prior research findings, we observe that an average automated spam account posted at least 12 tweets per day at well defined periods. Our method is suitable for real-time deployment in a social media data collection pipeline as an initial preprocessing strategy to improve the validity of research data.
Original languageEnglish
Pages (from-to)1-38
JournalNeurocomputing
Early online date8 Aug 2018
DOIs
Publication statusE-pub ahead of print - 8 Aug 2018

Fingerprint

Social Media
Pipelines
Research
Growth

Keywords

  • Social network
  • Twitter
  • spam
  • social media
  • Twitter microblog
  • spam detection

Cite this

@article{b130a65d2cc84a4ebd4b15a6d7ac40d5,
title = "Detection of spam-posting accounts on Twitter",
abstract = "Online Social Media platforms, such as Facebook and Twitter, enable all users, independently of their characteristics, to freely generate and consume huge amounts of data. While this data is being exploited by individuals and organisations to gain competitive advantage, a substantial amount of data is being generated by spam or fake users. One in every 200 social media messages and one in every 21 tweets is estimated to be spam. The rapid growth in the volume of global spam is expected to compromise research works that use social media data, thereby questioning data credibility. Motivated by the need to identify and filter out spam contents in social media data, this study presents a novel approach for distinguishing spam vs. non-spam social media posts and offers more insight into the behaviour of spam users on Twitter. The approach proposes an optimised set of features independent of historical tweets, which are only available for a short time on Twitter. We take into account features related to the users of Twitter, their accounts and their pairwise engagement with each other. We experimentally demonstrate the efficacy and robustness of our approach and compare it to a typical feature set for spam detection in the literature, achieving a significant improvement on performance. In contrast to prior research findings, we observe that an average automated spam account posted at least 12 tweets per day at well defined periods. Our method is suitable for real-time deployment in a social media data collection pipeline as an initial preprocessing strategy to improve the validity of research data.",
keywords = "Social network, Twitter, spam, social media, Twitter microblog, spam detection",
author = "Ise Inuwa-Dutse and Mark Liptrott and Yannis Korkontzelos",
year = "2018",
month = "8",
day = "8",
doi = "10.1016/j.neucom.2018.07.044",
language = "English",
pages = "1--38",
journal = "Neurocomputing",
issn = "0925-2312",
publisher = "Elsevier",

}

Detection of spam-posting accounts on Twitter. / Inuwa-Dutse, Ise; Liptrott, Mark; Korkontzelos, Yannis.

In: Neurocomputing, 08.08.2018, p. 1-38.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Detection of spam-posting accounts on Twitter

AU - Inuwa-Dutse, Ise

AU - Liptrott, Mark

AU - Korkontzelos, Yannis

PY - 2018/8/8

Y1 - 2018/8/8

N2 - Online Social Media platforms, such as Facebook and Twitter, enable all users, independently of their characteristics, to freely generate and consume huge amounts of data. While this data is being exploited by individuals and organisations to gain competitive advantage, a substantial amount of data is being generated by spam or fake users. One in every 200 social media messages and one in every 21 tweets is estimated to be spam. The rapid growth in the volume of global spam is expected to compromise research works that use social media data, thereby questioning data credibility. Motivated by the need to identify and filter out spam contents in social media data, this study presents a novel approach for distinguishing spam vs. non-spam social media posts and offers more insight into the behaviour of spam users on Twitter. The approach proposes an optimised set of features independent of historical tweets, which are only available for a short time on Twitter. We take into account features related to the users of Twitter, their accounts and their pairwise engagement with each other. We experimentally demonstrate the efficacy and robustness of our approach and compare it to a typical feature set for spam detection in the literature, achieving a significant improvement on performance. In contrast to prior research findings, we observe that an average automated spam account posted at least 12 tweets per day at well defined periods. Our method is suitable for real-time deployment in a social media data collection pipeline as an initial preprocessing strategy to improve the validity of research data.

AB - Online Social Media platforms, such as Facebook and Twitter, enable all users, independently of their characteristics, to freely generate and consume huge amounts of data. While this data is being exploited by individuals and organisations to gain competitive advantage, a substantial amount of data is being generated by spam or fake users. One in every 200 social media messages and one in every 21 tweets is estimated to be spam. The rapid growth in the volume of global spam is expected to compromise research works that use social media data, thereby questioning data credibility. Motivated by the need to identify and filter out spam contents in social media data, this study presents a novel approach for distinguishing spam vs. non-spam social media posts and offers more insight into the behaviour of spam users on Twitter. The approach proposes an optimised set of features independent of historical tweets, which are only available for a short time on Twitter. We take into account features related to the users of Twitter, their accounts and their pairwise engagement with each other. We experimentally demonstrate the efficacy and robustness of our approach and compare it to a typical feature set for spam detection in the literature, achieving a significant improvement on performance. In contrast to prior research findings, we observe that an average automated spam account posted at least 12 tweets per day at well defined periods. Our method is suitable for real-time deployment in a social media data collection pipeline as an initial preprocessing strategy to improve the validity of research data.

KW - Social network

KW - Twitter

KW - spam

KW - social media

KW - Twitter microblog

KW - spam detection

U2 - 10.1016/j.neucom.2018.07.044

DO - 10.1016/j.neucom.2018.07.044

M3 - Article

SP - 1

EP - 38

JO - Neurocomputing

JF - Neurocomputing

SN - 0925-2312

ER -