Detection of Microcosms on Twitter


Student thesis: Doctoral Thesis


A network is a composition of many sub-networks or communities with distinct and overlapping
properties. Because similarity breeds attraction and interaction, a community constitutes
of sets of nodes and edges with a stronger relationship that is expressed as a function of relatedness.
Network communities provide a crucial organising principle, which enables a better
understanding of the structure and function of complex networks. Depending on the network
type, communities come in various forms – from biologically- to technologically-induced communities.
Of technologically-induced communities, social networks or social media platforms
such as Twitter and Facebook support a myriad of diverse users to remain connected, leading
to a highly connected and dynamic social media ecosystem. Within this complex ecosystem,
multiple types of communications happen at various layers of granularity and intensity, leading
to the formation of communities. The task of identifying embedded communities within a network
has been of great interests for various reasons because a community is a functional unit
of a network that captures local relationship among the network objects. Community detection
paradigm involves prediction and quantification processes to identify and explain community
structures in a network. Establishing the equivalence of network entities is achieved either based
on (1) the equivalent units with the same connection pattern to the same neighbours and (2) the
equivalent units have the same or similar connection pattern to different neighbours. Accordingly,
communities are further formed around two primary modalities or sources of information:
network structure and features or attributes of nodes. However, existing studies mostly focus on
one aspect and the few studies based on a bi-modal source are limited in the use of a shallow set
of features. In the context of Twitter, while many community detection algorithms have been
proposed in the past, detection of socially cohesive communities still poses some challenges
with respect to mining-related tasks. These challenges are due to (1) flexibility of interaction in
social media, leading to a vast amount of content – relevant and irrelevant (2) a form of logical social dichotomy that favours content from popular users to dominate (3) the ability to automate
users’ accounts and remain anonymous (4) the eccentricity of connection on Twitter contributes
to identifying many socially unrelated users and encourage the propagation of spurious content.
Noting the challenges mentioned above, the thesis presents an effective detection method.
The central themes in the research relate to the problems of identifying genuine content and
detection of socially cohesive groups. The problem of identifying genuine content is tackled using
a novel approach (SPD strategy) designed to filter out irrelevant content, while the problem
of community detection is formulated to focus on smaller groups, which are homogeneous to
many sociodemographic behavioural, and intrapersonal characteristics. Essentially, the research
proposed a multilevel clustering technique (MCT) that leverages both structural and textual aspects
to identify local communities termed microcosms. By recognising the harmful effect of
social media spam and fake content towards undermining credible research based on analysing
social media data, the thesis contributed a useful content filtering system. As a precautionary
measure to avoid compromising the research outcome by irrelevant or unrepresentative data, the
SPD strategy offers crucial insights into the sophisticatedly evolving techniques of spamming
on Twitter. As a result, the detection of socially cohesive communities will be enhanced, thus
providing a useful analysis tool and strengthening the validity of online content. The proposed
MCT provides a useful, scalable framework to identify sub-groups in a network. The experimental
results from the MCT and evaluation on benchmark models and datasets demonstrate
the efficacy of the approach. Through this research work, a new dimension for the detection
of cohesive communities on Twitter is contributed. The thesis contributes to the literature by
offering better understanding and clarity toward describing how low-level communities of users
evolve and behave on Twitter. Moreover, by identifying communities of users with strong cohesion,
a well-informed recommendation that recognises structural and content similarities can
be achieved.
Date of Award7 Jan 2020
Original languageEnglish
Awarding Institution
  • Edge Hill University
SupervisorYANNIS KORKONTZELOS (Director of Studies), FRANCO RIZZUTO (Supervisor) & MARK LIPTROTT (Supervisor)

Cite this