Detection of Microcosms on Twitter

  • ISA INUWA-DUTSE

    Student thesis: Doctoral Thesis

    Abstract

    A network is a composition of many sub-networks or communities with distinct and overlapping
    properties. Because similarity breeds attraction and interaction, a community constitutes
    of sets of nodes and edges with a stronger relationship that is expressed as a function of relatedness.
    Network communities provide a crucial organising principle, which enables a better
    understanding of the structure and function of complex networks. Depending on the network
    type, communities come in various forms – from biologically- to technologically-induced communities.
    Of technologically-induced communities, social networks or social media platforms
    such as Twitter and Facebook support a myriad of diverse users to remain connected, leading
    to a highly connected and dynamic social media ecosystem. Within this complex ecosystem,
    multiple types of communications happen at various layers of granularity and intensity, leading
    to the formation of communities. The task of identifying embedded communities within a network
    has been of great interests for various reasons because a community is a functional unit
    of a network that captures local relationship among the network objects. Community detection
    paradigm involves prediction and quantification processes to identify and explain community
    structures in a network. Establishing the equivalence of network entities is achieved either based
    on (1) the equivalent units with the same connection pattern to the same neighbours and (2) the
    equivalent units have the same or similar connection pattern to different neighbours. Accordingly,
    communities are further formed around two primary modalities or sources of information:
    network structure and features or attributes of nodes. However, existing studies mostly focus on
    one aspect and the few studies based on a bi-modal source are limited in the use of a shallow set
    of features. In the context of Twitter, while many community detection algorithms have been
    proposed in the past, detection of socially cohesive communities still poses some challenges
    with respect to mining-related tasks. These challenges are due to (1) flexibility of interaction in
    social media, leading to a vast amount of content – relevant and irrelevant (2) a form of logical social dichotomy that favours content from popular users to dominate (3) the ability to automate
    users’ accounts and remain anonymous (4) the eccentricity of connection on Twitter contributes
    to identifying many socially unrelated users and encourage the propagation of spurious content.
    Noting the challenges mentioned above, the thesis presents an effective detection method.
    The central themes in the research relate to the problems of identifying genuine content and
    detection of socially cohesive groups. The problem of identifying genuine content is tackled using
    a novel approach (SPD strategy) designed to filter out irrelevant content, while the problem
    of community detection is formulated to focus on smaller groups, which are homogeneous to
    many sociodemographic behavioural, and intrapersonal characteristics. Essentially, the research
    proposed a multilevel clustering technique (MCT) that leverages both structural and textual aspects
    to identify local communities termed microcosms. By recognising the harmful effect of
    social media spam and fake content towards undermining credible research based on analysing
    social media data, the thesis contributed a useful content filtering system. As a precautionary
    measure to avoid compromising the research outcome by irrelevant or unrepresentative data, the
    SPD strategy offers crucial insights into the sophisticatedly evolving techniques of spamming
    on Twitter. As a result, the detection of socially cohesive communities will be enhanced, thus
    providing a useful analysis tool and strengthening the validity of online content. The proposed
    MCT provides a useful, scalable framework to identify sub-groups in a network. The experimental
    results from the MCT and evaluation on benchmark models and datasets demonstrate
    the efficacy of the approach. Through this research work, a new dimension for the detection
    of cohesive communities on Twitter is contributed. The thesis contributes to the literature by
    offering better understanding and clarity toward describing how low-level communities of users
    evolve and behave on Twitter. Moreover, by identifying communities of users with strong cohesion,
    a well-informed recommendation that recognises structural and content similarities can
    be achieved.
    Date of Award7 Jan 2020
    Original languageEnglish
    Awarding Institution
    • Edge Hill University
    SupervisorYANNIS KORKONTZELOS (Director of Studies), FRANCO RIZZUTO (Supervisor) & MARK LIPTROTT (Supervisor)

    Cite this

    Detection of Microcosms on Twitter
    INUWA-DUTSE, I. (Author). 7 Jan 2020

    Student thesis: Doctoral Thesis