TY - JOUR
T1 - Constructing a biodiversity terminological
inventory.
AU - Nhung, T.H.
AU - Nguyen, Axel J. Soto
AU - Kontonatsios, Georgios
AU - Batista-Navarro, Riza
AU - Ananiadou, Sophia
PY - 2017/4
Y1 - 2017/4
N2 - The increasing growth of literature in
biodiversity presents challenges to users
who need to discover pertinent
information in an efficient and timely
manner. In response, text mining
techniques offer solutions by facilitating
the automated discovery of knowledge
from large textual data. An important step
in text mining is the recognition of
concepts via their linguistic realisation, i.e.,
terms. However, a given concept may be
referred to in text using various synonyms
or term variants, making search systems
likely to overlook documents mentioning
less known variants, which are albeit
relevant to a query term. Domain-specific
terminological resources, which include
term variants, synonyms and related
terms, are thus
important in supporting semantic search
over large textual archives. This article
describes the use of text mining methods
for the automatic construction of a largescale
biodiversity term inventory. The
inventory consists of names of species,
amongst which naming variations
are prevalent. We apply a number of
distributional semantic techniques on all of
the titles in the Biodiversity Heritage
Library, to compute semantic similarity
between species names and support the
automated construction of the resource.
With the construction of our biodiversity
term inventory, we demonstrate that
distributional semantic models are able to
identify semantically similar names that
are not yet recorded in existing
taxonomies. Such methods can thus be
used to update existing taxonomies semiautomatically
by deriving semantically
related taxonomic names from a text
corpus and allowing expert curators to
validate them. We also evaluate our
inventory as a means to improve search by
facilitating automatic query expansion.
Specifically, we developed a visual search
interface that suggests semantically
related species names, which are available
in our inventory but not
always in other repositories, to incorporate
3 / 6
into the search query. An assessment of
the interface by domain experts reveals
that our query expansion based on related
names is useful for increasing the number
of relevant documents retrieved. Its
exploitation can benefit both users and
developers of search engines and text
mining applications
AB - The increasing growth of literature in
biodiversity presents challenges to users
who need to discover pertinent
information in an efficient and timely
manner. In response, text mining
techniques offer solutions by facilitating
the automated discovery of knowledge
from large textual data. An important step
in text mining is the recognition of
concepts via their linguistic realisation, i.e.,
terms. However, a given concept may be
referred to in text using various synonyms
or term variants, making search systems
likely to overlook documents mentioning
less known variants, which are albeit
relevant to a query term. Domain-specific
terminological resources, which include
term variants, synonyms and related
terms, are thus
important in supporting semantic search
over large textual archives. This article
describes the use of text mining methods
for the automatic construction of a largescale
biodiversity term inventory. The
inventory consists of names of species,
amongst which naming variations
are prevalent. We apply a number of
distributional semantic techniques on all of
the titles in the Biodiversity Heritage
Library, to compute semantic similarity
between species names and support the
automated construction of the resource.
With the construction of our biodiversity
term inventory, we demonstrate that
distributional semantic models are able to
identify semantically similar names that
are not yet recorded in existing
taxonomies. Such methods can thus be
used to update existing taxonomies semiautomatically
by deriving semantically
related taxonomic names from a text
corpus and allowing expert curators to
validate them. We also evaluate our
inventory as a means to improve search by
facilitating automatic query expansion.
Specifically, we developed a visual search
interface that suggests semantically
related species names, which are available
in our inventory but not
always in other repositories, to incorporate
3 / 6
into the search query. An assessment of
the interface by domain experts reveals
that our query expansion based on related
names is useful for increasing the number
of relevant documents retrieved. Its
exploitation can benefit both users and
developers of search engines and text
mining applications
KW - Algorithms
KW - Biodiversity
KW - Data Mining/methods
KW - Libraries
KW - Search Engine
KW - Semantics
KW - Terminology as Topic
UR - http://www.scopus.com/inward/record.url?scp=85017630031&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85017630031&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/52804a33-b34a-3fdd-93f9-055cd8477ec4/
U2 - 10.1371/journal.pone.0175277
DO - 10.1371/journal.pone.0175277
M3 - Article (journal)
C2 - 28414821
SN - 1932-6203
VL - 12
JO - PLoS ONE
JF - PLoS ONE
IS - 4
M1 - e0175277
ER -