TY - GEN
T1 - Semantic Coupling Between Classes: Corpora or Identifiers?
AU - Ajienka, Nemitari
AU - Capiluppi, Andrea
N1 - Article No 40
PY - 2016/11/30
Y1 - 2016/11/30
N2 - Context: Conceptual coupling is a measure of how loosely or closely related two software artifacts are, by considering the semantic information embedded in the comments and identifiers. This type of coupling is typically evaluated using the semantic information from source code into a words corpus. The extraction of words corpora can be lengthy, especially when systems are large and many classes are involved.
Goal: This study investigates whether using only the class identifiers (e.g., the class names) can be used to evaluate the conceptual coupling between classes, as opposed to the words corpora of the entire classes.
Method: In this study, we analyze two Java systems and extract the conceptual coupling between pairs of classes, using (i) a corpus-based approach; and (ii) two identifier-based tools.
Results: Our results show that measuring the semantic similarity between classes using (only) their identifiers is similar to using the class corpora. Additionally, using the identifiers is more efficient in terms of precision, recall, and computation time.
Conclusions: Using only class identifiers to measure their semantic similarity can save time on program comprehension tasks for large software projects; the findings of this paper support this hypothesis, for the systems that were used in the evaluation and can also be used to guide researchers developing future generations of tools supporting program comprehension.
AB - Context: Conceptual coupling is a measure of how loosely or closely related two software artifacts are, by considering the semantic information embedded in the comments and identifiers. This type of coupling is typically evaluated using the semantic information from source code into a words corpus. The extraction of words corpora can be lengthy, especially when systems are large and many classes are involved.
Goal: This study investigates whether using only the class identifiers (e.g., the class names) can be used to evaluate the conceptual coupling between classes, as opposed to the words corpora of the entire classes.
Method: In this study, we analyze two Java systems and extract the conceptual coupling between pairs of classes, using (i) a corpus-based approach; and (ii) two identifier-based tools.
Results: Our results show that measuring the semantic similarity between classes using (only) their identifiers is similar to using the class corpora. Additionally, using the identifiers is more efficient in terms of precision, recall, and computation time.
Conclusions: Using only class identifiers to measure their semantic similarity can save time on program comprehension tasks for large software projects; the findings of this paper support this hypothesis, for the systems that were used in the evaluation and can also be used to guide researchers developing future generations of tools supporting program comprehension.
KW - Corpora
KW - Corpus
KW - Latent Semantic Indexing (LSI)
KW - Object-oriented software (OO)
KW - Open-source software (OSS)
KW - Semantic coupling
KW - Semantic similarity
KW - Vector Space Model (VSM)
UR - http://www.scopus.com/inward/record.url?scp=84991706356&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84991706356&partnerID=8YFLogxK
U2 - 10.1145/2961111.2962622
DO - 10.1145/2961111.2962622
M3 - Conference proceeding (ISBN)
SN - 978-1-4503-4427-2
T3 - International Symposium on Empirical Software Engineering and Measurement
SP - 1
EP - 6
BT - 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2016
T2 - ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)
Y2 - 8 September 2016 through 9 September 2016
ER -