Context Concerns have been raised from many quarters regarding the reliability of empirical research findings and this includes software engineering. Replication has been proposed as an important means of increasing confidence. Objective We aim to better understand the value of replication studies, the level of confirmation between replication and original studies, what confirmation means in a statistical sense and what factors modify this relationship. Method We perform a systematic review to identify relevant replication experimental studies in the areas of (i) software project effort prediction and (ii) pair programming. Where sufficient details are provided we compute prediction intervals. Results Our review locates 28 unique articles that describe replications of 35 original studies that address 75 research questions. Of these 10 are external, 15 internal and 3 internal-same-article replications. The odds ratio of internal to external (conducted by independent researchers) replications of obtaining a ‘confirmatory’ result is 8.64. We also found incomplete reporting hampered our ability to extract estimates of effect sizes. Where we are able to compute replication prediction intervals these were surprisingly large. Conclusion We show that there is substantial evidence to suggest that current approaches to empirical replications are highly problematic. There is a consensus that replications are important, but there is a need for better reporting of both original and replicated studies. Given the low power and incomplete reporting of many original studies, it can be unclear the extent to which a replication is confirmatory and to what extent it yields additional knowledge to the software engineering community. We recommend attention is switched from replication research to meta-analysis.
- Software engineering