Empirical Evidence in Software Testing

Zoltan Micskei

Hungarian Software Testing Forum (HUSTEF) poster (2017-11-14)

Summary

Is TDD better than test-last? Which test design method to use? There are many myths, but luckily more and more empirical evidence is provided by researchers. The poster will give an overview on available empirical methods and explain what kind of insights to expect from them. Moreover, it will present example studies on testing habits, methods and techniques offering practical suggestions.

Poster

Details

After a short motivation, the poster depicts two things:

There are different empirical methods that are suitable to study various software testing methods and practices, and
There are empirical studies that can offer relevant suggestions for practitioners.

Part 1

The following main methods will be briefly presented:

Controlled experiment: a study where one factor is manipulated to analyse its effect, e.g. comparing the effectiveness of using two different test design methods or tools. Controlled experiments are frequently performed in a laboratory setting.
Case study: a study that investigates a phenomenon within its real-life context, e.g. observing or interviewing testers in a company performing their daily work or analysing artefacts or records from real projects.
Survey research: surveys are used to collect information from a representative sample of a broad population using typically questionnaires.

When assessing the evidence provided by such studies a significant concern is their validity. Are the conclusions valid and supported by the data? Can we generalize the results and use them in different settings? Unfortunately, one study cannot mitigate all validity threats, and therefore has to find a balance between these two (e.g. an experiment tries to control every factor and have a well-defined procedure, but this can result in a non-realistic setting). Therefore, it is vital to understand the typical validity threats to assess the applicability of the results of an empirical study.

Finally, reliable evidence can be obtained by integrating the results of different studies taking into account their findings and limitations. One way to achieve this is to perform a systematic literature review (SLR). SLRs offer valuable information for testers on what works in different settings.

Part 2

A few studies and their main results are presented to illustrate what kind of information can be obtained from empirical studies and how they can influence practice.

A systematic literature review of literature reviews in software testing: This paper collects 101 systematic reviews in the domain of software testing. It is a good start to search for evidence about a given topic (e.g. web testing, test automation, test process)
Reviewing 25 Years of Testing Technique Experiments: A literature review of experiments comparing different test design techniques (e.g. specification or structure-based). The authors summarize the main findings where the different studies agree, e.g. that for different kind of faults different techniques were effective.
When, How, and Why Developers (Do Not) Test in Their IDEs: Developers commonly report that testing takes a significant amount of their time. The authors of this study created an Eclipse plug-in that measures the interactions in the IDE and recorded the interactions of 416 developers during five months that amounts to 13 observed developer years. Although in a questionnaire the selected developers estimated that half of their time was spent on testing, in reality it was only 25% of their time, and the majority of the observed projects did not practice testing actively.
Test-Driven Development - An Empirical Evaluation of Agile Practice: This book presents three detailed experiments carried out with over 200 graduate MSc students to compare test-first and test-last programming. The results showed that test-first produced less coupled code, but it did not support that the test-first has a significant impact on code quality (measured as number of acceptance tests passed).
Maintenance of Automated Test Suites in Industry: An Empirical study on Visual GUI Testing: A case study performed in collaboration with Siemens and Saab that collected 13 factors that affect maintenance of automated GUI tests.
Classifying the Correctness of Generated White-Box Tests: One of our own study about test generator tools. Using graduate students we measured how well humans can understand the assertions in automatically generated tests (spoiler: not very well).

More information

Where to find such studies:

Google Scholar: searches in most of the available literature, and displays if there is an available, free PDF version (e.g. from an author's website)
Empirical Software Engineering: an academic journal focusing solely on empirical works. Articles are usually full of details, but it is worth just to browse the titles of the papers to see what topics have been investigated.
Information and Software Technology: an academic journal with a broad scope, frequently featuring systematic reviews.

Books if you are interested in the topic:

Evidence-Based Software Engineering and Systematic Reviews: a good introduction to the evidence-based paradigm and the available methods. Offers hints on how to evaluate the quality and findings of studies and how can the obtained knowledge be transferred to practice.
Experimentation in Software Engineering: an introduction to empirical studies focusing on controlled experiments.
Case Study Research in Software Engineering - Guidelines and Examples: overview of performing case studies in industrial/realistic settings. The book presents several detailed examples (about different topics and with different companies, e.g. project management study with IBM, study on quality monitoring).

Last modification: 2017-11-13