Validation of the Cognitive Complexity Metric for Code Comprehension

July 28, 2020 / Stefan Wagner

We validated the Cognitive Complexity measure as collected in SonarQube in an experimental study accepted at the upcoming ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. Our practical recommendation is to keep Cognitive Complexity values low, since a high metric value correlates with a longer time to understand and with poorer subjective ratings of developers.

It is an ongoing challenge to judge the complexity of source code, especially if we aim to to the assessment automatically. Many metrics have been proposed that failed in empirical validations to really represent complexity in the sense of higher difficulty in code comprehension.

Campbell (2018) developed Cognitive Complexity as a measure that should overcome this challenge. This measure was implemented in SonarQube and SonarCloud by SonarSource S.A. Code is evaluated purely syntactically and based on control flow structures. The metric is similar to Cyclomatic Complexity, but is intended to explicitly measure understandability (!= testability). Yet, so far, there was no large-scale validation whether it holds the promise to really measure complexity only based on static source code attributes.

Marvin Muñoz Barón, Marvin Wyrich and Stefan Wagner from the Empirical Software Engineering group designed and performed a study to fill this gap. The resulting article has recently been accepted to be published at the upcoming ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. A preprint is available on arXiv.

We started to systematically search for data sets from studies that dealt with program comprehension on source code level. We finally obtained 427 code snippets together with about 24k individual human comprehensibility evaluations.These data came from a total of 10 studies. We calculated correlations between the evaluations and the values for Cognitive Complexity of the respective code snippets and statistically summarized the coefficients in a meta-analysis.Our practical recommendation is to keep Cog. Complexity values low, since a high metric value correlates with a longer time to understand and with poorer subjective ratings of developers.

In addition, the paper has interesting insights into comprehension research: The study shows that code comprehension is currently measured in many different ways (correctness, time, ratings, physiological, ...). There are good arguments for all of them. The problem is that we cannot really compare the findings until we know how these measures relate to each other.In fact, the measurements of different proxies of one and the same study sometimes even contradict each other. This increases the uncertainty that already exists when designing code understanding experiments (see also Siegmund, 2016).

 

References

G. Ann Campbell. 2018. Cognitive Complexity: An Overview and Evaluation. In Proceedings of the 2018 International Conference on Technical Debt (TechDebt '18). ACM.

J. Siegmund. 2016. Program Comprehension: Past, Present, and Future. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE.

 

To the top of the page