New Publication on Detecting Requirements Smells with Deep Learning

May 17, 2022

The Empirical Software Engineering group explored in a recent paper how well smells in requirements specifications can be detected using deep learning. The paper was published at the AIRE 2021 workshop.

Mohammad Kasra Habib, Daniel Graziotin and Stefan Wagner from the Empirical Software Engineering group published the paper "Detecting Requirements Smells With Deep Learning: Experiences, Challenges and Future Work" at the Eigth International Workshop on Artificial Intelligence and Requirements Engineering (AIRE'21). 

Requirements Engineering (RE) is one of the initial phases when building a software system. The success or failure of a software project is firmly tied to this phase, based on communication among stakeholders using natural language. The problem with natural language is that it can easily lead to different understandings if it is not expressed precisely by the stakeholders involved. This results in building a product which is different from the expected one. Previous work proposed to enhance the quality of the software requirements by detecting language errors based on ISO 29148 requirements language criteria. The existing solutions apply classical Natural Language Processing (NLP) to detect them. NLP has some limitations, such as domain dependability which results in poor generalization capability. Therefore, this work aims to improve the previous work by creating a manually labeled dataset and using ensemble learning, Deep Learning (DL), and techniques such as word embeddings and transfer learning to overcome the generalization problem that is tied with classical NLP and improve precision and recall metrics using a manually labeled dataset. The current findings show that the dataset is unbalanced and which class examples should be added more. It is tempting to train algorithms even if the dataset is not considerably representative. Whence, the results show that models are overfitting; in Machine Learning this issue is adressed by adding more instances to the dataset, improving label quality, removing noise, and reducing the learning algorithms complexity, which is planned for this research.

Open Preprint on arXiv

To the top of the page