On the Reduction of Verbose Queries in Text Retrieval Based Software Maintenance
Oscar Chaparro and Andrian Marcus
Proceedings of the 38th IEEE/ACM International Conference on Software Engineering (ICSE'16)
Abstract: We argue that verbose queries used for software retrieval contain many terms that follow specific discourse rules, yet hinder retrieval. We report the results of an empirical study on the effect of removing such terms from verbose queries in the context of Text Retrieval-based concept location. In the study, we remove terms from 424 queries, generated from bug reports of nine open source systems. Removing the terms leads to substantial improvement in retrieval: 73% of the queries are improved, leading to 21.8% and 13.4% gain in terms of MRR and MAP, respectively. Such improvement is larger than that of many more sophisticated state-of-the art approaches. The results show promise and the future challenge lies with automatically identifying the terms to be removed from the verbose queries.
