A4 Refereed article in a conference publication
Toward validation of textual information retrieval techniques for software weaknesses
Authors: Jukka Ruohonen, Ville Leppänen
Editors: Mourad Elloumi, Michael Granitzer, Abdelkader Hameurlain, Christin Seifert, Benno Stein, A Min Tjoa, Roland Wagner
Conference name: International Conference on Database and Expert Systems Applications
Publisher: Springer Verlag
Publication year: 2018
Journal: Communications in Computer and Information Science
Book title : Database and Expert Systems Applications: DEXA 2018 International Workshops, BDMICS, BIOKDD, and TIR, Regensburg, Germany, September 3–6, 2018, Proceedings
Journal name in source: Communications in Computer and Information Science
Series title: Communications in Computer and Information Science
Volume: 903
First page : 265
Last page: 277
Number of pages: 13
ISBN: 978-3-319-99132-0
eISBN: 978-3-319-99133-7
ISSN: 1865-0929
DOI: https://doi.org/10.1007/978-3-319-99133-7_22
This paper presents a preliminary validation of common textual
information retrieval techniques for mapping unstructured software
vulnerability information to distinct software weaknesses. The
validation is carried out with a dataset compiled from four software
repositories tracked in the Snyk vulnerability database. According to
the results, the information retrieval techniques used perform
unsatisfactorily compared to regular expression searches. Although the
results vary from a repository to another, the preliminary validation
presented indicates that explicit referencing of vulnerability and
weakness identifiers is preferable for concrete vulnerability tracking.
Such referencing allows the use of keyword-based searches, which
currently seem to yield more consistent results compared to information
retrieval techniques. Further validation work is required for improving
the precision of the techniques, however.