A4 Article in conference proceedings
Toward validation of textual information retrieval techniques for software weaknesses

List of Authors: Jukka Ruohonen, Ville Leppänen
Publisher: Springer Verlag
Publication year: 2018
Journal: Communications in Computer and Information Science
Book title *: Database and Expert Systems Applications: DEXA 2018 International Workshops, BDMICS, BIOKDD, and TIR, Regensburg, Germany, September 3–6, 2018, Proceedings
Volume number: 903
ISBN: 978-3-319-99132-0
eISBN: 978-3-319-99133-7
ISSN: 1865-0929


This paper presents a preliminary validation of common textual
information retrieval techniques for mapping unstructured software
vulnerability information to distinct software weaknesses. The
validation is carried out with a dataset compiled from four software
repositories tracked in the Snyk vulnerability database. According to
the results, the information retrieval techniques used perform
unsatisfactorily compared to regular expression searches. Although the
results vary from a repository to another, the preliminary validation
presented indicates that explicit referencing of vulnerability and
weakness identifiers is preferable for concrete vulnerability tracking.
Such referencing allows the use of keyword-based searches, which
currently seem to yield more consistent results compared to information
retrieval techniques. Further validation work is required for improving
the precision of the techniques, however.

