A4 Refereed article in a conference publication
Classifying Web Exploits with Topic Modeling
Authors: Jukka Ruohonen
Editors: A Min Tjoa, Roland R. Wagner
Conference name: International Workshop on Database and Expert Systems Applications
Publication year: 2017
Book title : Proceedings of the 28th International Workshop on Database and Expert Systems Applications (DEXA), 2017
First page : 93
Last page: 97
Number of pages: 5
ISBN: 978-1-5386-2207-0
eISBN: 978-1-5386-1051-0
ISSN: 1529-4188
DOI: https://doi.org/10.1109/DEXA.2017.35
Web address : http://ieeexplore.ieee.org/document/8049693/
Self-archived copy’s web address: https://arxiv.org/abs/1710.05561
This short empirical paper investigates how well topic modeling and database meta-data characteristics can classify web and other proof-of-concept (PoC) exploits for publicly disclosed software vulnerabilities. By using a dataset comprised of over 36 thousand PoC exploits, near a 0.9 accuracy rate is obtained in the empirical experiment. Text mining and topic modeling are a significant boost factor behind this classification performance. In addition to these empirical results, the paper contributes to the research tradition of enhancing software vulnerability information with text mining, providing also a few scholarly observations about the potential for semi-automatic classification of exploits in the existing tracking infrastructures.
Downloadable publication This is an electronic reprint of the original article. |