LLM-Assisted Codebook Development for Cybersecurity Interviews with Enhanced Accuracy and Reduced Hallucination
: Adeseye, Aisvarya; Isoaho, Jouni; Virtanen, Seppo; Mohammad, Tahir
: N/A
: International Conference on AI in Cybersecurity
: 2026
: 2026 IEEE 5th International Conference on AI in Cybersecurity (ICAIC)
: 1
: 6
: 978-1-6654-7762-8
: 978-1-6654-7761-1
DOI: https://doi.org/10.1109/ICAIC67076.2026.11395872
: https://ieeexplore.ieee.org/document/11395872
Beyond what numerical data captures, qualitative cybersecurity interviews reveal human behaviors, lived experiences, trust perceptions and decision-making patterns. However, today’s current manual and software-assisted coding is slow, difficult to scale and subjective when distinguishing expert and non-expert perspectives. Consequent, recent development of Large Language Models (LLMs) makes them useful for qualitative analysis, but larger models remain costly despite lower hallucination, while smaller models alternatives are cheaper but less reliable. A codebook plays an essential role in structuring themes and interpreting qualitative data transparently and consistently. Therefore, this study proposes an LLM-assisted architecture to generate traceable and hierarchically structured codebooks from cybersecurity interviews. Five techniques were grouped into three areas: accuracy improvement, hallucination reduction, and reduction of context memory usage. These techniques were applied to measure performance, reliability, and coding quality from seven LLMs of various parameter sizes. The architecture produced an accurate codebook that improved coding reliability by up to 75% for non-expert and 35% for experts when compared to baseline manual extraction. Reduction of contextual memory use increased processing efficiency by over 40%, enabling even 1B–3B models to run effectively. Hallucination dropped by 82%, which demonstrates that trustworthy qualitative codes can be generated by small and mid-sized LLMs.