Why Compromise Privacy? Local LLMs Rival Commercial LLMs in Qualitative Analysis - UTU Tutkimustietojärjestelmä

A4 Vertaisarvioitu artikkeli konferenssijulkaisussa

Why Compromise Privacy? Local LLMs Rival Commercial LLMs in Qualitative Analysis

Tekijät: Adeseye, Aisvarya; Isoaho, Jouni; Virtanen, Seppo; Mohammad, Tahir

Toimittaja: N/A

Konferenssin vakiintunut nimi: Computing, Communications and IoT Applications

Julkaisuvuosi: 2025

Kokoomateoksen nimi: 2025 Computing, Communications and IoT Applications (ComComAp)

Aloitussivu: 127

Lopetussivu: 132

ISBN: 979-8-3315-9144-1

eISBN: 979-8-3315-9143-4

DOI: https://doi.org/10.1109/ComComAp68359.2025.11353130

Julkaisun avoimuus kirjaamishetkellä: Ei avoimesti saatavilla

Julkaisukanavan avoimuus : Ei avoin julkaisukanava

Verkko-osoite: https://ieeexplore.ieee.org/document/11353130

Tiivistelmä

Large Language Models (LLMs) are increasingly being applied in qualitative analysis for tasks such as theme extraction, frequency analysis, and impact evaluation. However, their adoption raises privacy and GDPR compliance concerns when transcripts are processed using commercial LLMs such as ChatGPT or Gemini. Existing studies highlight these risks but provide little systematic evidence for comparing local and commercial LLMs. This study evaluates the performance of local LLMs such as LLaMA-3.1 (8B), LLaMA-3.2 (1B−3B), LLaMA-3.3 (70B), Gemma-2 (2B−27B), and Phi-3.5 (3.5B−6.6B) against commercial LLMs (ChatGPT-4o and Gemini-2.5 Flash) using 82 anonymized transcripts for qualitative analysis tasks. A structured prompt design was applied, and the results were benchmarked against ground-truth coding using cost, through-put, hallucination rate, and accuracy rate. The findings indicate that the small local LLMs (about 3B) performed comparably close to Gemini, medium models (6-9B) performed close to ChatGPT, and large LLMs (27B−70B) consistently outperformed both commercial LLMs. Hallucination reduction of up to 85% was observed with local LLMs at negligible recurring costs. Furthermore, local LLMs help with GDPR compliance and privacy preservation. It also minimizes cost while delivering accuracy that is comparable, or better than the commonly available commercial LLMs.