Addressing imbalanced data for machine learning based mineral prospectivity mapping - UTU Research Portal

A2 Refereed review article in a scientific journal

Addressing imbalanced data for machine learning based mineral prospectivity mapping

Authors: Farahnakian, Fahimeh; Sheikh, Javad; Zelioli, Luca; Nidhi, Dipak; Seppä, Iiro; Ilo, Rami; Nevalainen, Paavo; Heikkonen, Jukka

Publisher: Elsevier BV

Publication year: 2024

Journal: Ore Geology Reviews

Journal name in source: Ore Geology Reviews

Article number: 106270

Volume: 174

ISSN: 0169-1368

eISSN: 1872-7360

DOI: https://doi.org/10.1016/j.oregeorev.2024.106270

Publication's open availability at the time of reporting: Open Access

Publication channel's open availability : Open Access publication channel

Web address : https://doi.org/10.1016/j.oregeorev.2024.106270

Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/458935530

Self-archived copy's licence: CC BY

Self-archived copy's version: Publisher`s PDF

Abstract

Effective Mineral Prospectivity Mapping (MPM) relies on the ability of Machine Learning (ML) models to extract meaningful patterns from geophysical data. However, in mineral exploration, identifying the presence of mineral deposits is often a rare event compared with the overall geological landscape. This rarity leads to a highly imbalanced dataset, where positive instances (mineralized samples) are considerably less frequent than negative instances (non-mineralized samples). Imbalanced data can potentially bias ML models towards the majority class, leading to inaccurate predictions for the minority class (mineralized samples) which are of primary interest. To address this challenge, we proposed two-level methods in this study. At the data level, we employed imbalanced data handling techniques that operate on the training dataset and change the class distribution. At the algorithmic level, we adjust the decision threshold of a model to balance the trade-off between false positives and false negatives. Experimental results are collected on a geophysical data from Lapland, Finland. The dataset exhibits a significant class imbalance, comprising 17 positive samples contrasted with 1.84×106 negative samples. We investigate the effect of the handling imbalanced data on the performance of four ML models including Multi-Layer Perceptron (MLP), Random Forest (RF), Decision Tree (DT), and Logistic Regression (LR). From the results, we found that the MLP model achieved the best overall performance, with total accuracy of 97.13% on balanced data using synthetic minority oversampling method. Random forest and DT also performed well, with accuracies of 88.34% and 89.35%, respectively. The implemented methodology of this work is integrated in QGIS as a new toolkit which is called EIS Toolkit¹ for MPM.

Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.

1-s2.0-S0169136824004037-main.pdf

Funding information in the publication:
The compilation of the presented work is supported by funds from the Horizon Europe research and innovation program under Grant Agreement number 101057357, EIS – Exploration Information System (https://eis-he.eu).