A4 Refereed article in a conference publication
Predicting profitability of peer-to-peer loans with recovery models for censored data
Authors: Markus Viljanen, Ajay Byanjankar, Tapio Pahikkala
Editors: Ireneusz Czarnowski, Robert J. Howlett, Lakhmi C. Jain
Conference name: International Conference on Intelligent Decision Technologies
Publisher: Springer
Publication year: 2020
Journal: International Conference on Intelligent Decision Technologies
Book title : Intelligent Decision Technologies: Proceedings of the 12th KES International Conference on Intelligent Decision Technologies (KES-IDT 2020)
Journal name in source: Smart Innovation, Systems and Technologies
Series title: Smart Innovation, Systems and Technologies
Volume: 193
First page : 15
Last page: 25
ISBN: 978-981-15-5924-2
ISSN: 2190-3018
DOI: https://doi.org/10.1007/978-981-15-5925-9_2
Peer-to-peer lending is a new lending approach gaining in popularity.
These loans can offer high interest rates, but they are also exposed to
credit risk. In fact, high default rates and low recovery rates are the
norms. Potential investors want to know the expected profit in these
loans, which means they need to model both defaults and recoveries.
However, real-world data sets are censored in the sense that they have
many ongoing loans, where future payments are unknown. This makes
predicting the exact profit in recent loans particularly difficult. In
this paper, we present a model that works for censored loans based on
monthly default and recovery rates. We use the Bondora data set, which
has a large amount of censored and defaulted loans. We show that loan
characteristics predicting lower defaults and higher recoveries are
usually, but not always, similar. Our predictions have some correlation
with the platform’s model, but they are substantially different. Using a
more accurate model, it is possible to select loans that are expected
to be more profitable. Our model is unbiased, with a relatively low
prediction error. Experiments in selecting portfolios of loans with
lower or higher Loss Given Default (LGD) demonstrate that our model is
useful, whereas predictions based on the platform’s model or credit
ratings are not better than random.