Detecting demographic bias in automatically generated personas

: Salminen J., Soongyo J., Jansen B.

: Conference on Human Factors in Computing Systems

Publisher: Association for Computing Machinery

: 2019

: CHI EA '19 Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems

: Conference on Human Factors in Computing Systems - Proceedings

: 6

: 978-1-4503-5971-9

DOI: https://doi.org/10.1145/3290607.3313034

We investigate the existence of demographic bias in automatically
generated personas by producing personas from YouTube Analytics data.
Despite the intended objectivity of the methodology, we find elements of
bias in the data-driven personas. The bias is highest when doing an
exact match comparison, and the bias decreases when comparing at age or
gender level. The bias also decreases when increasing the number of
generated personas. For example, the smaller number of personas resulted
in underrepresentation of female personas. This suggests that a higher
number of personas gives a more balanced representation of the user
population and a smaller number increases biases. Researchers and
practitioners developing data-driven personas should consider the
possibility of algorithmic bias, even unintentional, in their personas
by comparing the personas against the underlying raw data.