Accepted for/Published in: JMIR Formative Research
Date Submitted: Oct 19, 2021
Date Accepted: Apr 10, 2022
Fairness in Mobile Phone-based Mental Health Assessment Algorithms: Exploratory Study
ABSTRACT
Background:
Roughly 1 in 5 American adults experience mental illness every year. Mobile phone based mental health prediction apps, which utilize phone data and artificial intelligence (AI) techniques for mental health assessment have thus become increasingly important and are being rapidly developed. At the same time, multiple AI-related technologies (e.g., face recognition, search results) have recently been reported to be biased with respect to age, gender, race, etc. This paper moves this discussion to a new domain: phone-based mental health assessment algorithms. It is important to ensure that such algorithms do not contribute to gender disparities through biased predictions across gender groups.
Objective:
The objective of the research was to (a) analyze the susceptibility of multiple commonly used machine learning approaches for gender bias in mobile mental health assessment, and (b) explore the use of an algorithmic Disparate Impact removal approach to reduce bias levels while maintaining high accuracy.
Methods:
First, we performed pre-processing and model training using the dataset (N = 55) obtained from the previous study. Accuracy levels and differences in accuracy across gender were computed using five different machine learning models. We selected Random Forest model, which yielded the highest accuracy, for more detailed audit and computed multiple metrics that are commonly used in fairness in machine learning literature. Finally, we applied Disparate Impact Remover (DIR) to reduce bias in the mental health assessment algorithm.
Results:
The highest observed accuracy for mental health assessment was 78.57%. While this accuracy level raises optimism, the audit based on gender revealed that the performance of the algorithm was statistically significantly different for male and female groups (e.g., difference of accuracy across genders = 15.85 %, P<.001). Similar trends were obtained in terms of other fairness metrics. This disparity in the performance was found to reduce significantly after the application of a DIR approach by adapting the data used for modeling (e.g., difference of accuracy across genders = 1.66 %, reduction is statistically significant with P<.001).
Conclusions:
This paper grounds the need for algorithmic auditing in phone-based mental health assessment algorithms, and the use of gender as a protected attribute to study fairness in such settings. Such audits and remedial steps are building blocks for widespread adoption of fair and accurate mental health assessment algorithms in the future.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.