TY - JOUR
T1 - Generalization and Personalization of Mobile Sensing-Based Mood Inference Models
T2 - An Analysis of College Students in Eight Countries
AU - Meegahapola, Lakmal
AU - Droz, William
AU - Kun, Peter
AU - de Götzen, Amalia
AU - Nutakki, Chaitanya
AU - Diwakar, Shyam
AU - Ruiz-Correa, Salvador
AU - Song, Donglei
AU - Xu, Hao
AU - Bidoglia, Miriam
AU - Gaskell, George
AU - Chagnaa, Altangerel
AU - Ganbold, Amarsanaa
AU - Zundui, Tsolmon
AU - Caprini, Carlo
AU - Miorandi, Daniele
AU - Hume, Alethia
AU - Zarza, Jose Luis
AU - Cernuzzi, Luca
AU - Bison, Ivano
AU - Rodas Britez, Marcelo
AU - Busso, Matteo
AU - Chenu-Abente, Ronald
AU - Günel, Can
AU - Giunchiglia, Fausto
AU - Schelenz, Laura
AU - Gatica-Perez, Daniel
N1 - M1 - 176
PY - 2022/12
Y1 - 2022/12
N2 - Mood inference with mobile sensing data has been studied in ubicomp literature over the last decade. This inference enables context-aware and personalized user experiences in general mobile apps and valuable feedback and interventions in mobile health apps. However, even though model generalization issues have been highlighted in many studies, the focus has always been on improving the accuracies of models using different sensing modalities and machine learning techniques, with datasets collected in homogeneous populations. In contrast, less attention has been given to studying the performance of mood inference models to assess whether models generalize to new countries. In this study, we collected a mobile sensing dataset with 329K self-reports from 678 participants in eight countries (China, Denmark, India, Italy, Mexico, Mongolia, Paraguay, UK) to assess the effect of geographical diversity on mood inference models. We define and evaluate country-specific (trained and tested within a country), continent-specific (trained and tested within a continent), country-agnostic (tested on a country not seen on training data), and multi-country (trained and tested with multiple countries) approaches trained on sensor data for two mood inference tasks with population-level (non-personalized) and hybrid (partially personalized) models. We show that partially personalized country-specific models perform the best yielding area under the receiver operating characteristic curve (AUROC) scores of the range 0.78-0.98 for two-class (negative vs. positive valence) and 0.76-0.94 for three-class (negative vs. neutral vs. positive valence) inference. Further, with the country-agnostic approach, we show that models do not perform well compared to country-specific settings, even when models are partially personalized. We also show that continent-specific models outperform multi-country models in the case of Europe. Overall, we uncover generalization issues of mood inference models to new countries and how the geographical similarity of countries might impact mood inference.
AB - Mood inference with mobile sensing data has been studied in ubicomp literature over the last decade. This inference enables context-aware and personalized user experiences in general mobile apps and valuable feedback and interventions in mobile health apps. However, even though model generalization issues have been highlighted in many studies, the focus has always been on improving the accuracies of models using different sensing modalities and machine learning techniques, with datasets collected in homogeneous populations. In contrast, less attention has been given to studying the performance of mood inference models to assess whether models generalize to new countries. In this study, we collected a mobile sensing dataset with 329K self-reports from 678 participants in eight countries (China, Denmark, India, Italy, Mexico, Mongolia, Paraguay, UK) to assess the effect of geographical diversity on mood inference models. We define and evaluate country-specific (trained and tested within a country), continent-specific (trained and tested within a continent), country-agnostic (tested on a country not seen on training data), and multi-country (trained and tested with multiple countries) approaches trained on sensor data for two mood inference tasks with population-level (non-personalized) and hybrid (partially personalized) models. We show that partially personalized country-specific models perform the best yielding area under the receiver operating characteristic curve (AUROC) scores of the range 0.78-0.98 for two-class (negative vs. positive valence) and 0.76-0.94 for three-class (negative vs. neutral vs. positive valence) inference. Further, with the country-agnostic approach, we show that models do not perform well compared to country-specific settings, even when models are partially personalized. We also show that continent-specific models outperform multi-country models in the case of Europe. Overall, we uncover generalization issues of mood inference models to new countries and how the geographical similarity of countries might impact mood inference.
KW - Mobile Sensing Data
KW - Mood Inference
KW - Geographical Diversity
KW - Context-Aware Applications
KW - Machine Learning Generalization
KW - Mobile Sensing Data
KW - Mood Inference
KW - Geographical Diversity
KW - Context-Aware Applications
KW - Machine Learning Generalization
UR - https://arxiv.org/pdf/2211.03009.pdf
U2 - 10.1145/3569483
DO - 10.1145/3569483
M3 - Journal article
VL - 6
SP - 1
EP - 32
JO - Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
JF - Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
IS - 4
ER -