News from Journal of Medical Internet Research

Tracking Subjective Sleep Quality and Mood With Mobile Sensing: Multiverse Study


Background: Sleep influences moods and mood disorders. Existing methods for tracking the quality of people’s sleep are laborious and obtrusive. If a method were available that would allow effortless and unobtrusive tracking of sleep quality, it would mark a significant step toward obtaining sleep data for research and clinical applications.
Objective: Our goal was to evaluate the potential of mobile sensing data to obtain information about a person’s sleep quality. For this purpose, we investigated to what extent various automatically gathered mobile sensing features are capable of predicting (1) subjective sleep quality (SSQ), (2) negative affect (NA), and (3) depression; these variables are associated with objective sleep quality. Through a multiverse analysis, we examined how the predictive quality varied as a function of the selected sensor, the extracted feature, various preprocessing options, and the statistical prediction model.
Methods: We used data from a 2-week trial where we collected mobile sensing and experience sampling data from an initial sample of 60 participants. After data cleaning and removing participants with poor compliance, we retained 50 participants. Mobile sensing data involved the accelerometer, charging status, light sensor, physical activity, screen activity, and Wi-Fi status. Instructions were given to participants to keep their smartphone charged and connected to Wi-Fi at night. We constructed 1 model for every combination of multiverse parameters to evaluate their effects on each of the outcome variables. We evaluated the statistical models by applying them to training, validation, and test sets to prevent overfitting.
Results: Most models (on either of the outcome variables) were not informative on the validation set (ie, predicted R2≤0). However, our best models achieved R2 values of 0.658, 0.779, and 0.074 for SSQ, NA, and depression, respectively on the training set and R2 values of 0.348, 0.103, and 0.025, respectively on the test set.
Conclusions: The approach demonstrated in this paper has shown that different choices (eg, preprocessing choices, various statistical models, different features) lead to vastly different results that are bad and relatively good as well. Nevertheless, there were some promising results, particularly for SSQ, which warrant further research on this topic.