M-25: Reliability and Validity of Outcomes Data Using Statistical Methods for Wearable Medical Devices: A Systematic Review
Cezar Ocampo Manansala
Centro Escolar University Philippines
To determine the available evidences on the reliability and validity of wearable devices through a systematic review based on the four main statistical analysis used for wearable devices namely, Intraclass correlation coefficient, ROC analysis, sensitivity and specificity and Bland Altman Plots.
A systematic review was conducted in accordance with PRISMA guidelines and was registered to PROSPERO. A search was made using the keywords: wearable device or wearable medical device and intraclass correlation coefficient or ROC analysis or sensitivity and specificity or Bland-Altman plots.
A total of 7,578 articles were produced by the search, 122 full text articles were screened and only 62 eligible studies were included in the final analysis. A total of 68 different types of wearable devices were tested for their reliability and/or validity. Most of the wearable devices included in the study were used to measure heart rate, energy expenditure, step counts in different conditions and sleep acitivity. The researchers determined the mean for all the wearable devices tested (42) for their intraclass correlation coefficient in order to arrive at the reliability conclusion. The result shows that majority (16) had moderate reliability (0.5-0.75) against reference device/s or standards. Only 10 types had excellent reliability (>0.9), while 12 and 4 types had good (0.75-0.9) and poor (<0.5) reliability respectively. Only one type out of the total wearable devices were tested for and produced high proportionality of sensitivity and specificity results, 98% and 58% respectively. According to Bland and Altman interpretation, we can thus say that this device is accurate in comparison to reference standard. For the validity, the relationship that appeared most often or the mode in all the devices was determined because of the descriptive characteristic of the results. 51 types of wearable devices were tested for Bland Altman plots and majority (25) had low agreements against reference standard, while 17 and 9 types had modest and high agreements respectively. Through the interpretation given, we can say that most of the devices have low validity. Finally, four types of wearable devices were tested for ROC analysis and all produced high sensitivity and high specificity results, which we can then conclude that the accuracy is high when compared to reference standards.
This systematic review identified how reliable and/or valid are the existing wearable devices in the market today. It was found out that the results are heterogenous among different types of wearable devices, moreover, the results varies significantly based on the statistical treatment used, the parameters being measured and the conditions of the subjects. This is somewhat in agreement with Remoortel et.als 2012 review on the validity of activity monitors. For the manufacturers, it is critical that a validation on both reliability and validity of data recorded by these devices be performed not only in the laboratory setting but in actual set up as well. Changes in the software can also significantly affect the validity of data such as in the updates of firmwares as cited by Evenson et.al,2015. For the regulatory authorities, we have come to an age where self monitoring in the form of wearable devices have risen dramatically. Proper screening and approval of these devices should be carefully done prior to release in the market as it could significantly impact the health assessment of the consumers. A problem that developing countries faces in the world today. For the healthcare professionals, including the self assessment of patients and using wearable devices for health management should be carefully asessed as its validity presently varies as seen on this study and its effect to the individual can have high risk in the long run. Finally, the consumers who will witness newer wearable device in the future, should critically select to even attempt using these devices to monitor their health. Overall, this review described statistical results that is a basis for determining if these devices can be effectively used as a monitor in health conditions and whether incorporating them for a more specialized purpose such as for clinical trials is appropriate. As newer activity trackers and monitors are introduced everyday, this study can be a good basis for future researches.