Web Search-based Surveillance of Multiple Diseases in Multiple Countries

Sumaila Nigo (1751213)


SNS-based surveillance lacks when NLP resources or SNS data are scarce; Wikipedia-based is unreliable for widespread languages, and the current search-based is not applicable for low search-volume regions. Thus, how to conduct Internet-based surveillance when those conditions are not met is still an open problem. Our study serves as a first step in exploring the potential of conducting disease surveillance with relative search volume with sliced-timeframes. Our results show that our approach produces predictions with correlations against official patient numbers, ranging from 72% to 95% in countries with a high Web search volume. In countries with fewer Web search volume, our models produced correlations of about 62% for Lassa fever and 59% for Yellow fever in Nigeria. In the contexts Cholera-Nigeria and Cholera-Haiti, our models yielded promising predictions of 47% and 42%, respectively. Furthermore, the results show that standard regression models are most suitable rather than neural-based models. Longer lookback windows on category “health” lessen the noise on signal and in overall produces the most stable results.