We are approaching a perfect storm in the area of flu-tracking. Not only are machine learning algorithms becoming more sophisticated, but increased numbers of people are not only self-diagnosing via the Internet and talking about symptoms on social media.
This combination of factors provides an excellent source for data mining, resulting in new resources such as Google’s Flu Trends map – novel methods which are recommended by a study released today in Philosophical Transactions of the Royal Society: Biological Sciences.
Influenza and Other Infectious Diseases: Mapping and Tracking
According to the authors of this review, “there is a revolution occurring in both the volume and public availability of data about the health and wellbeing of individuals and populations through various forms of social media; most notably Twitter (twitter.com).”
What does the increase in the use of online information sources and social media, and the resulting increase in access to health information for the general public mean for data mining? Generally speaking, it’s an improvement – the more data, the better. A larger pool of information means that the algorithm gets more practice sorting, and gets better and reaching a valid conclusion, which means greater accuracy in results.
Flu Trends Map: What’s the State of the Nation’s Health?
When tracking the spread of the flu, Google aggregates search data, including searches for symptoms related to influenza, strips out personal information, and compiles the Flu Trends map with the resulting numbers. Started during the 2009 Swine Flu Pandemic, the map provides timely and accurate information to health officials and the public.
Do You Have The Flu? A Little Bird Told Me!
Recent research at Johns Hopkins University found that it’s possible to remove the ‘chatter’ from Twitter conversations about the flu to track the spread via Tweets. Johns Hopkins School of Medicine postdoctoral fellow David A. Broniatowski told Decoded Science that their research provides a complement to Google’s map, saying:
“We see the two as very much synergystic. Twitter data is one source of data for studying a very complex problem — it can give us a window into the way the disease spreads that can then be used in concert with CDC data and other data sources. The advantage of Twitter is that it is an early warning signal whereas other data sources take more time.”
Decoding Science. One article at a time.