Wired yesterday reports on a health site service that tracks disease outbreaks using news feeds such as Google news. A nifty bayesian-based machine learning algorithm is used, filtering out noise with some kind of intelligent phrase indexing -
For instance, key words like “mysterious” tend to pop up in outbreak stories, but not, say, in coverage of vaccine programs. Another common feature of outbreak stories is a small number in the headline, usually to denote a number of people infected or killed.
The site has actually been up & running since 2006 as this gmaps mashup blog records.
More detail on how it works can be found here.
I would like to create something along these lines for financial data, with buy/sell signals replacing the gmaps visualisation. Google News, owing to it’s concentration on news aggregation, does not currently capture stories quick enough for it to be used as part of a beat-the-market type event trading system. It’s aggregation nature would however lend itself perfectly to a more long term trend alerting mechanism. The smarts to be built on top of it would I imagine be pretty similar to what goes on in HealthMap above.
Industry talk on news flow algorithms seems to have disappeared after a bit of buzz a few years back. It may have went the way of the Neural Networks of the 80s.




0 Responses to “Machine Readable Google News”