Machine Readable Google News

Wired yesterday reports on a health site service that tracks disease outbreaks using news feeds such as Google news. A nifty bayesian-based machine learning algorithm is used, filtering out noise with some kind of intelligent phrase indexing -

For instance, key words like “mysterious” tend to pop up in outbreak stories, but not, say, in coverage of vaccine programs. Another common feature of outbreak stories is a small number in the headline, usually to denote a number of people infected or killed.

The site has actually been up & running since 2006 as this gmaps mashup blog records.

More detail on how it works can be found here.

I would like to create something along these lines for financial data, with buy/sell signals replacing the gmaps visualisation.  Google News, owing to it’s concentration on news aggregation, does not currently capture stories quick enough for it to be used as part of a beat-the-market type event trading system.  It’s aggregation nature would however lend itself perfectly to a more long term trend alerting mechanism.  The smarts to be built on top of it would I imagine be pretty similar to what goes on in HealthMap above.

Industry talk on news flow algorithms seems to have disappeared after a bit of buzz a few years back.  It may have went the way of the Neural Networks of the 80s.

0 Responses to “Machine Readable Google News”


  • No Comments

Leave a Reply