— aleatory

machine learning

ye olde Python logo

Or reasons why Python is cool #273.

There are many snippets of code on the internet detailing how to go about dynamic class creation when the compiler (and thus, you the programmer) knows the name and module location of the class some time in advance, e.g. initial deploy time or via some parameter being entered in by the user when they want to instantiate such a class.

However if I want to create objects without knowing their module/class in advance a little bit more thinking is required. Here is a simple tutorial that will leave you with a package that when updated with new modules will introspect & instantiate their classes on demand.

Read More

Stuff like this makes me want to get back into Comp Sci

Read More

Yesterday I blogged about the use of online news aggregators such as Google News to map trends. Further surfing on the topic yielded the current state of the art in such aggregators – Silobreaker is at it’s most basic another news aggregator. But it’s added value lies in it’s ability to analyse, group & visualise related news stories in a way much more intelligent than the likes of Google’s related story results.

For an example of how this fits in with my idea of a trend alerting system, see one of it’s bespoke topics – profit warnings. Historic data seems to back anything up to a year, and what Silobreaker calls a ‘360 degree’ view, you get quotes, blogs, media trends as well as the news. The trends graph can be used to compare the relative media interest in news subjects, or ‘entities’ – a welcome nod in the direction of the Semantic Web there, with users having the ability to add entities not already recognised.

Further on the Semantic theme there is a network function that produces graphs of related entities. The obligatory news to map function is also available, and extremely useful it is too – I just learned of the US airstrikes against Al Qaeda on Somali soil because of it. As with the trend function, a time range can be specified.

There are always room for improvements on machine learning, and Silobreaker is no different – the above US airstrike news item was listed as of Baghdad origin, which in itself seemed fine as source of the article was a UPI journalist in Baghdad, but there was no mention of the Somali bombing run over Somalia itself on the world map.

Registering gives you the ability to personalise among other things your list of news sources. I guess what I’m saying from all this is that Silobreaker does much of the heavy lifting I’d envisage a financial trend forecasting tool to do. What now is needed is an API to access this analysis – something I’d be hopeful for, considering their background in ‘open source’ intelligence.

Read More

Wired yesterday reports on a health site service that tracks disease outbreaks using news feeds such as Google news. A nifty bayesian-based machine learning algorithm is used, filtering out noise with some kind of intelligent phrase indexing –

For instance, key words like “mysterious” tend to pop up in outbreak stories, but not, say, in coverage of vaccine programs. Another common feature of outbreak stories is a small number in the headline, usually to denote a number of people infected or killed.

The site has actually been up & running since 2006 as this gmaps mashup blog records.

More detail on how it works can be found here.

I would like to create something along these lines for financial data, with buy/sell signals replacing the gmaps visualisation.  Google News, owing to it’s concentration on news aggregation, does not currently capture stories quick enough for it to be used as part of a beat-the-market type event trading system.  It’s aggregation nature would however lend itself perfectly to a more long term trend alerting mechanism.  The smarts to be built on top of it would I imagine be pretty similar to what goes on in HealthMap above.

Industry talk on news flow algorithms seems to have disappeared after a bit of buzz a few years back.  It may have went the way of the Neural Networks of the 80s.

Read More