Yahoo! Pipes is something I gave a look to back when it was released to the wild. Like many others I assume, I gave up meddling when whatever it was I was angling to do required stupendously long regex statements. However Matt 2.0 recently came fighting back and produced this RSS pipe for the webpage of a curiously interesting macro economics commentator from India. Admittedly the regex component still took up most of my time but I really don’t see how that can change, other than to pray for nice consistent, tabular web pages. Some hope.
Anyways, here is the Pipe ‘source':
4 parts to the above:
- Fetch the page, indicating start & end points and delimiter for items.
- Further chopping up of the results; data cleansing if you like.
- RSS-ify items, i.e. specify title, link & description.
- Pattern match the appropriate content into each field.
Was also going to point out how this can eleviate one of the big failings of the RSS protocol in my book, the inability to filter based on categories, but it seems there’s a recent blog (top google news result for ‘yahoo pipes’) which hits on the same point, and a few links to backgrounders & alternative methods too.
I’ve used dedicated web-scraper-cum-rss-generators in the past, which also include a version of pattern matching to structure data, but let’s be honest, as complex and seemingly unreasonable as regex is, it’s ability to interrogate even the most convoluted pages found on the web cannot be denied. And Pipes being a general platform to twist things (not just static page content) together, hopefully I’ll find greater uses for it in future.