In part 2 the chart app was successfully loaded with historical data using appcfg.py’s bulkloader tool and the remote_api module. However as queries over time series data quickly become expensive when we begin to deal with months and years, we should use some kind of caching mechanism to prevent too many datastore operations from slowing down our chart display.
App Engine’s powerful built-in object caching framework is memcache. With memcache we first check if the data we’re looking for is in memory and if so we will retrieve it from there, removing the need for any datastore querying. If not we will retrieve it using GQL as done in part 2 of the tutorial.
so for our 5 day data our datastore code in main.py would become:
Here I will walkthrough pre-populating the Google App Engine datastore with historical data in CSV format using the python bulkloader tool to do the data transformation and import.
In the last tutorial we seen how to create a financial chart with Google App Engine & update it periodically via some css-based web scraping. The bulkloader tool that comes in the appcfg.py script will enable us to visualise long term trends *right now* without needing to wait on our datastore growing one day at a time. Again the 3 Month EONIA Index Swap from the European Banking Federation’s Euribor site will be used as sample historical data.
So we need to grab the EONIA historic data. It’s available in CSV already but additionally I needed to:
This is a short 3 part tutorial series that will guide you through how to create & host your own financial charts on Google App Engine.
To begin we’ll see how simple it is to create a web scraper that uses CSS selectors and string manipulation to grab whatever data you want from a website. You can adapt the code to target whichever financial instrument you require. We will use a python app instance on Google App Engine to host the code and its scheduled tasks API to grab the required data periodically and persist to the datastore.
lxml is a python web scraper that uniquely uses CSS to query HTML documents. This makes it ideal for web developers and designers who are already familiar with the selector syntax.
If you haven’t already done so, sign up to Google App Engine, create a new python 2.7 application and download the SDK. Also ensure you have python 2.7 installed locally together with the 2.7 version of the lxml library.
For the purposes of this example we’ll seek to regularly poll the Euribor site and scrap the daily EONIA (Euro Over Night Index Average) 3 Month Swap fixing.
The theoretical physicist Geoffrey West criticised existing accepted thought in urban theory before coming up with a set of constants that defined the relationship between city size and the output of it’s citizens (Each time a city doubles in size it’s per capita innovation, income, etc increases by 15% – and likewise the negative social actions of crime, pollution…). Previously he found a similar efficiency in biology where the larger an organism was the less energy per unit mass it required to go about it’s life.
It’s this track record in reducing a problem domain to a simple set of rules & constraints that is so impressive. The way in which theoretical physics practitioners go about solving for x – the sense of minimalism that drives the crunching of gigs of data and seemingly chaotic environment into understandable, predictable systems. It’s raw data visualisation in it’s purest form.
And then we have the humble infographic.
Behavioural advertising involves the tracking of a web user’s surfing and displaying advertising that matches this data. I find the tracking of my surf history unnecessarily obtrusive personally and today found the online tool that will prevent marketing companies from collecting this data and profiting from it:
Incidentally I came by this information by way of Rapleaf,
Image courtesy jayce 31
Google has done two ‘real-time’ things lately, one good one not so good: Real Time web indexing and real time web search.
With ‘er, hang-on a minute…‘ moments now surfacing in the public domain I find the contrast between the two to be especially important. Google in their traditional engineer style expound the benefits of both in shaving seconds of search: ’11 user hours saved globally each second’; ’50% faster indexing rate of content’; figures that prove the mantra – machines search better than humans.
Machines definitely do the donkey work better than humans.
image courtesy johnson7
App Engine is generally a new paradigm for webapp developers; replacing sessions with memcache and a schemaless datastore just two elements requiring new thinking for old problems. Unfortunately there are a few more hidden nuisances which have the potential to waste programming time relatively early on. Here’s four of my personal head-bangers:
1. the datastore doesn’t always store Properties
I’ve had trouble with it refusing to store arbitrary entity props unless I assign them in the entity constructor itself (these fields were optional btw). Just setting prop values after initialisation then put() on the ds didn’t write them.