— aleatory

Financial Charts with App Engine Tutorial Pt 1: Web Scraping

5 day chart of 3month EONIA

This is a short 3 part tutorial series that will guide you through how to create & host your own financial charts on Google App Engine.

To begin we’ll see how simple it is to create a web scraper that uses CSS selectors and string manipulation to grab whatever data you want from a website. You can adapt the code to target whichever financial instrument you require. We will use a python app instance on Google App Engine to host the code and its scheduled tasks API to grab the required data periodically and persist to the datastore.

lxml is a python web scraper that uniquely uses CSS to query HTML documents. This makes it ideal for web developers and designers who are already familiar with the selector syntax.

If you haven’t already done so, sign up to Google App Engine, create a new python 2.7 application and download the SDK. Also ensure you have python 2.7 installed locally together with the 2.7 version of the lxml library.

For the purposes of this example we’ll seek to regularly poll the Euribor site and scrap the daily EONIA (Euro Over Night Index Average) 3 Month Swap fixing. The instrument is important as a measure of the risk-free rate in the Eurozone currency area and forms the basis of numerous derivatives contracts and at least one measure of the health of the Eurozone banking system. The code for this guide is based on a simplified version of the Euribor OIS Spread chart on a little side project of mine.

Before starting out if you’re doing so for the first time it’s worth noting again a few of the old issues I have had with App Engine when I first began working with it.

First, our simple data model data.py:

1
2
3
4
class Fixing(db.Model):
	timestamp = db.DateTimeProperty()
	instrument = db.StringProperty()
	bips = db.FloatProperty()

We want to store the rate, in bips, the name of the instrument (‘eonia_3m’) and the date it was extracted. Now the css-based web scrape using lxml cron.py – which will be run as a scheduled task:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#!/usr/bin/env python
 
import webapp2
import logging
 
from datetime import datetime
from lxml.cssselect import CSSSelector
from lxml.html import parse
from urllib2 import urlopen
 
from data import Fixing
 
class MainHandler(webapp2.RequestHandler):
	def get(self):
		url = 'http://www.euribor-ebf.eu/eoniaswap-org/about-eoniaswap.html'
		page = urlopen(url)
		doc = parse(page).getroot()
		target = doc.cssselect('#todayRates li')
		list_item = '3m'
		surrounding_text = ['3m','(']
		for item in target:
			if list_item in item.text_content().strip():
				tmp = item.text_content().strip()
				eonia_3m = tmp[tmp.index(surrounding_text[0])+len(surrounding_text[0]):tmp.index(surrounding_text[1])].strip()		
 
		if eonia_3m is None:
			logging.info('BREAK: %s' % (url))
 
		Fixing(timestamp=datetime.utcnow(),instrument="eonia_3m",bips=eonia_3m).put()
 
 
logging.getLogger().setLevel(logging.INFO)
app = webapp2.WSGIApplication([('/tasks/fixing', MainHandler)],
                              debug=True)
 
if __name__ == '__main__':
	run_wsgi_app(app)

Here we open a URL stream, pass it to lxml’s parse function which returns the document tree of which we getroot(). Lxml’s cssselect() is straightforward to use, getting a list of elements matching the specified css selector. In our example we want the 3 Month data and so search through the returned list for a matching string and strip out the characters surrounding the actual rate in basis points. The value is wrapped in our Fixing object and stored to the cloud.

We add the handler to our app.yaml:

1
2
3
- url: /tasks/fixing
  script: cron.app
  login: admin

We wrap this task in a cron job using cron.yaml. A useful feature of App Engine cron is that you can set the timezone to whatever value you require, meaning we can set it for just after the daily publishing time on the Euribor site, which is 1100 CET:

1
2
3
4
5
cron:
- description: daily fixing collection
  url: /tasks/fixing
  schedule: every mon,tue,wed,thu,fri 11:05
  timezone: Europe/Brussels

Now to visualise this data we create another web handler, main.py:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/usr/bin/env python
 
import webapp2
import os
from google.appengine.ext.webapp import template
import logging
 
from datetime import datetime, timedelta
from google.appengine.ext import db
from data import Fixing
 
class MainHandler(webapp2.RequestHandler):
	def get(self):
 
		# 5 day timespan default
timespan = 5
 
		since = datetime.utcnow().date() - timedelta(days=5)
 
		q = db.GqlQuery("SELECT * FROM Fixing ORDER BY timestamp DESC")
 
		fixings = q.fetch(timespan)
		for fixing in fixings:
			fixing.timestamp_prettyprint = fixing.timestamp.strftime('%d/%m/%Y')
 
		results = {'fixings':fixings}
 
		path = os.path.join(os.path.dirname(__file__), 'html/index.html')
		self.response.out.write(template.render(path,results))
 
logging.getLogger().setLevel(logging.DEBUG)
app = webapp2.WSGIApplication([('/.*', MainHandler)],
                              debug=True)

The html/index.html template, which uses Google Charts to visualise the time series data:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
<html>
<head>
<title>EURIBOR - OIS Spread</title>
<script type="text/javascript" src="https://www.google.com/jsapi"></script>
<script type="text/javascript">
	google.load("visualization", "1", {packages:["corechart"]});
		google.setOnLoadCallback(drawChart);
		function drawChart() {
			var data = google.visualization.arrayToDataTable([
				['Date', '% points'],
				{% for fixing in fixings %}
				['{{ fixing.timestamp_prettyprint }}', {{ fixing.difference }} ],
				{% endfor %}
			]);
 
			var options = {
				title: '3 Month EONIA',
			};
 
			var chart = new google.visualization.AreaChart(document.getElementById('chart_div'));
			chart.draw(data, options);
		}
</script>
</head>
<body>
<div id="chart_div" style="width:900px; height:500px"></div>
</body>
</html>

Finally we create the app.yaml for our app and deploy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
application: finchart
version: 0-1
runtime: python27
api_version: 1
threadsafe: yes
 
handlers:
- url: /favicon\.ico
  static_files: favicon.ico
  upload: favicon\.ico
 
- url: /tasks/fixing
  script: cron.app
  login: admin
 
- url: /.*
  script: main.app
 
libraries:
- name: webapp2
  version: "2.5.1"
 
- name: lxml
  version: latest

And that’s how to create a simple chart using data scraped from public websites. Next we’ll look at how to bulkload historical data into the App Engine datastore so that you can visualise long term trends in your charts.

2 comments
  1. Nick says: 10 September 20123:10 pm

    tnx, useful info

  2. Web Scraping | josedouglasramirez says: 4 October 201310:34 pm
Submit comment