— aleatory

Financial Charts with App Engine Tutorial Pt 3: Caching Data

Python Cache

In part 2 the chart app was successfully loaded with historical data using appcfg.py’s bulkloader tool and the remote_api module. However as queries over time series data quickly become expensive when we begin to deal with months and years, we should use some kind of caching mechanism to prevent too many datastore operations from slowing down our chart display.

App Engine’s powerful built-in object caching framework is memcache. With memcache we first check if the data we’re looking for is in memory and if so we will retrieve it from there, removing the need for any datastore querying. If not we will retrieve it using GQL as done in part 2 of the tutorial.

so for our 5 day data our datastore code in main.py would become:

1
2
3
4
5
6
7
8
9
from google.appengine.api import memcache
 
		if fixings is None:
			q = db.GqlQuery("SELECT * FROM "+entity.__name__+" ORDER BY timestamp DESC")
			fixings = q.fetch(5)
			for fixing in fixings:
				fixing.timestamp_prettyprint = fixing.timestamp.strftime('%d/%m/%Y')
			if not memcache.add(‘5d’,fixings):
				logging.error('Memcache set failed for %s' % ‘5day’)

This would be fine except that we also need to clear the cache once every weekday, when the new fixing comes in. So we’ll move the memcache checking code out into it’s own module, timeseries.py:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
from google.appengine.ext import db
 
from google.appengine.api import memcache
from datetime import datetime, timedelta, time
from data import timespans
 
def get_tick_data(entity,timespan):
 
	if timespan == timespans[0]: # 5day
		fixings = memcache.get(timespan)
		if fixings is None:
			q = db.GqlQuery("SELECT * FROM "+entity.__name__+" ORDER BY timestamp DESC")
			fixings = q.fetch(5)
			for fixing in fixings:
				fixing.timestamp_prettyprint = fixing.timestamp.strftime('%d/%m/%Y')
			if not memcache.add(timespan,fixings):
				logging.error('Memcache set failed for %s' % timespan)
	elif timespan == timespans[1]: # 1yr
		fixings = memcache.get(timespan)
		if fixings is None:
			multiplier = 7 # weekly data
			limit = 53 # 53 because economists always want data from 1 year ago today...
			fixings = []
			for i in range(limit,0,-1):
				since = datetime.utcnow().date() - timedelta(days=multiplier*(i-1))
				q = db.GqlQuery("SELECT * FROM "+entity.__name__+" WHERE timestamp <= :1 ORDER BY timestamp DESC",since)
 
				fixing = q.get()
				if fixing is not None:
					fixing.timestamp_prettyprint = fixing.timestamp.strftime('%d/%m/%Y')
				else:
					fixing = entity(timestamp=datetime.combine(since,time()),difference=0.0)
					fixing.timestamp_prettyprint = fixing.timestamp.strftime('%d/%m/%Y')
				fixings.append(fixing)
			if not memcache.add(timespan,fixings):
				logging.error('Memcache set failed for %s' % timespan)
 
	return fixings

You’ll note I added the timespans list to data.py

timespans = ['5day','1yr']

and the following to index.html in place of the hardcoded html links:

{% for timespan in timespans %}<span class="timespan"><a href="/{{ timespan }}">{{ timespan }}</a></span>{% endfor %}

so that we keep a central track of what timeseries we keep available.

Now we update both main.py (instead of the code suggested earlier) and our scheduled task cron.py to take advantage of the memcache-aware timeseries.get_tick_data():

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#!/usr/bin/env python
 
import webapp2
import os
from google.appengine.ext.webapp import template
import logging
 
from google.appengine.ext import db
from data import Fixing, timespans
from timeseries import get_tick_data
 
class MainHandler(webapp2.RequestHandler):
	def get(self):
		# get values from datastore, display
 
		timespan = self.request.path_info[1:]
		logging.info('path_info:%s' % timespan)
 
		timespan = timespan.strip('/')
 
		fixings = get_tick_data(fixing,timespan)
 
		results = {'fixings':fixings, 'timespans':timespans}
 
		logging.debug('number of results:%s' % len(results['fixings']))
 
		path = os.path.join(os.path.dirname(__file__), 'html/index.html')
		self.response.out.write(template.render(path,results))
 
logging.getLogger().setLevel(logging.DEBUG)
app = webapp2.WSGIApplication([('/.*', MainHandler)],
                              debug=True)

cron.py just needs the following declaration:

1
2
3
from google.appengine.api import memcache
from data import Fixing, timespans
from timeseries import get_tick_data

and this snippet added to the end of its Handler’s get():

1
2
3
4
		# cache
		memcache.flush_all()
		for timespan in timespans:
			get_tick_data(Fixing,timespan)

As our sample data, 3 Month Eonia, is updated once a day at a fixed time ideally we flush the cache just after the web scraper scheduled task is run and the updates are made to the datastore. Hence the datastore should in theory only be read once a day, making our chart app run virtually entirely in memory and speeding up the end user’s experience.

Finally a small tip: In GQL queries you cannot test properties for None explicitly, however for Integer and FloatProperty at least, you can test for <0 which will return the required None values. Suggestions? Please share… I’m a big fan of Google App Engine as a method of rapid application development. However, programming in the cloud brings new issues and concepts to think about over plain old client server stuff. Things like sharding and it’s consequences require design time thought in order to avoid problems later on in the development cycle. Shout if you see anything that should be discussed!

1 comment
  1. Weamr says: 31 March 20168:21 am

    The Django web framework and applications running on it can be used on App Engine with modification. Django-nonrel

Submit comment