<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>aleatory &#187; data</title>
	<atom:link href="http://aleatory.clientsideweb.net/category/data/feed/" rel="self" type="application/rss+xml" />
	<link>http://aleatory.clientsideweb.net</link>
	<description></description>
	<lastBuildDate>Thu, 19 Jan 2012 18:53:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Faux Data: Infographics</title>
		<link>http://aleatory.clientsideweb.net/2011/01/04/faux-data-infographics/</link>
		<comments>http://aleatory.clientsideweb.net/2011/01/04/faux-data-infographics/#comments</comments>
		<pubDate>Tue, 04 Jan 2011 11:55:24 +0000</pubDate>
		<dc:creator>rutherford</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[information media]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://aleatory.clientsideweb.net/?p=454</guid>
		<description><![CDATA[The theoretical physicist Geoffrey West criticised existing accepted thought in urban theory before coming up with a set of constants that defined the relationship between city size and the output of it’s citizens (Each time a city doubles in size it’s per capita innovation, income, etc increases by 15% &#8211; and likewise the negative social [...]]]></description>
			<content:encoded><![CDATA[<p><img alt="" src="http://chart.apis.google.com/chart?chxl=0:|Time|1:|Quality&#038;chxp=1,10&#038;chxt=x,y&#038;chs=400x400&#038;cht=lxy&#038;chco=3072F3&#038;chd=t:0,10,20,30,40,50,60,70,80,90,100|100,90,80,70,60,50,40,30,20,10,0&#038;chdlp=b&#038;chls=2,4,1&#038;chma=5,5,5,25&#038;chtt=Standard+of+Web+Infographics" title="Standard of Web Infographics" class="aligncenter" width="400" height="400" /></p>
<p>The theoretical physicist Geoffrey West <a href="http://www.nytimes.com/2010/12/19/magazine/19Urban_West-t.html?_r=4&#038;ref=magazine&#038;pagewanted=all#">criticised existing accepted thought in urban theory</a> before coming up with a set of constants that defined the relationship between city size and the output of it’s citizens (Each time a city doubles in size it’s per capita innovation, income, etc increases by 15% &#8211; and likewise the negative social actions of crime, pollution&#8230;). Previously he found a similar efficiency in biology where the larger an organism was the less energy per unit mass it required to go about it’s life.</p>
<p>It’s this track record in reducing a problem domain to a simple set of rules &#038; constraints that is so impressive. The way in which theoretical physics practitioners go about solving for x &#8211; the sense of minimalism that drives the crunching of gigs of data and seemingly chaotic environment into understandable, predictable systems. It’s raw data visualisation in it’s purest form.</p>
<p>And then we have the humble infographic.<span id="more-454"></span></p>
<p>A little like urban theory at present far from being a true ‘data’ oriented approach they have morphed from original good intentions into a viral sub-genre seemingly as gratification for the design community while conveying no more than a smattering of anecdotal crumbs as an afterthought. Who cares if it’s insightful so long as it’s typeface looks cool. </p>
<p>Unconvinced?</p>
<p>Look at a recent effort from the popular <a href="http://www.informationisbeautiful.net/2010/debtris/">information is beautiful</a> site. The title leads us to believe there is a message about debt buried in the animated visualisations. Well, that’s a matter for debate. There is a bunch of figures expressed in terms of tetris blocks with area equating to amount. But it’s such a jumble of data without any clever way of giving context or connection beyond one dimensional “Data A vs Data B” scalar quantities. </p>
<p>For instance, the cost of the credit crunch is compared to African debt. One is many times more than the other. Perhaps worth some kind of further analysis, if only to see if there is some kind of basic relationship holding these unrelated numbers together. But it’s actually at the end of the clip, one that started off with everything from Tesco’s revenue to some guy’s net worth to the annual level of corporate tax evasion. The clever bit is apparently getting all these random amounts to slot together nicely as Tetris blocks&#8230;.</p>
<p>So what? Where is the relationship? What is linking them? Where is the message?</p>
<p>And so here is an animated gimmick that tells us a bunch of unrelated numbers and surprisingly enough doesn’t try to relate them. It’s medium is certainly graphical. Is the data presented really ‘info’? Has the web deluge instead merely managed to dilute ‘info’ to mean any random factoid? I’ll be honest, there are people out there describing themselves as data geeks and I doubt they’ve touched the fundamentals of mathematics since GCSE.</p>
<p>Btw I’m in no way picking on just the above case, it just happened to be the first one I found. The web’s full of similarly vacuous ‘infographics’ that offer little in the way of truly informing people..</p>
<p>Undoubtedly there is an important place for real data graphics in popular science today. Peer through the sea of non-existent ‘insight’ and hubris surrounding the more widely circulated pseudo-data variant. The <a href="http://www.globalrecruitingroundtable.com/wp-content/uploads/where-we-live-in-us.jpg">best graphics</a> can guide the viewer no matter what they’re level of expertise to take in a startling array of data in a matter of seconds and crucially allow expression of context and relationship.</p>
<p>We need to get back to what infographics were developed for: rich visualisations of complex data expressed in a manner that conveys a simple overarching relationship to the observer free from narcissistic clutter and plain randomness.</p>
]]></content:encoded>
			<wfw:commentRss>http://aleatory.clientsideweb.net/2011/01/04/faux-data-infographics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to Opt Out of Targeted Behavioural Advertising</title>
		<link>http://aleatory.clientsideweb.net/2010/10/26/how-to-opt-out-of-targeted-behavioural-advertising/</link>
		<comments>http://aleatory.clientsideweb.net/2010/10/26/how-to-opt-out-of-targeted-behavioural-advertising/#comments</comments>
		<pubDate>Tue, 26 Oct 2010 21:25:29 +0000</pubDate>
		<dc:creator>rutherford</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[Tech Labours]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://aleatory.clientsideweb.net/?p=428</guid>
		<description><![CDATA[Behavioural advertising involves the tracking of a web user&#8217;s surfing and displaying advertising that matches this data. I find the tracking of my surf history unnecessarily obtrusive personally and today found the online tool that will prevent marketing companies from collecting this data and profiting from it: http://www.networkadvertising.org/managing/opt_out.asp Incidentally I came by this information by [...]]]></description>
			<content:encoded><![CDATA[<p><img alt="" src="http://farm5.static.flickr.com/4107/5118352871_09c16ca398.jpg" title="Agressive Computer Advertisers" class="aligncenter" width="500" height="158" /><br />
Behavioural advertising involves the tracking of a web user&#8217;s surfing and displaying advertising that matches this data. I find the tracking of my surf history unnecessarily obtrusive personally and today found the online tool that will prevent marketing companies from collecting this data and profiting from it:</p>
<p><a href="http://www.networkadvertising.org/managing/opt_out.asp">http://www.networkadvertising.org/managing/opt_out.asp</a></p>
<p>Incidentally I came by this information by way of Rapleaf, <span id="more-428"></span>who are one of these &#8216;database marketing&#8217; companies who engage in datamining browsing habits in a big way. Interestingly I remember them from a <a href="http://techcrunch.com/2006/04/23/rapleaf-to-challenge-ebay-feedback/">TechCrunch article</a> a few years back where they started out innocently enough as an online social networking reputation tool &#8211; until eBay didn&#8217;t like it encroaching on their space and banned Rapleaf content from sellers auction pages.</p>
<p>If you&#8217;re still registered with them and like myself didn&#8217;t realise they had morphed into an marketing data company you can delete your account with them <a href="https://www.rapleaf.com/opt_out">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://aleatory.clientsideweb.net/2010/10/26/how-to-opt-out-of-targeted-behavioural-advertising/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>When to do Real Time</title>
		<link>http://aleatory.clientsideweb.net/2010/10/10/when-to-do-real-time/</link>
		<comments>http://aleatory.clientsideweb.net/2010/10/10/when-to-do-real-time/#comments</comments>
		<pubDate>Sun, 10 Oct 2010 22:53:25 +0000</pubDate>
		<dc:creator>rutherford</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[information media]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://aleatory.clientsideweb.net/?p=383</guid>
		<description><![CDATA[Image courtesy jayce 31 Google has done two &#8216;real-time&#8217; things lately, one good one not so good: Real Time web indexing and real time web search. With &#8216;er, hang-on a minute&#8230;&#8216; moments now surfacing in the public domain I find the contrast between the two to be especially important. Google in their traditional engineer style [...]]]></description>
			<content:encoded><![CDATA[<p><img alt="" src="http://farm3.static.flickr.com/2703/4150957996_58f0437e8e.jpg" title="Tape Deck Amstrad 464" class="aligncenter" width="500" height="375" />Image courtesy <a href="http://www.flickr.com/photos/jayce_31/">jayce 31</a></p>
<p>Google has done two &#8216;real-time&#8217; things lately, one good one not so good:  <a href="http://www.theregister.co.uk/2010/09/09/google_caffeine_explained/">Real Time web indexing</a> and <a href="http://googleblog.blogspot.com/2010/09/search-now-faster-than-speed-of-type.html">real time web search</a>.</p>
<p>With &#8216;<a href="http://www.capecodonline.com/apps/pbcs.dll/article?AID=/20101005/BIZ/10050303">er, hang-on a minute&#8230;</a>&#8216; moments now surfacing in the public domain I find the contrast between the two to be especially important.  Google in their traditional engineer style expound the benefits of both in shaving seconds of search: &#8217;11 user hours saved globally each second&#8217;; &#8217;50% faster indexing rate of content&#8217;; figures that prove the mantra &#8211; machines search better than humans.</p>
<p>Machines definitely do the donkey work better than humans.  <span id="more-383"></span>Indexing is a dumb process easily solvable by machine and has been for decades.  The migration from batch processing to incremental updating of the search index that Google Caffeine delivers is an essential improvement to real time search.</p>
<p>The Google Instant realtime GUI trick is not such a homerun.  Instant brings up a full page of results updated character by character.  In cases where the user searches over two or more words &#8211; in my experience the vast majority of search &#8211; context is vital.  Rarely is that context clear until the entire phrase is typed in.  This is why google instant, as fast as it undoubtedly is, rarely returns what you&#8217;re looking for until you complete your search term.  </p>
<p>In any case, the Mind Machine Interface is a delicate thing and only as strong as the weakest link &#8211; the human.  And it&#8217;s the human that has to comprehend this extra flow of data, most of it extraneous.</p>
<p>Google does not yet do the contextual understanding the user must accomplish to use Instant search successfully &#8211; and I wouldn&#8217;t like them to try, as that would likely involve personalisation based on past searches and as my browsing habits change over time I don&#8217;t want past results skewing things.</p>
<p>Incidently an <a href="http://www.davidnaylor.co.uk/google-instant-hmmm.html">ulterior motive for Google Instant</a> can always be found on the web. </p>
<p>So in conclusion real time is only useful when the data can be transformed into a form easily processed as by the end user.  If it cannot it instead serves to exacerbate the problem of information overload rather than lessening it.  </p>
<p>The ideal real time UI has yet to be realised.</p>
]]></content:encoded>
			<wfw:commentRss>http://aleatory.clientsideweb.net/2010/10/10/when-to-do-real-time/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google App Engine Datastore Gotchas</title>
		<link>http://aleatory.clientsideweb.net/2009/11/28/google-app-engine-datastore-gotchas/</link>
		<comments>http://aleatory.clientsideweb.net/2009/11/28/google-app-engine-datastore-gotchas/#comments</comments>
		<pubDate>Sat, 28 Nov 2009 17:21:19 +0000</pubDate>
		<dc:creator>rutherford</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Tech Labours]]></category>

		<guid isPermaLink="false">http://aleatory.clientsideweb.net/2009/11/28/google-app-engine-datastore-gotchas/</guid>
		<description><![CDATA[image courtesy johnson7 App Engine is generally a new paradigm for webapp developers; replacing sessions with memcache and a schemaless datastore just two elements requiring new thinking for old problems. Unfortunately there are a few more hidden nuisances which have the potential to waste programming time relatively early on. Here&#8217;s four of my personal head-bangers: [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://aleatory.clientsideweb.net/wp-content/uploads/2009/11/stormclouds.jpg" alt="stormclouds gather" /><br />
<span class="wp-caption" style="margin: 0pt; padding: 0pt; font-size: 10px">image courtesy <a href="http://www.flickr.com/photos/johnson7/1460568819/">johnson7</a></span></p>
<p>App Engine is generally a new paradigm for webapp developers; replacing sessions with memcache and a schemaless datastore just two elements requiring new thinking for old problems.  Unfortunately there are a few more hidden nuisances which have the potential to waste programming time relatively early on.  Here&#8217;s four of my personal head-bangers:</p>
<p><strong>1. the datastore doesn&#8217;t always store Properties</strong><br />
I&#8217;ve had trouble with it refusing to store arbitrary entity props unless I assign them in the entity constructor itself (these fields were optional btw).  Just setting prop values after initialisation then put() on the ds didn&#8217;t write them.</p>
<p><span id="more-160"></span><strong>2. fussy filter parsing</strong></p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">.<span style="color: #008000;">filter</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;prop= &quot;</span>,propValue<span style="color: black;">&#41;</span>.<span style="color: black;">fetch</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>returns a NoneType error</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">.<span style="color: #008000;">filter</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;prop=&quot;</span>,propValue<span style="color: black;">&#41;</span>.<span style="color: black;">fetch</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>silently fails to find expected.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">.<span style="color: #008000;">filter</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;prop =&quot;</span>,propValue<span style="color: black;">&#41;</span>.<span style="color: black;">fetch</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>Only the above will return the expected result.</p>
<p><strong>3. Only possible to execute inequality filters on one property per query</strong><br />
This is a pain in the arse if you want to query whether an input date is between two dates stored in a particular entity &#8211; officially there was a workaround whereby the date range is stored in a ListProperty (instead of two fields of type DateProperty) and you do the normal check if input is more than the list (greater than at least one element in the list) and less than the list (less than at least one element in the list).</p>
<p>However the App Engine team has now changed the behaviour in the cloud whereby both the &#8216;&gt;=&#8217; and &#8216;&lt;=&#8217; filters are operated on each individual list element and not a lazy test over the whole series i.e. where the existence of two list elements that bounded the input date would have been sufficient the following query now only returns the entity if one of the ListProperty elements is an exact match for it:</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">WHERE</span> date_range &amp;gt;<span style="color: #66cc66;">=</span> :<span style="color: #cc66cc;">1</span> <span style="color: #993333; font-weight: bold;">AND</span> date_range &amp;lt;<span style="color: #66cc66;">=</span> :<span style="color: #cc66cc;">1</span></pre></div></div>

<p>Unfortunately this has not been removed from the dev_server datastore, hence it runs perfectly well locally.</p>
<p>4. And hopefully the <strong>1000 record query limit</strong> is well-known by this point.</p>
<p>On a more general note, why is it newfangled tech doesn&#8217;t build on top of the old stuff?  Re-use would get us to where we want to be a lot sooner.  I bitched about this <a href="http://twitter.com/rutherford/status/6055511385">on Twitter</a> at the time and I&#8217;ll repeat the message here too because it&#8217;s worth doing so frankly, I expect new stuff to do the same things old stuff does as well as any &#8220;hey that&#8217;s cool&#8221; new fandangomatrons it bolts on.</p>
<p>Wave&#8217;s another case in point.  Not &#8216;email invented today&#8217;, far from it &#8211; it&#8217;s left so much cutting-edge crowd-sourced participatory stuff out (as well as how to do IM, namely the KISS principle) &#8211; that it actually feels like a retrograde step in many ways.  Most in fact.</p>
<p>Bottom line &#8211; Google needs to make stuff better.</p>
]]></content:encoded>
			<wfw:commentRss>http://aleatory.clientsideweb.net/2009/11/28/google-app-engine-datastore-gotchas/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

