WhatfettleOne CSV, thirty stories: 8. Heatmap meh

This is day 8 of "One CSV, 30 stories":http://blog.whatfettle.com/2014/10/13/one-csv-thirty-stories/ a series of articles exploring "price paid data":https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads from the Land Registry found on GOV.UK. The code for this and the other articles is available as open source from "GitHub":https://github.com/psd/price-paid-data

"Yesterday's":http://blog.whatfettle.com/2014/10/20/one-csv-thirty-stories-7-prices-redux/ post was now four days ago and whilst I've a number of excuses for losing momentum, the main reason was trying to make this post interesting. I'd let wanting something great become the enemy of the "probably good enough":http://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good.

A few people suggested a "heat map":http://en.wikipedia.org/wiki/Heat_map rather than a scatter plot might shed some light on prices. A heat map means grouping values over a time period as well as range of values. The gnuplot "image plot":http://gnuplot.sourceforge.net/demo/heatmaps.html takes XYZ values with rows separated by blank lines. For prices this means collating lines of date, price, counts as follows:

bc. 1995-01 0 0
1995-01 1 94
1995-01 2 244
1995-01 3 506
...
1995-02 0 0
1995-02 1 169
1995-02 2 493
1995-02 3 1007
...

We need something to turn the date, price pairs we created for the "scatter plot":http://blog.whatfettle.com/2014/10/17/one-csv-thirty-stories-4-scattering/ If this was 1986 I'd use "some APL":http://ngn.github.io/apl/web/#code=20%205%203%20%u2374%20%u2373%20250 :

bc. 20 5 3 ⍴ ⍳ 250

but APL is hard to find these days, harder to share with others, so here's some logic in awk:

bc. function print_prices(date, counts) {
for (price= 0; price <= price_max; price++) { count = counts[price]; if (!count) { count = 0; } printf "%s %d %d\n", date, price, count; } printf "\n"; } BEGIN { FS=" " price_unit = 1000000; price_max = 60; } { lines++; date = $1; sub("-[0-9][0-9]$", "", date); price = $2 / price_unit; counts[price]++; if (date_last && date_last != date) { print_prices(date_last, counts); lines = 0; delete counts; } date_last = date; } END { if (lines) { print_prices(date_last, counts); } } Which groups prices into month by £1 million squares, giving: Price heatmap (£1M)

A complete picture, but a little flat. Drilling in again to prices £0-600k and experimenting with units going from £1k, £10k, £25k, £50k, £100k gives a range of charts:

Averages

I've been creating monochrome charts mostly because I like monochrome and good colour design is difficult, but "colorbrewer":http://colorbrewer2.org/ exists to create heatmap palettes, which we can apply to gnuplot as follows:

bc. set palette defined (\
0 '#fff7ec',\
1 '#fee8c8',\
2 '#fdd49e',\
3 '#fdbb84',\
4 '#fc8d59',\
5 '#ef6548',\
6 '#d7301f',\
7 '#b30000',\
8 '#7f0000')
plot '/dev/stdin' using 1:2:3 with image

leading towards a more interesting version of the £10k banded price heatmap:

Priceheat (redrawn)

I need to pick up the pace if I'm to meet the 30 posts in 30 days, and think there's at least a couple more things to say about prices tomorrow, which will hopefully actually happen "tomorrow":http://blog.whatfettle.com/2014/10/28/one-csv-thirty-stories-9-yearly/!