WhatfettleOne CSV, thirty stories: 6. Prices

This is day 6 of One CSV, 30 stories a series of articles exploring price paid data from the Land Registry found on GOV.UK. The code for this and the other articles is available as open source from GitHub

I was confident today was going to be “Talk like a statistician day” but my laptop was tied up for most of it whilst Yosemite installed itself, meaning I didn’t have time to play with R after all. Instead let’s continue to dig into how property is priced.

We saw in yesterday’s scatter plots how prices clump around integer values, and then skip around where stamp duty kicks in, £60k in this section:

Zooming in on the prices scatterplot

I didn’t have much time, so grabbed gnuplot again to make another scatter plot, this time using the prices file we made on Day 2:

 #!/usr/bin/env gnuplot
set terminal png font "helvetica,14" size 1600,1200 transparent truecolor
set output "/dev/stdout"
set key off
set xlabel "Price paid (£)"
set xrange [0:1500000]
set format x "%.0s%c"
set ylabel "Number of transactions"
set yrange [0:150000]
set format y "%.0s%c"
set style circle radius 4500
plot "/dev/stdin" using 2:1 \
    with circles lc rgb "black" \
    fs transparent \
    solid 0.5 noborder
$ price.gpi < price.tsv > price.png

Transactions by price

Maybe the same plot with boxes will be clearer:

 plot "/dev/stdin" using 2:1 with boxes lc rgb "black"

Frequency of prices

So even more confirmation that people prefer whole numbers and multiples of 10 when pricing houses, and market them either just below a stamp duty band or some way beyond it. The interference lines at the lower prices look interesting. More on that tomorrow.