This is day 6 of One CSV, 30 stories a series of articles exploring price paid data from the Land Registry found on GOV.UK. The code for this and the other articles is available as open source from GitHub
I was confident today was going to be “Talk like a statistician day” but my laptop was tied up for most of it whilst Yosemite installed itself, meaning I didn’t have time to play with R after all. Instead let’s continue to dig into how property is priced.
We saw in yesterday’s scatter plots how prices clump around integer values, and then skip around where stamp duty kicks in, £60k in this section:
#!/usr/bin/env gnuplot set terminal png font "helvetica,14" size 1600,1200 transparent truecolor set output "/dev/stdout" set key off set xlabel "Price paid (£)" set xrange [0:1500000] set format x "%.0s%c" set ylabel "Number of transactions" set yrange [0:150000] set format y "%.0s%c" set style circle radius 4500 plot "/dev/stdin" using 2:1 \ with circles lc rgb "black" \ fs transparent \ solid 0.5 noborder
$ price.gpi < price.tsv > price.png
Maybe the same plot with boxes will be clearer:
plot "/dev/stdin" using 2:1 with boxes lc rgb "black"
So even more confirmation that people prefer whole numbers and multiples of 10 when pricing houses, and market them either just below a stamp duty band or some way beyond it. The interference lines at the lower prices look interesting. More on that tomorrow.