Lies, damned lies, and data
by Danielle Wood
There are many unsung heroes in the public service: people with deep expertise, beavering away in the public interest, who rarely make it into the public eye. Australia’s Parliamentary Library team is one such group. They conduct research for politicians, write digests of bills, and publish guides to policies and evaluations of their likely impact.
In other words, they make sure our policy debates, including at the pointy end, are more fact-based than they would otherwise be. Apart from the politicians they serve, though, I suspect their talents are not widely recognised.
That’s why it is especially noteworthy when one of the unsung heroes of the public service steps into the light — not from Australia’s Parliamentary Library in this case, but from Britain’s equivalent. The House of Commons Library’s senior statistician, Georgina Sturge, has just released a book that will get hearts racing among policy nerds: a book about data, or more specifically, bad data and how it can mislead policy discussions.
Sturge’s book, Bad Data, has particular resonance for anyone (like me) who has spent many years peering into the sausage-meat vats of public data collection and use. Almost every example of “bad data” advanced by Sturge has an Australian parallel.
I certainly let out a knowing chuckle or two reading Sturge’s discussion of “zombie statistics” — those dodgy numbers that haunt public debates. Sturge highlights how the bogus figure for Britain’s weekly contribution to the European Union, £350 million, continued to be referenced by Brexit campaigners even after it was comprehensively debunked by the UK Statistics Authority.
In Australia, similarly disingenuous numbers haunt a host of debates. Some of the more egregious come from anonymously commissioned modelling in 2015 that suggested Labor’s $1.5 billion policy to wind back negative gearing would wipe $20 billion off GDP (!) and increase rents by 10 per cent (!!).
Those numbers continued to emerge from beyond the grave even several years after they were shown to be garbage, and even after they had inspired a Media Watch episode exposing the willingness of some media outlets to publish almost any number without a sense check.
Ditto Sturge’s discussion of dodgy policy costings. Despite government forecasts that outsourcing probation services could save British taxpayers £10.4 billion over seven years, the policy was considered a failure and the government paid an additional half a billion to end the private contracts early. Similar examples of cost blowouts abound in Australia — from disability services to major infrastructure and defence projects. Optimism bias and the rubbery forecasts that result are a global phenomenon.
Then there is the “algorithm unleashed” approach to policy implementation. Anyone who has been following the fallout from Australia’s scandalous robodebt scheme will shake their heads when Sturge describes similar crackdowns on tax and benefit fraud in Britain and the Netherlands.
As well as blind faith in badly designed algorithms, both schemes generated huge waves of stress among recipients of incorrect debt notices and, in the case of the Dutch government, more than €1 billion in compensation payments.
Bad Data lays bare the good (data is very helpful for informing policy decisions) and the bad (for many policy decisions the data is non-existent or poor) in the easy-to-understand style you would expect of a data expert who spends all day communicating with the less numerate.
Sturge describes eye-openingly common problems with data — inconsistent definitions, sample-size problems, lack of useful time series — as well as issues with modelling. She takes a deep dive into several key areas of public life — crime, poverty, migration — and points out the inherent difficulty in delivering high-quality and time-consistent data on these crucial topics.
One surprising deficiency of Bad Data is its failure to highlight the exciting developments in government data collections — “good data” — that are starting to overcome at least some of the problems it highlights. Sturge mentions administrative datasets, but her readers don’t get a sense of just how revolutionary it is for policymakers to be able to link datasets that cover the whole of a relevant population.
For example, linking tax data showing someone’s income with location data and health data allows us to understand how disease prevalence, access to healthcare and health outcomes vary across locations and socioeconomic and cultural groups.
Linking data can also help understand people’s pathways through government services, creating a powerful tool for identifying gaps. How many of those turning up to emergency departments, for example, have made visits to a GP that might have kept them out of hospital?
In Australia, many key public service organisations have been slow to understand the potential of these linked whole-of-population datasets and invest in the capability needed to work with them. The light coverage in this book suggests the same may be true in Britain.
The other key omission is more understandable, given Sturge is a serving civil servant. Her book contains no strong critique of the British government’s commitment — or lack of commitment — to investing in better data.
In her opening chapter, Sturge makes the powerful observation that while we can easily find how many times Harry Kane made an on-target shot at goal with his left foot in the last season of the English Premier League, Britain doesn’t have accurate data on how many people are eligible to vote, how many died from Covid-19, and whether crime is going up or down.
The difference, of course, is investment: the football analytics industry invests in paying people to catalogue, in meticulous detail, every pass, tackle and touch.
What is apparent is Sturge’s frustration that the UK census is conducted only every ten years. But she stops short of more obvious questions about funding of statistical agencies, and how much and what data should be collected to enable government to make better decisions. In an environment where the Office for National Statistics, like the Australian Bureau of Statistics, has sometimes been starved of funds while demands on its services kept growing, this is an important corollary to the story of bad data.
But we should celebrate the fact that one of Britain’s “anonymous” civil servants has been able to share her knowledge more widely. I seriously doubt that the risk-averse Australian public service would support an employee publishing such a book.
Sturge has produced a useful and engaging guide to understanding the common pitfalls of data and modelling in public life. But perhaps, for those wanting more, the next item on her to-do list should be a follow-up book about how and when governments should invest in better data, and the opportunities they have to get the most out of enhanced analytical and computing capability.
While you’re here…
Grattan Institute is an independent not-for-profit think tank. We don’t take money from political parties or vested interests. Yet we believe in free access to information. All our research is available online, so that more people can benefit from our work.
Which is why we rely on donations from readers like you, so that we can continue our nation-changing research without fear or favour. Your support enables Grattan to improve the lives of all Australians.
Danielle Wood – CEO