no comments

Curious kakapos and violin plots: my data expedition discoveries

Maddy McCormack took a first dive into a data expedition organised by Open Data Manchester.

She discovered that violin plots are not generally created by scheming orchestra members and that you should always question the data, be very thoughtful about power dynamics and keep in mind who the information can hurt and why.

In recent weeks many of us, full of lockdown restlessness, have reached for the tent pegs and bivvy bags to quell our need for a change of scene. While I admire the courage and optimism it takes to roll out a ground sheet to do battle with the elements of the British summer, I chose instead, to stay indoors, and go on a data expedition.

Run by Open Data Manchester(ODM) in partnership with 360Giving ( an organisation that help UK funders publish open, standardised grants data, and empower people to use it to improve charitable giving), the goal of the data expedition is to give those who work with data an opportunity to learn some new skills and reflect on their use of data. Sam Milsom, one of the trainers on the expedition, explained that ODM’s goal is “to promote intelligent and responsible data practice”.

During introductions I made clear my extremely limited experience with data, that I expected to feel out of my depth very quickly and that I was there to learn how not to panic when someone mentioned correlation coefficients. Then we were put into teams. Mine named themselves the Curious Kakapos, a reference to the O’Reilly textbook ‘R for Data Science’, and a reference I didn’t get. This, and one of our team having a Star Trek background on their Zoom video, told me I was now in a very specific world. 

Each session was organised around a theme, and training, before breakout rooms were used to make progress on our team projects. The members of my team were interested in how to measure happiness. We were encouraged to start with big questions and to refine them down into targeted, more answerable, questions. We discussed Robert Putnam’s influential work on civic engagement and how it had been on the decline over the past thirty years. We talked about how to investigate this decline and the reasons why it had happened.

One thing I hadn’t realised was how much data was already open and publicly available for anybody to work with if they chose to. The London data store, for example, is free and contains all sorts of information to do with the capital. What also struck me was how many organisations there are to help individuals and groups use data, particularly for the social good.

Open Data Manchester on Twitter
Open Data Manchester on Twitter

The Engine Room is one such organisation. They established a Responsible Data programme, focussing on the complex legal, ethical and privacy issues which arise in this work. There is the Centre For Thriving Places which works with local authorities, organisations and individuals to help measure and improve wellbeing, and which has a wonderfully optimistic name!

The Curious Kakapos landed on using crime statistics to help evaluate wellbeing. Another team chose crime too, particularly looking at whether seasonal change affected crime rates in Leeds. They found there was more crime and more antisocial behaviour in the summer.

But the team project I found most interesting was an exploration into how much of UK roads were built from Roman roads, by a team who used Wikipedia, Project Mercury, OS Data Hub and Mapbox to get their findings. I was used to understanding crime as something that data analysts would work with and turn into graphs, but I hadn’t come across anyone looking at maps and processing them the same way. The Romans built 1,852 miles of roads in Britain, the Fosse Way is 220 miles long, linking Lincoln to Exeter . There was a stronger relationship between our modern roads and Roman roads in southern England compared with the north and 10% of Roman roads are still in use across the UK, which I found fascinating.

What did I learn during the expedition? My findings were a mixture of high-level take-aways and building a familiarity with the language of data analysis. The big lessons from the trainers were useful to me: how to come up with a good answerable question, how the data pipeline (find, get, clean, analyse, present) can be circular and a complex process, how analysis might reveal a need to refine your question, that analysts should be aware of the potential pitfalls of storytelling through data.

We were told explicitly: don’t ignore the data that doesn’t back up your story and remember to question your story. It was important to be very thoughtful about power dynamics and keep in mind who the information can hurt and why.

It was good to see analysts foregrounding the human factor. The way we process data isn’t neutral. Biases must be checked so we’re not led into interpretations we want to make. We must be wary of telling stories that chime with our preconceptions. And, of course, correlation does not equal causation.


bus
To find out more about becoming a Meteor Community Member – click here

The latter was exemplified by one of the teams who pulled out a weird link between lower mortality rates and a higher use of public transport. They noted that 1,000 people per year die in car accidents so it was not a big enough number to affect the statistics. In the end it was attributed to the fact that a greater number of young people live in cities so were more likely to use public transport, while older people lived in the fancier village areas where they had to use their cars more.

I also learned that the London deprivation profiles can be shown as violin plots which show probability density at different values. They look like noses if you’re being pre-watershed, or gynaecological if you’re not. Also pivot tables are a very useful tool on Excel which allow you to choose the variables you’re looking at and which help you sift through large amounts of data on a spreadsheet!

So, I built some familiarity, if not technical competence. I learned some of the language and I refreshed some dormant mathematical knowledge. For the record, the correlation coefficient is a value which can be calculated to show how close the relationship is between two variables.

Sam Milsom believes data expeditions should be about empowering citizens:
“Data underpins so much of our lives. For some, it can bring people out in cold sweats. But it’s important that the communities understand the impact it has on our lives. The aim is to give people the tools.”

I didn’t quite have the tools. But now I did have more known unknowns! And working to know more of what you don’t know, is progress.

By Maddy McCormack

For Open Data Manchester’s website – click here

Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.