by Dr Justin Marley

Doing Science Using Open Data – Part 3: Census Data and Open Software

In the second part in the series, I examined UK mortality data to generate H1

H1: In the UK, deaths in the age group 45-64 years of age are 4 times higher than deaths in the age group 15-44 years of age.

In order to test this hypothesis further we need to learn a bit more about the population as a whole. The finding in the previous post in this series was based on the eight weeks worth of data. There are various reasons why this may be a transient finding. There may be a seasonal variation in figures or else this cohort may differ considerably from the age-equivalent cohort in one year’s time.

Before investigating this further I will return to the issue of how to analyse the data. In the first part of this series I referenced Microsoft Excel. I’ve found this to be very useful but some readers may not have access to this. There is an open source alternative – Open Office Calc. Apache ‘Open Office‘ is described as ‘The Free and Open Productivity Suite’. In order to get started with the Open Office alternative to Excel follow these instructions

2. Download the Apache Open Office package (versions are available for several operating systems)

3. Install the package

4. Start up Apache Open Office Calc

If you’re familiar with other spreadsheets then it shouldn’t be too difficult to get started. There is a drop-down menu for help. At the time of writing i’m using Apache Open Office 3.4.1 and will use Calc for the remainder of this post.

Returning to hypothesis 1 above we need to find out a bit more about the general population. Fortunately there is detailed Census Data available. We’re going to use the Mid 2011 Census results. To do this

The results are just for England and Wales. The Scotland 2011 Census results are due out in December 2012 and will be published in 5 year age groups. The Northern Ireland 2011 Census results are available here. Looking at the data for England and Wales, there is a cut-off at age 89 and further data above this age is due to be published. Selecting the data for all ages including male and female figures graphed against population (using an X-Y Scatter) gives the following result.

A cursory examination of the graph reveals that there are more males than females for every age under 25. Once we reach the mid-forties this is reversed. Indeed there is an increase excess of women over men from the mid-seventies onwards. This may be consistent with numerous studies showing increased life expectancy for women although we would need more information to draw conclusions in this regards. We can also see that the population for each group peaks in the mid-forties. This is relevant to the hypothesis H1. Indeed hypothesis H2 states that the increase in mortality in moving from age group 16-44 to 45-65 may be accounted for by a larger population in the latter group.

We can test this hypothesis for the England and Wales population directly. Returning to the census data and summing the male and female figures we get the following results for ages 16 through to 44

680,979

706,234

711,491

741,667

765,895

757,901

757,295

771,297

756,449

768,415

774,921

759,889

768,860

770,810

778,986

782,510

751,251

700,825

690,775

702,024

716,419

729,013

761,347

794,300

820,805

800,550

821,037

819,650

832,297

For ages 45-65 we get the following results

832,727

838,064

831,041

813,798

797,077

770,066

739,859

723,861

708,371

682,824

659,795

637,073

641,145

634,399

618,132

623,508

638,118

655,668

694,644

754,834

583,734

The total estimated population in England and Wales in Mid-2011 for the age group 16-44 is

21993892

and for the age group 45-65 is

15711035

Hypothesis 2 states that the increase in mortality moving from the first to the second age group might be accounted for by an increase in the population in the second group. However the data above for England and Wales shows that there is a reduction in the overall population in moving from the first to the second group. Indeed the second group is only 71% of the size of the former group. Nevertheless the data is incomplete as the mortality data applies to the UK and the census figures apply only to England and Wales. When the other census data becomes available it will be possible to revisit hypothesis 2 and test it more convincingly.

Using the above data what implications are there for hypothesis 1? Suppose the findings from other parts of the UK are consistent with the England and Wales census data. This would imply that on moving from the age group 16-44 to 45-65 the mortality per 100,000 would increase 4 x 1/0.71 = 5.6 fold (2 sf).

Index: There are indices for the TAWOP site here and hereTwitter: You can follow ‘The Amazing World of Psychiatry’ Twitter by clicking on this link. Podcast: You can listen to this post on Odiogo by clicking on this link (there may be a small delay between publishing of the blog article and the availability of the podcast). It is available for a limited period. TAWOP Channel: You can follow the TAWOP Channel on YouTube by clicking on this link. Responses: If you have any comments, you can leave them below or alternatively e-mail justinmarley17@yahoo.co.uk. Disclaimer: The comments made here represent the opinions of the author and do not represent the profession or any body/organisation. The comments made here are not meant as a source of medical advice and those seeking medical advice are advised to consult with their own doctor. The author is not responsible for the contents of any external sites that are linked to in this blog.