Census: Beyond 2011 - Monday 8 September 1.30pm

Developing insight from commercial data to support #Census2022Andy Newing, Ben Anderson, Sustainable Energy Research Group, University of Southampton

We present outputs from applied research seeking to evaluate the potential use of commercial sector ‘big data’ to support production of small area census-type population statistics. ‘Digital trace’ data collected by commercial organisations through regular interactions with consumers and households could offer considerable potential as a supplement to traditional census taking. We explore the feasibility of using high temporal resolution near real-time domestic energy consumption data to extract valuable information on household characteristics and behaviour. We use data from a household energy monitoring study incorporating 184 UK households in two south of England neighbourhoods. These data are similar in nature to those available from domestic smart metering, currently being rolled out to all UK households, and are supported by household survey data revealing dwelling and residents’ characteristics and associated behaviours. We assess the link between household and neighbourhood level temporal patterns of energy consumption (termed load profiles) and key household characteristics (including the presence of children and their employment status). We also use these load profiles to develop a series of novel neighbourhood indicators not currently collected by the census. We demonstrate that load profile data can be used to predict area level temporal patterns of household ‘active occupancy’, and to construct a novel indicator of energy inequality at the neighbourhood level. These may add considerable value to existing small area statistics. We note, with some confidence, that a dataset of this nature would enable development of approaches to use observed consumption to predict household characteristics as a supplement to the traditional census, providing a timely and robust source of population statistics and area-based indicators.

ONS research into methods for anonymous data linkagePaul Groom, Office for National Statistics

The Office for National Statistics (ONS) has conducted a review (the Beyond 2011 Programme) of the future approach to the census and population statistics in England and Wales. The National Statistician made a recommendation to Government on 27 March 2014 that there should be a predominantly online census of all households and communal establishments in 2021. This will be supplemented with increased use of administrative data and surveys in order to enhance the statistics from the 2021 Census and improve annual statistics between censuses. This paper discusses research to develop methods to link national level administrative datasets. Matching multiple administrative sources is both resource intensive and elevates risks relating to the privacy of data about people and households. We have therefore sought to develop fully automated methods to link anonymised data. Critical to the research has been the development of techniques that can identify similarity between anonymised records and the accurate classification of records into matches and non-matches. A quality assurance exercise has been

undertaken by testing these methods on a match between the NHS Patient Register and the 2011 Census. Results so far are highly promising showing very high match rates (higher than 90%) but more importantly relatively low levels of false positives (less than 1%) and false negatives (around 2%). The paper will also discuss further ideas for improving the quality still further.

ONS research into producing population estimates from administrative sourcesBecky Tinsley, Dean Jathoonia,Officefor National Statistics

The Office for National Statistics (ONS) has conducted a review of the census and the future provision of population statistics in England and Wales. This paper discusses research that has been conducted during this review to develop rules based methods that are used to pull together linked individual records from administrative sources, such as the Patient Register (PR), the Customer Information System (CIS) and the Higher Education Statistics Agency Student Record (HESA), into a series of coherent Statistical Population Datasets (SPDs). The paper also discusses how the SPD counts compare to 2011 Census estimates for different age/sex groups at different levels of geography. Two estimation approaches, Dual System Estimation and Weighting Class Estimation, are used in conjunction with the SPDs to produce population estimates by LA, age and sex. This paper explores the pros and cons of the two estimation approaches and presents some provisional estimation results from the research.

Using income and benefit data to assess residence in the populationCharlie Wroth-Smith, Joanna Wroe, Office for National Statistics

Administrative data offer a rich source of information for users, yet are recognised as having statistical quality issues such as over coverage.

Beyond 2011 have therefore been using DWP/HMRC data in the production, assessment and use of activity or 'signs of life' based administrative data.

DWP/HMRC income and benefit data offer the opportunity of deriving activity indicators based upon an individual’s interaction, or lack of interaction, with income and benefit systems. In its simplest form the presence or absence of interaction can be used to classify an individual as resident or non-resident in the population.

In order to assess this approach Beyond 2011 have linked a 1% sample of DWP/HMRC income and benefit data to a number of other data sources (for example the 2011 Census and the 2011 Patient Register).

This linkage work has allowed Beyond 2011 to assess how well the derived activity variables interact with the other data sources. In addition to this, the work is also enabling Beyond 2011 to develop and refine the initial methodology.

The results of this work have shown that activity indicators based upon income and benefit data offer a variety of benefits, such as;

Successfully identifying people present in the population

Helping to identify over coverage in other administrative data sources

Census analysis - Tuesday 9 September 1.30pm

Origin-Destination data are a unique and highly demanded product, including commuting and migration patterns of individuals. In addition, new questions asked from the 2011 Census allow data for the first time, to be produced on the migration patterns of those living at a student residence one year prior to the census, and information on individuals with second residences. These ‘flows’ of people are also cross tabulated by an array of social-demographic variables, for example, occupation, approximated social grade, and ethnic group. With geographies down to the individual Output Area, a very large amount of Origin-Destination data will be produced. This includes harmonised flows covering the UK, with country specific data for England and Wales; Scotland; and Northern Ireland. The travel to work data for England and Wales will use the new Workplace Zone geography, detailing commuting patterns from Output Areas to Workplace Zones. This presentation provides details of the Origin-Destination data that are/will be available from the 2011 Census, as well as presenting some key results from the analysis of the data to demonstrate ways in which the data can be used.

Both the National Planning Policy Framework (NPPF) and the London Plan require that development should be balanced with regard to the residential, economic and demographic nature of communities. The paper presents a methodology for identifying areas that are 'diverse' in these dimensions - as gauged by 2001 and 2011 census variables - and discusses measures of overall diversity.

What does the 2011 Census tell us about the "oldest old" living in England & Wales?Jo Zumpe, Angele Storey, Office for National Statistics

It is generally accepted that the population of England and Wales is ageing, that the number of people aged 85 or over is increasing and that women outnumber men in this age group. This presentation uses data from the 2001 and 2011 censuses to quantify these perceptions. Census data provides an opportunity to look at the current demographic characteristics of all of those in the “oldest old” age group and any changes in the characteristics in the last decade, including interactions between their marital status, general health and provision of unpaid care. Information on the “oldest old” living in communal establishments, excluded from household surveys, is also analysed. Grouped together by their age, the “oldest old” age group have many characteristics in common, especially when compared with those aged under 65 and to those aged 65 to 74 and 75 to 84. However they are also a diverse group, for example in the variations in perceived general health and the amount of unpaid care they both receive and provide. For other characteristics such as religion and ethnicity, there is now more diversity amongst the “oldest old” than 10 years ago at the 2001 Census. A report published by ONS on this topic is available here: www.ons.gov.uk/ons/dcp171776_342117.pdf. Our analysis of the “oldest old” population is ongoing at ONS. This presentation affords the opportunity to discuss the relevance and importance of future areas of study within this topic.

Spatial Structure in the Burden of Illness Measured from UK Census Returns: Lessons for Predicting the Prevalence of Chronic Non-Communicable IllnessesPeter Dutey-Magni1,2, Graham Moon1, 1 School of Geography & Environment, University ofSouthampton, 2 Department of Social Statistics & Demography, University of Southampton

Prevalence models have been popular both to study individual and ecological risk factors, and to estimate the prevalence of health, lifestyle and disability characteristics for small geographies when other data are not available (small area estimation). This paper is concerned with the latter purpose. The construction and validation of robust prevalence models for health indicators often presents challenges due to the limited availability of appropriate data. The recent publication of 2011 census tables for England and Wales provides information at a very fine level of granularity and more power than survey data usually holds. We take this as an opportunity to explore spatial patterns in self-rated health and long-term activity limitation for very small cross-sections of the population, in terms of risk factors as well as spatial autocorrelation. This provides information of relevance for the specification of small area estimation models. We go on to specify such models using data from the English Housing Survey and the Labour Force Survey. The main findings from our investigation suggest that the spatial structure of illness is very dissimilar across ethnic groups, that contextual hospital data improves the quality of prediction, and that contiguity neighbourhood matrices are not always the optimal structure to base conditional autoregressive models upon.

The session will provide an overview of some of the key results from Scotland’s 2011 Census published to date. The talk will highlight some of the main findings which have caught the attention to date, paint a picture of Scotland in 2011, draw comparisons with 2001 and with other parts of the UK. This talk will offer an introduction to the Census Data Explorer - National Records of Scotland's key online resource for making the results from Scotland's Census available to users. It will highlight plans for further outputs and plans for extending the site. The talk will also summarise the range of analysis which has been carried out on the Census 2011 in Scotland to date, selected to illustrate a wide range of uses and potential uses.

Using administrative data sets in the quality assurance of a CensusStephen Sharp, Alternative Sources Branch, National Records of Scotland

The use of administrative data sets for census purposes is the subject of much interest both in government and academia. The research reported here starts from the assumption that there are two strategic problems to be overcome if administrative data sets are to be used in quality assuring, or in the longer term replacing, the traditional census. The first problem is that multiple administrative data sets are available but only one set of population estimates is required. The information gleaned from the sets therefore needs to be integrated in some way. The other is that counts taken from administrative data all have bias which results in patterns of over-count and under-count which vary with gender, age and geography. Work done at NRS over the last two years has made encouraging progress on the question of data integration. This presentation reports the recent progress which has been made in identifying, quantifying and adjusting for over-count and under-count in six administrative data sets. The research used the 2011 Census in Scotland and the sets as they stood in the Spring of 2011. It compares the degree of consistency which each set showed with the Census and the extent to which the inconsistency is common across the sets (e.g. do they all show over-count in the same places?) The cell counts from the sets, adjusted to correct for as much bias as possible, are then integrated using Bayesian Markov Chain Monte Carlo methods and the results compared to actual Census cell counts.

The Census Under Enumeration (CUE) Project was initiated in Northern Ireland to augment the coverage of the Census enumeration (i.e. completed questionnaires) by using activity based administrative data from the medical system to supply core information on non responding households. The CUE project contributed 4% of the Census household estimate compared to 2% from the Census Coverage Survey - the traditional method to estimate non-response. Further detail at: http://www.nisra.gov.uk/Census/2011_results_supporting_information.html

Rebasing Scotland’s mid-year population estimates to better reflect the results of the 2011 CensusDarren Knox, The Scottish Government

In August 2013 the National Records of Scotland (NRS) published mid-2011 population estimates for Scotland based on the results of Scotland’s 2011 Census. These estimates were 45,100 higher than the mid-2011 population estimates rolled-forward from the 2001 Census.

Births, deaths and migration are the three main components that we use to produce mid-year population estimates and it is likely that, at a national level, most of the 45,100 difference is because of the difficulty in estimating migration.

The migration component consists of internal and overseas migration and is estimated using a combination of administrative and survey data. The effectiveness of the National Health Service Central Register as a source for estimating internal migration is dependent upon people registering with their GP when they change address and this is known to be problematic in certain age groups. Furthermore, the sample size of the International Passenger Survey, which is used to estimate overseas migration, is very small for Scotland and consequently there is also uncertainty around these estimates.

Scotland’s recent 2011 Census provides a snapshot in time which allows us to compare our population and migration estimates against a Census of the entire population. For example, comparisons of country of birth data from the 2001 and 2011 Censuses and our overseas migration estimates has revealed that, at a national level, much of the 45,100 difference is likely to be because of underestimation of overseas migration across the decade.

In December 2013 Scotland’s population estimates for mid-2002 to mid-2010 were rebased to better reflect the results of Scotland’s 2011 Census. The aim of this paper is to present the results of this work, indicating what our existing methodology captured well and how it can be developed to improve our estimates going forward.

Use of this website is subject to, and implies acceptance of, its Terms of use (including Copyright and intellectual property, Privacy and data protection and Accessibility). The London School of Economics and Political Science is a School of the University of London. It is a charity and is incorporated in England as a company limited by guarantee under the Companies Acts (Reg no. 70527).The registered office address of the School is: The London School of Economics and Political Science, Houghton Street, London WC2A 2AE, UK; Tel: +44 (0)20 7405 7686