Revisiting UK Biobank: assessing the data so far and what it reveals about inclusivity

Written by Curtis Asante

UK Biobank is an unprecedented initiative housed on an industrial estate in Stockport, Greater Manchester, UK. It contains blood and urine samples from just over half a million people aged 40-69 years of age that were recruited from national assessment centres — designed specifically for this purpose — between 2006 and 2010.  The assessment centres were also used to collect information on a participant’s health and lifestyle, hearing and cognitive function, which was collected through touchscreen questionnaires and verbal interviews. A range of physical measurements were also taken such as blood pressure and eye tests. The biological samples and participant information were intended for the lofty ambition of determining what factors are important for developing specific diseases within the UK population. This was no mean feat; its establishment required input from Wellcome, Medical Research Council, Department of Health, Scottish Government and the Northwest Regional Development Agency. UK Biobank also receives funding from the Welsh Assembly Government, British Heart Foundation and Diabetes UK. The financial and logistical support was such that expectations of success were very high. 

Just over two years ago I wrote an introductory piece on UK Biobank [1]. In that article, I discussed the ways in which UK Biobank worked towards ensuring that its volunteers represented the ethnic diversity of the UK population as much as possible and why this was important.  In the past few years, UK Biobank has been intermittently releasing data. These data sets are revealing fascinating insights with potentially far-reaching implications for our society. The full datasets can now be accessed. What I will explain below is that while UK Biobank may have succeeded in addressing the diversity balance in terms of ethnicity, it is not representative of the general population on a variety of sociodemographic, physical, lifestyle and health-related characteristics, meaning that those far-reaching implications are relevant only for a subsection of society. 

Health and the city

So what does some of the data tell us and what are the implications? In one study, published in The Lancet in December last year [2], Prof Steven Cummins and colleagues from the London School of Hygiene & Tropical Medicine in London hypothesised that the built environment might be associated with the development of obesity and related disorders. The team used linked data on environments around 400,000 participants’ residential addresses — which was collected at the same time as the biological samples — to examine whether the density of gyms and proximity to fast-food outlets were associated with waist circumference, body-mass index (BMI), and body fat percentage. Unsurprisingly, they found that obesity rates in mid-life adults were higher in areas with fewer gyms. The associations between obesity and proximity to fast-food outlets were weaker but that’s likely due to limitations in food outlet classification in the source database, which was supplied by local authorities and might include misclassification of some outlets as restaurants rather than fast food outlets. Nevertheless, the authors propose that increasing access to gyms and, possibly, reducing access to fast food close to residential areas has the potential to reduce the prevalence of obesity and overweight individuals at the population level. This may be particularly true if obese individuals are actively moving to areas with fewer gyms and easier access to fast food outlets.

Related to the above study, Professor John Gallacher and colleagues from the University of Hong Kong examined the association of obesity with residential density in a large and diverse population comprising around 420,000 adult men and women aged 37—73 years from 22 cities across the UK drawn from UK Biobank to identify healthy-weight-sustaining density environments [3]. What they found was that below a residential density of 1800 households per km2 — Greater London is about 2250 households per km2 — there was a positive association with body fat, higher BMI and increased odds of obesity. Beyond 1800 units per km2, they observed an association with lower BMI and decreased odds of obesity. The reasons for this seem to be unclear but one possibility is that public transport or cycling are more popular forms of transport in dense cities. Driven by the demand from their inhabitants, dense cities also have more options for healthy eating and indoor exercise facilities.  The authors say that attempts by the UK government to prevent suburban densification by, for example, prohibiting the subdivision of single lot housing and the conversion of domestic gardens to housing lots, will potentially have the effect of inhibiting the conversion of suburbs into more healthy places to live.

Selection bias

It wasn’t until July 2016 – approximately eleven years after the first volunteer agreed to provide samples for them —  that UK Biobank announced that the complete genetic data set would be released to scientists that preregistered with UK Biobank. This data set was eagerly awaited due to its sheer size – twelve terabytes, which is equivalent to just over 5 billion single spaced typed pages of text- but also because of the potential to identify what variations and mutations in an individual’s genome identified as soon as the samples are analysed contribute to specific diseases over the individual’s lifetime, which is tracked using the individual’s NHS number. 

The findings from UK Biobank will continue to inform us of health or disease trends and patterns much like the earlier studies have with the smaller data sets but it will be some time yet before we see any real impact on health and/or social policies and regulations. However, we do know a lot more about how UK Biobank participants compare with the general population thanks to a study by Anna Fry et al [4]. We now know that UK Biobank’s 500,000 participants are generally healthier, leaner and smoke less than their fellow countrymen and women, suffering less heart and kidney disease and cancer. In fact, the number of deaths that have been recorded by UK Biobank to date are approximately half of those that you would expect to find in the general population.

Overall, cancer incidence is approximately 10-20% lower in UK Biobank participants compared to the general population. Death rates and total cancer incidence at age 70-74 years are 46.2% and 11.8% lower in men and 55.5% and 18.1% lower in women, respectively, than the general population of the same age. Lung cancer rates specifically are substantially lower in UK Biobank participants, a finding which is almost certainly explained by lower rates of smoking in these participants. Although rates of female breast cancer are like the national average in those aged 50 and over, the breast cancer rate is higher in women aged 45-49 years. This could be due to a higher rate of screening amongst Biobank participants. Only prostate cancer was found to be more prevalent in UK Biobank participants compared to the general population. This might reflect a higher rate of screening among health-conscious UK Biobank participants, resulting in a diagnosis of cancer.

UK Biobank participants are more likely to be older and to live in less socially poor areas than non-participants. They are less likely to be obese, smoke, drink alcohol daily and have fewer self-reported health problems. Could this ‘healthy volunteer’ mean that the resource has no value?  UK Biobank’s large size and diversity of measurements means that a wide range of studies can be undertaken and the results will have at least some value for all people. However, this value will be unequally weighted towards those that are already better-off in terms of relative health.

UK Biobank is a huge achievement by any ground-breaking standards and although various measures were put in place to ensure representation of all sectors of society, they seem to have fallen short here. This was unintentional but UK Biobank is only as good as the participants that volunteered to take part. I originally highlighted issues that might make people who are not — cue sweeping generalisation — white, middle class and English from wanting to participate in UK Biobank and why it was necessary to include them. These clear findings highlight that necessity even more. The participants are relatively diverse in terms of race and gender but not their socioeconomic status — there are less participants from a lower socioeconomic status —  and it’s arguably the people from a lower socioeconomic status that are the most likely to benefit from these initiatives. 

In retrospect, the efforts to ensure a truly diverse set of participants happened too late. If UK Biobank did all they could to get people from diverse backgrounds to sign up, the question must be how do you get a nationwide population to believe enough in these initiatives to get involved when the calls come out? It starts at a young age, at school perhaps as part of citizenship curriculum courses. It continues through adulthood by keeping people informed, explaining the importance of these studies, explaining how they might benefit people in the long-term. It involves getting more diverse people doing the science, shaping policies, delivering the outputs so that the people involved in these initiatives are truly reflective of the people they are supposed to benefit. 

To ensure that progress is made, there needs to be mechanisms in place by which progress in terms of full demographic representation can be assessed. A system of measurement could involve three levels: at a health trust level where the policies are held, at a planning and delivery level, where the policies are enacted and at a participant level, where measuring awareness can show whether staff have put the policies into practice.  Additional recording or tracking processes may need to be put in place at sampling sites to support this.

Those of us who understand the importance of initiatives like UK Biobank all have a responsibility to spread the word. We can do this by telling our friends, our families and getting involved with awareness campaigns. It is unlikely that there will ever be 100% participation by targeted groups but it is important that as new initiatives evolve and better treatments and interventions emerge, no sectors of society are unintentionally neglected or left behind.


  • Asante C (2016). Biobanking for all: UK Biobank’s inclusion of ethnic minorities.
  • Mason KE et al (2017). Associations between fast food and physical activity environments and adiposity in mid-life: cross-sectional, observational evidence from UK Biobank. Lancet, 3(1): e24-e33
  • Sarkar C et al. Association between adiposity outcomes and residential density: a full-data, cross-sectional analysis of 419ˆ562 UK Biobank adult participants. Lancet Plan Health. Vol 1 (7), e277—e288
    Fry A et al (2017).  Comparison of sociodemographic and health-related characteristics of UK Biobank participants with the general