Skip to main content

Dangers of residual confounding: a cautionary tale featuring cognitive ability, socioeconomic background, and education

Abstract

Background

Cognitive ability and socioeconomic background (SEB) have been previously identified as determinants of achieved level of education. According to a “discrimination hypothesis”, higher cognitive ability is required from those with lower SEB in order to achieve the same level of education as those with higher SEB. Support for this hypothesis has been claimed from the observation of a positive association between SEB and achieved level of education when adjusting for cognitive ability. We propose a competing hypothesis that the observed association is due to residual confounding.

Methods

To adjudicate between the discrimination and the residual confounding hypotheses, data from the 1997 National Longitudinal Survey of Youth (NLSY97, N = 8984) was utilized, including a check of the logic where we switched predictor and outcome variables.

Results

The expected positive association between SEB and achieved level of education when adjusting for cognitive ability (predicted by both hypotheses) was found, but a positive association between cognitive ability and SEB when adjusting for level of education (predicted only by the residual confounding hypothesis) was also observed.

Conclusions

These results highlight the potential use of reversing predictors and outcomes to test the logic of hypothesis testing, and support a residual confounding hypothesis over a discrimination hypothesis in explaining associations between SEB, cognitive ability, and educational outcome.

Peer Review reports

Introduction

Studies have found an association between individuals’ socioeconomic background (SEB) and achieved level of education or socioeconomic position even when adjusting for cognitive ability [e.g. 1, 2]. Some researchers have explained this association with negative social expectations and discrimination against people from humbler origins and favoritism of the highborn. The persistence of this association when adjusting for cognitive ability has been interpreted to mean that higher cognitive ability is required from someone with a lower SEB in order to achieve the same level of education or socioeconomic position as someone with a higher SEB, or alternatively that high SEB can compensate for a lack in cognitive ability [1, 3, 4]. However, other studies have found that when adjusting for achieved socioeconomic position, a positive association between SEB and cognitive ability as well as achieved level of education can be observed [5,6,7,8,9,10]. If using the same logic as above, a contradictory interpretation would emerge: higher cognitive ability is required from individuals with high SEB in order to achieve the same socioeconomic position as someone with lower SEB, i.e. there is societal discrimination against the highborn.

An alternative explanation, which could account for the above-mentioned seemingly contradictory associations, is residual confounding. Confounding in a statistical analysis occurs when a variable (Z) influences both the dependent variable (Y) and the independent variable (X) [11]. It is common to adjust for potential confounding variables by including them as covariates in an analysis, in order to reduce the risk of spurious associations. However, the influence of the confounding variable may not be fully attenuated by such adjustment [12,13,14,15,16]. Residual confounding refers to confounding which remains despite adjustment. The impact of residual confounding is increased by higher true degree of confounding, larger sample size, and higher reliability in the measurements of X and Y, while it is attenuated by a high reliability in the measurement of Z [12,13,14,15,16]. With these factors in place, even if entities/individuals have the same value on observed Z they will tend to differ in their true Z and this may result in an association between observed X and observed Y even if adjusting for observed Z. For example, even if achieved socioeconomic position or level of education has been rated as the same, the actual/true position or level may be higher for those with high SEB compared to those with lower SEB. Similarly, even if observed cognitive ability is the same, true ability may tend to be higher for those with high SEB. This could explain why high SEB is associated with a higher achieved socioeconomic position and level of education, even when adjusting for observed ability.

The expected standardized effect of measured SEB on true cognitive ability when adjusting for measured cognitive ability is given by Eq. (1) (see “Appendix” for derivation). Assuming that cognitive ability is not measured completely without reliability (rTrCA,CA ≠ 0) and that the correlation between observed ability and SEB does not equal unity (rCA,SEB ≠ 1), we see that the effect of observed SEB on true cognitive ability when adjusting for observed ability is expected to be zero only if the correlation between observed cognitive ability and SEB equals zero (rCA,SEB = 0) or if cognitive ability is measured with perfect reliability (rTrCA,CA = 1). Consequently, observed SEB is expected to be associated with whatever true cognitive ability is associated with, e.g. achieved level of education, even when adjusting for measured cognitive ability. It should be noted that in the present context, the term “reliability” should be interpreted more broadly than just, for example, homogeneity. If some research participants would not take the measurement of cognitive ability seriously, e.g. due to low motivation, this could actually strengthen the correlations between scores on subtests and, consequently, the homogeneity of the tests. However, such lack of earnestness among some participants would tend to weaken the correlation between true and measured cognitive ability.

$$E|{\beta}_{SEB,TrCA.CA}|=\frac{{r}_{CA,SEB}\times (1-{r}_{TrCA,CA}^{2})}{{r}_{TrCA,CA}\times (1-{r}_{CA,SEB}^{2})}$$
(1)

According to a “discrimination hypothesis”, a positive association between SEB and achieved level of education is expected to persist when adjusting for cognitive ability [1, although they do not use the term “discrimination hypothesis”]. However, when adjusting for achieved level of education, the discrimination hypothesis predicts a negative association between SEB and cognitive ability, indicating that higher ability was required from those with lower SEB in order to achieve the same level of education as those with higher SEB. We propose the competing “residual confounding hypothesis”, which implies that any two of cognitive ability, SEB, and achieved level of education will be positively associated even when adjusting for the third, due to imperfect measurement. Furthermore, the discrimination hypothesis predicts that a difference score between achieved level of education and cognitive ability (both variables standardized) will be positively associated with SEB. This difference score is a measure of the degree to which participants are, in a manner of speaking, more educated than intelligent. The residual confounding hypothesis does not imply any association between the difference score and SEB.

Aims

This study aimed to investigate:

  • whether the discrimination hypothesis or the residual confounding hypothesis is best supported by empirical data.

  • whether, in the present case, reversing the predictors and outcomes yields a viable test of the logic of inference.

To the best of our knowledge, this is the first explicit investigation of the possibility that adjusted associations between SEB, cognitive ability, and achieved level of education may be due to residual confounding rather than discrimination.

Method

Respondents

Publicly available data from the 1997 National Longitudinal Survey of Youth (NLSY97), collected from 8984 US youths (4385 women and 4599 men) born between 1980 and 1984, were used for the present analyses. This dataset is suitable for the present investigation as it is large, nationally representative, contains appropriate measures of all three constructs under investigation, and has been widely used in past research.

Measurements

Most respondents (complete data available for 7008 individuals) took 12 Armed Services Vocational Aptitude Battery (ASVAB) tests in 1997–1998, when they were between 12 and 18 years old: (1) general science; (2) arithmetic reasoning; (3) word knowledge; (4) paragraph comprehension; (5) numerical operations; (6) coding speed; (7) auto information; (8) shop information; (9) mathematical knowledge; (10) mechanical comprehension; (11) electronics information; (12) assembling objects.

We operationalized SEB as parental income, consistent with common practice in the field [17,18,19]. Total parental income for the years 1997 and 1998, when the respondents were between 12 and 18 years old, was calculated and the natural logarithm of the mean of these two values was used as the indicator of SEB. An income of zero was treated as a missing value and data were available for 7302 respondents.

In 2017, when they were between 32 and 37 years old, respondents were asked about their highest academic degree received, with the values: (0) None, n = 515, (1) General educational development, n = 862, (2) High school diploma, n = 2692, (3) Associate/junior college, n = 598, (4) Bachelor’s degree, n = 1352, (5) Master’s degree, n = 540, (6) Professional degree/PhD, n = 149. Degree was treated as a continuous variable and data were available for 6708 respondents.

Statistical analyses

Factor scores on the first unrotated factor in an analysis of all 12 ASVAB tests was used as an estimate of the respondents’ cognitive ability. The effects of cognitive ability, SEB (operationalized as parental income, see above), and academic degree on each other were calculated with ordinary least squares regression. All three variables were standardized before the analyses. In an additional analysis, the difference between academic degree and cognitive ability was predicted from SEB. The final sample size for the regression analyses was 4654. Data processing and analyses were conducted with R 4.1.0 statistical software [20] employing the psych package [21]. Data and scripts are available at https://osf.io/cwn5u/.

Results

Table 1 shows descriptive statistics for SEB, cognitive ability, and academic degree, as well as correlations between these three variables and standardized adjusted regression effects. All correlations and regression effects were positive. Academic degree and cognitive ability were more strongly associated with each other, adjusted or not, than with SEB. We also see that the association between SEB and academic degree when adjusting for cognitive ability was weaker (no overlap of confidence intervals) than the association between SEB and cognitive ability when adjusting for academic degree.

Table 1 Descriptive statistics for, correlations between, and regression effects of the study variables

The adjusted associations, i.e. associations between residuals, are illustrated in Fig. 1. We see that those with higher cognitive ability than predicted from their SEB also tended to have a higher academic degree, and vice versa (panel A); those with higher SEB than predicted from their cognitive ability also tended to have a higher academic degree, and vice versa (panel B); those with higher SEB than predicted from their academic degree also tended to have higher cognitive ability, and vice versa (panel C).

Fig. 1
figure1

A Association between residual cognitive ability and residual academic degree, both adjusted for SEB; B association between residual SEB and residual academic degree, both adjusted for cognitive ability; C association between residual SEB and residual cognitive ability, both adjusted for academic degree

The difference between the respondents’ academic degree and cognitive ability was weakly negatively associated with SEB (β =  − 0.051, 95% CI − 0.081; − 0.021, p < 0.001, Fig. 2), indicating that those with high SEB did not tend to have a higher standardized score on education than on cognitive ability.

Fig. 2
figure2

Association between SEB and the difference between academic degree and cognitive ability

Discussion

This study aimed to investigate whether the discrimination hypothesis or the residual confounding hypothesis was best supported by empirical data, and whether, in the present case, reversing the predictors and outcomes yielded a viable test of the logic of inference. We show that any two of the variables cognitive ability, SEB, and achieved level of education were positively associated with each other while adjusting for the third variable. A positive association between SEB and education when adjusting for cognitive ability has been observed before and interpreted as indicating that higher cognitive ability is required from those with low SEB in order to achieve the same level of education as those with higher SEB (what we refer to as a discrimination hypothesis, e.g. [1]). However, according to the same logic, the positive association between cognitive ability and SEB when adjusting for level of education would indicate that higher cognitive ability is required from those with high SEB in order to achieve the same level of education as those with lower SEB. In the present data, the adjusted association between cognitive ability and SEB was stronger than the adjusted association between SEB and education. Moreover, and in contradiction to the discrimination hypothesis, no positive association was observed between SEB and the difference between the respondents’ academic degree and their cognitive ability, a measure that indicates to what degree the respondents tend to have a higher standardized score on education than on cognitive ability.

Instead of interpreting these results to indicate simultaneous discrimination of those with low and high SEB, a competing interpretation is that observed associations are due to residual confounding. For example, when adjusting for each other, residual cognitive ability and residual SEB are expected to have, in accordance with Eq. (1), positive associations with their respective true scores and also with the true score on the other variable (e.g. residual SEB has a positive association with true cognitive ability) which, for both variables, results in a positive adjusted association with level of education. Consequently, if two individuals with high and low SEB have the same measured cognitive ability but the former achieves a higher level of education, this does not necessarily indicate that individuals with high and low SEB are privileged and discriminated against, respectively. Alternatively, the former individual may have a higher true cognitive ability, i.e. a more negative residual in measured ability, and this is the reason why he/she achieves a higher level of education.

In the present study, the association between cognitive ability and SEB when adjusting for education was stronger than the association between SEB and education when adjusting for cognitive ability. This does not necessarily indicate that those with high SEB are more discriminated against than those with low SEB. Instead, this discrepancy could be due to lower reliability in the measurement of education compared with the measurement of cognitive ability. Although individuals may formally have achieved the same academic degree there may nonetheless be differences e.g. in the prestige of the university from which they graduated. We predict that on average, those with more prestigious degrees probably have higher SEB and maybe also higher cognitive ability than those with less prestigious degrees.

The present results do not disprove the existence of discrimination on the basis of SEB in processes leading to educational attainment. Rather, the reasoning and evidence presented here is a criticism of one line of evidence that has been advanced in support of a discrimination hypothesis. Causal inference from observational data is fraught with difficulties, of which residual confounding is one [22].

Limitations

There are several well-established measures of cognitive ability, e.g. the Wechsler Adult Intelligence Scale (WAIS), Raven’s Progressive Matrices (RPM), and the Stanford-Binet Intelligence Scales, and it is possible that these measures would have given slightly different results compared with ASVAB, used in the present study. However, previous research has shown high correlations between different measures of cognitive ability, including Armed Forces Qualification Test (AFQT) score, which is extracted from ASVAB, and classic IQ tests such as California Test of Mental Maturity and Otis-Lennon Mental Ability Test (r = 0.81 for both [23]). Parental income represents only one facet of SEB; it is however commonly used and correlated to other facets of SEB such as educational and occupational status [2, 23].

The main message of the present paper is that adjustment for possible confounders can, due to residual confounding, leave room for spurious findings. However, this point does not apply if the observed correlation between the predictor and the possible confounder equals zero or if the possible confounder is measured with perfect reliability (see Eq. (1)). For the application in the present paper, this would mean that with a perfectly reliable measure of cognitive ability, an observed association between SEB and achieved level of education while adjusting for cognitive ability could not be due to residual confounding. The same would be true if the correlation between the measures of cognitive ability and SEB were to equal zero.

Conclusions

An observed association between two variables, X and Y, while adjusting for a third variable, Z, may be due to residual confounding due to error in the measurement of Z rather than due to a true independent association between X and Y. In our analyses reported here, any two of the variables cognitive ability, socioeconomic background, and achieved level of education were positively associated with each other while adjusting for the third variable. We propose that the most likely explanation for the adjusted associations is residual confounding rather than discrimination. Changing the place of predictors and outcome variables in analyses, to see if results concur with interpretations of the original results, is a simple yet possibly revealing method to validate interpretations. We recommend researchers to use this method and to beware of the dangers of residual confounding.

Availability of data and materials

The script and data are available at Open Science Framework at https://osf.io/cwn5u/.

References

  1. 1.

    Paulus L, Spinath FM, Hahn E. How do educational inequalities develop? The role of socioeconomic status, cognitive ability, home environment, and self-efficacy along the educational path. Intelligence. 2021;86:101528. https://doi.org/10.1016/j.intell.2021.101528.

    Article  Google Scholar 

  2. 2.

    Sorjonen K, Hemmingsson T, Lundin A, Falkstedt D, Melin B. Intelligence, socioeconomic background, emotional capacity, and level of education as predictors of attained socioeconomic position in a cohort of Swedish men. Intelligence. 2012;40:269–77. https://doi.org/10.1016/j.intell.2012.02.009.

    Article  Google Scholar 

  3. 3.

    Breen R, Goldthorpe JH. Class inequality and meritocracy: a critique of Saunders and an alternative analysis1. Br J Sociol. 1999;50:1–27. https://doi.org/10.1111/j.1468-4446.1999.00001.x.

    Article  PubMed  Google Scholar 

  4. 4.

    Breen R, Goldthorpe JH. Class, mobility and merit. Eur Sociol Rev. 2001;17:81–101.

    Article  Google Scholar 

  5. 5.

    Blane D, Smith GD, Hart C. Some social and physical correlates of intergenerational social mobility: evidence from the west of Scotland collaborative study. Sociology. 1999;33:169–83. https://doi.org/10.1177/S0038038599000097.

    Article  Google Scholar 

  6. 6.

    Deary IJ, Taylor MD, Hart CL, Wilson V, Smith GD, Blane D, et al. Intergenerational social mobility and mid-life status attainment: influences of childhood intelligence, childhood social factors, and education. Intelligence. 2005;33:455–72. https://doi.org/10.1016/j.intell.2005.06.003.

    Article  Google Scholar 

  7. 7.

    Johnson W, Brett CE, Deary IJ. The pivotal role of education in the association between ability and social class attainment: a look across three generations. Intelligence. 2010;38:55–65. https://doi.org/10.1016/j.intell.2009.11.008.

    Article  Google Scholar 

  8. 8.

    Nettle D. Intelligence and class mobility in the British population. Br J Psychol. 2003;94:551–61. https://doi.org/10.1348/000712603322503097.

    Article  PubMed  Google Scholar 

  9. 9.

    Saunders P. Social mobility in Britain: an empirical evaluation of two competing explanations. Sociology. 1997;31:261–88. https://doi.org/10.1177/0038038597031002005.

    Article  Google Scholar 

  10. 10.

    Sorjonen K, Hemmingsson T, Lundin A, Melin B. How social position of origin relates to intelligence and level of education when adjusting for attained social position. Scand J Psychol. 2011;52:277–81. https://doi.org/10.1111/j.1467-9450.2010.00871.x.

    Article  PubMed  Google Scholar 

  11. 11.

    Cohen J, Cohen P, West SG, Aiken LS. Applied multiple regression/correlation analysis for the behavioral sciences. 3rd ed. Mahwah: Lawrence Erlbaum Associates; 2003.

    Google Scholar 

  12. 12.

    Christenfeld NJS, Sloan RP, Carroll D, Greenland S. Risk factors, confounding, and the illusion of statistical control. Psychosom Med. 2004;66:868–75. https://doi.org/10.1097/01.psy.0000140008.70959.41.

    Article  PubMed  Google Scholar 

  13. 13.

    D’Onofrio BM, Sjölander A, Lahey BB, Lichtenstein P, Öberg AS. Accounting for confounding in observational studies. Annu Rev Clin Psychol. 2020;16:25–48. https://doi.org/10.1146/annurev-clinpsy-032816-045030.

    Article  PubMed  Google Scholar 

  14. 14.

    Fewell Z, Davey Smith G, Sterne JAC. The impact of residual and unmeasured confounding in epidemiologic studies: a simulation study. Am J Epidemiol. 2007;166:646–55. https://doi.org/10.1093/aje/kwm165.

    Article  PubMed  Google Scholar 

  15. 15.

    Sorjonen K, Melin B, Ingre M. Accounting for expected adjusted effect. Front Psychol. 2020;11: 542082. https://doi.org/10.3389/fpsyg.2020.542082.

    Article  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Westfall J, Yarkoni T. Statistically controlling for confounding constructs is harder than you think. PLoS ONE. 2016;11:e0152719. https://doi.org/10.1371/journal.pone.0152719.

    Article  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Dickinson ER, Adelson JL. Exploring the limitations of measures of students’ socioeconomic status (SES). Pract Assess Res Eval. 2014;19:1. https://doi.org/10.7275/mkna-d373.

    Article  Google Scholar 

  18. 18.

    Liu J, Peng P, Luo L. The relation between family socioeconomic status and academic achievement in china: a meta-analysis. Educ Psychol Rev. 2020;32:49–76. https://doi.org/10.1007/s10648-019-09494-0.

    Article  Google Scholar 

  19. 19.

    Sirin SR. Socioeconomic status and academic achievement: a meta-analytic review of research. Rev Educ Res. 2005;75:417–53. https://doi.org/10.3102/00346543075003417.

    Article  Google Scholar 

  20. 20.

    R Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. 2021. https://www.R-project.org/.

  21. 21.

    Revelle W. psych: procedures for personality and psychological research. Evanston: Northwestern University. 2020. https://CRAN.R-project.org/package=psych Version = 2.0.7.

  22. 22.

    Pearl J. Causality: models, reasoning and inference. 2nd ed. Cambridge: Cambridge University Press; 2009.

    Book  Google Scholar 

  23. 23.

    Herrnstein RJ, Murray CA. The Bell curve: intelligence and class structure in American Life. New York: Free Press; 1994.

    Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

Open access funding provided by Karolinska Institute.

Author information

Affiliations

Authors

Contributions

KS, DF, ASW, BM, and GN conceived of the study; KS acquired data, carried out the statistical analyses and wrote an initial draft with assistance from GN; KS, DF, ASW, BM, and GN critically revised the manuscript; KS, DF, ASW, BM, and GN gave final approval for publication and agree to be held accountable for the work performed therein. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Kimmo Sorjonen.

Ethics declarations

Ethical approval and consent to participate

In the present study, we have used the openly accessible 1997 National Longitudinal Survey of Youth (NLSY97) dataset. According to the homepage of the National Longitudinal Surveys (NLS, see link below), “The NLS program has established set procedures for ensuring respondent confidentiality and obtaining informed consent. These procedures comply with Federal law and the policies and guidelines of the U.S. Office of Management and Budget (OMB) and the U.S. Bureau of Labor Statistics … The U.S. Office of Management and Budget (OMB) reviews the procedures and questionnaires for each NLSY round”. https://www.nlsinfo.org/content/cohorts/nlsy97/intro-to-the-sample/confidentiality-informed-consent.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

The expected effect of Y on True X when adjusting for X is given by (2) [11]:

$${\beta }_{Y,TrX.X}=\frac{{r}_{Y,TrX}-{r}_{X,Y}{ \times r}_{TrX,X}}{1-{r}_{X,Y}^{2}}$$
(2)

If data is generated as in Fig. 3, correlations between Y and True X and between X and Y, respectively, are expected to be:

$${r}_{Y,TrX}={r}_{TrX,TrY}\times {r}_{TrY,Y}$$
(3)
$${r}_{X,Y}={r}_{TrX,TrY}\times {r}_{TrY,Y}{\times r}_{TrX,X}$$
(4)

We can replace terms in (2) with (3) and (4):

$${\beta }_{Y,TrX.X}=\frac{{r}_{TrX,TrY}\times {r}_{TrY,Y}-{r}_{TrX,TrY}\times {r}_{TrY,Y}{\times r}_{TrX,X}{ \times r}_{TrX,X}}{1-{r}_{X,Y}^{2}}$$
(5)

(5) simplifies to:

$${\beta }_{Y,TrX.X}=\frac{{r}_{TrX,TrY}\times {r}_{TrY,Y}\times (1-{r}_{TrX,X}^{2})}{1-{r}_{X,Y}^{2}}$$
(6)

We can multiply the numerator and the denominator in (6) by rTrX,X:

$${\beta }_{Y,TrX.X}=\frac{{{r}_{TrX,X}\times r}_{TrX,TrY}\times {r}_{TrY,Y}\times (1-{r}_{TrX,X}^{2})}{{r}_{TrX,X}\times (1-{r}_{X,Y}^{2})}$$
(7)

The left part of the numerator in (7) equals rX,Y (see (4)) and we can simplify:

$${\beta }_{Y,TrX.X}=\frac{{r}_{X,Y}\times (1-{r}_{TrX,X}^{2})}{{r}_{TrX,X}\times (1-{r}_{X,Y}^{2})}$$
(8)

(8) is identical to Eq. (1) in the introduction.

Fig. 3
figure3

Assumed data generation (solid arrows) and analyzed adjusted effects (dashed arrows)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sorjonen, K., Falkstedt, D., Wallin, A.S. et al. Dangers of residual confounding: a cautionary tale featuring cognitive ability, socioeconomic background, and education. BMC Psychol 9, 145 (2021). https://doi.org/10.1186/s40359-021-00653-z

Download citation

Keywords

  • Cognitive ability
  • Discrimination
  • Education
  • Residual confounding
  • Socioeconomic background
  • Switching predictors and outcomes