Challenging the myth - Does ethnic composition affect public school performance?
Authors: Daniel Juhász Vigild, Frederik Kromann Hansen & Lau Johansson
15th May 2020
Reading time: 8 min.
News articles claiming a connection between school performance and share of students with foreign backgrounds
In Denmark, a prevalent opinion in the public discourse and mainstream media is that the performance of public schools becomes lower due to a higher share of students from foreign backgrounds.
Is this true, or are there some deeper structural elements behind this narrative? Might social, economical and institutional variables impact this as well?
Public schools are the main arena for laying the seeds of social mobility. This narrative is stigmatising kids from an early age - which could create a self-fulfilling prophecy of poor performance from kids attending these schools.
Try finding your school in the interactive map below
Before diving into the analysis, check out the map below that shows an overview of the share of students from foreign backgrounds (in percent) for each school district. Try browsing through the different districts and see if you can find the one your school belongs to.
Going beyond the simple correlation
There are always at least two sides of a coin, but when you always see the same side, it is natural to believe that it is the only side. However, please hang in there and join us for a journey!
At first glance, it appears that the prevalent opinion regarding the effect of ethnic composition on school performance (a metric combining grade point average and student well-being) is confirmed if these are the only two parameters observed. But what if more dimensions are included in the analysis? Take a look at the scatterplot below, which implies a correlation between the share of students with foreign backgrounds and school performance. Try browsing through the tabs of the plot below and reflect on what you see.
A darker colour means higher average household income and a big circle means high levels of education held by parents of the students attending the school. Both dimensions seem to be correlated with school performance as well. Already, we are starting to turn the coin and show another side of the story.
Could it be that wealthier families move to school districts with better-performing public schools? Or maybe, the best opportunity for most immigrants, when moving to Copenhagen is to live in areas where families are generally less well-positioned compared to other areas, and that the general characteristics of families in these areas make it difficult for the parents to support their kids' education?
The correlation between the share of students with foreign backgrounds and school performance is still there, but it may not necessarily be because the students don't have Danish ancestors. Maybe the impact of these other dimensions are more profound than the apparent effect of the share of students with foreign backgrounds?
Let's get down to the nitty gritty...
Alright, after spending some time exploring the scatterplot above, it is now time to test our hunch – that school performance is better explained by other factors than the share of students from foreign backgrounds.
Now, to do this we need to establish a common language we can use to communicate the results of the test. When testing this type of problem, data scientists use something called linear regression. Some important fancy words: variables, coefficients, confidence intervals, significance. Variables are things we are measuring, like the share of students from foreign backgrounds, average household income and parents' education. Coefficients are the effect of variables on our target-variable (the thing we are explaining: school performance). Coefficients can be negative or positive. When talking about a coefficient we show the confidence interval, which is the interval where we are 95 pct. sure that the true effect lies. If we are more than 95 pct. sure that a variable has either a positive or negative impact on our target variable, we say that the variable has a significant effect.
Good. A quick example.
Look at the plot below of a fake model that explains weight-loss (target-variable):
Some of the variables are expected to impact weight-loss, like hours at the gym and portions of fast food eaten. Some of the variables are not expected to impact weight-loss as meetings attended and books read. And this is exactly what the plot shows! If one spends more hours at the gym it will probably result in a bigger weight-loss: a positive coefficient of a high magnitude with a small confidence interval, that does not overlap zero - indicating significance. Conversely for fast food: negative coefficient, high magnitude, small confidence interval, no overlap with zero. Finally, meetings attended and books read both have coefficients close to zero, with wide confidence intervals that overlap zero, indicating insignificance.
Cool beans! Now you know all you need to know to be able to understand the plot we are about to show. The plot will show the coefficients and confidence intervals for the share of students with a foreign background in three different models that explain the school performance metric. These models are:
Model 1: Where we only look at share of students with a foreign background.
- Share of students with foreign backgrounds
Model 2: Where we look at ethnic data and information about the institution.
- Share of students with foreign backgrounds
- Number of students attending the school
- Qualified coverage (share of hours taught by teachers with formal education in the subject)
Model 3: Where we look at ethnic data, institution information, and socio-economic conditions of the school district.
- Share of students with foreign backgrounds
- Number of students attending school
- Qualified coverage
- Unemployment rate in school districts
- Average household income in school districts
- Education score (a metric describing the levels of education held by parents of the students, the higher the education score the more years spent in school)
Interact with the model below and see how the coefficients and confidence intervals change! (Use the tabs at the top of the plot to add data from the models)
What have we learned from this? Two things:
- In Model 2, where we have ethnic data and institution data, the coefficients of the share of students with foreign backgrounds barely change. So, adding information about the institution explains almost nothing regarding school performance. We anticipated that qualified coverage should have had an impact on performance, but the lack of this could be because the municipality and schools do a pretty good job of creating equal opportunities for successful teaching. This is done across all schools in the municipality by distributing teacher-resources in a way that is beneficial for the schools where performance has historically been lower.
- When adding average household income, unemployment and parents' education level, the coefficient takes a big leap and becomes positive and insignificant. So, school performance is explained by the socio-economic conditions in the school districts, not by share of students with foreign backgrounds.
Dive into a plot of Model 3 and check out the impact of each individual variable:
It seems that the share of students with foreign backgrounds is actually irrelevant for the performance of the school if we also take into account a few variables regarding the school itself, the average household income, unemployment and educational level in the school district. And now imagine ALL THE OTHER THINGS that could impact school performance as well - just naming a few: parents' interest in kids' life, the relationship between teachers and students, parents' willingness/ability to help kids with their homework, quality of leadership of the school, and teachers' skills within education and communication. All these things are not included in Model 3 and are pretty darn hard to measure. But imagine the small, insignificant impact that share of students with foreign backgrounds would have in that big, fully informed model. Probably just as small an impact as reading books has on weight-loss.
In our humble opinion, this is a pretty convincing case, and we highly encourage politicians and authorities to think about this - and other challenges - facing Danish citizens from foreign backgrounds with more statistical nuance. Simply looking at the correlation between two variables is an oversimplification, but the consequences of this oversimplification are felt every day in the communities of the people who are affected by the way the media, politicians and authorities talk about them. Shouldn't it be possible to do better?
Now that you have learned about the relations between the different variables, check out your local school district, or any other one in Copenhagen!
Find your school
The first orange map shows the performances of the schools - an orange-red school district has a good performance. According to our analysis, the average household income affects performance. Look at the dark-green areas in the second map to identify the districts with the highest average household income. Dark purple areas in the third map represent districts with high education score. Try to compare the three new maps with the original map of the share of students from foreign backgrounds.
Use the map below to identify the performance of the school you want to investigate. Use the tabs to navigate to the variables which affect the performance (average household income and education score). Did you forget the share of students from foreign backgrounds in your school district? Don't worry - you can find the map here also! When you are done investigating the variables you can go to the last tab "School webpages". Click on the district and you will be navigated to the school's .kk.dk webpage.
Summary
What have we learned from this analysis? We will sum up the main takeaways:
- The share of students with foreign backgrounds is irrelevant for the performance of the school if other variables are taken into account.
- Average household income is the only variable which has a significant effect on school performance, while education score is almost significant but with an impact of a slightly larger magnitude than average household income.
Thanks for reading!
References - (click to expand)
www.djoef-forlag.dk/openaccess/samf/samfdocs/2009/2009_1/samf_2009_1_10.pdf
www.sondagsavisen.dk/politik-2/2016-04-01-her-er-der-flest-elever-med-udenlandsk-herkomst/
www.dst.dk/da/Statistik/Analyser/visanalyse?cid=29550
www.sondagsavisen.dk/politik-2/2015-10-23-mange-indvandrerskoler-er-for-ringe/
www.dingeo.dk/data/grundskoler/
https://uddannelsesstatistik.dk/Pages/main.aspx