For reference, the table below provides a disambiguation of how ethnicities have been grouped in this analysis.
For analyses using the disaggregated (survey) categories, the reference category is “English / Welsh / Scottish / Northern Irish / British”.
For analyses using the aggregated categories, the reference category is “White British”
Ethnicity: Survey
Ethnicity: Aggregated
Ethnicity: Binary
English / Welsh / Scottish / Northern Irish / British
White British
White British
Irish
White other
Non-White British
Gypsy or Irish Traveller
White other
Non-White British
Roma
White other
Non-White British
Any other White background
White other
Non-White British
White and Black Caribbean
Mixed/Multiple ethnic group
Non-White British
White and Black African
Mixed/Multiple ethnic group
Non-White British
White and Asian
Mixed/Multiple ethnic group
Non-White British
Any other Mixed / Multiple ethnic background
Mixed/Multiple ethnic group
Non-White British
Indian
Asian/Asian British
Non-White British
Pakistani
Asian/Asian British
Non-White British
Bangladeshi
Asian/Asian British
Non-White British
Chinese
Asian/Asian British
Non-White British
Any other Asian background
Asian/Asian British
Non-White British
African
Black/African/Caribbean/Black British
Non-White British
Caribbean
Black/African/Caribbean/Black British
Non-White British
Any other Black, Black British, or Caribbean background
Black/African/Caribbean/Black British
Non-White British
Arab
Arab/British Arab
Non-White British
Any other ethnic group
Other ethnic group
Non-White British
Don’t think of myself as any of these
Prefer not to say
Non-White British
Prefer not to say
Prefer not to say
Non-White British
NA
Prefer not to say
Non-White British
2 Chapter 2: How many outsourced workers are there in the UK?
2.1 How many UK workers are outsourced?
#how-many
Around 1 in 6 UK workers meet our definition of an outsourced worker
The ‘outsourced sub-group’ is the most dominant of the three sub-groups - meaning the total group is predominantly made up of people who self-identify as an outsourced worker and they say they are hired to do work that is long-term or ongoing. People included in this sub-group (either uniquely, or while also meeting the criteria for at least one of the other sub-groups) make up around 67% (check) of our total outsourced group, or nearly 7 in 10. This group makes up X of all UK workers.
In terms of the the different possible types of outsourced groups2, the numbers are as follows:
Definitely outsourced: 11%
Likely agency: 3%
High indicators: 3%
People included in this sub-group (either uniquely, or while also meeting the criteria for at least one of the other sub-groups) make up around 68% of our total outsourced group. This group makes up 11% of all UK workers.
#non-exclusive-subgroups1
The two other sub-groups – the agency and indicators sub-groups – are less dominant in comparison. Around 58% of all respondents meet the criteria for either or both of these sub-groups, but this falls to around 33% if we exclude people who are already captured in the outsourced sub-group. Excluding the first sub-group, these other two groups makes up X of all UK workers.
The percentages here refer to the number of people who are outsourced (super-ordinate group), not the total number of respondents. Below I provide percentages as function of the outsourced super-ordinate group as well as the total sample
Group criteria
Outsourced, defined as responding ‘I am sure I am outsourced’ or ‘I might be outsourced’, and responding ‘I do work on a long-term basis’.
Likely agency, defined as those responding ‘I am sure I am agency’ and ‘I do work on a long-term basis’, excluding those people who are already defined as being outsourced.
High indicators: defined as responding TRUE to 5 or 6 of the outsourcing indicators, as well as responding ‘I do work on a long-term basis’, excluding those people who are already defined as outsourced or likely agency.
Including outsourced group
agency_or_indicator
freq
n
total
perc
N
agency
342.6956
344
10155
3.374649
10155
both
106.3656
116
10155
1.047421
10155
indicator
513.2645
516
10155
5.054303
10155
neither
9192.6744
9179
10155
90.523627
10155
Exluding outsourced group
agency_or_indicator
freq
n
total
perc
N
agency
231.43068
231
8993.922
2.5731897
9032
both
35.10624
38
8993.922
0.3903329
9032
indicator
280.74106
291
8993.922
3.1214531
9032
neither
8446.64421
8472
8993.922
93.9150243
9032
9.48% of the whole sample meet the criteria for either or both of these sub-groups. This falls to 6.08% if we exclude people who are already captured in the outsourced sub-group.
Out of those who are in the ‘outsourced’ status (i.e., the combination of the three outsourced groups), 57.99% meet the criteria for either or both of these sub-groups, but this falls to around 33.27% if we exclude people who are already captured in the outsourced sub-group.
#non-exclusive-subgroups2
There is some overlap between these sub-groups, but they are not like for like. Just over a quarter (27%) of respondents are in more than one sub-group, while nearly three quarters (73%) of respondents are uniquely captured in just one of the three sub-groups.
Just over a quarter (26.35%) of respondents are in more than one sub-group, while nearly three quarters (73.65%) of respondents are uniquely captured in just one of the three sub-groups.3
2.2 Evaluating our total estimate
#evaluating-total-estimate To do
Around 1 in 4 “outsourced” respondents sit in more than one sub-group within our definition, but around 3 in 4 are uniquely captured in just one of the three sub-groups - predominantly in the outsourced sub-group.
As figure X shows, not all respondents in the outsourced sub-group said yes five or six of our six outsourcing
3.2 Evidence paints a racialised picture of outsourcing in the UK, with links to both ethnicity and migration
#ethnicity
More than 1 in 4 (nearly 1/3) outsourced workers are from an ethnic minority background
Workers from ethnic minority backgrounds are disproportionately over-represented in outsourced work in the UK, and typically more likely to be outsourced than White British workers.
Overall, 22% of non-outsourced workers are from an ethnic minority background, rising to 33% of outsourced workers – a more than ten percentage point difference. This means that while just over 1 in 6 non-outsourced workers in our sample were from an ethnic minority background, nearly 1 in 3 outsourced workers were.
People from an ethnic minority background are overall 1.75 times more likely to be outsourced than people from a White British background.
Workers from Arab backgrounds are 3.86 times more likely than White workers to be outsourced; (check sample size – are we confident in all of these significance tests, or should we just use some of them in these bullet points?)
Workers from Black backgrounds are 2.33 times more likely than White workers to be outsourced.
Workers from Asian backgrounds are 1.98 times more likely than White workers to be outsourced
Workers from Mixed Ethnicity backgrounds are 1.86 times more likely than White workers to be outsourced
White other workers are 1.32 times more likely than White British workers to be outsourced
People from an ethnic minority are 1.75 times more likely to be outsourced than people from a White British background; 33.09% of outsourced workers are from an ethnic minority, compared to 21.99% of non-outsourced workers.16
Overall, there is no interaction between being from a minority and outsourced on whether you are low paid. i.e., being from an ethnic minority and outsourced is not associated with being in the low pay group.17
However there is nuance in the groups. We do find evidence to suggest that among White British people, outsourced people are 1.35 times more likely to be in the low income group compared to non-outsourced people, and among Mixed ethnicity people, outsourced people are 2.8 times more likely to be in the low income group compared to non-outsourced people.18
Looking at this with disaggregated ethnicities indicates that among “English / Welsh / Scottish / Northern Irish / British” workers, outsourced people are 1.35 times more likely to be in the low income group compared to non-outsourced people. Among “White and Asian” workers, outsourced workers are 7.66 times more likely to be in the low income group compared to non-outsourced workers.19
Ethnicity (binary) by outsourcing status and income group(%)
outsourcing_status
income_group
White British
Non-White British
Not outsourced
Not low
78.53
21.47
Not outsourced
Low
80.18
19.82
Outsourced
Not low
64.95
35.05
Outsourced
Low
68.67
31.33
Comparison of ethnicities indicates that some groups are statistically more likely to be outsourced than others20:
Arab/British Arab workers are 3.386 times more likely than White British workers to be outsourced.
Asian/Asian British workers are 1.982 times more likely than White British workers to be outsourced.
Black/African/Caribbean/Black British workers are 2.334 times more likely than White British workers to be outsourced.
Mixed/Multiple ethnic group workers are 1.865 times more likely than White British workers to be outsourced.
Prefer not to say workers are 1.389 times more likely than White British workers to be outsourced.
White other workers are 1.315 times more likely than White British workers to be outsourced.
Comparison of more disaggregated ethnicities indicates more nuance21:
Any other White background workers are 1.41 times more likely than White British workers to be outsourced.
White and Black African workers are 4.12 times more likely than White British workers to be outsourced.
Any other Mixed / Multiple ethnic background workers are 2.73 times more likely than White British workers to be outsourced.
Indian workers are 1.79 times more likely than White British workers to be outsourced.
Pakistani workers are 3.23 times more likely than White British workers to be outsourced.
Bangladeshi workers are 2.48 times more likely than White British workers to be outsourced.
Any other Asian background workers are 2.18 times more likely than White British workers to be outsourced.
African workers are 2.57 times more likely than White British workers to be outsourced.
Any other Black, Black British, or Caribbean background workers are 2.65 times more likely than White British workers to be outsourced.
Arab workers are 3.39 times more likely than White British workers to be outsourced.
#ethnicity-sub-group
These differences in ethnicity also shift slightly depending on which outsourced “sub-group” we look at. For example, compared to White British workers, Black outsourced workers are more likely to be in the “outsourced sub-group” meaning they have self-identified as outsourced, or the “agency sub-group”, meaning they are agency workers doing more long-term and ongoing work. Are there any other interesting points to mention here? Should we do a chart showing this different across sub-groups? Do we need an interpretive comment in this section?
# weights: 36 (24 variable)
initial value 14077.819237
iter 10 value 6009.847927
iter 20 value 5984.124702
iter 30 value 5983.869764
final value 5983.869675
converged
Breaking down by outsourcing group helps to separate out the type of outsourced work people from the ethnicities identified above engage in.22 Compared to White British workers,
Arab people are more likely to be likely agency or high indicators
Asian people are more likely to be in any of the groups
Black people are more likely to be likely agency or outsourced
People of mixed ethnicity are more likely to be outsourced
People who selected Other ethnicity are more likely to be agency
White other people are more likely to be outsourced
# weights: 88 (63 variable)
initial value 13604.387456
iter 10 value 5752.921034
iter 20 value 5738.642702
iter 30 value 5738.326928
iter 40 value 5738.207808
iter 50 value 5738.195963
final value 5738.195716
converged
More nuance from disaggregated ethnicities23. The table below shows the likelihood of workers of different ethnicities falling into each of the outsourcing groups, compared to White British workers. Note that only significant relationships are shown here. Note also that the ‘n’ for many of these statistics is very low. As such many of these statistics are illustrative but not inferential.
Likelihood of belonging to different groups compared to White British. Note: NAs are non-sig. relationships. 'n_' is sample size, 'freq_' is weighted sample size
Ethnicity
Outsourced
Likely agency
High indicators
n_Outsourced
n_Likely agency
n_High indicators
freq_Outsourced
freq_Likely agency
freq_High indicators
Gypsy or Irish Traveller
NA
0.00
0.00
2
NA
NA
2.48
NA
NA
Any other White background
1.59
NA
NA
63
10
7
72.25
13.33
8.37
White and Black African
4.59
NA
NA
21
2
3
11.08
0.91
2.62
Any other Mixed / Multiple ethnic background
NA
4.87
NA
15
5
3
9.84
4.33
1.71
Indian
1.57
NA
2.64
32
8
15
43.96
11.83
18.18
Pakistani
2.88
3.83
4.11
29
8
12
32.69
9.74
11.43
Bangladeshi
2.84
NA
NA
15
3
3
17.95
2.61
2.48
Any other Asian background
2.17
2.66
NA
17
5
4
30.35
8.34
6.10
African
2.54
3.09
NA
74
22
15
47.20
12.82
9.93
Any other Black, Black British, or Caribbean background
3.13
NA
NA
13
1
2
9.46
1.16
1.16
Arab
NA
6.30
6.15
3
2
2
4.97
3.42
3.63
Any other ethnic group
NA
6.35
NA
1
1
1
1.52
3.93
1.60
Don’t think of myself as any of these
NA
NA
0.00
4
1
NA
2.54
0.40
NA
Prefer not to say
NA
NA
6.94
1
1
2
1.67
0.52
4.72
#ethnicity-pay-split
On the low-pay / high-pay split, you say “A person is more likely to be in the low income group if they are: Older; Female; Prefer not to say when they arrived, And less likely if they are: Asian/Asian British; Live in North West or Wales; Arrived in the UK in last 30 years”; Can I confirm this means we don’t see any other significant differences in the ethnicity breakdown if we look at high paid vs low paid workers? If so, let’s clarify what this says about how ethnicity relates to a) outsourced workers being disproportionately low paid, but b) ethnic minority workers being no more likely to be in our low pay group.
Using the new ethnicity groupings, there is no evidence indicating that any ethnicity is more or less likely to be in the low income group
Note to self: This could benefit from stepwise regression
A person is more likely to be in the low income group if they are:
Older
Female
Don’t have a degree (or don’t know if they have a degree?)
Are outsourced
Arrived in the UK in the last year
And less likely if they are:
Younger
Male
Have a degree
Live in the North West or Wales (compared to London)
Arrived in the UK in last 30 years
#migration
As you would expect, the vast majority of outsourced workers were born in the UK. However, we still see a significantly higher likelihood of outsourced workers having been born outside of the UK compared to people who aren’t outsourced. While around 14% of non-outsourced workers were born outside of the UK, this rose to just over 24% for outsourced workers – or nearly 1 in 4.
Overall, people who were born outside of the UK are 1.94 times more likely to be in outsourced work than people who were born here.
As for non-outsourced workers, the vast majority of outsourced workers are born in the UK. However, people not born in the UK are more likely to be outsourced than people born in the UK. 24.13% of outsourced workers are not born in the UK, compared to 14.08% of non-outsourced workers.24 This difference is statistically significant; outsourced workers are 1.94 times more likely to have been born outside the UK than non-outsourced workers.25
#migration-sub-groups
This pattern broadly holds across our three outsourcing sub-groups, with nearly no difference in the likelihood of people born outside of the UK being in any one of the three groups.
# weights: 12 (6 variable)
initial value 14077.819237
iter 10 value 6002.136126
final value 6002.013178
converged
#ethnicity-migration-interaction. Some attention needed here
Among all workers who were born in the UK:
Black workers are 2.01 times more likely to be outsourced than a White worker
Asian workers are 2.02 times more likely to be outsourced than a White worker.
Workers from Other ethnic backgrounds are X times more likely to be outsourced than a White other worker
For workers born outside of the UK:
Among White workers, someone not born in the UK is 1.82 times more likely to be outsourced than someone born in the UK.
Among workers from Mixed ethnic backgrounds, someone not born in the UK is 2.73 times more likely to be outsourced than someone born in the UK.
For workers from other ethnicities, it doesn’t matter whether you are born in the UK or not – you are equally likely as a Black or an Asian worker to be outsourced, whether you were born in the UK or somewhere else. And compared to a White person born in the UK, Black African and South Asian workers specifically are more likely to be outsourced, whether or not they were born in the UK . Does this need any further detail or explanation
To discuss confidence in our interpretation in this section: The evidence on ethnicity and country of birth clearly paints a racialised picture of outsourcing, and one with colonial undertones, as Black African and South Asian workers see a higher risk of being outsourced compared to White British workers, regardless of their country of birth. This obviously raises further questions about why, linked to (sector, occupation, labour market inequality and structural racism). Discuss the draft interpretation in the comment on the right.
However, workers from non-White ethnic groups are not the only workers who see a higher risk of being outsourced: Non-UK-born White workers are also more likely to be outsourced than UK-born White people . Ethnicity and country of birth interact independently for some groups, but seem to be fundamentally connected for others.
Exploring the intersection of ethnicity and arrival time reveals some patterns whereby the likelihood of a person being outsourced is related to the combinations of ethnicity and whether they were born in the UK.26 The plot below shows that
Among workers born in the UK, a Black worker is 2.01 times more likely to be outsourced than a White British worker.
Among workers born in the UK, an Asian worker is 2.03 times more likely to be outsourced than a White British worker.
Among workers born in the UK, an Other ethnicity worker is 5.63 times more likely to be outsourced than a White other worker.
Among workers not born in the UK, a White other worker is 0.58 times as likely (i.e., less likely) to be outsourced than a White British worker.
Among workers not born in the UK, a White other worker is 0.53 times as likely (i.e., less likely) to be outsourced than a Black worker.
Among workers not born in the UK, a White other worker is 0.37 times as likely (i.e., less likely) to be outsourced than a worker of mixed ethnicity.
Among White British workers, someone not born in the UK is 2.48 times more likely to be outsourced than someone born in the UK.
Among Mixed workers, someone not born in the UK is 2.73 times more likely to be outsourced than someone born in the UK.
Among people who preferred not to say their ethnicity, someone not born in the UK is 1.95 times as likely (i.e.,-95% less likely) to be outsourced than someone born in the UK.
For people born in UK, if you are Pakistani you are more likely to be outsourced than if you are White.
For White people and for White and Asian people, if you’re not born in UK you’re more likely to be outsourced.
#migration-by-pay-split
If we do a basic “born UK / not born UK” split, looking by low and high pay, what % of the low-paid workers group were born outside of the UK, vs in the high-paid group?
20.96% of outsourced workers in the low pay group were not born in the UK, compared to 26.39% of people in the not low pay group. This difference is marginally statistically significant; someone in the low income group is less likely to be born outside the UK than someone in the not low income group. This pattern is the same for non outsourced workers, and when we consider the interaction between outsourcing status and migration status, the only factor predicting income group is outsourcing status.
3.3 Outsourced workers are on average younger than non-outsourced workers
#age
We find that outsourced workers are significantly younger than non-outsourced workers, on average. The median age of an outsourced worker is 35, compared to a median age of 43 for a non-outsourced worker.
the outsourced and indicator sub-groups – people who directly said that they were or might be outsourced, or ticked a high number of our indicators of outsourced working – see higher proportions of younger workers than the “agency” sub-group.
#age-violin
INSERT VIOLIN PLOT CHART HERE SHOWING MEDIAN AGE OF EACH SUB-GROUP, COMPARED TO NON-OUTSOURCED WORKERS. Is this necessary? We already have the density plots
Outsourced workers are on average younger than non-outsourced workers. The median age of the outsourced group is 36 , compared to 43 for the not outsourced group.28 This difference is statistically significant.29
Outsourcing group
Mean
Median
Min
Max
Standard dev.
N
Not outsourced
42.80
43
16
80
13.08
8472
Outsourced
38.63
36
16
78
13.07
1683
The higher concentration of younger workers identified above appears to be driven primarily by the ‘outsourced’ and ‘high indicator’ groups, whilst the ‘likely agency’ group follows a similar pattern to the non-outsourced group.30
Outsourcing status
Income group
Mean
Median
Min
Max
Standard dev.
N
Not outsourced
Not low
41.97
41
18
78
12.47
5280
Not outsourced
Low
42.87
43
16
80
15.09
1644
Outsourced
Not low
37.96
35
18
77
12.53
986
Outsourced
Low
39.05
37
16
78
14.06
381
Outsourcing group
Mean
Median
Min
Max
Standard dev.
N
Not outsourced
42.80
43
16
80
13.08
8472
Outsourced
38.40
35
16
78
13.09
1123
Likely agency
39.80
38
18
77
13.49
269
High indicators
38.49
35
18
72
12.55
291
Outsourcing group
Income group
Mean
Median
Min
Max
Standard dev.
N
Not outsourced
Not low
41.97
41.00
18
78.0
12.47
5280
Not outsourced
Low
42.87
43.00
16
80.0
15.09
1644
Outsourced
Not low
37.81
34.52
18
67.0
12.57
625
Outsourced
Low
39.07
37.00
16
78.0
13.89
272
Likely agency
Not low
39.33
38.00
18
77.0
12.66
168
Likely agency
Low
39.35
37.00
19
71.5
15.66
63
High indicators
Not low
37.29
35.00
18
65.0
12.25
193
High indicators
Low
38.42
34.59
19
67.0
12.82
46
#gender
The evidence also finds meaningful differences by gender between the outsourced and non-outsourced groups in our data. Men make up 56% of the outsourced workforce compared to 47% of the non-outsourced workforce, a nearly 10 percentage point difference.
Outsourced workers are 1.44 times more likely to be male than female.
The group with the largest proportion of men in the workforce is the ‘high indicators’ group (66.35%), followed by the ‘likely agency’ group (56.66%), followed by the ‘outsourced’ group (53.94%). Comparison of outsourced and non-outsourced workers finds that
Someone in the high indicators sub-group is 2.18 times more likely to be male than female.
Someone in the agency sub-group is 1.45 times more likely to be male than female.
Someone in the outsourced sub-group is 1.31 times more likely to be male than female.
#gender-sector
Possible addition: Will readers want to know more about how this intersects with the roles or sectors with higher rates of outsourcing – even if this is just an interpretive comment from us on how gender interacts with jobs and sectors more generally in the labour market?
# weights: 12 (6 variable)
initial value 14077.819237
iter 10 value 7610.573378
iter 20 value 7465.550476
final value 7465.517316
converged
The outsourced workforce consists of a greater proportion of males than the non-outsourced workforce.31 Men make up 56% of the outsourced workforce compared to 47% of the non-outsourced workforce. This difference is statistically significant; outsourced workers, compared to non-outsourced workers, are 1.44 times more likely to be male than female.32
Call:
glm(formula = outsourcing_status ~ Gender, family = "quasibinomial",
data = data, weights = NatRepemployees)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.78606 0.03987 -44.792 < 0.0000000000000002 ***
GenderMale 0.36421 0.05365 6.788 0.000000000012 ***
GenderOther 0.20126 0.68008 0.296 0.767
GenderPrefer not to say -0.24251 0.38958 -0.622 0.534
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasibinomial family taken to be 1.000395)
Null deviance: 9201.8 on 10154 degrees of freedom
Residual deviance: 9153.9 on 10151 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 4
# weights: 20 (12 variable)
initial value 14077.819237
iter 10 value 7977.307669
iter 20 value 7461.899083
iter 30 value 7457.852026
iter 40 value 7457.374598
final value 7457.362521
converged
Breaking down by outsourcing group, we find that the group with the largest proportion of men in the workforce is the ‘high indicators’ group (66.35%), followed by the ‘likely agency’ group (56.66%), followed by the ‘outsourced’ group (53.94%). Statistically speaking, compared to a not outsourced person,
Someone in the high indicators group is 2.18 times more likely to be male than female.
Someone in the likely agency group is 1.45 times more likely tobe male than female.
Someone in the outsourced group is 1.31 times more likely tobe male than female.
Additionally, people identifying as ‘Other’ gender are absent from the high indicators and likely agency groups, though given the small N (14) for this group, this finding is unlikely to be meaningful.
3.4 Outsourced workers are more likely to work in some sectors than others; but seem to be spread across the labour market
#sectors
The three most common sectors for outsourced workers in our survey to be employed within – excluding those with an N size below X (50?) – were administrative and support service activities; water supply, sewerage, waste supply and remediation activities; and other service activities
Five of the twenty employment sectors have at least 1 in 5 of their workforce “outsourced”: more than the average of around 17% across the whole workforce.
Here we explore what proportion of workers in each sector are outsourced.33
The plot below shows the proportion of outsourced and not outsourced workers within each sector. I.e. this is showing what sectors have higher and lower proportions of outsourced workers.
The top three Sectors with the highest proportion of outsourced workers are:
ACTIVITIES OF HOUSEHOLDS AS EMPLOYERS; UNDIFFERENTIATED GOODS-AND SERVICES-PRODUCING ACTIVITIES OF HOUSEHOLDS FOR OWN US (note that N = 31)
ADMINISTRATIVE AND SUPPORT SERVICE ACTIVITIES
WATER SUPPLY; SEWERAGE, WASTE MANAGEMENT AND REMEDIATION ACTIVITIES
Note that for an undefined sector (‘Not found’) contained one of the largest proportions of outsourced workers (31% of workers in the ‘Not found’ category were outsourced).
A key takeaway here is that whereas the total outsourced population is 17%, this figure varies by sector, from 0% for Mining… and Extraterritoral organisations… all the way to 36% for Activities of households as employers, with 5 out 20 sectors having at least 20% of their workforce outsourced.
#sectors-ogroup
Figure X also shows how the total outsourced group in each sector splits into our three outsourced “sub-groups”. We find – as you might expect, based on its dominance within the group of outsourced workers – that outsourced workers in every sector are most likely to be in the “outsourced sub-group”, i.e. those who self-identified as outsourced workers.
4 Pay
’#pay
Using regression analysis, we find that outsourced workers are on average paid £2170 less than non-outsourced workers .
The “outsourced sub-group” earns £3,813 less, and the “agency sub-group” £2,603 less, than the non-outsourced group. This finds that pay is lowest in the “outsourced sub-group” of workers, i.e. those who directly identified themselves as being outsourced. Figure X below shows the median and distribution of pay across the three outsourced sub-groups and the non-outsourced group, for comparison.
#pay-violin
Violin plot for the above
The tables and plots below show descriptive statistics on income and its distribution for outsourced and non-outsourced people. Regression analysis shows that outsourced workers are on average paid £2170 less annually than non-outsourced workers.34 Per week, outsourced workers are on average paid £47 less than non-outsourced workers
The tables and plots below show descriptive statistics on income and its distribution for outsrouced groups. Only the full outsourced subgroup has lower income than non-outsourced people. Regression analysis shows that outsourced workers are on average paid £3100 less annually than non-outsourced workers.36 Per week, outsourced workers are on average paid £67 less than non-outsourced workers
This difference increases to £2951 annually (£63 per week) when we take into account Age, Gender, Education, Ethnicity, Region, and Arrival Time. 38 This analysis shows that all other variables, apart from Age, are in some way relevant to income. On average, and controlling for each of the other variables in the model.
Annually:
Men earn £7028 more than women.
People who have a degree earn £8195 more than people without a degree.
Workers in all non-London regions earn less than workers in London
East Midlands: -£5770
East of England: -£4074
North East: -£4850
North West: -£4476
Northern Ireland: -£6546
Scotland: -£5466
South East: -£3406
Wales: -£5366
West Midlands: -£5002
Yorkshire and the Humber: -£5524
People who arrived in the UK within the last year earn £6136 less than people born in the UK
People who arrived in the UK within the last 3 years earn £2392 less than people born in the UK
People who arrived in the UK within the last 5 years earn £2031 less than people born in the UK
People who arrived within the last 30 years earn £3501 more than people born in the UK.
People who have a degree earn £176 more than people without a degree.
Workers in all non-London regions earn less than workers in London
East Midlands: -£124
East of England: -£88
North East: -£104
North West: -£96
Northern Ireland: -£141
Scotland: -£117
South East: -£73
Wales: -£115
West Midlands: -£107
Yorkshire and the Humber: -£119
People who arrived in the UK within the last year earn £132 less than people born in the UK
People who arrived in the UK within the last 3 years earn £51 less than people born in the UK
People who arrived in the UK within the last 5 years earn £44 less than people born in the UK
People who arrived within the last 30 years earn £75 more than people born in the UK.
4.1 Gender pay gap
#gender-pay-gap
On average within our sample, male workers earn £6400 more than female workers per year; but further exploration of how pay relates to gender for outsourced workers suggests that this gender pay gap doesn’t differ in a statistically significant way depending on whether workers are outsourced or not
For female outsourced workers, this suggests that being an outsourced worker neither exacerbates nor diminishes the gender pay gap they face compared to male workers. Check what this controls for
Exploring the gender pay gap by outsourcing status indicates that the pay gap does not differ depending on whether workers are outsourced our not. For non-outsourced workers, females are paid £5800.82 less than males. For outsourced workers, females are paid £6399.5 less than males. The difference between non-outsourced and outsourced workers is not significant.
Exploring the gender pay gap by outsourcing status indicates that the pay gap does not differ depending on whether workers are outsourced our not. For non-outsourced workers, females are paid £124.63 less than males. For outsourced workers, females are paid £137.5 less than males. The difference between non-outsourced and outsourced workers is not significant.
The gender by outsourcing status is also not relevant for whether a worker is low income (i.e. non-sig relationship with income_group).
A person is more likely to be in the low income group if they are:
Older
Female
Don’t have a degree (or don’t know if they have a degree?)
Are outsourced
Arrived in the UK in the last year
And less likely if they are:
Younger
Male
Have a degree
Live in the North West or Wales (compared to London)
Arrived in the UK in last 30 years
#gender-by-pay-split
Is there already a basic low / high pay split for gender? I know you talk about women being more likely to be in the low-paid group, but again not sure if there is just a basic “women make up x% of low pay group and x% of not low pay group”?
60.34% of outsourced workers in the low pay group were female, compared to 35.85% of outsourced workers in the not low pay group. This difference is statistically significant; women are more likely to be in the low income group. This pattern is the same for non outsourced workers, and there is no interaction effect; irrespective of outsourcing status, women are more likely to be low paid, and irrespective of gender, outsourced people are more likely to be low paid.
#pay-gap-sector
Overall, we find that workers in administrative and support service activities – one of the dominant sectors for outsourced workers in this research – are more likely to be lower-paid than non-outsourced workers in the same sector. The same is true for outsourced water supply (full name; sewerage, waste etc.) workers – another prominent outsourcing sector – information and communication, transportation and storage, and education workers, amongst others. In contrast, we find outsourced workers in financial and insurance activities, for example, appear to be slightly higher paid on average than their non-outsourced counterparts; however, this is one of the few sectors in which this appears to be the case.to be confirmed
I don’t quite understand the chart below the above chart in the file, would you be able to explain it – thanks! Is this the best chart to use, above? Does this need to control for anything else to show us the most accurate analysis of pay by sector for outsourced and non outsourced, or are we confident that this is showing us something notable about sector and pay?
4.2 Sectors/occupations
4.2.1 Sector and occupation hierarchy
The data from Opinium has four variables relating to sectors/occupations. These are
SectorName
Majorgroupcode
MajorsubgroupOccupation
UnitOccupation
SOC 2020 has nine major groups, 26 sub-major groups, 104 minor groups and 412 unit groups. The variables we have appear to map in the following way:
Majorgroupcode = the 9 ‘major groups’
MajorsubgroupOccupation = the 26 ‘sub-major’ groups
UnitOccupation = the 104 ‘minor groups’
This last pairing is the point of confusion. The ‘UnitOccupation’ wording came from Opinium and these categories match the coding index where they are confusingly referred to as ‘unit groups’ even though they are the minor groups.
There is no variable in our data that relates to the most disaggregated category, the 412 ‘unit groups’.
The unique values of each variable are shown in each section below.
4.2.1.1 SectorName
SectorName_labelled
FINANCIAL AND INSURANCE ACTIVITIES
WHOLESALE AND RETAIL TRADE; REPAIR OF MOTOR VEHICLES AND MOTORCYCLES
TRANSPORTATION AND STORAGE
MANUFACTURING
INFORMATION AND COMMUNICATION
ELECTRICITY, GAS, STEAM AND AIR CONDITIONING SUPPLY
CONSTRUCTION
PUBLIC ADMINISTRATION AND DEFENCE; COMPULSORY SOCIAL SECURITY
PROFESSIONAL, SCIENTIFIC AND TECHNICAL ACTIVITIES
WATER SUPPLY; SEWERAGE, WASTE MANAGEMENT AND REMEDIATION ACTIVITIES
HUMAN HEALTH AND SOCIAL WORK ACTIVITIES
EDUCATION
OTHER SERVICE ACTIVITIES
ADMINISTRATIVE AND SUPPORT SERVICE ACTIVITIES
ACCOMMODATION AND FOOD SERVICE ACTIVITIES
AGRICULTURE, FORESTRY AND FISHING
MINING AND QUARRYING
ARTS, ENTERTAINMENT AND RECREATION
REAL ESTATE ACTIVITIES
ACTIVITIES OF HOUSEHOLDS AS EMPLOYERS; UNDIFFERENTIATED GOODS-AND SERVICES-PRODUCING ACTIVITIES OF HOUSEHOLDS FOR OWN US
ACTIVITIES OF EXTRATERRITORIAL ORGANISATIONS AND BODIES
Not found
4.2.1.2 Majorgroupcode
These are the 9 major groups according to SOC
Majorgroupcode_labelled
ADMINISTRATIVE AND SECRETARIAL OCCUPATIONS
MANAGERS, DIRECTORS AND SENIOR OFFICIALS
ELEMENTARY OCCUPATIONS
SALES AND CUSTOMER SERVICE OCCUPATIONS
ASSOCIATE PROFESSIONAL OCCUPATIONS
SKILLED TRADES OCCUPATIONS
PROFESSIONAL OCCUPATIONS
PROCESS, PLANT AND MACHINE OPERATIVES
CARING, LEISURE AND OTHER SERVICE OCCUPATIONS
4.2.1.3 MajorsubgroupOccupation
These are the 26 ‘sub-major’ groups
MajorsubgroupOccupation_labelled
ADMINISTRATIVE OCCUPATIONS
CORPORATE MANAGERS AND DIRECTORS
ELEMENTARY ADMINISTRATION AND SERVICE OCCUPATIONS
CUSTOMER SERVICE OCCUPATIONS
BUSINESS AND PUBLIC SERVICE ASSOCIATE PROFESSIONALS
SKILLED CONSTRUCTION AND BUILDING TRADES
SCIENCE, RESEARCH, ENGINEERING AND TECHNOLOGY PROFESSIONALS
BUSINESS, MEDIA AND PUBLIC SERVICE PROFESSIONALS
PROCESS, PLANT AND MACHINE OPERATIVES
CARING PERSONAL SERVICE OCCUPATIONS
TRANSPORT AND MOBILE MACHINE DRIVERS AND OPERATIVES
SKILLED METAL, ELECTRICAL AND ELECTRONIC TRADES
CULTURE, MEDIA AND SPORTS OCCUPATIONS
SALES OCCUPATIONS
LEISURE, TRAVEL AND RELATED PERSONAL SERVICE OCCUPATIONS
SECRETARIAL AND RELATED OCCUPATIONS
HEALTH AND SOCIAL CARE ASSOCIATE PROFESSIONALS
HEALTH PROFESSIONALS
SKILLED AGRICULTURAL AND RELATED TRADES
TEACHING AND OTHER EDUCATIONAL PROFESSIONALS
SCIENCE, ENGINEERING AND TECHNOLOGY ASSOCIATE PROFESSIONALS
PROTECTIVE SERVICE OCCUPATIONS
OTHER MANAGERS AND PROPRIETORS
ELEMENTARY TRADES AND RELATED OCCUPATIONS
TEXTILES, PRINTING AND OTHER SKILLED TRADES
COMMUNITY AND CIVIL ENFORCEMENT OCCUPATIONS
4.2.1.4 UnitOccupation
These are indeed the 104 ‘minor groups’.
UnitOccupation_labelled
Administrative Occupations: Finance
Managers and Directors in Retail and Wholesale
Functional Managers and Directors
Elementary Storage Occupations
Customer Service Occupations
Business Associate Professionals
Construction and Building Trades Supervisors
Information Technology Professionals
Legal Professionals
Production, Factory and Assembly Supervisors
Administrative Occupations: Government and Related Organisations
Caring Personal Services
Mobile Machine Drivers and Operatives
Quality and Regulatory Professionals
Metal Forming, Welding and Related Trades
Artistic, Literary and Media Occupations
Metal Machining, Fitting and Instrument Making Trades
Sales Assistants and Retail Cashiers
Other Administrative Occupations
Production Managers and Directors
Regulatory Associate Professionals
Public Services Associate Professionals
Animal Care and Control Services
Administrative Occupations: Office Managers and Supervisors
Engineering Professionals
Shopkeepers and Sales Supervisors
Elementary Security Occupations
Other Elementary Services Occupations
Legal Associate Professionals
Architects, Chartered Architectural Technologists, Planning Officers, Surveyors and Construction Professionals
Housekeeping and Related Services
Secretarial and Related Occupations
Welfare and Housing Associate Professionals
Sales, Marketing and Related Associate Professionals
Nursing Professionals
NA
Natural and Social Science Professionals
Agricultural and Related Trades
Elementary Administration Occupations
Process Operatives
Other Educational Professionals
Road Transport Drivers
Information Technology Technicians
Business and Financial Project Management Professionals
Finance Professionals
Therapy Professionals
Electrical and Electronic Trades
Teaching Professionals
HR, Training and Other Vocational Associate Guidance Professionals
Construction Operatives
Protective Service Occupations
Sales Related Occupations
Leisure and Travel Services
Chief Executives and Senior Officials
Teaching and Childcare Support Occupations
Managers and Proprietors in Health and Care Services
Managers and Proprietors in Other Services
Managers in Logistics, Warehousing and Transport
Other Health Professionals
Elementary Construction Occupations
Business, Research and Administrative Professionals
Veterinary nurses
Research and Development (R&D) and Other Research Professionals
Assemblers and Routine Operatives
Welfare Professionals
Science, Engineering and Production Technicians
Finance Associate Professionals
Plant and Machine Operatives
Elementary Cleaning Occupations
Food Preparation and Hospitality Trades
Construction and Building Trades
Senior Officers in Protective Services
Managers and Proprietors in Hospitality and Leisure Services
Other Drivers and Transport Operatives
Health Associate Professionals
Health and Social Services Managers and Directors
Skilled Metal, Electrical and Electronic Trades Supervisors
Administrative Occupations: Records
Sports and Fitness Occupations
Medical Practitioners
Media Professionals
Web and Multimedia Design Professionals
Transport Associate Professionals
Conservation and Environment Professionals
Vehicle Trades
Elementary Process Plant Occupations
Teaching and Childcare Associate Professionals
Cleaning and Housekeeping Managers and Supervisors
Librarians and Related Professionals
Customer Service Supervisors
Elementary Sales Occupations
Veterinarians
Hairdressers and Related Services
Printing Trades
Building Finishing Trades
Managers and Proprietors in Agriculture Related Services
4.2.2.3 Comparing pay penalty between weekly and hourly
Note only consider n >= 10
The table below shows the pay difference between outsourced and non-outsourced workers by sector. Negative values indicate pay penalties for outsourced workers. The ‘pattern_reverse’ column indicates the 4 sectors where the direction of the difference is different if you consider hourly versus weekly pay difference.
For example, per week, outsourced workers in PROFESSIONAL, SCIENTIFIC AND TECHNICAL ACTIVITIES earn £1.77 less than non-outsourced counterparts, but per hour they are paid on average 1.3y more than non-outsourced workers. This suggests that outsourced rates are higher in this occupation, but the amount of work available is not enough for outsourced people to earn more than non-outsourced people on a weekly basis.
The reverse pattern indicates sectors where outsourced workers are paid less per hour but work more hours and earn more per week than their non-outsourced counterparts.
Weekly and hourly pay difference by sector
sector_name_labelled
weekly_pay_diff
hourly_pay_diff
pattern_reverse
WATER SUPPLY; SEWERAGE, WASTE MANAGEMENT AND REMEDIATION ACTIVITIES
-143.703719
-2.4894583
0
ACTIVITIES OF HOUSEHOLDS AS EMPLOYERS; UNDIFFERENTIATED GOODS-AND SERVICES-PRODUCING ACTIVITIES OF HOUSEHOLDS FOR OWN US
-127.012080
0.5256981
1
PUBLIC ADMINISTRATION AND DEFENCE; COMPULSORY SOCIAL SECURITY
-108.857788
-2.8196227
0
TRANSPORTATION AND STORAGE
-106.492790
-2.4819904
0
ADMINISTRATIVE AND SUPPORT SERVICE ACTIVITIES
-100.690083
-3.5297510
0
EDUCATION
-99.173285
-0.8902779
0
MANUFACTURING
-86.099585
-2.2453893
0
INFORMATION AND COMMUNICATION
-82.198780
-2.9866898
0
Not found
-68.232481
0.9112035
1
HUMAN HEALTH AND SOCIAL WORK ACTIVITIES
-49.055714
-1.7786666
0
ELECTRICITY, GAS, STEAM AND AIR CONDITIONING SUPPLY
-36.646155
-0.3160751
0
REAL ESTATE ACTIVITIES
-32.495631
-1.5397384
0
ACCOMMODATION AND FOOD SERVICE ACTIVITIES
-30.203839
-0.7020112
0
CONSTRUCTION
-20.556524
-0.9727247
0
OTHER SERVICE ACTIVITIES
-9.189807
-0.5184089
0
PROFESSIONAL, SCIENTIFIC AND TECHNICAL ACTIVITIES
-1.776135
1.3701111
1
FINANCIAL AND INSURANCE ACTIVITIES
11.183407
-0.7242390
1
WHOLESALE AND RETAIL TRADE; REPAIR OF MOTOR VEHICLES AND MOTORCYCLES
Here we look at Major subgroup occupations within sectors. We only consider the down to ‘Other services’, as the remaining sectors have small n for outsourced group. Note you can find larger images for these plots in outputs/figures/occupation_pay_plots.
The figures indicate there is variation between occupations within sectors in terms of whether outsourced people are paid less or more than non-outsourced workers.
4.2.3.3 Comparing pay penalty between weekly and hourly
Note only consider n >= 10
The table below shows the weekly and hourly pay difference between outsourced and non-outsourced workers by major group occupation. As before, negative values indicate pay penalties for outsourced workers, and the ‘pattern_reverse’ column indicates the occupations where the direction of the difference is different if you consider hourly versus weekly pay difference.
Weekly and hourly pay difference by major group occupations within Accommodation And Food Service Activities
majorsubgroup_occupation_labelled
weekly_pay_diff
hourly_pay_diff
pattern_reverse
Elementary Administration And Service Occupations
-43.529668
-0.7017468
0
Textiles, Printing And Other Skilled Trades
7.332828
-0.5894382
1
Weekly and hourly pay difference by major group occupations within Administrative And Support Service Activities
majorsubgroup_occupation_labelled
weekly_pay_diff
hourly_pay_diff
pattern_reverse
Customer Service Occupations
-102.66553
-3.2164373
0
Elementary Administration And Service Occupations
15.41439
-1.0311391
1
Administrative Occupations
43.30396
-0.2282938
1
Error in UseMethod("filter") :
no applicable method for 'filter' applied to an object of class "logical"
Weekly and hourly pay difference by major group occupations within Construction
majorsubgroup_occupation_labelled
weekly_pay_diff
hourly_pay_diff
pattern_reverse
Corporate Managers And Directors
-140.83839
-3.0843931
0
Administrative Occupations
-66.73665
-0.3469444
0
Skilled Construction And Building Trades
114.15752
1.0713412
0
Weekly and hourly pay difference by major group occupations within Education
majorsubgroup_occupation_labelled
weekly_pay_diff
hourly_pay_diff
pattern_reverse
Teaching And Other Educational Professionals
-63.69984
0.9922447
1
Caring Personal Service Occupations
-44.52325
-1.2978309
0
Elementary Administration And Service Occupations
62.16781
0.3060616
0
Weekly and hourly pay difference by major group occupations within Financial And Insurance Activities
majorsubgroup_occupation_labelled
weekly_pay_diff
hourly_pay_diff
pattern_reverse
Business, Media And Public Service Professionals
-74.01367
-3.4437168
0
Administrative Occupations
12.10298
0.3119162
0
Business And Public Service Associate Professionals
38.27790
0.1876232
0
Corporate Managers And Directors
83.87818
0.7076445
0
Weekly and hourly pay difference by major group occupations within Human Health And Social Work Activities
majorsubgroup_occupation_labelled
weekly_pay_diff
hourly_pay_diff
pattern_reverse
Corporate Managers And Directors
-121.63952
-4.2284955
0
Health Professionals
-82.89714
-3.7965198
0
Caring Personal Service Occupations
-29.46213
-0.4411910
0
Administrative Occupations
-16.11184
-0.3403748
0
Health And Social Care Associate Professionals
13.05492
-0.2742730
1
Weekly and hourly pay difference by major group occupations within Information And Communication
majorsubgroup_occupation_labelled
weekly_pay_diff
hourly_pay_diff
pattern_reverse
Science, Engineering And Technology Associate Professionals
-172.85859
-4.858679
0
Science, Research, Engineering And Technology Professionals
-63.61085
-2.228024
0
Corporate Managers And Directors
-46.09461
-5.531615
0
Weekly and hourly pay difference by major group occupations within Manufacturing
majorsubgroup_occupation_labelled
weekly_pay_diff
hourly_pay_diff
pattern_reverse
Corporate Managers And Directors
-224.324833
-3.950745
0
Process, Plant And Machine Operatives
-4.759046
-0.255132
0
Elementary Administration And Service Occupations
30.827500
-1.333051
1
Error in UseMethod("filter") :
no applicable method for 'filter' applied to an object of class "logical"
Weekly and hourly pay difference by major group occupations within Other Service Activities
majorsubgroup_occupation_labelled
weekly_pay_diff
hourly_pay_diff
pattern_reverse
Elementary Administration And Service Occupations
-34.19164
1.336877
1
Weekly and hourly pay difference by major group occupations within Professional, Scientific And Technical Activities
majorsubgroup_occupation_labelled
weekly_pay_diff
hourly_pay_diff
pattern_reverse
Business And Public Service Associate Professionals
-79.87644
-2.665625
0
Business, Media And Public Service Professionals
-70.48336
-2.998460
0
Science, Research, Engineering And Technology Professionals
184.43019
5.567409
0
Weekly and hourly pay difference by major group occupations within Public Administration And Defence; Compulsory Social Security
majorsubgroup_occupation_labelled
weekly_pay_diff
hourly_pay_diff
pattern_reverse
Administrative Occupations
-55.57135
-1.576596
0
Weekly and hourly pay difference by major group occupations within Transportation And Storage
majorsubgroup_occupation_labelled
weekly_pay_diff
hourly_pay_diff
pattern_reverse
Transport And Mobile Machine Drivers And Operatives
-153.62305
-4.2130255
0
Elementary Administration And Service Occupations
19.59351
0.8366258
0
Weekly and hourly pay difference by major group occupations within Wholesale And Retail Trade; Repair Of Motor Vehicles And Motorcycles
majorsubgroup_occupation_labelled
weekly_pay_diff
hourly_pay_diff
pattern_reverse
Corporate Managers And Directors
-97.84726
-1.7337137
0
Transport And Mobile Machine Drivers And Operatives
-21.37001
-1.0275407
0
Elementary Administration And Service Occupations
17.09860
0.2720542
0
Administrative Occupations
61.76549
0.7687880
0
Sales Occupations
85.88476
0.6364519
0
4.2.4 Major group occupations across all sectors
Note I only consider unit occupations where the the minimum n is >= 10.
Looking at occupations across all sectors, there are many occupations where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:50
Weekly pay penalty for major subgroup occupations across all sectors
majorsubgroup_occupation_labelled
pay_penalty
wtd_avg_income_not_outsourced
wtd_avg_income_outsourced
n_not_outsourced
n_outsourced
Protective Service Occupations
-186.115913
798.1466
612.0307
87
11
Science, Engineering And Technology Associate Professionals
-109.411519
694.4187
585.0072
168
38
Transport And Mobile Machine Drivers And Operatives
-107.609950
623.2422
515.6322
207
54
Leisure, Travel And Related Personal Service Occupations
-86.111295
440.7630
354.6517
109
25
Elementary Trades And Related Occupations
-81.066699
522.1490
441.0823
40
11
Health Professionals
-80.351915
701.1247
620.7728
303
62
Business, Media And Public Service Professionals
-79.019478
805.3639
726.3444
458
69
Teaching And Other Educational Professionals
-65.534711
665.9185
600.3837
364
42
Corporate Managers And Directors
-62.645479
781.4969
718.8514
600
123
Other Managers And Proprietors
-55.771413
610.1658
554.3944
162
29
Customer Service Occupations
-44.336562
506.9532
462.6166
185
35
Business And Public Service Associate Professionals
-43.246143
700.1621
656.9160
504
75
Secretarial And Related Occupations
-39.218379
477.0919
437.8735
146
17
Caring Personal Service Occupations
-23.753758
396.5326
372.7788
502
117
Process, Plant And Machine Operatives
-17.004071
546.9718
529.9677
154
34
Science, Research, Engineering And Technology Professionals
Looking at occupations across all sectors, there are many occupations where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:52
Hourly pay penalty for major subgroup occupations across all sectors
majorsubgroup_occupation_labelled
pay_penalty
wtd_avg_income_not_outsourced
wtd_avg_income_outsourced
n_not_outsourced
n_outsourced
Protective Service Occupations
-4.6610127
20.77082
16.10980
87
11
Science, Engineering And Technology Associate Professionals
-3.6935626
19.08387
15.39031
168
38
Health Professionals
-3.5288815
21.39635
17.86747
303
62
Transport And Mobile Machine Drivers And Operatives
-2.8911514
16.32247
13.43131
207
54
Business, Media And Public Service Professionals
-2.0749121
22.18071
20.10580
458
69
Business And Public Service Associate Professionals
-1.5629055
19.28122
17.71831
504
75
Secretarial And Related Occupations
-1.4253607
14.65362
13.22826
146
17
Leisure, Travel And Related Personal Service Occupations
-1.3425317
13.64332
12.30079
109
25
Other Managers And Proprietors
-1.2644888
17.16476
15.90027
162
29
Corporate Managers And Directors
-1.1326326
21.20963
20.07699
600
123
Customer Service Occupations
-0.7665494
15.11169
14.34514
185
35
Science, Research, Engineering And Technology Professionals
-0.6370135
22.48816
21.85114
397
82
Caring Personal Service Occupations
-0.5554487
12.79316
12.23771
502
117
Skilled Construction And Building Trades
-0.4931561
16.05214
15.55899
63
18
Elementary Administration And Service Occupations
-0.4652052
12.25335
11.78814
483
173
Elementary Trades And Related Occupations
-0.3578431
14.92769
14.56985
40
11
Process, Plant And Machine Operatives
-0.3423633
14.20301
13.86065
154
34
Textiles, Printing And Other Skilled Trades
-0.3203910
13.48948
13.16909
115
24
4.2.4.3 Comparing pay penalty between weekly and hourly
Note only consider n >= 10
The table below shows the weekly and hourly pay difference between outsourced and non-outsourced workers by major group occupation. As before, negative values indicate pay penalties for outsourced workers, and the ‘pattern_reverse’ column indicates the occupations where the direction of the difference is different if you consider hourly versus weekly pay difference.
Weekly and hourly pay difference by major sub group occupation
majorsubgroup_occupation_labelled
weekly_pay_diff
hourly_pay_diff
pattern_reverse
Protective Service Occupations
-186.115913
-4.6610127
0
Science, Engineering And Technology Associate Professionals
-109.411519
-3.6935626
0
Transport And Mobile Machine Drivers And Operatives
-107.609950
-2.8911514
0
Leisure, Travel And Related Personal Service Occupations
-86.111295
-1.3425317
0
Elementary Trades And Related Occupations
-81.066699
-0.3578431
0
Health Professionals
-80.351915
-3.5288815
0
Business, Media And Public Service Professionals
-79.019478
-2.0749121
0
Teaching And Other Educational Professionals
-65.534711
0.8216280
1
Corporate Managers And Directors
-62.645479
-1.1326326
0
Other Managers And Proprietors
-55.771413
-1.2644888
0
Customer Service Occupations
-44.336562
-0.7665494
0
Business And Public Service Associate Professionals
-43.246143
-1.5629055
0
Secretarial And Related Occupations
-39.218379
-1.4253607
0
Caring Personal Service Occupations
-23.753758
-0.5554487
0
Process, Plant And Machine Operatives
-17.004071
-0.3423633
0
Science, Research, Engineering And Technology Professionals
Looking at occupations across all sectors, there are many occupations where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:58
Weekly pay penalty for unit occupations across all sectors
unit_occupation_labelled
pay_penalty
wtd_avg_income_not_outsourced
wtd_avg_income_outsourced
n_not_outsourced
n_outsourced
Protective Service Occupations
-186.11591
798.1466
612.0307
87
11
Administrative Occupations: Government And Related Organisations
-173.25254
660.8699
487.6173
150
11
Information Technology Technicians
-172.50995
748.9725
576.4626
90
27
Elementary Administration Occupations
-125.57767
477.9704
352.3927
34
11
Functional Managers And Directors
-89.04844
820.2879
731.2395
385
88
Business, Research And Administrative Professionals
-88.82911
845.0837
756.2546
101
18
Teaching Professionals
-82.42140
675.1625
592.7411
293
41
Nursing Professionals
-70.58157
673.2751
602.6935
180
36
Sales, Marketing And Related Associate Professionals
Looking at occupations across all sectors, there are many occupations where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:60
Hourly pay penalty for unit occupations across all sectors
unit_occupation_labelled
pay_penalty
wtd_avg_income_not_outsourced
wtd_avg_income_outsourced
n_not_outsourced
n_outsourced
Information Technology Technicians
-5.3727754
21.00403
15.63125
90
27
Administrative Occupations: Government And Related Organisations
-5.2459770
19.14590
13.89992
150
11
Protective Service Occupations
-4.6610127
20.77082
16.10980
87
11
Nursing Professionals
-3.3125387
20.83875
17.52621
180
36
Elementary Administration Occupations
-3.1391398
14.09894
10.95980
34
11
Business, Research And Administrative Professionals
-2.7378800
22.77643
20.03855
101
18
Finance Professionals
-2.6390686
22.80200
20.16293
110
20
Business Associate Professionals
-2.5706545
19.48274
16.91209
125
21
Science, Engineering And Production Technicians
-2.4845163
17.31369
14.82918
76
11
Information Technology Professionals
-2.1256053
24.07395
21.94834
231
50
Shopkeepers And Sales Supervisors
-1.9766868
13.87657
11.89988
87
27
Functional Managers And Directors
-1.8781560
22.40838
20.53022
385
88
Sales, Marketing And Related Associate Professionals
-1.8203230
18.43185
16.61152
155
20
Finance Associate Professionals
-1.8146591
20.48350
18.66884
55
12
Secretarial And Related Occupations
-1.4253607
14.65362
13.22826
146
17
Teaching And Childcare Support Occupations
-1.2699956
12.33642
11.06642
156
28
Welfare And Housing Associate Professionals
-1.2294317
17.03022
15.80078
84
10
Other Health Professionals
-1.2132578
19.56315
18.34989
62
17
Elementary Cleaning Occupations
-0.9426106
12.07398
11.13137
113
60
Other Elementary Services Occupations
-0.7139322
11.27759
10.56366
144
39
Road Transport Drivers
-0.6822714
14.77702
14.09474
154
41
Construction And Building Trades
-0.6551832
16.55799
15.90281
48
16
Hr, Training And Other Vocational Associate Guidance Professionals
-0.5313910
20.16320
19.63181
115
11
Food Preparation And Hospitality Trades
-0.4007838
13.05801
12.65722
98
23
Caring Personal Services
-0.3815696
13.03798
12.65641
332
87
Customer Service Occupations
-0.0397066
14.68619
14.64648
163
29
4.2.6.3 Comparing pay penalty between weekly and hourly
Note only consider n >= 10
The table below shows the pay difference between outsourced and non-outsourced workers by minor sub group occupation. Negative values indicate pay penalties for outsourced workers. The ‘pattern_reverse’ column indicates the four occupations where the direction of the difference is different if you consider hourly versus weekly pay difference. For example, per week, teaching professionals who are outsourced earn £82 less than non-outsourced counterparts, but per hour they are paid on average 16p more than non-outsourced workers. This suggests that outsrouced rates are higher in this occupation, but the amount of work available is not enough for outsrouced people to earn more than non-outsoruced people on a weekly basis.
The reverse pattern is evident for the other three. For example, outsourced workers in food preparation and hospitality earn on average 40p less an hour than non-outsourced workers, but earn on average £17 more per week than non-outsourced workers. This suggests that outsourced workers in this occupation are paid less but work more hours than their non-outsourced counterparts.
Weekly and hourly pay difference by minor sub group occupation
unit_occupation_labelled
weekly_pay_diff
hourly_pay_diff
pattern_reverse
Protective Service Occupations
-186.115913
-4.6610127
0
Administrative Occupations: Government And Related Organisations
-173.252537
-5.2459770
0
Information Technology Technicians
-172.509948
-5.3727754
0
Elementary Administration Occupations
-125.577667
-3.1391398
0
Functional Managers And Directors
-89.048442
-1.8781560
0
Business, Research And Administrative Professionals
-88.829106
-2.7378800
0
Teaching Professionals
-82.421399
0.1691203
1
Nursing Professionals
-70.581567
-3.3125387
0
Sales, Marketing And Related Associate Professionals
-69.202110
-1.8203230
0
Business Associate Professionals
-65.932462
-2.5706545
0
Finance Professionals
-64.488797
-2.6390686
0
Information Technology Professionals
-62.398866
-2.1256053
0
Finance Associate Professionals
-54.418727
-1.8146591
0
Teaching And Childcare Support Occupations
-53.275748
-1.2699956
0
Shopkeepers And Sales Supervisors
-46.321162
-1.9766868
0
Science, Engineering And Production Technicians
-39.512943
-2.4845163
0
Secretarial And Related Occupations
-39.218379
-1.4253607
0
Welfare And Housing Associate Professionals
-38.627851
-1.2294317
0
Other Elementary Services Occupations
-38.032638
-0.7139322
0
Customer Service Occupations
-35.317533
-0.0397066
0
Other Health Professionals
-32.723952
-1.2132578
0
Road Transport Drivers
-26.766285
-0.6822714
0
Caring Personal Services
-25.121892
-0.3815696
0
Elementary Cleaning Occupations
-18.859410
-0.9426106
0
Hr, Training And Other Vocational Associate Guidance Professionals
2.135395
-0.5313910
1
Managers And Directors In Retail And Wholesale
3.901812
1.0154738
0
Elementary Storage Occupations
6.604673
0.3415404
0
Administrative Occupations: Finance
10.603485
3.5501683
0
Food Preparation And Hospitality Trades
17.138094
-0.4007838
1
Other Administrative Occupations
22.177338
1.0835737
0
Administrative Occupations: Office Managers And Supervisors
49.688329
1.5565714
0
Construction And Building Trades
51.831109
-0.6551832
1
Process Operatives
65.088732
1.5912972
0
Administrative Occupations: Records
80.002327
1.5821197
0
Electrical And Electronic Trades
86.173451
2.4032684
0
Sales Assistants And Retail Cashiers
96.203172
1.0828781
0
Engineering Professionals
104.921934
1.1020864
0
Elementary Security Occupations
228.853164
0.9008344
0
4.3 London has a disproportionate share of the UK’s outsourced workers, followed by the East and West Midlands
#regions
In London, around 25% of workers are outsourced – the highest proportion of any region in the UK. London is followed by the East Midlands (19%) and West Midlands (18%) in the share of workers in the region who are outsourced, with the East of England being the region with the lowest share of outsourced workers as part of the total employed workforce, at 13%.
Possible addition: Should this include some comment on WHY we think this might be the case? Should we look at sectoral splits in London, compared to everywhere else, to see whether there are significant sector differences that might explain this trend?
The plot below shows the proportion of workers within each region who are outsourced.61
Below we map the workforce composition in each region. The first map emphasises that London has the highest concentration of outsourced workers (25%).
The second map excludes London so that is easier to see how the remaining regions compare. After London, the regions with the highest proportion of outsourced workers are:
East Midlands (19%)
West Midlands (18%)
Wales (18%)
North West (17%)
Northern Ireland (16%)
We can also explore how the the entire UK workforce is distributed across the country.62 The table and map below show the percentage of outsourced workers in each region as a proportion of the total UK workforce. They show where the UK’s outsourced workforce is concentrated. The regions with the highest share of the UK’s outsourced workforce are:
---title: "Key findings - matched to report"author: - Jolyon Miles-Wilson - Celestin Okorojidate: "`r format(Sys.time(), '%e %B %Y')`"format: html: self-contained: true code-fold: true code-tools: true code-summary: "Code for Nerds" toc: true toc-depth: 5execute: echo: false warning: falsenumber-sections: true---```{r packages}library(haven)library(poLCA)library(Hmisc)library(dplyr)library(ggplot2)library(tidyr)library(skimr)library(kableExtra)#library(MASS)library(wesanderson)library(ggrepel)library(here)library(emmeans)#library(devtools)#install_version("sjstats", version = "0.18.2")library(sjstats)library(readr)library(sjPlot)library(nnet)``````{r palette}rm(list = ls())options(scipen = 999)colours <- wes_palette("GrandBudapest2",4,"discrete")better_colours <- c('#8dd3c7','#bebada','#fb8072','#80b1d3','#fdb462')many_colours <- c('#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99','#e31a1c','#fdbf6f','#ff7f00','#cab2d6','#6a3d9a','#ffff99','#b15928','#8dd3c7','#ffffb3','#bebada','#fb8072','#80b1d3','#fdb462','#b3de69','#fccde5','#d9d9d9','#bc80bd','#ccebc5','#ffed6f')``````{r functions}extract_glm_coefs <- function(mod, only_sig=F, decimal_places = 3){ coefs <- coef(summary(mod)) if(only_sig==T){ coefs <- coefs[which(coefs[,4] < .05),] } coefs <- as_tibble(coefs, rownames="variable") %>% # specify new variable to add rownames to mutate( or = round(exp(Estimate), decimal_places), .after=Estimate )}extract_lm_coefs <- function(mod, only_sig = F){ coefs <- coef(summary(mod)) if(only_sig==T){ coefs <- coefs[which(coefs[,4] < .05),] } coefs <- as_tibble(coefs, rownames="variable") # specify new variable to add rownames to }``````{r data, output=FALSE}data <- readRDS("../Data/2025-04-07 - Cleaned_data.rds")# Specify data to be used in income analysisincome_data <- filter(data, income_drop_all==0)```# Ethnicity categorisationsFor reference, the table below provides a disambiguation of how ethnicities have been grouped in this analysis.For analyses using the disaggregated (survey) categories, the reference category is "English / Welsh / Scottish / Northern Irish / British".For analyses using the aggregated categories, the reference category is "White British"```{r}ethnicity_cat <- data %>% dplyr::select(contains("ethnicity")) %>%distinct() %>%arrange(Ethnicity) %>% dplyr::select(-c(1:2,Ethnicity_collapsed_disaggregated))ethn_colnames =c("Ethnicity: Survey","Ethnicity: Aggregated","Ethnicity: Binary")ethnicity_cat %>%kable(col.names = ethn_colnames) %>%kable_styling(full_width=FALSE)```# Chapter 2: How many outsourced workers are there in the UK?## How many UK workers are outsourced?::: {.callout-tip title="#how-many"}- Around 1 in 6 UK workers meet our definition of an outsourced worker- The 'outsourced sub-group' is the most dominant of the three sub-groups - meaning the total group is predominantly made up of people who self-identify as an outsourced worker and they say they are hired to do work that is long-term or ongoing. People included in this sub-group (either uniquely, or while also meeting the criteria for at least one of the other sub-groups) make up around 67% (check) of our total outsourced group, or nearly 7 in 10. This group makes up X of all UK workers.:::```{r sum-outsourced}total_outsourced <- data %>% group_by(outsourcing_status) %>% summarise( Sum = sum(NatRepemployees), n = n() ) %>% mutate( Proportion = Sum / sum(Sum), Percentage = 100 * Proportion, N = sum(n) )readr::write_csv(total_outsourced, file="../outputs/data/total_outsourced.csv")# Create function to find nearest denominator to express as a fraction.f <- function(x) ifelse(abs(1/floor(1/x) - x) < abs(1/ceiling(1/x) - x),floor(1/x),ceiling(1/x))```**1 in `r f(total_outsourced$Proportion[which(total_outsourced$outsourcing_status=="Outsourced")])` (`r round(total_outsourced$Percentage[which(total_outsourced$outsourcing_status=="Outsourced")], 0)`%) of UK workers are outsourced.**[^1][^1]: [outputs/data/total_outsourced.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/total_outsourced.csv)```{r sum-outsourcing-group}total_outsourced_group <- data %>% group_by(outsourcing_group) %>% summarise( Sum = sum(NatRepemployees), n = n(), ) %>% mutate( Proportion = Sum / sum(Sum), Percentage = 100 * Proportion, N = sum(n) )readr::write_csv(total_outsourced_group, file="../outputs/data/total_outsourced_2.csv")```In terms of the the different possible types of outsourced groups[^2], the numbers are as follows:[^2]: [outputs/data/total_outsourced_2.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/total_outsourced_2.csv)1. Definitely outsourced: `r round(total_outsourced_group$Percentage[which(total_outsourced_group$outsourcing_group=="Outsourced")], 0)`%2. Likely agency: `r round(total_outsourced_group$Percentage[which(total_outsourced_group$outsourcing_group=="Likely agency")], 0)`%3. High indicators: `r round(total_outsourced_group$Percentage[which(total_outsourced_group$outsourcing_group=="High indicators")], 0)`%```{r}breakdown <- data %>%filter(outsourcing_status=="Outsourced") %>%group_by(outsourcing_group) %>%summarise(freq =sum(NatRepemployees),n =n() ) %>%mutate(total =sum(freq),percentage =100* (freq/total),N =sum(n) )breakdown2 <- data %>%group_by(outsourcing_group) %>%summarise(freq =sum(NatRepemployees),n =n() ) %>%mutate(total =sum(freq),percentage =100* (freq/total),N =sum(n) )```People included in this sub-group (either uniquely, or while also meeting the criteria for at least one of the other sub-groups) make up around `r round(breakdown[which(breakdown$outsourcing_group=="Outsourced"),"percentage"],0)`% of our total outsourced group. This group makes up `r round(breakdown2[which(breakdown2$outsourcing_group=="Outsourced"),"percentage"],0)`% of all UK workers.:::{.callout-tip title='#non-exclusive-subgroups1'}- The two other sub-groups – the agency and indicators sub-groups – are less dominant in comparison. Around 58% of all respondents meet the criteria for either or both of these sub-groups, but this falls to around 33% if we exclude people who are already captured in the outsourced sub-group. Excluding the first sub-group, these other two groups makes up X of all UK workers.**The percentages here refer to the number of people who are outsourced (super-ordinate group), not the total number of respondents.** Below I provide percentages as function of the outsourced super-ordinate group as well as the total sample:::Group criteria- **Outsourced**, defined as responding 'I am sure I am outsourced' or 'I might be outsourced', and responding 'I do work on a long-term basis'.- **Likely agency**, defined as those responding 'I am sure I am agency' and 'I do work on a long-term basis', **excluding** those people who are already defined as being outsourced.- **High indicators**: defined as responding TRUE to 5 or 6 of the outsourcing indicators, as well as responding 'I do work on a long-term basis', **excluding** those people who are already defined as outsourced or likely agency.```{r}# non mutually exclusivegroups_non_excl <- data %>%mutate(# SURE outsourced or MIGHT BE outsourced + LONGTERMoutsourced =ifelse((Q3v3a ==1& Q2 ==1) | (Q3v3a ==2& Q2 ==1), 1, 0),# NOT outsourced, SURE agency, and LONG-TERMlikely_agency =ifelse(Q2 ==1& (Q3v3b ==1| Q3v3c ==1| Q3v3d ==1), 1, 0),likely_agency =ifelse(is.na(likely_agency), 0, likely_agency),# NOT outsourced, NOT likely agency, 5 or more indicators, & LONGTERMhigh_indicators =ifelse((Q2 ==1& sum_true >=5), 1, 0) )either <- groups_non_excl %>%mutate(agency_or_indicator =case_when((likely_agency ==1& high_indicators ==0) ~"agency", (likely_agency ==0& high_indicators ==1) ~"indicator", (likely_agency ==1& high_indicators ==1) ~"both", (likely_agency ==0& high_indicators ==0) ~"neither",TRUE~NA) ) %>%group_by(agency_or_indicator) %>%summarise(freq =sum(NatRepemployees),n =n () ) %>%mutate(total =sum(freq),perc =100* (freq/total),N =sum(n) )either_perc <- either %>%filter(agency_or_indicator !="neither") %>%summarise(round(sum(perc),2) # perc or weighted perc? ) %>%pull()either_excl_outsourced <- groups_non_excl %>%filter(outsourced==0) %>%mutate(agency_or_indicator =case_when((likely_agency ==1& high_indicators ==0) ~"agency", (likely_agency ==0& high_indicators ==1) ~"indicator", (likely_agency ==1& high_indicators ==1) ~"both", (likely_agency ==0& high_indicators ==0) ~"neither",TRUE~NA) ) %>%group_by(agency_or_indicator) %>%summarise(freq =sum(NatRepemployees),n =n () ) %>%mutate(total =sum(freq),perc =100* (freq/total),N =sum(n) )either_excl_perc <- either_excl_outsourced %>%filter(agency_or_indicator !="neither") %>%summarise(round(sum(perc),2) ) %>%pull()either %>%kable(caption ="Including outsourced group") %>%kable_styling(full_width = F)either_excl_outsourced %>%kable(caption ="Exluding outsourced group") %>%kable_styling(full_width = F)````r either_perc`% of the whole sample meet the criteria for either or both of these sub-groups. This falls to `r either_excl_perc`% if we exclude people who are already captured in the outsourced sub-group.```{r}# same as above but now only among those who are outsourcedgroups_non_excl <- data %>%filter(outsourcing_status=="Outsourced") %>%mutate(# SURE outsourced or MIGHT BE outsourced + LONGTERMoutsourced =ifelse((Q3v3a ==1& Q2 ==1) | (Q3v3a ==2& Q2 ==1), 1, 0),# NOT outsourced, SURE agency, and LONG-TERMlikely_agency =ifelse(Q2 ==1& (Q3v3b ==1| Q3v3c ==1| Q3v3d ==1), 1, 0),likely_agency =ifelse(is.na(likely_agency), 0, likely_agency),# NOT outsourced, NOT likely agency, 5 or more indicators, & LONGTERMhigh_indicators =ifelse((Q2 ==1& sum_true >=5), 1, 0) )either <- groups_non_excl %>%mutate(agency_or_indicator =case_when((likely_agency ==1& high_indicators ==0) ~"agency", (likely_agency ==0& high_indicators ==1) ~"indicator", (likely_agency ==1& high_indicators ==1) ~"both", (likely_agency ==0& high_indicators ==0) ~"neither",TRUE~NA) ) %>%group_by(agency_or_indicator) %>%summarise(freq =sum(NatRepemployees),n =n () ) %>%mutate(total =sum(freq),perc =100* (freq/total),N =sum(n) )either_perc <- either %>%filter(agency_or_indicator !="neither") %>%summarise(round(sum(perc),2) ) %>%pull()either_excl_outsourced <- groups_non_excl %>%filter(outsourced==0) %>%mutate(agency_or_indicator =case_when((likely_agency ==1& high_indicators ==0) ~"agency", (likely_agency ==0& high_indicators ==1) ~"indicator", (likely_agency ==1& high_indicators ==1) ~"both", (likely_agency ==0& high_indicators ==0) ~"neither",TRUE~NA) ) %>%group_by(agency_or_indicator) %>%summarise(freq =sum(NatRepemployees),n =n () ) %>%mutate(total =sum(freq),perc =100* (freq/total),N =sum(n) )either_excl_perc <- either_excl_outsourced %>%filter(agency_or_indicator !="neither") %>%summarise(round(sum(perc),2) ) %>%pull()n_outsourced <- total_outsourced[which(total_outsourced$outsourcing_status=="Outsourced"), "n"] %>%pull()either_incl_perc <- either %>%filter(agency_or_indicator !="neither") %>%summarise(round(100* (sum(n) / n_outsourced),2) ) %>%pull()either_excl_perc <- either_excl_outsourced %>%filter(agency_or_indicator !="neither") %>%summarise(round(100* (sum(n) / n_outsourced),2) ) %>%pull()```Out of those who are in the 'outsourced' status (i.e., the combination of the three outsourced groups), `r either_incl_perc`% meet the criteria for either or both of these sub-groups, but this falls to around `r either_excl_perc`% if we exclude people who are already captured in the outsourced sub-group.:::{.callout-tip title="#non-exclusive-subgroups2"}- There is some overlap between these sub-groups, but they are not like for like. Just over a quarter (27%) of respondents are in more than one sub-group, while nearly three quarters (73%) of respondents are uniquely captured in just one of the three sub-groups.:::```{r}groups_count <- data %>%filter(outsourcing_status=="Outsourced") %>%mutate(# SURE outsourced or MIGHT BE outsourced + LONGTERMoutsourced =ifelse((Q3v3a ==1& Q2 ==1) | (Q3v3a ==2& Q2 ==1), 1, 0),# NOT outsourced, SURE agency, and LONG-TERMlikely_agency =ifelse(Q2 ==1& (Q3v3b ==1| Q3v3c ==1| Q3v3d ==1), 1, 0),likely_agency =ifelse(is.na(likely_agency), 0, likely_agency),# NOT outsourced, NOT likely agency, 5 or more indicators, & LONGTERMhigh_indicators =ifelse((Q2 ==1& sum_true >=5), 1, 0),number_of_groups =rowSums(across(c(outsourced,likely_agency,high_indicators))) ) %>%group_by(number_of_groups) %>%summarise(total =sum(NatRepemployees),n =n() ) %>%mutate(wtd_percentage =100* (n/sum(n)),percentage =100* (total /sum(total)) )write_csv(groups_count, file="../outputs/data/number_of_groups.csv")```Just over a quarter (`r round(groups_count[which(groups_count$number_of_groups==2),"percentage"] + groups_count[which(groups_count$number_of_groups==3),"percentage"],2)`%) of respondents are in more than one sub-group, while nearly three quarters (`r round(groups_count[which(groups_count$number_of_groups==1),"percentage"],2)`%) of respondents are uniquely captured in just one of the three sub-groups.^[[outputs/data/number_of_groups.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/number_of_groups.csv)]## Evaluating our total estimate::: {.callout-important title="#evaluating-total-estimate To do"}- Around 1 in 4 "outsourced" respondents sit in more than one sub-group within our definition, but around 3 in 4 are uniquely captured in just one of the three sub-groups - predominantly in the outsourced sub-group.- As figure X shows, not all respondents in the outsourced sub-group said yes five or six of our six outsourcing:::# Chapter 3: Who are the UK’s outsourced workers?## Demographic breakdown {#sec-demographic-breakdown}Demographic variables:- Categorical - [x] Gender - [x] Ethnicity- Numeric - [x] Age - in age section: @sec-ageWe want them broken down by - outsourcing status - high low pay- outsourcing group - high low pay### Ethnicity by outsourcing status```{r}# pollster# crosstab(df = data, x = outsourcing_status, y = Ethnicity_collapsed, weight = NatRepemployees) %>%# kable()# # # base r# tab <- as.data.frame(xtabs(NatRepemployees ~ outsourcing_status + Ethnicity_collapsed, data=data))# test <- xtabs(NatRepemployees ~ outsourcing_status + income_group + Ethnicity_collapsed, data=data)# prop.table(test)# # percent_row <- 100 * prop.table(test, margin = 1)# test2 <- as.data.frame(percent_row)# # test2 %>%# filter(outsourcing_status=="Outsourced") %>%# summarise(sum(Freq))```#### Collapsed ethnicity^[[outputs/data/status_by_ethnicity.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/status_by_ethnicity.csv)]```{r}tab <- data %>%group_by(outsourcing_status, Ethnicity_collapsed) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum),Ethnicity_short = Ethnicity_collapsed )write_csv(tab, file="../outputs/data/status_by_ethnicity.csv")tab %>%pivot_wider(id_cols = outsourcing_status,names_from = Ethnicity_collapsed, values_from = Percentage ) %>%kable(caption ="Ethnicity by outsourcing status (%)",digits =2) %>%kable_styling(full_width = F)```#### Full ethnicity^[[outputs/data/status_by_ethnicity_full.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/status_by_ethnicity_full.csv)]```{r}tab <- data %>%group_by(outsourcing_status, Ethnicity_labelled) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) )write_csv(tab, file="../outputs/data/status_by_ethnicity_full.csv")tab %>%pivot_wider(id_cols = outsourcing_status,names_from = Ethnicity_labelled, values_from = Percentage ) %>%kable(caption ="Ethnicity by outsourcing status (%)",digits =2) %>%kable_styling(full_width = F)```#### By high/low pay##### Collapsed ethnicity^[[outputs/data/status_by_ethnicity_income_group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/status_by_ethnicity_income_group.csv)]```{r}tab <- income_data %>%filter(!is.na(income_group)) %>%group_by(outsourcing_status, income_group, Ethnicity_collapsed) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum),Ethnicity_short = Ethnicity_collapsed )write_csv(tab, file="../outputs/data/status_by_ethnicity_income_group.csv")tab %>%pivot_wider(id_cols =c(outsourcing_status,income_group),names_from = Ethnicity_collapsed, values_from = Percentage )%>%kable(caption ="Ethnicity by outsourcing status and income group(%)",digits =2) %>%kable_styling(full_width = F)```##### Full ethnicity^[[outputs/data/status_by_ethnicity_full_income_group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/status_by_ethnicity_full_income_group.csv)]```{r}tab <- income_data %>%filter(!is.na(income_group)) %>%group_by(outsourcing_status, income_group, Ethnicity_labelled) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum),Ethnicity_short = Ethnicity_labelled )write_csv(tab, file="../outputs/data/status_by_ethnicity_full_income_group.csv")tab %>%pivot_wider(id_cols =c(outsourcing_status,income_group),names_from = Ethnicity_labelled, values_from = Percentage )%>%kable(caption ="Ethnicity by outsourcing status and income group(%)",digits =2) %>%kable_styling(full_width = F)```### Ethnicity by oustourcing group#### Collapsed ethnicity^[[outputs/data/group_by_ethnicity.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/group_by_ethnicity.csv)]```{r}tab <- data %>%group_by(outsourcing_group, Ethnicity_collapsed) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum),Ethnicity_short = Ethnicity_collapsed ) write_csv(tab, file="../outputs/data/group_by_ethnicity.csv")tab %>%pivot_wider(id_cols = outsourcing_group,names_from = Ethnicity_collapsed, values_from = Percentage )%>%kable(caption ="Ethnicity by outsourcing group (%)",digits =2) %>%kable_styling(full_width = F)```#### Full ethnicity^[[outputs/data/group_by_ethnicity_full.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/group_by_ethnicity_full.csv)]```{r}tab <- data %>%group_by(outsourcing_group, Ethnicity_labelled) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum),Ethnicity_short = Ethnicity_labelled ) write_csv(tab, file="../outputs/data/group_by_ethnicity_full.csv")tab %>%pivot_wider(id_cols = outsourcing_group,names_from = Ethnicity_labelled, values_from = Percentage )%>%kable(caption ="Ethnicity by outsourcing group (%)",digits =2) %>%kable_styling(full_width = F)```#### By high/low pay##### Collapsed ethnicity^[[outputs/data/group_by_ethnicity_income_group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/group_by_ethnicity_income_group.csv)]```{r}tab <- income_data %>%filter(!is.na(income_group)) %>%group_by(outsourcing_group, income_group, Ethnicity_collapsed) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum),Ethnicity_short = Ethnicity_collapsed ) write_csv(tab, file="../outputs/data/group_by_ethnicity_income_group.csv")tab %>%pivot_wider(id_cols =c(outsourcing_group,income_group),names_from = Ethnicity_collapsed, values_from = Percentage )%>%kable(caption ="Ethnicity by outsourcing status and income group(%)",digits =2) %>%kable_styling(full_width = F)```##### Full ethnicity^[[outputs/data/group_by_ethnicity_full_income_group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/group_by_ethnicity_full_income_group.csv)]```{r}tab <- income_data %>%filter(!is.na(income_group)) %>%group_by(outsourcing_group, income_group, Ethnicity_labelled) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum),Ethnicity_short = Ethnicity_labelled ) write_csv(tab, file="../outputs/data/group_by_ethnicity_full_income_group.csv")tab %>%pivot_wider(id_cols =c(outsourcing_group,income_group),names_from = Ethnicity_labelled, values_from = Percentage )%>%kable(caption ="Ethnicity by outsourcing status and income group(%)",digits =2) %>%kable_styling(full_width = F)```### Gender by outsourcing status^[[outputs/data/status_by_gender.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/status_by_gender.csv)]```{r}tab <- data %>%group_by(outsourcing_status, Gender) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) ) write_csv(tab, file="../outputs/data/status_by_gender.csv")tab %>%pivot_wider(id_cols = outsourcing_status,names_from = Gender, values_from = Percentage )%>%kable(caption ="Gender by outsourcing status (%)",digits =2) %>%kable_styling(full_width = F)```#### By high/low pay^[[outputs/data/status_by_gender_income_group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/status_by_gender_income_group.csv)]```{r}tab <- income_data %>%filter(!is.na(income_group)) %>%group_by(outsourcing_status, income_group, Gender) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) ) write_csv(tab, file="../outputs/data/status_by_gender_income_group.csv")tab %>%pivot_wider(id_cols =c(outsourcing_status,income_group),names_from = Gender, values_from = Percentage )%>%kable(caption ="Gender by outsourcing status and income group(%)",digits =2) %>%kable_styling(full_width = F)```### Gender by outsourcing group^[[outputs/data/group_by_gender.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/group_by_gender.csv)]```{r}tab <- data %>%group_by(outsourcing_group, Gender) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) ) write_csv(tab, file="../outputs/data/group_by_gender.csv")tab %>%pivot_wider(id_cols = outsourcing_group,names_from = Gender, values_from = Percentage )%>%kable(caption ="Gender by outsourcing group (%)",digits =2) %>%kable_styling(full_width = F)```#### By high/low pay^[[outputs/data/group_by_gender_income_group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/group_by_gender_income_group.csv)]```{r}tab <- income_data %>%filter(!is.na(income_group)) %>%group_by(outsourcing_group, income_group, Gender) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) ) write_csv(tab, file="../outputs/data/group_by_gender_income_group.csv")tab %>%pivot_wider(id_cols =c(outsourcing_group,income_group),names_from = Gender, values_from = Percentage )%>%kable(caption ="Gender by outsourcing group and income group(%)",digits =2) %>%kable_styling(full_width = F)```## Evidence paints a racialised picture of outsourcing in the UK, with links to both ethnicity and migration::: {.callout-tip title="#ethnicity"}- More than 1 in 4 (nearly 1/3) outsourced workers are from an ethnic minority background- Workers from ethnic minority backgrounds are disproportionately over-represented in outsourced work in the UK, and typically more likely to be outsourced than White British workers.- Overall, 22% of non-outsourced workers are from an ethnic minority background, rising to 33% of outsourced workers – a more than ten percentage point difference. This means that while just over 1 in 6 non-outsourced workers in our sample were from an ethnic minority background, nearly 1 in 3 outsourced workers were.- People from an ethnic minority background are overall 1.75 times more likely to be outsourced than people from a White British background.- Workers from Arab backgrounds are 3.86 times more likely than White workers to be outsourced; (check sample size – are we confident in all of these significance tests, or should we just use some of them in these bullet points?)- Workers from Black backgrounds are 2.33 times more likely than White workers to be outsourced.- Workers from Asian backgrounds are 1.98 times more likely than White workers to be outsourced- Workers from Mixed Ethnicity backgrounds are 1.86 times more likely than White workers to be outsourced- White other workers are 1.32 times more likely than White British workers to be outsourced:::```{r ethnicity-counts}ethnicity_statistics <- data %>% group_by(outsourcing_status, Ethnicity_collapsed) %>% summarise( n = n(), # count cases Frequency = sum(NatRepemployees) # count weighted cases ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum), Ethnicity_short = Ethnicity_collapsed ) %>% separate_wider_delim(Ethnicity_short, names = c("Ethnicity_short", "Ethnicity detail"), delim = stringr::regex(" / |, "), # use multiple delims too_few = "align_start", too_many = "merge")readr::write_csv(ethnicity_statistics, file = "../outputs/data/ethnicity_stats_1.csv")``````{r ethnicity_binary_inferential, output=FALSE}ethnicities <- as.vector(unique(data$Ethnicity_collapsed))non_white_ethnicities <- ethnicities[!(ethnicities %in% "White British")]# Will throw NA warning. I think this OK but investigate how to avoid the problemdata <- data %>% mutate( Ethnicity_binary = forcats::fct_collapse(Ethnicity_collapsed, "White British" = c("White British"), "Non-White British" = non_white_ethnicities) )mod <- glm(outsourcing_status ~ Ethnicity_binary, data, weights = NatRepemployees, family="quasibinomial")# mod <- glm(Ethnicity_binary~outsourcing_status , data, weights = NatRepemployees, family="quasibinomial")# summary(mod)coefs <- extract_glm_coefs(mod)write_csv(coefs, file = "../outputs/data/ethnicity_binary_o-status_inferential_tab.csv")```People from an ethnic minority are `r round(coefs[2, 'or'],2)` times more likely to be outsourced than people from a White British background; `r round(100 - ethnicity_statistics[which(ethnicity_statistics$outsourcing_status == "Outsourced" & ethnicity_statistics$Ethnicity_collapsed == "White British"), "Percentage"],2)`% of outsourced workers are from an ethnic minority, compared to `r round(100 - ethnicity_statistics[which(ethnicity_statistics$outsourcing_status == "Not outsourced" & ethnicity_statistics$Ethnicity_collapsed == "White British"), "Percentage"],2)`% of non-outsourced workers.[^3][^3]: [outputs/data/ethnicity_stats_1.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_stats_1.csv) & [outputs/data/ethnicity_binary_o-status_inferential_tab.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_binary_o-status_inferential_tab.csv)```{r ethnicity-plot}data %>% group_by(outsourcing_status, Ethnicity_binary) %>% summarise( n = n(), # count cases Frequency = sum(NatRepemployees) # count weighted cases ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum) ) %>% ggplot(., aes(outsourcing_status, Percentage, fill = Ethnicity_binary)) + geom_col(colour="black") + annotate("text", x = ethnicity_statistics$outsourcing_status, y = 99, label = paste0("N = ",ethnicity_statistics$N), hjust=1) + coord_flip() + scale_fill_manual(values = many_colours, name = "Ethnicity") + xlab("Outsourcing group") + theme_minimal()``````{r}#| output: false#| warning: false#| message: falsemod_2 <-glm(income_group ~ Ethnicity_collapsed * outsourcing_status, income_data, family="quasibinomial", weights = NatRepemployees)summary(mod_2)mod_3 <-glm(income_group ~ Ethnicity_binary * outsourcing_status, income_data, family="quasibinomial", weights = NatRepemployees)summary(mod_3)coefs2 <-extract_glm_coefs(mod_2)write_csv(coefs2, file ="../outputs/data/ethnicity_collapsed_income_group_inferential.csv")coefs3 <-extract_glm_coefs(mod_3)write_csv(coefs3, file ="../outputs/data/ethnicity_binary_income_group_inferential.csv")coefs <-extract_glm_coefs(mod_2, only_sig=T)ems <-emmeans(mod_2, specs ="outsourcing_status", by ="Ethnicity_collapsed")cons <-summary(contrast(ems, "pairwise",adjust="tukey"))sig_cons <- cons %>%filter(p.value < .05) %>%mutate(or =1/exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) )sjPlot::plot_model(mod_2, type ="pred", legend.title="", terms =c("Ethnicity_collapsed","outsourcing_status"), dodge=0.5) +coord_flip() +xlab("") +theme_minimal()ems_2 <-emmeans(mod_2, specs ="Ethnicity_collapsed", by ="outsourcing_status")cons <-summary(contrast(ems_2, "pairwise",adjust="tukey"))sig_cons_2 <- cons %>%filter(p.value < .05) %>%mutate(or =1/exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) )sjPlot::plot_model(mod_2, type ="pred", legend.title="", terms =c("outsourcing_status","Ethnicity_collapsed"), dodge=0.5) +coord_flip() +xlab("") +theme_minimal()# other_coef <- extract_glm_coefs(mod_2,, only_sig = T)[6,"Estimate"] %>%# exp(.) %>%# round(.,2) %>%# pull()wb <- sig_cons %>%filter(Ethnicity_collapsed =="White British") %>%pull(or) %>%round(2)mix <- sig_cons %>%filter(Ethnicity_collapsed =="Mixed/Multiple ethnic group") %>%pull(or) %>%round(2)```Overall, there is no interaction between being from a minority and outsourced on whether you are low paid. i.e., being from an ethnic minority and outsourced is not associated with being in the low pay group.^[[outputs/data/ethnicity_binary_income_group_inferential.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/outputs/data/ethnicity_binary_income_group_inferential.csv)]However there is nuance in the groups. We do find evidence to suggest that among White British people, outsourced people are `r wb` times more likely to be in the low income group compared to non-outsourced people, and among Mixed ethnicity people, outsourced people are `r mix` times more likely to be in the low income group compared to non-outsourced people.^[[outputs/data/ethnicity_collapsed_income_group_inferential.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/outputs/data/ethnicity_collapsed_income_group_inferential.csv)]```{r}sjPlot::plot_model(mod_2, type ="pred", legend.title="", terms =c("Ethnicity_collapsed","outsourcing_status"), dodge=0.5) +coord_flip() +xlab("") +theme_minimal()``````{r}#| output: false#| warning: false#| message: falsemod_2 <-glm(income_group ~ Ethnicity_collapsed_disaggregated * outsourcing_status, income_data, family="quasibinomial", weights = NatRepemployees)summary(mod_2)coefs <-extract_glm_coefs(mod_2, only_sig=T)coefs2 <-extract_glm_coefs(mod_2)write_csv(coefs2, file ="../outputs/data/ethnicity_collapsed_disaggregated_income_group_inferential.csv")ems <-emmeans(mod_2, specs ="outsourcing_status", by ="Ethnicity_collapsed_disaggregated")cons <-summary(contrast(ems, "pairwise",adjust="tukey"))sig_cons <- cons %>%filter(p.value < .05) %>%mutate(or =1/exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) )sjPlot::plot_model(mod_2, type ="pred", legend.title="", terms =c("Ethnicity_collapsed_disaggregated","outsourcing_status"), dodge=0.5) +coord_flip() +xlab("") +theme_minimal()ems_2 <-emmeans(mod_2, specs ="Ethnicity_collapsed_disaggregated", by ="outsourcing_status")cons <-summary(contrast(ems_2, "pairwise",adjust="tukey"))sig_cons_2 <- cons %>%filter(p.value < .05) %>%mutate(or =1/exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) )sjPlot::plot_model(mod_2, type ="pred", legend.title="", terms =c("outsourcing_status","Ethnicity_collapsed_disaggregated"), dodge=0.5) +coord_flip() +xlab("") +theme_minimal() +theme(legend.position ="none" )ew <- sig_cons %>%filter(Ethnicity_collapsed_disaggregated =="English / Welsh / Scottish / Northern Irish / British") %>%pull(or) %>%round(2)wa <- sig_cons %>%filter(Ethnicity_collapsed_disaggregated =="White and Asian") %>%pull(or) %>%round(2)```Looking at this with disaggregated ethnicities indicates that among “English / Welsh / Scottish / Northern Irish / British” workers, outsourced people are `r ew` times more likely to be in the low income group compared to non-outsourced people. Among “White and Asian” workers, outsourced workers are `r wa` times more likely to be in the low income group compared to non-outsourced workers.^[[outputs/data/ethnicity_collapsed_disaggregated_income_group_inferential.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/outputs/data/ethnicity_collapsed_disaggregated_income_group_inferential.csv)]```{r}sjPlot::plot_model(mod_2, type ="pred", legend.title="", terms =c("Ethnicity_collapsed_disaggregated","outsourcing_status"), dodge=0.5) +coord_flip() +xlab("") +theme_minimal()``````{r}tab_split <- income_data %>%filter(!is.na(income_group)) %>%group_by(outsourcing_status, income_group, Ethnicity_binary) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) ) tab_split %>%pivot_wider(id_cols =c(outsourcing_status,income_group),names_from = Ethnicity_binary, values_from = Percentage )%>%kable(caption ="Ethnicity (binary) by outsourcing status and income group(%)",digits =2) %>%kable_styling(full_width = F)write_csv(tab_split, "../outputs/data/ethnicity_binary_income_group.csv")tab_split %>%ggplot(., aes(outsourcing_status, Percentage, fill = Ethnicity_binary)) +facet_grid(rows=vars(income_group)) +geom_col(colour="black") +coord_flip() +scale_fill_manual(values = many_colours, name ="Ethnicity") +xlab("") +theme_minimal()``````{r ethnicity-interential-status}mod <- glm(outsourcing_status ~ Ethnicity_collapsed, data, weights = NatRepemployees, family = "quasibinomial")# summary(mod)coef_table <- extract_glm_coefs(mod) %>% mutate(across(where(is.numeric), ~round(.x,2)))rownames(coef_table) <- coef_table$variablesig_coefs <- extract_glm_coefs(mod, only_sig = T)# set rownames so we can indexrownames(sig_coefs) <- sig_coefs$variable# get labels for pipingethnicity_keys <- sig_coefs$variableethnicity_labs <- sub(".*collapsed","",ethnicity_keys)write_csv(coef_table, file="../outputs/data/ethnicity_model_inferential.csv")```Comparison of ethnicities indicates that some groups are statistically more likely to be outsourced than others[^4]:[^4]: [outputs/data/ethnicity_model_inferential.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_model_inferential.csv)- `r ethnicity_labs[2]` workers are `r sig_coefs[ethnicity_keys[2], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[3]` workers are `r sig_coefs[ethnicity_keys[3], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[4]` workers are `r sig_coefs[ethnicity_keys[4], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[5]` workers are `r sig_coefs[ethnicity_keys[5], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[6]` workers are `r sig_coefs[ethnicity_keys[6], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[7]` workers are `r sig_coefs[ethnicity_keys[7], "or"]` times more likely than White British workers to be outsourced.```{r}mod <-glm(outsourcing_status ~ Ethnicity_collapsed_disaggregated, data, weights = NatRepemployees, family ="quasibinomial")# summary(mod)coef_table <-extract_glm_coefs(mod) %>%mutate(across(where(is.numeric), ~round(.x,2)))rownames(coef_table) <- coef_table$variablesig_coefs <-extract_glm_coefs(mod, only_sig = T) %>%mutate(across(where(is.numeric), ~round(.x,2)))rownames(sig_coefs) <- sig_coefs$variableethnicity_keys <- sig_coefs$variableethnicity_labs <-sub(".*disaggregated","",ethnicity_keys)write_csv(coef_table, file="../outputs/data/ethnicity_model_inferential_2.csv")```Comparison of more disaggregated ethnicities indicates more nuance[^5]:[^5]: [outputs/data/ethnicity_model_inferential_2.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_model_inferential_2.csv)- `r ethnicity_labs[2]` workers are `r sig_coefs[ethnicity_keys[2], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[3]` workers are `r sig_coefs[ethnicity_keys[3], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[4]` workers are `r sig_coefs[ethnicity_keys[4], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[5]` workers are `r sig_coefs[ethnicity_keys[5], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[6]` workers are `r sig_coefs[ethnicity_keys[6], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[7]` workers are `r sig_coefs[ethnicity_keys[7], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[8]` workers are `r sig_coefs[ethnicity_keys[8], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[9]` workers are `r sig_coefs[ethnicity_keys[9], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[10]` workers are `r sig_coefs[ethnicity_keys[10], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[11]` workers are `r sig_coefs[ethnicity_keys[11], "or"]` times more likely than White British workers to be outsourced.```{r}#| include: falsecount <- data %>%group_by(Ethnicity_collapsed) %>%summarise(count =n(),freq =sum(NatRepemployees) )cis <-confint(mod, level=.95)```::: {.callout-tip title="#ethnicity-sub-group"}- These differences in ethnicity also shift slightly depending on which outsourced “sub-group” we look at. For example, compared to White British workers, Black outsourced workers are more likely to be in the “outsourced sub-group” meaning they have self-identified as outsourced, or the “agency sub-group”, meaning they are agency workers doing more long-term and ongoing work. **Are there any other interesting points to mention here? Should we do a chart showing this different across sub-groups? Do we need an interpretive comment in this section?**:::```{r ethnicity-group}mod <- multinom(outsourcing_group ~ Ethnicity_collapsed, data, weights=NatRepemployees)#summary(mod)# get coefficients and calcualte pcoefs <- summary(mod)$coefficients# get predicted group names to insert latergroup <- rownames(coefs)ors <- exp(coefs)colnames(ors) <- paste(colnames(ors), "or", sep="_")z <- coefs/summary(mod)$standard.errorsp <- (1 - pnorm(abs(z), 0, 1)) * 2colnames(p) <- paste(colnames(p), "p", sep="_")p_2 <- apply(p, 2, function(x) ifelse(x < 0.01, 1, NA))sig_ors <- exp(summary(mod)$coefficients * p_2)# add to table for savingcoefs2 <- cbind(coefs, ors, p) %>% as_tibble() %>% mutate( predicted_group = group, .before=everything() # insert predicted group so output table can be better interpeted )write_csv(coefs2, file = "../outputs/data/ethnicity_ogroup_inferential_tab.csv")# sig_ors```Breaking down by outsourcing group helps to separate out the *type* of outsourced work people from the ethnicities identified above engage in.[^6] Compared to White British workers,[^6]: [outputs/data/ethnicity_ogroup_inferential_tab.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_ogroup_inferential_tab.csv)- Arab people are more likely to be likely agency or high indicators- Asian people are more likely to be in any of the groups- Black people are more likely to be likely agency or outsourced- People of mixed ethnicity are more likely to be outsourced- People who selected Other ethnicity are more likely to be agency- White other people are more likely to be outsourced```{r}sjPlot::plot_model(mod)``````{r}mod <-multinom(outsourcing_group ~ Ethnicity_collapsed_disaggregated, data, weights=NatRepemployees)#summary(mod)# get coefficients and calcualte pcoefs <-summary(mod)$coefficients# get predicted group names to insert latergroup <-rownames(coefs)ors <-exp(coefs)colnames(ors) <-paste(colnames(ors), "or", sep="_")z <- coefs/summary(mod)$standard.errorsp <- (1-pnorm(abs(z), 0, 1)) *2colnames(p) <-paste(colnames(p), "p", sep="_")p_2 <-apply(p, 2, function(x) ifelse(x <0.01, 1, NA))sig_ors <-exp(summary(mod)$coefficients * p_2)# add to table for savingcoefs2 <-cbind(coefs, ors, p) %>%as_tibble() %>%mutate(predicted_group = group, .before=everything() # insert predicted group so output table can be better interpeted )sig_ors2 <- sig_ors[,colSums(!is.na(sig_ors)) >0]sig_ors2 <-t(sig_ors2)# get teh sample informatoinsample_count <- data %>%group_by(outsourcing_group,Ethnicity_collapsed_disaggregated) %>%summarise(n =n(),freq =sum(NatRepemployees) ) %>%filter(outsourcing_group !="Not outsourced") %>%pivot_wider(names_from = outsourcing_group, values_from =c(n, freq))# combine sampel info with estimates# NAs in this table simply indicate non-sig results sig_ors2 <-as.data.frame(sig_ors2) %>% tibble::rownames_to_column(var ="Ethnicity_collapsed_disaggregated") %>%mutate(Ethnicity_collapsed_disaggregated =sub(".*disaggregated","", Ethnicity_collapsed_disaggregated) ) %>%filter(Ethnicity_collapsed_disaggregated !="(Intercept)") %>%left_join(sample_count, by ="Ethnicity_collapsed_disaggregated")write_csv(coefs2, file ="../outputs/data/ethnicity_ogroup_inferential_tab_2.csv")```More nuance from disaggregated ethnicities[^7]. The table below shows the likelihood of workers of different ethnicities falling into each of the outsourcing groups, compared to White British workers. Note that only significant relationships are shown here. *Note also that the 'n' for many of these statistics is very low. As such many of these statistics are illustrative but not inferential.*[^7]: [outputs/data/ethnicity_ogroup_inferential_tab_2.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_ogroup_inferential_tab_2.csv)```{r}sig_ors2 %>%rename(Ethnicity = Ethnicity_collapsed_disaggregated ) %>%kable(caption ="Likelihood of belonging to different groups compared to White British. Note: NAs are non-sig. relationships. 'n_' is sample size, 'freq_' is weighted sample size", digits =2) %>%kable_styling(full_width = F)```::: {.callout-tip title="#ethnicity-pay-split"}- On the low-pay / high-pay split, you say “*A person is more likely to be in the low income group if they are: Older; Female; Prefer not to say when they arrived, And less likely if they are: Asian/Asian British; Live in North West or Wales; Arrived in the UK in last 30 years*”; Can I confirm this means we don’t see any other significant differences in the ethnicity breakdown if we look at high paid vs low paid workers? If so, let’s clarify what this says about how ethnicity relates to a) outsourced workers being disproportionately low paid, but b) ethnic minority workers being no more likely to be in our low pay group.*Using the new ethnicity groupings, there is no evidence indicating that any ethnicity is more or less likely to be in the low income group***Note to self: This could benefit from stepwise regression**:::```{r income-group}#| output: false#| message: false# test significance# mod <- glm(income_group ~ outsourcing_status, data, family="quasibinomial", weights = NatRepemployees)# summary(mod)# # test <- summary(mod)# # or <- exp(mod[["coefficients"]][["outsourcing_statusOutsourced"]])# p <- test[["coefficients"]][2,4]mod_2 <- glm(income_group ~ Age + Gender + Has_Degree + Ethnicity_collapsed + Region + outsourcing_status + BORNUK_labelled, income_data, family="quasibinomial", weights = NatRepemployees)summary(mod_2)test <- summary(mod_2)or <- exp(mod_2[["coefficients"]][["outsourcing_statusOutsourced"]])p <- test[["coefficients"]][2,4]rownames(coef_table) <- coef_table$variablesig_coefs <- extract_glm_coefs(mod_2, only_sig = T)write_csv(coef_table, file="../outputs/data/income_group_outsourcing.csv")``````{r}#| output: false#| message: falsemod_2 <-glm(income_group ~ Ethnicity_collapsed * outsourcing_status, income_data, family="quasibinomial", weights = NatRepemployees)summary(mod_2)mod_3 <-glm(income_group ~ Ethnicity_binary * outsourcing_status, data, family="quasibinomial", weights = NatRepemployees)summary(mod_3)black_coef <-extract_glm_coefs(mod_2,, only_sig = T)[5,"Estimate"] %>%exp(.) %>%round(.,2) %>%pull()other_coef <-extract_glm_coefs(mod_2,, only_sig = T)[6,"Estimate"] %>%exp(.) %>%round(.,2) %>%pull()```A person is more likely to be in the low income group if they are:- Older- Female- Don't have a degree (or don't know if they have a degree?)- Are outsourced- Arrived in the UK in the last yearAnd less likely if they are:- Younger- Male- Have a degree- Live in the North West or Wales (compared to London)- Arrived in the UK in last 30 years::: {.callout-tip title="#migration"}- As you would expect, the vast majority of outsourced workers were born in the UK. However, we still see a significantly higher likelihood of outsourced workers having been born outside of the UK compared to people who aren’t outsourced. While around 14% of non-outsourced workers were born outside of the UK, this rose to just over 24% for outsourced workers – or nearly 1 in 4.- Overall, people who were born outside of the UK are 1.94 times more likely to be in outsourced work than people who were born here.:::```{r}data <- data %>%mutate(BORNUK_collapsed = forcats::fct_collapse(BORNUK_labelled,"Born in UK"="I was born in the UK","Came to UK recently"=c("Within the last year"),"Came to UK not recently"=c("Within the last 3 years","Within the last 5 years","Within the last 10 years","Within the last 15 years","Within the last 20 years","Within the last 30 years","More than 30 years ago"),"Prefer not to say"=c("Prefer not to say") ) )bornuk_statistics <- data %>%group_by(outsourcing_status, BORNUK_collapsed) %>%summarise(n =n(),Frequency =sum(NatRepemployees) ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) )readr::write_csv(bornuk_statistics, file="../outputs/data/arrival_in_UK_collapsed_stats.csv")bornuk_statistics %>%ggplot(., aes(BORNUK_collapsed, Percentage, fill =outsourcing_status)) +geom_col(colour="black", position ="dodge") +geom_text(aes(BORNUK_collapsed, y =99, label =paste0("n = ",n)), position=position_dodge(width=1), hjust=1) +coord_flip() +scale_fill_manual(values=many_colours, name="Outsourcing status") +theme_minimal() +xlab("Arrival in UK") ``````{r bornuk_inferential, output=FALSE}mod <- glm(outsourcing_status ~ BORNUK_binary, data, weights = NatRepemployees, family="quasibinomial")# mod <- glm(Ethnicity_binary~outsourcing_status , data, weights = NatRepemployees, family="quasibinomial")summary(mod)coefs <- extract_glm_coefs(mod)write_csv(coefs, file = "../outputs/data/bornuk_ostatus_inferential_tab.csv")```As for non-outsourced workers, the vast majority of outsourced workers are born in the UK. However, people not born in the UK are more likely to be outsourced than people born in the UK. `r 100 - round(bornuk_statistics[which(bornuk_statistics$outsourcing_status == "Outsourced" & bornuk_statistics$BORNUK_collapsed == "Born in UK"), "Percentage"],2)`% of outsourced workers are not born in the UK, compared to `r 100 - round(bornuk_statistics[which(bornuk_statistics$outsourcing_status == "Not outsourced" & bornuk_statistics$BORNUK_collapsed == "Born in UK"), "Percentage"],2)`% of non-outsourced workers.[^8] This difference is statistically significant; **outsourced workers are `r round(coefs %>% filter(variable == "BORNUK_binaryNot born in UK") %>% pull(or),2)` times more likely to have been born outside the UK than non-outsourced workers.**[^9][^8]: [outputs/data/arrival_in_UK_stats.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/arrival_in_UK_stats.csv)[^9]: [outputs/data/bornuk_ostatus_inferential_tab.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/bornuk_ostatus_inferential_tab.csv)::: {.callout-tip title="#migration-sub-groups"}- This pattern broadly holds across our three outsourcing sub-groups, with nearly no difference in the likelihood of people born outside of the UK being in any one of the three groups.:::```{r}mod <-multinom(outsourcing_group ~ BORNUK_binary, data, weights=NatRepemployees)#summary(mod)# get coefficients and calcualte pcoefs <-summary(mod)$coefficientsors <-exp(coefs)colnames(ors) <-paste(colnames(ors), "or", sep="_")z <- coefs/summary(mod)$standard.errorsp <- (1-pnorm(abs(z), 0, 1)) *2colnames(p) <-paste(colnames(p), "p", sep="_")p_2 <-apply(p, 2, function(x) ifelse(x <0.01, 1, NA))sig_ors <-exp(summary(mod)$coefficients * p_2)# add to table for savingcoefs <-cbind(coefs, ors, p) %>%as_tibble()write_csv(coefs, file ="../outputs/data/bornuk_ogroup_inferential_tab.csv")# sig_orsbornuk_statistics_ogroup <- data %>%group_by(outsourcing_group, BORNUK_collapsed) %>%summarise(n =n(),Frequency =sum(NatRepemployees) ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) )readr::write_csv(bornuk_statistics_ogroup, file="../outputs/data/arrival_in_UK_collapsed_stats_ogroup.csv")bornuk_statistics_ogroup %>%ggplot(., aes(BORNUK_collapsed, Percentage, fill =outsourcing_group)) +geom_col(colour="black", position ="dodge") +geom_text(aes(BORNUK_collapsed, y =99, label =paste0("n = ",n)), position=position_dodge(width=1), hjust=1) +coord_flip() +scale_fill_manual(values=many_colours, name="Outsourcing status") +theme_minimal() +xlab("Arrival in UK") ```::: {.callout-warning title="#ethnicity-migration-interaction. Some attention needed here"}Among all workers who were born in the UK:- Black workers are 2.01 times more likely to be outsourced than a White worker- Asian workers are 2.02 times more likely to be outsourced than a White worker.- Workers from Other ethnic backgrounds are X times more likely to be outsourced than a White other workerFor workers born outside of the UK:- Among White workers, someone not born in the UK is 1.82 times more likely to be outsourced than someone born in the UK.- Among workers from Mixed ethnic backgrounds, someone not born in the UK is 2.73 times more likely to be outsourced than someone born in the UK.For workers from other ethnicities, it doesn’t matter whether you are born in the UK or not – you are equally likely as a Black or an Asian worker to be outsourced, whether you were born in the UK or somewhere else. And compared to a White person born in the UK, Black African and South Asian workers specifically are more likely to be outsourced, whether or not they were born in the UK . Does this need any further detail or explanation**To discuss confidence in our interpretation in this section: The evidence on ethnicity and country of birth clearly paints a racialised picture of outsourcing, and one with colonial undertones, as Black African and South Asian workers see a higher risk of being outsourced compared to White British workers, regardless of their country of birth. This obviously raises further questions about why, linked to (sector, occupation, labour market inequality and structural racism). Discuss the draft interpretation in the comment on the right.****However, workers from non-White ethnic groups are not the only workers who see a higher risk of being outsourced: Non-UK-born White workers are also more likely to be outsourced than UK-born White people . Ethnicity and country of birth interact independently for some groups, but seem to be fundamentally connected for others.**:::```{r}base_mod <- mod <-glm(outsourcing_status ~ Ethnicity_collapsed + BORNUK_binary, data, weights = NatRepemployees, family ="quasibinomial")mod <-glm(outsourcing_status ~ Ethnicity_collapsed*BORNUK_binary, data, weights = NatRepemployees, family ="quasibinomial")# summary(mod)# check that interaction imporves the model over main effects - it doesanova(base_mod, mod, test ="F")coefs <-extract_glm_coefs(mod)``````{r}ems <-emmeans(mod, specs ="Ethnicity_collapsed", by ="BORNUK_binary")cons <-summary(contrast(ems, "pairwise",adjust="tukey"))sig_cons <- cons %>%filter(p.value < .05) %>%mutate(or =1/exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) )write_csv(cons, file ="../outputs/data/ethnicity_bornUK_binary_contrasts.csv")```Exploring the intersection of ethnicity and arrival time reveals some patterns whereby the likelihood of a person being outsourced is related to the combinations of ethnicity and whether they were born in the UK.[^10] The plot below shows that[^10]: [outputs/data/bornUK_binary_contrasts.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/bornUK_binary_contrasts.csv)- Among workers born in the UK, a Black worker is `r round(sig_cons %>% filter(contrast == "White British - (Black/African/Caribbean/Black British)") %>% pull(or),2)` times more likely to be outsourced than a White British worker.- Among workers born in the UK, an Asian worker is `r round(sig_cons %>% filter(contrast == "White British - (Asian/Asian British)") %>% pull(or),2)` times more likely to be outsourced than a White British worker.- Among workers born in the UK, an Other ethnicity worker is `r round(sig_cons %>% filter(contrast == "White British - Other ethnic group") %>% pull(or),2)` times more likely to be outsourced than a White other worker.- Among workers not born in the UK, a White other worker is `r round(sig_cons %>% filter(contrast == "White British - White other") %>% pull(or),2)` times as likely (i.e., less likely) to be outsourced than a White British worker.- Among workers not born in the UK, a White other worker is `r round(sig_cons %>% filter(contrast == "(Black/African/Caribbean/Black British) - White other") %>% pull(or),2)` times as likely (i.e., less likely) to be outsourced than a Black worker.- Among workers not born in the UK, a White other worker is `r round(sig_cons %>% filter(contrast == "(Mixed/Multiple ethnic group) - White other") %>% pull(or),2)` times as likely (i.e., less likely) to be outsourced than a worker of mixed ethnicity.```{r}sjPlot::plot_model(mod, type ="pred", legend.title="", terms =c("BORNUK_binary","Ethnicity_collapsed"), dodge=0.5) +coord_flip() +xlab("") +ylab("Likelihood of being outsourced") +theme_minimal()``````{r} ems_2 <-emmeans(mod, specs ="BORNUK_binary", by ="Ethnicity_collapsed") cons <-summary(contrast(ems_2, "pairwise",adjust="tukey")) sig_cons <- cons %>%filter(p.value < .05) %>%mutate(or =1/exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) )write_csv(cons, file ="../outputs/data/bornUK_binary_contrasts_2.csv")```Similarly, the plot below shows that[^11][^11]: [outputs/data/region_stats_2.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/region_stats_2.csv)- Among White British workers, someone not born in the UK is `r round(sig_cons %>% filter(contrast == "Born in UK - Not born in UK" & Ethnicity_collapsed == "White British") %>% pull(or),2)` times more likely to be outsourced than someone born in the UK.- Among Mixed workers, someone not born in the UK is `r round(sig_cons %>% filter(contrast == "Born in UK - Not born in UK" & Ethnicity_collapsed == "Mixed/Multiple ethnic group") %>% pull(or),2)` times more likely to be outsourced than someone born in the UK.- Among people who preferred not to say their ethnicity, someone not born in the UK is `r round(sig_cons %>% filter(contrast == "Born in UK - Not born in UK" & Ethnicity_collapsed == "Prefer not to say") %>% pull(or),2)` times as likely (i.e.,`r round(100 * (1 - (sig_cons %>% filter(contrast == "Born in UK - Not born in UK" & Ethnicity_collapsed == "Prefer not to say") %>% pull(or))),0)`% less likely) to be outsourced than someone born in the UK.```{r}sjPlot::plot_model(mod, type ="pred", legend.title="", terms =c("Ethnicity_collapsed","BORNUK_binary"), dodge=0.5) +coord_flip() +xlab("") +ylab("Likelihood of being outsourced") +theme_minimal()``````{r}mod <-glm(outsourcing_status ~ Ethnicity_collapsed_disaggregated*BORNUK_binary, data, weights = NatRepemployees, family ="quasibinomial")# summary(mod)coefs <-extract_glm_coefs(mod, only_sig = T)ems <-emmeans(mod, specs ="Ethnicity_collapsed_disaggregated", by ="BORNUK_binary")cons <-summary(contrast(ems, "pairwise",adjust="tukey"))sig_cons <- cons %>%filter(p.value < .05) %>%mutate(or =1/exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) )sjPlot::plot_model(mod, type ="pred", legend.title="", terms =c("BORNUK_binary","Ethnicity_collapsed_disaggregated"), dodge=0.5) +#coord_flip() +xlab("") +ylab("Likelihood of being outsourced") +theme_minimal() +theme(legend.position ="none") ems_2 <-emmeans(mod, specs ="BORNUK_binary", by ="Ethnicity_collapsed_disaggregated")cons <-summary(contrast(ems_2, "pairwise",adjust="tukey"))sig_cons <- cons %>%filter(p.value < .05) %>%mutate(or =1/exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) )sjPlot::plot_model(mod, type ="pred", legend.title="", terms =c("Ethnicity_collapsed_disaggregated","BORNUK_binary"), dodge=0.5) +coord_flip() +xlab("") +ylab("Likelihood of being outsourced") +theme_minimal()```For people born in UK, if you are Pakistani you are more likely to be outsourced than if you are White.For White people and for White and Asian people, if you're not born in UK you're more likely to be outsourced.::: {.callout-tip title="#migration-by-pay-split"}If we do a basic “born UK / not born UK” split, looking by low and high pay, what % of the low-paid workers group were born outside of the UK, vs in the high-paid group?:::```{r}#| message: falsemig_pay_split <- income_data %>%filter(!is.na(income_group)) %>%group_by(outsourcing_status, income_group, BORNUK_binary) %>%summarise(freq =sum(NatRepemployees),n =n() ) %>%mutate(total =sum(freq),percentage =100* (freq / total),N =sum(n) )low_pay_perc <- mig_pay_split %>%filter(income_group =="Low"& BORNUK_binary =="Not born in UK"& outsourcing_status =="Outsourced") %>%mutate(round(percentage,2) ) %>%pull()high_pay_perc <- mig_pay_split %>%filter(income_group =="Not low"& BORNUK_binary =="Not born in UK"& outsourcing_status =="Outsourced") %>%mutate(round(percentage,2) ) %>%pull()mod <-glm(income_group ~ BORNUK_binary, income_data, weights = NatRepemployees, family ="quasibinomial")# summary(mod)mod_2 <-glm(income_group ~ BORNUK_binary * outsourcing_status, income_data, weights = NatRepemployees, family ="quasibinomial")# summary(mod_2)````r low_pay_perc`% of outsourced workers in the low pay group were not born in the UK, compared to `r high_pay_perc`% of people in the not low pay group. This difference is marginally statistically significant; someone in the low income group is less likely to be born outside the UK than someone in the not low income group. This pattern is the same for non outsourced workers, and when we consider the interaction between outsourcing status and migration status, the only factor predicting income group is outsourcing status.```{r}mig_pay_split %>%ggplot(aes(income_group, percentage, fill = BORNUK_binary)) +facet_grid(rows =vars(outsourcing_status)) +geom_col(position="dodge") +theme_minimal()```## Outsourced workers are on average younger than non-outsourced workers {#sec-age}::: {.callout-tip title="#age"}- We find that outsourced workers are significantly younger than non-outsourced workers, on average. The median age of an outsourced worker is 35, compared to a median age of 43 for a non-outsourced worker.- the outsourced and indicator sub-groups – people who directly said that they were or might be outsourced, or ticked a high number of our indicators of outsourced working – see higher proportions of younger workers than the “agency” sub-group.:::::: {.callout-important title="#age-violin"}INSERT VIOLIN PLOT CHART HERE SHOWING MEDIAN AGE OF EACH SUB-GROUP, COMPARED TO NON-OUTSOURCED WORKERS. **Is this necessary? We already have the density plots**:::```{r age-by-status}age_statistics <- data %>% group_by(outsourcing_status) %>% summarise( mean = weighted.mean(Age, w = NatRepemployees, na.rm = T), median = wtd.quantile(Age, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(Age, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(Age, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(Age, w = NatRepemployees, na.rm = T)), N = n() )readr::write_csv(age_statistics, file = "../outputs/data/age_stats.csv")``````{r age-inferential, include=FALSE}test <- lm(Age ~ outsourcing_status, weights = NatRepemployees, data)summary(test)coefs <- extract_lm_coefs(test)readr::write_csv(coefs,file="../outputs/data/age_inferential.csv")```Outsourced workers are on average younger than non-outsourced workers. The median age of the outsourced group is `r age_statistics[which(age_statistics$outsourcing_status=="Outsourced"),"median"]` , compared to `r age_statistics[which(age_statistics$outsourcing_status=="Not outsourced"),"median"]` for the not outsourced group.[^12] This difference is statistically significant.[^13][^12]: [outputs/data/region_stats_2.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/region_stats_2.csv)[^13]: [outputs/data/region_stats_3.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/region_stats_3.csv)```{r age-by-status-plot}knitr::kable(age_statistics, digits = 2, col.names = c("Outsourcing group", "Mean", "Median", "Min", "Max", "Standard dev.", "N")) %>% kable_styling(full_width = F)data %>% mutate( Age = as.numeric(as.character(as_factor(Age))) ) %>% ggplot(.,aes(Age, colour = outsourcing_status, fill = outsourcing_status)) + geom_density(alpha = 0.3) + geom_vline(data =age_statistics, aes(xintercept=median, colour = outsourcing_status)) + scale_x_continuous(breaks = seq(min(age_statistics$min), max(age_statistics$max),5)) + theme_minimal() + scale_colour_manual(values=colours, name = "Outsourcing status") + scale_fill_manual(values=colours, name = "Outsourcing status")```The higher concentration of younger workers identified above appears to be driven primarily by the 'outsourced' and 'high indicator' groups, whilst the 'likely agency' group follows a similar pattern to the non-outsourced group.[^14][^14]: [outputs/data/sector_summary_3.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/sector_summary_3.csv)```{r}age_statistics_income_group <- income_data %>%filter(!is.na(income_group)) %>%group_by(outsourcing_status, income_group) %>%summarise(mean =weighted.mean(Age, w = NatRepemployees, na.rm = T),median =wtd.quantile(Age, w = NatRepemployees, probs =c(.5), na.rm = T),min =wtd.quantile(Age, w = NatRepemployees, probs =c(0), na.rm = T),max =wtd.quantile(Age, w = NatRepemployees, probs =c(1), na.rm = T),stdev =sqrt(wtd.var(Age, w = NatRepemployees, na.rm = T)),N =n() )knitr::kable(age_statistics_income_group,digits =2,col.names =c("Outsourcing status","Income group","Mean","Median","Min","Max","Standard dev.","N")) %>%kable_styling(full_width = F)income_data %>%filter(!is.na(income_group)) %>%mutate(Age =as.numeric(as.character(as_factor(Age))) ) %>%ggplot(.,aes(Age, colour = outsourcing_status, fill = outsourcing_status)) +facet_grid(rows =vars(income_group)) +geom_density(alpha =0.3) +geom_vline(data = age_statistics_income_group, aes(xintercept=median, colour = outsourcing_status)) +scale_x_continuous(breaks =seq(min(age_statistics_income_group$min), max(age_statistics_income_group$max),5)) +theme_minimal() +scale_colour_manual(values=colours, name ="Outsourcing status") +scale_fill_manual(values=colours, name ="Outsourcing status")``````{r age-by-group}age_statistics_2 <- data %>% group_by(outsourcing_group) %>% summarise( mean = weighted.mean(Age, w = NatRepemployees, na.rm = T), median = wtd.quantile(Age, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(Age, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(Age, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(Age, w = NatRepemployees, na.rm = T)), N = n() )readr::write_csv(age_statistics_2, file = "../outputs/data/age_stats_2.csv")``````{r age-by-group-plot}knitr::kable(age_statistics_2, digits = 2, col.names = c("Outsourcing group", "Mean", "Median", "Min", "Max", "Standard dev.", "N")) %>% kable_styling(full_width = F)data %>% ggplot(.,aes(Age, colour = outsourcing_group, fill = outsourcing_group)) + geom_density(alpha = 0.2) + geom_vline(data = age_statistics_2, aes(xintercept=median, colour = outsourcing_group)) + scale_x_continuous(breaks = seq(min(age_statistics_2$min), max(age_statistics_2$max),5)) + theme_minimal() + scale_colour_manual(values=better_colours, name = "Outsourcing group") + scale_fill_manual(values=better_colours, name = "Outsourcing group")``````{r}age_statistics2_income_group <- income_data %>%filter(!is.na(income_group)) %>%group_by(outsourcing_group, income_group) %>%summarise(mean =weighted.mean(Age, w = NatRepemployees, na.rm = T),median =wtd.quantile(Age, w = NatRepemployees, probs =c(.5), na.rm = T),min =wtd.quantile(Age, w = NatRepemployees, probs =c(0), na.rm = T),max =wtd.quantile(Age, w = NatRepemployees, probs =c(1), na.rm = T),stdev =sqrt(wtd.var(Age, w = NatRepemployees, na.rm = T)),N =n() )knitr::kable(age_statistics2_income_group,digits =2,col.names =c("Outsourcing group","Income group","Mean","Median","Min","Max","Standard dev.","N")) %>%kable_styling(full_width = F)income_data %>%filter(!is.na(income_group)) %>%mutate(Age =as.numeric(as.character(as_factor(Age))) ) %>%ggplot(.,aes(Age, colour = outsourcing_group, fill = outsourcing_group)) +facet_grid(rows =vars(income_group)) +geom_density(alpha =0.3) +geom_vline(data = age_statistics2_income_group, aes(xintercept=median, colour = outsourcing_group)) +scale_x_continuous(breaks =seq(min(age_statistics2_income_group$min), max(age_statistics2_income_group$max),5)) +theme_minimal() +scale_colour_manual(values=colours, name ="Outsourcing group") +scale_fill_manual(values=colours, name ="Outsourcing group")```::: {.callout-tip title="#gender"}- The evidence also finds meaningful differences by gender between the outsourced and non-outsourced groups in our data. Men make up 56% of the outsourced workforce compared to 47% of the non-outsourced workforce, a nearly 10 percentage point difference.- Outsourced workers are 1.44 times more likely to be male than female. - The group with the largest proportion of men in the workforce is the ‘high indicators’ group (66.35%), followed by the ‘likely agency’ group (56.66%), followed by the ‘outsourced’ group (53.94%). Comparison of outsourced and non-outsourced workers finds that- Someone in the high indicators sub-group is 2.18 times more likely to be male than female.- Someone in the agency sub-group is 1.45 times more likely to be male than female.- Someone in the outsourced sub-group is 1.31 times more likely to be male than female.:::::: {.callout-important title="#gender-sector"}- Possible addition: Will readers want to know more about how this intersects with the roles or sectors with higher rates of outsourcing – even if this is just an interpretive comment from us on how gender interacts with jobs and sectors more generally in the labour market?:::```{r}gender_statistics <- data %>%group_by(outsourcing_status, Gender) %>%summarise(n =n(),Frequency =sum(NatRepemployees) ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) )readr::write_csv(gender_statistics, file="../outputs/data/gender_statistics.csv")``````{r gender-outsourcing-status}mod <- multinom(Gender ~ outsourcing_status, data, weights=NatRepemployees)#summary(mod)# get coefficients and calcualte pcoefs <- summary(mod)$coefficientscoef_names <- rownames(coefs)ors <- exp(coefs)colnames(ors) <- paste(colnames(ors), "or", sep="_")z <- coefs/summary(mod)$standard.errorsp <- (1 - pnorm(abs(z), 0, 1)) * 2colnames(p) <- paste(colnames(p), "p", sep="_")p_2 <- apply(p, 2, function(x) ifelse(x < 0.01, 1, NA))sig_ors <- exp(summary(mod)$coefficients * p_2)coefs <- cbind(coefs, ors, p) %>% as_tibble() %>% mutate( gender = coef_names, .before=everything() )write_csv(coefs, file = "../outputs/data/gender_inferential_tab.csv")```The outsourced workforce consists of a greater proportion of males than the non-outsourced workforce.[^15] Men make up `r round(gender_statistics[which(gender_statistics$outsourcing_status == "Outsourced" & gender_statistics$Gender == "Male"),"Percentage"], 0)`% of the outsourced workforce compared to `r round(gender_statistics[which(gender_statistics$outsourcing_status == "Not outsourced" & gender_statistics$Gender == "Male"),"Percentage"], 0)`% of the non-outsourced workforce. This difference is statistically significant; outsourced workers, compared to non-outsourced workers, are `r round(sig_ors['Male', 'outsourcing_statusOutsourced'], 2)` times more likely to be male than female.[^16][^15]: [outputs/data/gender_statistics.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/gender_statistics.csv)[^16]: [../outputs/data/gender_inferential_tab.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/gender_inferential_tab.csv)```{r}# try just using a glm?mod <-glm(outsourcing_status ~ Gender, family ="quasibinomial", weights = NatRepemployees, data)summary(mod)ors <-extract_glm_coefs(mod)ors``````{r}# gender_statistics %>%# kable() %>%# kable_styling(full_width = F)gender_statistics %>%ggplot(., aes(outsourcing_status, Percentage, fill = Gender)) +geom_col(colour="black") +# annotate("text", x = gender_statistics$outsourcing_status, y = 75, label = paste0("n=", gender_statistics$Frequency)) +coord_flip() +scale_fill_manual(values=colours) +theme_minimal() +xlab("Outsourcing group") +annotate("text", x = gender_statistics$outsourcing_status, y =99, label =paste0("N = ", gender_statistics$N), hjust=1) ``````{r}gender_statistics_2 <- data %>%group_by(outsourcing_group, Gender) %>%summarise(n =n(),Frequency =sum(NatRepemployees) ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) )readr::write_csv(gender_statistics_2, file="../outputs/data/gender_statistics_2.csv")``````{r gender-outsourcing-group}mod <- multinom(Gender ~ outsourcing_group, data, weights=NatRepemployees)#summary(mod)# get coefficients and calcualte pcoefs <- summary(mod)$coefficientsors <- exp(coefs)colnames(ors) <- paste(colnames(ors), "or", sep="_")z <- coefs/summary(mod)$standard.errorsp <- (1 - pnorm(abs(z), 0, 1)) * 2colnames(p) <- paste(colnames(p), "p", sep="_")p_2 <- apply(p, 2, function(x) ifelse(x < 0.01, 1, NA))sig_ors <- exp(summary(mod)$coefficients * p_2)# add to table for savingcoefs <- cbind(coefs, ors, p) %>% as_tibble()write_csv(coefs, file = "../outputs/data/gender_inferential_tab_2.csv")```Breaking down by outsourcing group, we find that the group with the largest proportion of men in the workforce is the 'high indicators' group (`r round(gender_statistics_2 %>% filter(outsourcing_group=="High indicators" & Gender == "Male") %>% pull(Percentage), 2)`%), followed by the 'likely agency' group (`r round(gender_statistics_2 %>% filter(outsourcing_group=="Likely agency" & Gender == "Male") %>% pull(Percentage), 2)`%), followed by the 'outsourced' group (`r round(gender_statistics_2 %>% filter(outsourcing_group=="Outsourced" & Gender == "Male") %>% pull(Percentage), 2)`%). Statistically speaking, compared to a not outsourced person,- Someone in the high indicators group is `r round(sig_ors['Male', 'outsourcing_groupHigh indicators'],2)` times more likely to be male than female.- Someone in the likely agency group is `r round(sig_ors['Male', 'outsourcing_groupLikely agency'],2)` times more likely tobe male than female.- Someone in the outsourced group is `r round(sig_ors['Male', 'outsourcing_groupOutsourced'],2)` times more likely tobe male than female.Additionally, people identifying as 'Other' gender are absent from the high indicators and likely agency groups, though given the small N (`r sum(data$Gender=="Other")`) for this group, this finding is unlikely to be meaningful.```{r}# gender_statistics_2 %>%# kable() %>%# kable_styling(full_width = F)gender_statistics_2 %>%ggplot(., aes(outsourcing_group, Percentage, fill = Gender)) +geom_col(colour="black") +# annotate("text", x = gender_statistics$outsourcing_status, y = 75, label = paste0("n=", gender_statistics$Frequency)) +coord_flip() +scale_fill_manual(values=colours) +theme_minimal() +xlab("Outsourcing group") +annotate("text", x = gender_statistics_2$outsourcing_group, y =99, label =paste0("N = ", gender_statistics_2$N), hjust=1) ```## Outsourced workers are more likely to work in some sectors than others; but seem to be spread across the labour market::: {.callout-tip title="#sectors"}- The three most common sectors for outsourced workers in our survey to be employed within – excluding those with an N size below X (50?) – were administrative and support service activities; water supply, sewerage, waste supply and remediation activities; and other service activities- Five of the twenty employment sectors have at least 1 in 5 of their workforce “outsourced”: more than the average of around 17% across the whole workforce.:::Here we explore what proportion of workers in each sector are outsourced.[^17][^17]: [outputs/data/sector_summary_3.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/sector_summary_3.csv)```{r sector-summary-3}sector_summary_3 <- data %>% #filter(income_drop_all == 0) %>% group_by(SectorName, SectorName_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), # avg_income = mean(income_annual_all, na.rm=T), # wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(SectorName) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum), SectorName_labelled = case_when(SectorName_labelled == "NA" ~ NA, TRUE ~ SectorName_labelled), SectorName_short = SectorName_labelled ) %>% # make the sector names more readable separate_wider_delim(SectorName_short, names = c("SectorName_short", "SectorName_short_detail"), delim=";", too_few = "align_start") %>% mutate( SectorName_short = factor(stringr::str_to_sentence(SectorName_short)), SectorName_short_detail = factor(stringr::str_to_sentence(SectorName_short_detail)), )write_csv(sector_summary_3, file="../outputs/data/sector_summary_3.csv")```The plot below shows the proportion of outsourced and not outsourced workers within each sector. I.e. this is showing what sectors have higher and lower proportions of outsourced workers.```{r sector-plot-2}plot_data <- sector_summary_3 %>% drop_na(SectorName_short) %>% droplevels() %>% ungroup()# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>% filter(outsourcing_status == 'Not outsourced') %>% mutate(SectorName_short = forcats::fct_reorder(SectorName_short, perc, .desc = TRUE))outsourced <- plot_data %>% filter(outsourcing_status == 'Outsourced') %>% mutate( rank = rank(desc(perc)) )# Apply the reordered levels back to the original dataplot_data <- plot_data %>% mutate( SectorName_short = factor(SectorName_short, levels = levels(not_outsourced_levels$SectorName_short)), )# annotation_df <- plot_data %>%# dplyr::select(SectorName_short, outsourcing_status, perc, n# mutate(annotation_df <- plot_data %>% filter(outsourcing_status == "Not outsourced") %>% dplyr::select(SectorName_short, N) %>% mutate( ypos = 80 )ggplot(plot_data, aes(SectorName_short, perc, fill = outsourcing_status)) + geom_col() + geom_text(inherit.aes=F,data=annotation_df, aes(x=SectorName_short, y=ypos, label = paste0("N = ", N)), hjust=1, nudge_y = 15) + coord_flip() + scale_fill_manual(values=many_colours) + scale_y_continuous(breaks=seq(0,100,10))# sector_key <- data.frame("number" = seq(1,length(unique(plot_data$SectorName_labelled)),1),# "Sector" = levels(plot_data$SectorName_labelled))# # sector_key %>%# kable() %>%# kable_styling(full_width = F)```The top three Sectors with the highest proportion of outsourced workers are:- `r unique(plot_data$SectorName_labelled[plot_data$SectorName==3])` (note that N = 31)- `r unique(plot_data$SectorName_labelled[plot_data$SectorName==4])`- `r unique(plot_data$SectorName_labelled[plot_data$SectorName==22])`Note that for an undefined sector ('Not found') contained one of the largest proportions of outsourced workers (`r round(plot_data$perc[which(plot_data$SectorName==16 & plot_data$outsourcing_status=="Outsourced")],0)`% of workers in the 'Not found' category were outsourced).A key takeaway here is that whereas the total outsourced population is 17%, this figure varies by sector, from 0% for Mining... and Extraterritoral organisations... all the way to `r round(outsourced[which(outsourced$rank==1),'perc'],0)`% for `r outsourced[which(outsourced$rank==1),'SectorName_short']`, with 5 out 20 sectors having at least 20% of their workforce outsourced.:::{.callout-tip title=#sectors-ogroup}- Figure X also shows how the total outsourced group in each sector splits into our three outsourced “sub-groups”. We find – as you might expect, based on its dominance within the group of outsourced workers – that outsourced workers in every sector are most likely to be in the “outsourced sub-group”, i.e. those who self-identified as outsourced workers.:::```{r}sector_summary_3 <- data %>%#filter(income_drop_all == 0) %>%filter(outsourcing_group!="Not outsourced") %>%group_by(SectorName, SectorName_labelled, outsourcing_group) %>%summarise(n =n(),Frequency =sum(NatRepemployees),# avg_income = mean(income_annual_all, na.rm=T),# wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(SectorName) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum),SectorName_labelled =case_when(SectorName_labelled =="NA"~NA,TRUE~ SectorName_labelled),SectorName_short = SectorName_labelled ) %>%# make the sector names more readableseparate_wider_delim(SectorName_short, names =c("SectorName_short", "SectorName_short_detail"), delim=";",too_few ="align_start") %>%mutate(SectorName_short =factor(stringr::str_to_sentence(SectorName_short)),SectorName_short_detail =factor(stringr::str_to_sentence(SectorName_short_detail)), )plot_data <- sector_summary_3 %>%drop_na(SectorName_short) %>%droplevels() %>%ungroup()# Filter for 'outsourced' level and reorder SectorName_shortoutsourced_levels <- plot_data %>%filter(outsourcing_group =='Outsourced') %>%mutate(SectorName_short = forcats::fct_reorder(SectorName_short, perc, .desc =TRUE))outsourced <- plot_data %>%filter(outsourcing_group =='Outsourced') %>%mutate(rank =rank(desc(perc)) )# Apply the reordered levels back to the original dataplot_data <- plot_data %>%mutate(SectorName_short =factor(SectorName_short, levels =levels(outsourced_levels$SectorName_short)), ) # annotation_df <- plot_data %>%# dplyr::select(SectorName_short, outsourcing_status, perc, n# mutate(annotation_df <- plot_data %>%filter(outsourcing_group =="Outsourced") %>% dplyr::select(SectorName_short, N) %>%mutate(ypos =80 )plot_data <- plot_data %>%filter(outsourcing_group!="Not outsourced")ggplot(plot_data, aes(SectorName_short, perc, fill = outsourcing_group)) +geom_col() +geom_text(inherit.aes=F,data=annotation_df, aes(x=SectorName_short, y=ypos, label =paste0("N = ", N)), hjust=1, nudge_y =15) +coord_flip() +scale_fill_manual(values=many_colours) +scale_y_continuous(breaks=seq(0,100,10))```# Pay::: {.callout-tip title="'#pay"}- Using regression analysis, we find that outsourced workers are on average paid £2170 less than non-outsourced workers .- The “outsourced sub-group” earns £3,813 less, and the “agency sub-group” £2,603 less, than the non-outsourced group. This finds that pay is lowest in the “outsourced sub-group” of workers, i.e. those who directly identified themselves as being outsourced. Figure X below shows the median and distribution of pay across the three outsourced sub-groups and the non-outsourced group, for comparison.:::::: {.callout-important title="#pay-violin"}Violin plot for the above:::```{r income}# filter to just cases where income is abovve the fifth percentile and lower than the 95th? I.e., drop the top and bottom 5%.income_statistics <- data %>% filter(income_drop_all == 0 & !is.na(income_annual_all)) %>% group_by(outsourcing_status) %>% summarise( n = n(), mean = weighted.mean(income_annual_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_annual_all, w = NatRepemployees, na.rm = T)) )readr::write_csv(income_statistics, file="../outputs/data/income_stats_o-status.csv")mod <- lm(income_annual_all ~ outsourcing_status, income_data, weights = NatRepemployees)# summary(mod)coef_table <- extract_lm_coefs(mod)rownames(coef_table) <- coef_table$variablesig_coefs <- extract_lm_coefs(mod, only_sig = T)write_csv(coef_table, file="../outputs/data/model_income_by_o-status.csv")income_statistics_weekly <- data %>% filter(income_drop_all == 0 & !is.na(income_weekly_all)) %>% group_by(outsourcing_status) %>% summarise( n = n(), mean = weighted.mean(income_weekly_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_weekly_all, w = NatRepemployees, na.rm = T)) )readr::write_csv(income_statistics_weekly, file="../outputs/data/weekly_income_stats_o-status.csv")mod_weekly <- lm(income_weekly_all ~ outsourcing_status, income_data, weights = NatRepemployees)# summary(mod)coef_table_weekly <- extract_lm_coefs(mod_weekly)rownames(coef_table_weekly) <- coef_table_weekly$variablesig_coefs_weekly <- extract_lm_coefs(mod_weekly, only_sig = T)write_csv(coef_table_weekly, file="../outputs/data/model_income_by_o-status_weekly.csv")```The tables and plots below show descriptive statistics on income and its distribution for outsourced and non-outsourced people. Regression analysis shows that **outsourced workers are on average paid £`r abs(round(coef_table['outsourcing_statusOutsourced','Estimate'],0))` less annually than non-outsourced workers**.[^18] Per week, **outsourced workers are on average paid £`r abs(round(coef_table_weekly['outsourcing_statusOutsourced','Estimate'],0))` less than non-outsourced workers**[^18]: [outputs/data/income_stats_o-status.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/income_stats_o-status.csv) & [outputs/data/model_income_by_o-status.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/model_income_by_o-status.csv)Weekly stats here^[[outputs/data/weekly_income_stats_o-status.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/weekly_income_stats_o-status.csv) & [outputs/data/model_income_by_o-status_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/model_income_by_o-status_weekly.csv)]```{r income-plot}knitr::kable(income_statistics, digits = 2, col.names = c("Outsourcing status", "n", "Mean", "Median", "Min", "Max", "Standard dev.")) %>% kable_styling(full_width = F)# plot the distribution of income for the two groupsdata %>% filter(income_drop_all == 0 & !is.na(income_annual_all)) %>% ggplot(., aes(outsourcing_status, income_annual_all)) + geom_violin() + geom_boxplot(width = 0.3) + geom_text(inherit.aes=F, data=income_statistics, aes(outsourcing_status, y = 6e+04), label=paste0("Mean = ", round(income_statistics$mean,0),"\n", "Median = ", income_statistics$median), nudge_x = 0.1, hjust=0) + coord_cartesian(xlim=c(1,2.5)) + theme_minimal() + xlab("Outsourcing status") + ylab("Annual income") + coord_cartesian(ylim = c(plyr::round_any(min(income_statistics$min), 5000, f = floor),plyr::round_any(max(income_statistics$max),5000, f = ceiling))) + scale_y_continuous(breaks = seq(plyr::round_any(min(income_statistics$min), 5000, f = ceiling), plyr::round_any(max(income_statistics$max),5000, f = ceiling), 10000))# weeklyknitr::kable(income_statistics_weekly, digits = 2, col.names = c("Outsourcing status", "n", "Mean", "Median", "Min", "Max", "Standard dev.")) %>% kable_styling(full_width = F)# plot the distribution of income for the two groupsdata %>% filter(income_drop_all == 0 & !is.na(income_weekly_all)) %>% ggplot(., aes(outsourcing_status, income_weekly_all)) + geom_violin() + geom_boxplot(width = 0.3) + geom_text(inherit.aes=F, data=income_statistics_weekly, aes(outsourcing_status, y = 6e+04), label=paste0("Mean = ", round(income_statistics_weekly$mean,0),"\n", "Median = ", income_statistics_weekly$median), nudge_x = 0.1, hjust=0) + coord_cartesian(xlim=c(1,2.5)) + theme_minimal() + xlab("Outsourcing status") + ylab("Weekly income") + coord_cartesian(ylim = c(plyr::round_any(min(income_statistics_weekly$min), 10, f = floor),plyr::round_any(max(income_statistics_weekly$max),10, f = ceiling))) + scale_y_continuous(breaks = seq(plyr::round_any(min(income_statistics_weekly$min), 10, f = ceiling), plyr::round_any(max(income_statistics_weekly$max),10, f = ceiling), 100))``````{r income-outsourcing-group}# filter to just cases where income is abovve the fifth percentile and lower than the 95th? I.e., drop the top and bottom 5%.income_statistics <- data %>% filter(income_drop_all == 0 & !is.na(income_annual_all)) %>% group_by(outsourcing_group) %>% summarise( n = n(), mean = weighted.mean(income_annual_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_annual_all, w = NatRepemployees, na.rm = T)) )readr::write_csv(income_statistics, file="../outputs/data/income_stats_o-group.csv")mod <- lm(income_annual_all ~ outsourcing_group, income_data, weights = NatRepemployees)# summary(mod)coef_table <- extract_lm_coefs(mod)rownames(coef_table) <- coef_table$variablesig_coefs <- extract_lm_coefs(mod, only_sig = T)write_csv(coef_table, file="../outputs/data/model_income_by_o-group.csv")income_statistics_weekly <- data %>% filter(income_drop_all == 0 & !is.na(income_weekly_all)) %>% group_by(outsourcing_group) %>% summarise( n = n(), mean = weighted.mean(income_weekly_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_weekly_all, w = NatRepemployees, na.rm = T)) )readr::write_csv(income_statistics_weekly, file="../outputs/data/weekly_income_stats_o-group.csv")mod_weekly <- lm(income_weekly_all ~ outsourcing_group, income_data, weights = NatRepemployees)# summary(mod)coef_table_weekly <- extract_lm_coefs(mod_weekly)rownames(coef_table_weekly) <- coef_table_weekly$variablesig_coefs_weekly <- extract_lm_coefs(mod_weekly, only_sig = T)write_csv(coef_table_weekly, file="../outputs/data/model_income_by_o-group_weekly.csv")```The tables and plots below show descriptive statistics on income and its distribution for outsrouced groups. Only the full outsourced subgroup has lower income than non-outsourced people. Regression analysis shows that **outsourced workers are on average paid £`r abs(round(coef_table['outsourcing_groupOutsourced','Estimate'],0))` less annually than non-outsourced workers**.[^18] Per week, **outsourced workers are on average paid £`r abs(round(coef_table_weekly['outsourcing_groupOutsourced','Estimate'],0))` less than non-outsourced workers**[^18]: [outputs/data/income_stats_o-group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/income_stats_o-group.csv) & [outputs/data/model_income_by_o-group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/model_income_by_o-group.csv)Weekly stats here^[[outputs/data/weekly_income_stats_o-group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/weekly_income_stats_o-group.csv) & [outputs/data/model_income_by_o-group_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/model_income_by_o-group_weekly.csv)]```{r income-plot-group}knitr::kable(income_statistics, digits = 2, col.names = c("Outsourcing group", "n", "Mean", "Median", "Min", "Max", "Standard dev.")) %>% kable_styling(full_width = F)# plot the distribution of income for the two groupsdata %>% filter(income_drop_all == 0 & !is.na(income_annual_all)) %>% ggplot(., aes(outsourcing_group, income_annual_all)) + geom_violin() + geom_boxplot(width = 0.3) + geom_text(inherit.aes=F, data=income_statistics, aes(outsourcing_group, y = 6e+04), label=paste0("Mean = ", round(income_statistics$mean,0),"\n", "Median = ", round(income_statistics$median,0)), nudge_x = 0.1, hjust=0) + coord_cartesian(xlim=c(1,2.5)) + theme_minimal() + xlab("Outsourcing group") + ylab("Annual income") + coord_cartesian(ylim = c(plyr::round_any(min(income_statistics$min), 5000, f = floor),plyr::round_any(max(income_statistics$max),5000, f = ceiling))) + scale_y_continuous(breaks = seq(plyr::round_any(min(income_statistics$min), 5000, f = ceiling), plyr::round_any(max(income_statistics$max),5000, f = ceiling), 10000))# weeklyknitr::kable(income_statistics_weekly, digits = 2, col.names = c("Outsourcing group", "n", "Mean", "Median", "Min", "Max", "Standard dev.")) %>% kable_styling(full_width = F)# plot the distribution of income for the two groupsdata %>% filter(income_drop_all == 0 & !is.na(income_weekly_all)) %>% ggplot(., aes(outsourcing_group, income_weekly_all)) + geom_violin() + geom_boxplot(width = 0.3) + geom_text(inherit.aes=F, data=income_statistics_weekly, aes(outsourcing_group, y = 1300), label=paste0("Mean = ", round(income_statistics_weekly$mean,0),"\n", "Median = ", round(income_statistics_weekly$median,0)), nudge_x = 0.1, hjust=0) + coord_cartesian(xlim=c(1,2.5)) + theme_minimal() + xlab("Outsourcing group") + ylab("Weekly income") + coord_cartesian(ylim = c(plyr::round_any(min(income_statistics_weekly$min), 10, f = floor),plyr::round_any(max(income_statistics_weekly$max),10, f = ceiling))) + scale_y_continuous(breaks = seq(plyr::round_any(min(income_statistics_weekly$min), 10, f = ceiling), plyr::round_any(max(income_statistics_weekly$max),10, f = ceiling), 100))``````{r}#| output: falsemod <-lm(income_annual_all ~ Age + Gender + Ethnicity_collapsed + Region + outsourcing_status, income_data, weights = NatRepemployees)summary(mod)mod_2 <-lm(income_annual_all ~ Age + Gender + Has_Degree + Ethnicity_collapsed + Region + outsourcing_status, income_data, weights = NatRepemployees)summary(mod_2)mod_3 <-update(mod_2, ~.+ BORNUK_labelled) summary(mod_3)# anova(mod_2, mod_3) # adding BORNUK improves model fitcoef_table <-extract_lm_coefs(mod_3)rownames(coef_table) <- coef_table$variablesig_coefs <-extract_glm_coefs(mod_3, only_sig = T)write_csv(coef_table, file="../outputs/data/model_2_income_by_o-status.csv")mod_3_weekly <-lm(income_weekly_all ~ Age + Gender + Has_Degree + Ethnicity_collapsed + Region + outsourcing_status + BORNUK_labelled, income_data, weights = NatRepemployees)summary(mod_3_weekly)coef_table_weekly <-extract_lm_coefs(mod_3_weekly)rownames(coef_table_weekly) <- coef_table_weekly$variablesig_coefs <-extract_glm_coefs(mod_3_weekly, only_sig = T)write_csv(coef_table_weekly, file="../outputs/data/model_2_income_by_o-status_weekly.csv")```This difference increases to £`r abs(round(coef_table['outsourcing_statusOutsourced','Estimate'],0))` annually (£`r abs(round(coef_table_weekly['outsourcing_statusOutsourced','Estimate'],0))` per week) when we take into account Age, Gender, Education, Ethnicity, Region, and Arrival Time. [^19] This analysis shows that all other variables, apart from Age, are in some way relevant to income. On average, and controlling for each of the other variables in the model. Annually:[^19]: [outputs/data/model_2_income_by_o-status.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/model_2_income_by_o-status.csv)- Men earn £`r abs(round(coef_table['GenderMale','Estimate'],0))` more than women.- People who have a degree earn £`r abs(round(coef_table['Has_DegreeYes','Estimate'],0))` more than people without a degree.- Workers in all non-London regions earn less than workers in London - East Midlands: -£`r abs(round(coef_table['RegionEast Midlands','Estimate'],0))` - East of England: -£`r abs(round(coef_table['RegionEast of England','Estimate'],0))` - North East: -£`r abs(round(coef_table['RegionNorth East','Estimate'],0))` - North West: -£`r abs(round(coef_table['RegionNorth West','Estimate'],0))` - Northern Ireland: -£`r abs(round(coef_table['RegionNorthern Ireland','Estimate'],0))` - Scotland: -£`r abs(round(coef_table['RegionScotland','Estimate'],0))` - South East: -£`r abs(round(coef_table['RegionSouth East','Estimate'],0))` - Wales: -£`r abs(round(coef_table['RegionWales','Estimate'],0))` - West Midlands: -£`r abs(round(coef_table['RegionWest Midlands','Estimate'],0))` - Yorkshire and the Humber: -£`r abs(round(coef_table['RegionYorkshire and the Humber','Estimate'],0))`- People who arrived in the UK within the last year earn £`r abs(round(coef_table['BORNUK_labelledWithin the last year','Estimate'],0))` less than people born in the UK- People who arrived in the UK within the last 3 years earn £`r abs(round(coef_table['BORNUK_labelledWithin the last 3 years','Estimate'],0))` less than people born in the UK- People who arrived in the UK within the last 5 years earn £`r abs(round(coef_table['BORNUK_labelledWithin the last 5 years','Estimate'],0))` less than people born in the UK- People who arrived within the last 30 years earn £`r abs(round(coef_table['BORNUK_labelledWithin the last 30 years','Estimate'],0))` more than people born in the UK.Weekly^[[outputs/data/model_2_income_by_o-status_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/model_2_income_by_o-status_weekly.csv)]:- Men earn £`r abs(round(coef_table_weekly['GenderMale','Estimate'],0))` more than women.- People who have a degree earn £`r abs(round(coef_table_weekly['Has_DegreeYes','Estimate'],0))` more than people without a degree.- Workers in all non-London regions earn less than workers in London - East Midlands: -£`r abs(round(coef_table_weekly['RegionEast Midlands','Estimate'],0))` - East of England: -£`r abs(round(coef_table_weekly['RegionEast of England','Estimate'],0))` - North East: -£`r abs(round(coef_table_weekly['RegionNorth East','Estimate'],0))` - North West: -£`r abs(round(coef_table_weekly['RegionNorth West','Estimate'],0))` - Northern Ireland: -£`r abs(round(coef_table_weekly['RegionNorthern Ireland','Estimate'],0))` - Scotland: -£`r abs(round(coef_table_weekly['RegionScotland','Estimate'],0))` - South East: -£`r abs(round(coef_table_weekly['RegionSouth East','Estimate'],0))` - Wales: -£`r abs(round(coef_table_weekly['RegionWales','Estimate'],0))` - West Midlands: -£`r abs(round(coef_table_weekly['RegionWest Midlands','Estimate'],0))` - Yorkshire and the Humber: -£`r abs(round(coef_table_weekly['RegionYorkshire and the Humber','Estimate'],0))`- People who arrived in the UK within the last year earn £`r abs(round(coef_table_weekly['BORNUK_labelledWithin the last year','Estimate'],0))` less than people born in the UK- People who arrived in the UK within the last 3 years earn £`r abs(round(coef_table_weekly['BORNUK_labelledWithin the last 3 years','Estimate'],0))` less than people born in the UK- People who arrived in the UK within the last 5 years earn £`r abs(round(coef_table_weekly['BORNUK_labelledWithin the last 5 years','Estimate'],0))` less than people born in the UK- People who arrived within the last 30 years earn £`r abs(round(coef_table_weekly['BORNUK_labelledWithin the last 30 years','Estimate'],0))` more than people born in the UK.## Gender pay gap```{r}#| output: false#| messages: false#| warnings: falsesimp_mod <-lm(income_annual_all ~ outsourcing_status*Gender, income_data, weights=NatRepemployees)summary(simp_mod)mod <-lm(income_annual_all ~ Age + Gender + Has_Degree + Ethnicity_collapsed + Region + outsourcing_status + BORNUK_labelled + Gender:outsourcing_status, income_data, weights = NatRepemployees)summary(mod)mod_weekly <-lm(income_weekly_all ~ Age + Gender + Has_Degree + Ethnicity_collapsed + Region + outsourcing_status + BORNUK_labelled + Gender:outsourcing_status, income_data, weights = NatRepemployees)summary(mod_weekly)simp_mod_weekly <-lm(income_weekly_all ~ outsourcing_status*Gender, income_data, weights=NatRepemployees)summary(simp_mod)```::: {.callout-warning title="#gender-pay-gap"}- On average within our sample, male workers earn £6400 more than female workers per year; but further exploration of how pay relates to gender for outsourced workers suggests that this gender pay gap doesn’t differ in a statistically significant way depending on whether workers are outsourced or not- For female outsourced workers, this suggests that being an outsourced worker neither exacerbates nor diminishes the gender pay gap they face compared to male workers. **Check what this controls for**:::### Outsourcing status```{r gender-pay-gap-1}gender_outsourced_gap <- income_data %>% group_by(outsourcing_status, Gender) %>% summarise( n = n(), mean = weighted.mean(income_annual_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_annual_all, w = NatRepemployees, na.rm = T)) )not_outsourced_gap <- gender_outsourced_gap %>% filter(outsourcing_status == "Not outsourced") %>% dplyr::select(c(outsourcing_status, Gender, median)) %>% pivot_wider(names_from = "Gender", values_from = "median") %>% mutate( diff = Male - Female ) %>% pull(diff)outsourced_gap <- gender_outsourced_gap %>% filter(outsourcing_status == "Outsourced") %>% dplyr::select(c(outsourcing_status, Gender, median)) %>% pivot_wider(names_from = "Gender", values_from = "median") %>% mutate( diff = Male - Female ) %>% pull(diff)gender_outsourced_gap %>% kable() %>% kable_styling(full_width = F)gender_outsourced_gap %>% ggplot(aes(outsourcing_status, median, fill = Gender)) + geom_col(position="dodge") + ggtitle("Annual income")write_csv(gender_outsourced_gap, "../outputs/data/o-status_gender_gap.csv")# weeklygender_outsourced_gap_weekly <- income_data %>% group_by(outsourcing_status, Gender) %>% summarise( n = n(), mean = weighted.mean(income_weekly_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_weekly_all, w = NatRepemployees, na.rm = T)) )not_outsourced_gap_weekly <- gender_outsourced_gap_weekly %>% filter(outsourcing_status == "Not outsourced") %>% dplyr::select(c(outsourcing_status, Gender, median)) %>% pivot_wider(names_from = "Gender", values_from = "median") %>% mutate( diff = Male - Female ) %>% pull(diff)outsourced_gap_weekly <- gender_outsourced_gap_weekly %>% filter(outsourcing_status == "Outsourced") %>% dplyr::select(c(outsourcing_status, Gender, median)) %>% pivot_wider(names_from = "Gender", values_from = "median") %>% mutate( diff = Male - Female ) %>% pull(diff)gender_outsourced_gap_weekly %>% kable() %>% kable_styling(full_width = F)gender_outsourced_gap_weekly%>% ggplot(aes(outsourcing_status, median, fill = Gender)) + geom_col(position="dodge")+ ggtitle("Weekly income")write_csv(gender_outsourced_gap_weekly, "../outputs/data/o-status_gender_gap_weekly.csv")```**Annual**^[[outputs/data/o-status_gender_gap.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/o-status_gender_gap.csv) & [outputs/data/mod_o-status_gender.csv](outputs/data/mod_o-status_gender.csv)]:Exploring the gender pay gap by outsourcing status indicates that the pay gap does not differ depending on whether workers are outsourced our not. For non-outsourced workers, females are paid £`r round(not_outsourced_gap,2)` less than males. For outsourced workers, females are paid £`r round(outsourced_gap,2)` less than males. The difference between non-outsourced and outsourced workers is not significant.**Weekly**^[[outputs/data/o-status_gender_gap_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/o-status_gender_gap_weekly.csv) & [outputs/data/mod_o-status_gender_weekly.csv](outputs/data/mod_o-status_gender_weekly.csv)]: Exploring the gender pay gap by outsourcing status indicates that the pay gap does not differ depending on whether workers are outsourced our not. For non-outsourced workers, females are paid £`r round(not_outsourced_gap_weekly,2)` less than males. For outsourced workers, females are paid £`r round(outsourced_gap_weekly,2)` less than males. The difference between non-outsourced and outsourced workers is not significant.```{r gender-outsourcing-int}#| output: falseggplot(gender_outsourced_gap, aes(outsourcing_status, median, fill = Gender)) + geom_col(position="dodge") + geom_label(aes(label=round(median,0)), position=position_dodge(width=0.9)) + theme_minimal() + ylab("Median income") + xlab("Outsourcing status")simp_mod <- lm(income_annual_all ~ Gender*outsourcing_status, income_data, weights = NatRepemployees)summary(simp_mod)# simp_mod2 <- update(simp_mod, ~. + Has_Degree)# summary(simp_mod2)# anova(simp_mod, simp_mod2)mod_2 <- lm(income_annual_all ~ Age + Has_Degree + Ethnicity_collapsed + Region + Gender*outsourcing_status, income_data, weights = NatRepemployees)summary(mod_2)mod_3 <- update(mod_2, ~.+ BORNUK_labelled) summary(mod_3)anova(mod_2, mod_3) # adding BORNUK improves model fitcoef_table <- extract_lm_coefs(mod_3)rownames(coef_table) <- coef_table$variablesig_coefs <- extract_glm_coefs(mod, only_sig = T)write_csv(coef_table, "../outputs/data/mod_o-status_gender.csv")mod_3_weekly <- lm(income_weekly_all ~ Age + Has_Degree + Ethnicity_collapsed + Region + Gender*outsourcing_status + BORNUK_labelled, income_data, weights = NatRepemployees)summary(mod_3_weekly)coef_table_weekly <- extract_lm_coefs(mod_3_weekly)rownames(coef_table_weekly) <- coef_table_weekly$variablesig_coefs <- extract_glm_coefs(mod_3_weekly, only_sig = T)write_csv(coef_table_weekly, "../outputs/data/mod_o-status_gender_weekly.csv")```The gender by outsourcing status is also not relevant for whether a worker is low income (i.e. non-sig relationship with income_group).```{r}#| output: falsemod <-glm(income_group ~ Age + Has_Degree + Ethnicity_collapsed + Region + Gender*outsourcing_status + BORNUK_labelled, income_data, family="quasibinomial", weights = NatRepemployees)summary(mod)# test <- summary(mod)# # or <- exp(mod[["coefficients"]][["outsourcing_statusOutsourced"]])# p <- test[["coefficients"]][2,4]# # coef_table <- extract_lm_coefs(mod_3)# rownames(coef_table) <- coef_table$variable# sig_coefs <- extract_glm_coefs(mod, only_sig = T)write_csv(coef_table, "../outputs/data/mod_gender_outsourcing_income_group.csv")```### Outsourcing group```{r gender-pay-gap-group}#| output: false#| warnings: false#| messages: falsegender_outsourced_gap <- income_data %>% group_by(outsourcing_group, Gender) %>% summarise( n = n(), mean = weighted.mean(income_annual_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_annual_all, w = NatRepemployees, na.rm = T)) )gender_outsourced_gap %>% kable() %>% kable_styling(full_width = F)gender_outsourced_gap %>% ggplot(aes(outsourcing_group, median, fill = Gender)) + geom_col(position="dodge") + ggtitle("Annual income")write_csv(gender_outsourced_gap, "../outputs/data/o-group_gender_gap.csv")# weeklygender_outsourced_gap_weekly <- income_data %>% group_by(outsourcing_group, Gender) %>% summarise( n = n(), mean = weighted.mean(income_weekly_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_weekly_all, w = NatRepemployees, na.rm = T)) )gender_outsourced_gap_weekly %>% kable() %>% kable_styling(full_width = F)gender_outsourced_gap_weekly%>% ggplot(aes(outsourcing_group, median, fill = Gender)) + geom_col(position="dodge")+ ggtitle("Weekly income")write_csv(gender_outsourced_gap_weekly, "../outputs/data/o-group_gender_gap_weekly.csv")# models## annualmod_3 <- lm(income_annual_all ~ Age + Has_Degree + Ethnicity_collapsed + Region + Gender*outsourcing_group + BORNUK_labelled, income_data, weights = NatRepemployees)summary(mod_3)coef_table <- extract_lm_coefs(mod_3)rownames(coef_table) <- coef_table$variablesig_coefs <- extract_glm_coefs(mod, only_sig = T)write_csv(coef_table, "../outputs/data/mod_o-group_gender.csv")# weeklymod_3_weekly <- lm(income_weekly_all ~ Age + Has_Degree + Ethnicity_collapsed + Region + Gender*outsourcing_group + BORNUK_labelled, income_data, weights = NatRepemployees)summary(mod_3_weekly)coef_table_weekly <- extract_lm_coefs(mod_3_weekly)rownames(coef_table_weekly) <- coef_table_weekly$variablesig_coefs_weekly <- extract_glm_coefs(mod_3_weekly, only_sig = T)write_csv(coef_table_weekly, "../outputs/data/mod_o-group_gender_weekly.csv")```**Annual data files**^[[outputs/data/o-group_gender_gap.csv.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/o-group_gender_gap.csv.csv) & [outputs/data/mod_o-group_gender.csv](outputs/data/mod_o-group_gender.csv)]:**Weekly**^[[outputs/data/o-group_gender_gap_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/o-group_gender_gap_weekly.csv) & [outputs/data/mod_o-group_gender_weekly.csv](outputs/data/mod_o-group_gender_weekly.csv)]: The gender by outsourcing group is also not relevant for whether a worker is low income (i.e. non-sig relationship with income_group).```{r}#| output: falsemod <-glm(income_group ~ Age + Has_Degree + Ethnicity_collapsed + Region + Gender*outsourcing_group + BORNUK_labelled, income_data, family="quasibinomial", weights = NatRepemployees)summary(mod)# test <- summary(mod)# # or <- exp(mod[["coefficients"]][["outsourcing_statusOutsourced"]])# p <- test[["coefficients"]][2,4]# # coef_table <- extract_lm_coefs(mod_3)# rownames(coef_table) <- coef_table$variable# sig_coefs <- extract_glm_coefs(mod, only_sig = T)write_csv(coef_table, "../outputs/data/mod_gender_outsourcing_income_group.csv")```::: {.callout-tip title="#gender-income-group"}- In particular, people are more likely to be in our low-paid outsourced group if they are female, or older workers .:::Income group[^21][^21]: [../outputs/data/income_group_outsourcing.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/income_group_outsourcing.csv)```{r}#| output: false# test significancemod <-glm(income_group ~ outsourcing_status, data, family="quasibinomial", weights = NatRepemployees)summary(mod)test <-summary(mod)or <-exp(mod[["coefficients"]][["outsourcing_statusOutsourced"]])p <- test[["coefficients"]][2,4]mod_2 <-glm(income_group ~ Age + Gender + Has_Degree + Ethnicity_collapsed + Region + outsourcing_status + BORNUK_labelled, income_data, family="quasibinomial", weights = NatRepemployees)summary(mod_2)# test <- summary(mod_2)or <-exp(mod_2[["coefficients"]][["outsourcing_statusOutsourced"]])p <- test[["coefficients"]][2,4]coef_table <-extract_glm_coefs(mod_2)rownames(coef_table) <- coef_table$variablesig_coefs <-extract_glm_coefs(mod_2, only_sig = T)write_csv(coef_table, file="../outputs/data/income_group_outsourcing.csv")```A person is more likely to be in the low income group if they are:- Older- Female- Don't have a degree (or don't know if they have a degree?)- Are outsourced- Arrived in the UK in the last yearAnd less likely if they are:- Younger- Male- Have a degree- Live in the North West or Wales (compared to London)- Arrived in the UK in last 30 years::: {.callout-tip title="#gender-by-pay-split"}Is there already a basic low / high pay split for gender? I know you talk about women being more likely to be in the low-paid group, but again not sure if there is just a basic “women make up x% of low pay group and x% of not low pay group”?:::```{r}#| message: falsegender_pay_split <- income_data %>%filter(!is.na(income_group)) %>%group_by(outsourcing_status, income_group, Gender) %>%summarise(freq =sum(NatRepemployees),n =n() ) %>%mutate(total =sum(freq),percentage =100* (freq / total),N =sum(n) )low_pay_perc <- gender_pay_split %>%filter(income_group =="Low"& outsourcing_status =="Outsourced"& Gender =="Female") %>%mutate(round(percentage,2) ) %>%pull()high_pay_perc <- gender_pay_split %>%filter(income_group =="Not low"& outsourcing_status =="Outsourced"& Gender =="Female") %>%mutate(round(percentage,2) ) %>%pull()mod <-glm(income_group ~ Gender, income_data, weights = NatRepemployees, family ="quasibinomial")# summary(mod)mod_2 <-glm(income_group ~ Gender * outsourcing_status, income_data, weights = NatRepemployees, family ="quasibinomial")# summary(mod_2)````r low_pay_perc`% of outsourced workers in the low pay group were female, compared to `r high_pay_perc`% of outsourced workers in the not low pay group. This difference is statistically significant; women are more likely to be in the low income group. This pattern is the same for non outsourced workers, and there is no interaction effect; irrespective of outsourcing status, women are more likely to be low paid, and irrespective of gender, outsourced people are more likely to be low paid.```{r}gender_pay_split %>%ggplot(aes(income_group, percentage, fill = Gender)) +facet_grid(rows =vars(outsourcing_status)) +geom_col(position="dodge") +theme_minimal()```::: {.callout-important title="#pay-gap-sector"}- Overall, we find that workers in administrative and support service activities – one of the dominant sectors for outsourced workers in this research – are more likely to be lower-paid than non-outsourced workers in the same sector. The same is true for outsourced water supply (full name; sewerage, waste etc.) workers – another prominent outsourcing sector – information and communication, transportation and storage, and education workers, amongst others. In contrast, we find outsourced workers in financial and insurance activities, for example, appear to be slightly higher paid on average than their non-outsourced counterparts; however, this is one of the few sectors in which this appears to be the case.**to be confirmed**I don’t quite understand the chart below the above chart in the file, would you be able to explain it – thanks! Is this the best chart to use, above? Does this need to control for anything else to show us the most accurate analysis of pay by sector for outsourced and non outsourced, or are we confident that this is showing us something notable about sector and pay?:::## Sectors/occupations### Sector and occupation hierarchyThe data from Opinium has four variables relating to sectors/occupations. These are- SectorName- Majorgroupcode- MajorsubgroupOccupation- UnitOccupationSOC 2020 has nine major groups, 26 sub-major groups, 104 minor groups and 412 unit groups. The variables we have appear to map in the following way:- Majorgroupcode = the 9 'major groups'- MajorsubgroupOccupation = the 26 'sub-major' groups- UnitOccupation = the 104 'minor groups'This last pairing is the point of confusion. The 'UnitOccupation' wording came from Opinium and these categories match the [coding index](https://www.ons.gov.uk/methodology/classificationsandstandards/standardoccupationalclassificationsoc/soc2020/soc2020volume2codingrulesandconventions) where they are confusingly referred to as 'unit groups' even though they are the minor groups.There is no variable in our data that relates to the most disaggregated category, the 412 'unit groups'.The unique values of each variable are shown in each section below.#### SectorName```{r}data %>%distinct(SectorName_labelled) %>%filter(SectorName_labelled !="NA") %>%kable() %>%kable_styling(full_width = F)```#### MajorgroupcodeThese are the 9 major groups according to SOC```{r}data %>%distinct(Majorgroupcode_labelled) %>%drop_na() %>%filter(Majorgroupcode_labelled !="NA") %>%kable() %>%kable_styling(full_width = F)```#### MajorsubgroupOccupationThese are the 26 'sub-major' groups```{r}data %>%distinct(MajorsubgroupOccupation_labelled) %>%drop_na() %>%filter(MajorsubgroupOccupation_labelled !="NA") %>%kable() %>%kable_styling(full_width = F)```#### UnitOccupationThese are indeed the 104 'minor groups'. ```{r}data %>%distinct(UnitOccupation_labelled) %>% drop_na %>%kable() %>%kable_styling(full_width = F)```### Sectoral pay differences#### Weekly^[[outputs/data/sector_summary_pay_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/sector_summary_pay_weekly.csv)]```{r sector-bubble-weekly}sector_summary_pay <- data %>% filter(income_drop_all == 0 & !is.na(income_weekly_all)) %>% group_by(SectorName, SectorName_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_weekly_all, na.rm=T), wtd_avg_income = weighted.mean(income_weekly_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(SectorName) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum), SectorName_labelled = case_when(SectorName_labelled == "NA" ~ NA, TRUE ~ SectorName_labelled), SectorName_short = SectorName_labelled ) %>% # make the sector names more readable separate_wider_delim(SectorName_short, names = c("SectorName_short", "SectorName_short_detail"), delim=";", too_few = "align_start") %>% mutate( SectorName_short = factor(stringr::str_to_sentence(SectorName_short)), SectorName_short_detail = factor(stringr::str_to_sentence(SectorName_short_detail)), )summary_weekly <- sector_summary_pay %>% group_by(SectorName_labelled) %>% mutate( min_n = min(n, na.rm=TRUE) ) %>% filter(min_n >= 10) %>% # need to identify the unit occs that have an ok n ungroup()write_csv(sector_summary_pay, file="../outputs/data/sector_summary_pay_weekly.csv")plot_data <- sector_summary_pay %>% drop_na(SectorName_short) %>% droplevels() %>% ungroup()# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>% filter(outsourcing_status == 'Not outsourced') %>% mutate(SectorName_short = forcats::fct_reorder(SectorName_short, N, .desc = FALSE))# outsourced <- plot_data %>%# filter(outsourcing_status == 'Outsourced') %>%# mutate(# rank = rank(desc(perc))# )# Apply the reordered levels back to the original dataplot_data <- plot_data %>% mutate( SectorName_short = factor(SectorName_short, levels = levels(not_outsourced_levels$SectorName_short)), ) %>% arrange(desc(SectorName_short))annotation_df <- plot_data %>% #filter(outsourcing_status == "Not outsourced") %>% dplyr::select(SectorName_short, n) %>% group_by(SectorName_short) %>% summarise( N = sum(n) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) plot_data %>% # mutate( # SectorName = as.factor(SectorName) # ) %>% ggplot(., aes(wtd_avg_income,SectorName_short, size = perc, colour = outsourcing_status)) + geom_point(position = "dodge") + theme_minimal() + theme(legend.position = "bottom", legend.title = element_blank())+ #coord_flip() + scale_x_continuous(breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 100)) + scale_colour_manual(values=colours) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=SectorName_short, label = paste0("N = ", N)), hjust=1) + geom_label_repel(inherit.aes = F, aes(wtd_avg_income, SectorName_short, colour = outsourcing_status, label=paste0("n=",n)), size=3) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average weekly income") + ylab("Sector") + labs(caption = "Size of bubble represents the size of the respective workforce within the sector")sectors_of_interest <- unique(plot_data$SectorName_labelled)sectors_of_interest <- sectors_of_interest[1:13] %>% stringr::str_to_title()```#### Hourly^[[outputs/data/sector_summary_pay_hourly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/sector_summary_pay_hourly.csv)]```{r sector-bubble-hourly}sector_summary_pay <- data %>% filter(income_drop_all == 0 & !is.na(income_hourly_all)) %>% group_by(SectorName, SectorName_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_hourly_all, na.rm=T), wtd_avg_income = weighted.mean(income_hourly_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(SectorName) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum), SectorName_labelled = case_when(SectorName_labelled == "NA" ~ NA, TRUE ~ SectorName_labelled), SectorName_short = SectorName_labelled ) %>% # make the sector names more readable separate_wider_delim(SectorName_short, names = c("SectorName_short", "SectorName_short_detail"), delim=";", too_few = "align_start") %>% mutate( SectorName_short = factor(stringr::str_to_sentence(SectorName_short)), SectorName_short_detail = factor(stringr::str_to_sentence(SectorName_short_detail)), )summary_hourly <- sector_summary_pay %>% group_by(SectorName_labelled) %>% mutate( min_n = min(n, na.rm=TRUE) ) %>% filter(min_n >= 10) %>% # need to identify the unit occs that have an ok n ungroup()write_csv(sector_summary_pay, file="../outputs/data/sector_summary_pay_hourly.csv")plot_data <- sector_summary_pay %>% drop_na(SectorName_short) %>% droplevels() %>% ungroup()# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>% filter(outsourcing_status == 'Not outsourced') %>% mutate(SectorName_short = forcats::fct_reorder(SectorName_short, N, .desc = FALSE))# outsourced <- plot_data %>%# filter(outsourcing_status == 'Outsourced') %>%# mutate(# rank = rank(desc(perc))# )# Apply the reordered levels back to the original dataplot_data <- plot_data %>% mutate( SectorName_short = factor(SectorName_short, levels = levels(not_outsourced_levels$SectorName_short)), ) %>% arrange(desc(SectorName_short))annotation_df <- plot_data %>% #filter(outsourcing_status == "Not outsourced") %>% dplyr::select(SectorName_short, n) %>% group_by(SectorName_short) %>% summarise( N = sum(n) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) plot_data %>% # mutate( # SectorName = as.factor(SectorName) # ) %>% ggplot(., aes(wtd_avg_income,SectorName_short, size = perc, colour = outsourcing_status)) + geom_point(position = "dodge") + theme_minimal() + theme(legend.position = "bottom", legend.title = element_blank())+ #coord_flip() + scale_x_continuous(breaks=scales::breaks_pretty(n=10)) + # seq(0,max(plot_data$wtd_avg_income, na.rm=T), 100)) + scale_colour_manual(values=colours) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=SectorName_short, label = paste0("N = ", N)), hjust=1) + geom_label_repel(inherit.aes = F, aes(wtd_avg_income, SectorName_short, colour = outsourcing_status, label=paste0("n=",n)), size=3) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average hourly income") + ylab("Sector") + labs(caption = "Size of bubble represents the size of the respective workforce within the sector")sectors_of_interest <- unique(plot_data$SectorName_labelled)sectors_of_interest <- sectors_of_interest[1:13] %>% stringr::str_to_title()```#### Comparing pay penalty between weekly and hourlyNote only consider n >= 10```{r}# add pay frame flagssummary_weekly <- summary_weekly %>%mutate(pay_frame ="weekly" )summary_hourly <- summary_hourly %>%mutate(pay_frame ="hourly" )# combinesummary_combined <- dplyr::bind_rows(summary_weekly,summary_hourly)summary_combined2 <- summary_combined %>%filter(!is.na(SectorName_labelled)) %>%pivot_wider(id_cols =c(SectorName_labelled, pay_frame), names_from = outsourcing_status, values_from =c(wtd_avg_income, n)) %>% janitor::clean_names() %>%mutate(pay_diff = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced )summary_combined3 <- summary_combined2 %>%pivot_wider(id_cols = sector_name_labelled, names_from = pay_frame, values_from = pay_diff, names_glue ="{pay_frame}_pay_diff") %>%mutate(pattern_reverse =ifelse((weekly_pay_diff <0& hourly_pay_diff >=0 ) | (weekly_pay_diff >=0& hourly_pay_diff <0), 1, 0 ) ) ```The table below shows the pay difference between outsourced and non-outsourced workers by sector. Negative values indicate pay penalties for outsourced workers. The 'pattern_reverse' column indicates the `r sum(summary_combined3$pattern_reverse, na.rm=T)` sectors where the direction of the difference is different if you consider hourly versus weekly pay difference. For example, per week, outsourced workers in PROFESSIONAL, SCIENTIFIC AND TECHNICAL ACTIVITIES earn £1.77 less than non-outsourced counterparts, but per hour they are paid on average 1.3y more than non-outsourced workers. This suggests that outsourced rates are higher in this occupation, but the amount of work available is not enough for outsourced people to earn more than non-outsourced people on a weekly basis. The reverse pattern indicates sectors where outsourced workers are paid less per hour but work more hours and earn more per week than their non-outsourced counterparts.```{r}summary_combined3 %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption ="Weekly and hourly pay difference by sector") %>%kable_styling(full_width = F)```### Major group occupations#### Weekly^[[outputs/data/major_subgroup_occupation_in_sector_summary_pay_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/major_subgroup_occupation_in_sector_summary_pay_weekly.csv)]Here we look at Major subgroup occupations within sectors. We only consider the down to 'Other services', as the remaining sectors have small n for outsourced group. Note you can find larger images for these plots in [outputs/figures/occupation_pay_plots](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/figures/occupation_pay_plots).The figures indicate there is variation between occupations within sectors in terms of whether outsourced people are paid less or more than non-outsourced workers.```{r}#| height: 10#| width: 10occ_in_sect_summary_pay <- data %>%filter(income_drop_all ==0&!is.na(income_weekly_all)) %>%group_by(SectorName, SectorName_labelled, MajorsubgroupOccupation_labelled, outsourcing_status) %>%summarise(n =n(),Frequency =sum(NatRepemployees),avg_income =mean(income_weekly_all, na.rm=T),wtd_avg_income =weighted.mean(income_weekly_all, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(SectorName_labelled, MajorsubgroupOccupation_labelled) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum),MajorsubgroupOccupation_labelled =case_when(MajorsubgroupOccupation_labelled =="NA"~NA,TRUE~ MajorsubgroupOccupation_labelled),MajorsubgroupOccupation_labelled = stringr::str_to_title(MajorsubgroupOccupation_labelled),SectorName_labelled = stringr::str_to_title(SectorName_labelled) ) summary_weekly <- occ_in_sect_summary_pay %>%group_by(SectorName_labelled,MajorsubgroupOccupation_labelled) %>%mutate(min_n =min(n, na.rm=TRUE) ) %>%filter(min_n >=10) %>%# need to identify the unit occs that have an ok nungroup() %>%mutate(pay_frame ="weekly" )write_csv(occ_in_sect_summary_pay, file="../outputs/data/major_subgroup_occupation_in_sector_summary_pay_weekly.csv")for(sector in sectors_of_interest){#print(sector)# subset to this sector and drop na occupatoins plot_data <- occ_in_sect_summary_pay %>%filter(SectorName_labelled == sector) %>%filter(!is.na(MajorsubgroupOccupation_labelled)) %>%droplevels() %>%ungroup()# Order occs by N# First filter for 'outsourced' level and reorder by N not_outsourced_levels <- plot_data %>% dplyr::select(MajorsubgroupOccupation_labelled, outsourcing_status, N) %>%distinct(MajorsubgroupOccupation_labelled, N) %>%mutate(MajorsubgroupOccupation_labelled = forcats::fct_reorder(MajorsubgroupOccupation_labelled, N, .desc =FALSE))# not_outsourced_levels <- plot_data %>%# filter(outsourcing_status == 'Not outsourced') %>%# mutate(MajorsubgroupOccupation_labelled = forcats::fct_reorder(MajorsubgroupOccupation_labelled, N, .desc = FALSE))# Then apply the reordered levels back to the original data plot_data <- plot_data %>%mutate(MajorsubgroupOccupation_labelled =factor(MajorsubgroupOccupation_labelled, levels =levels(not_outsourced_levels$MajorsubgroupOccupation_labelled)), ) annotation_df <- plot_data %>%#filter(outsourcing_status == "Not outsourced") %>% dplyr::select(MajorsubgroupOccupation_labelled, n) %>%group_by(MajorsubgroupOccupation_labelled) %>%summarise(N =sum(n) ) %>%mutate(ypos =max(plot_data$wtd_avg_income, na.rm=T) *1.2 ) p <- plot_data %>%ggplot(., aes(wtd_avg_income, MajorsubgroupOccupation_labelled, size = perc, colour = outsourcing_status)) +geom_point(position ="dodge") +geom_label_repel(inherit.aes = F, aes(wtd_avg_income, MajorsubgroupOccupation_labelled, colour = outsourcing_status, label=paste0("n=",n)), size=3, #force_pull = 2 ) +theme_minimal() +theme(legend.position ="bottom",legend.title =element_blank()) +#coord_flip() +scale_x_continuous(breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 200)) +scale_colour_manual(values=colours) +geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=MajorsubgroupOccupation_labelled, label =paste0("N = ", N)), hjust=1) +guides(size=FALSE) +# remove size legend as gauging size is difficult xlab("Weighted average weekly income") +ylab("Major subgroup occupation") +labs(caption ="Size of bubble represents the size of the respective workforce within the occupation") +ggtitle(sector)show(p)ggsave(here('outputs','figures','occupation_pay_plots',paste0('major_subgroup_occupation_pay_plot_weekly_', sector, '.png')), height =8, width =8, dpi=800, bg="white")}```#### Hourly^[[outputs/data/major_subgroup_occupation_in_sector_summary_pay_hourly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/major_subgroup_occupation_in_sector_summary_pay_hourly.csv)]```{r}#| height: 10#| width: 10occ_in_sect_summary_pay <- data %>%filter(income_drop_all ==0&!is.na(income_hourly_all)) %>%group_by(SectorName, SectorName_labelled, MajorsubgroupOccupation_labelled, outsourcing_status) %>%summarise(n =n(),Frequency =sum(NatRepemployees),avg_income =mean(income_hourly_all, na.rm=T),wtd_avg_income =weighted.mean(income_hourly_all, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(SectorName_labelled, MajorsubgroupOccupation_labelled) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum),MajorsubgroupOccupation_labelled =case_when(MajorsubgroupOccupation_labelled =="NA"~NA,TRUE~ MajorsubgroupOccupation_labelled),MajorsubgroupOccupation_labelled = stringr::str_to_title(MajorsubgroupOccupation_labelled),SectorName_labelled = stringr::str_to_title(SectorName_labelled) ) summary_hourly <- occ_in_sect_summary_pay %>%group_by(SectorName_labelled,MajorsubgroupOccupation_labelled) %>%mutate(min_n =min(n, na.rm=TRUE) ) %>%filter(min_n >=10) %>%# need to identify the unit occs that have an ok nungroup() %>%mutate(pay_frame ="hourly" )write_csv(occ_in_sect_summary_pay, file="../outputs/data/major_subgroup_occupation_in_sector_summary_pay_hourly.csv")for(sector in sectors_of_interest){#print(sector)# subset to this sector and drop na occupatoins plot_data <- occ_in_sect_summary_pay %>%filter(SectorName_labelled == sector) %>%filter(!is.na(MajorsubgroupOccupation_labelled)) %>%droplevels() %>%ungroup()# Order occs by N# First filter for 'outsourced' level and reorder by N not_outsourced_levels <- plot_data %>% dplyr::select(MajorsubgroupOccupation_labelled, outsourcing_status, N) %>%distinct(MajorsubgroupOccupation_labelled, N) %>%mutate(MajorsubgroupOccupation_labelled = forcats::fct_reorder(MajorsubgroupOccupation_labelled, N, .desc =FALSE))# not_outsourced_levels <- plot_data %>%# filter(outsourcing_status == 'Not outsourced') %>%# mutate(MajorsubgroupOccupation_labelled = forcats::fct_reorder(MajorsubgroupOccupation_labelled, N, .desc = FALSE))# Then apply the reordered levels back to the original data plot_data <- plot_data %>%mutate(MajorsubgroupOccupation_labelled =factor(MajorsubgroupOccupation_labelled, levels =levels(not_outsourced_levels$MajorsubgroupOccupation_labelled)), ) annotation_df <- plot_data %>%#filter(outsourcing_status == "Not outsourced") %>% dplyr::select(MajorsubgroupOccupation_labelled, n) %>%group_by(MajorsubgroupOccupation_labelled) %>%summarise(N =sum(n) ) %>%mutate(ypos =max(plot_data$wtd_avg_income, na.rm=T) *1.2 ) p <- plot_data %>%ggplot(., aes(wtd_avg_income, MajorsubgroupOccupation_labelled, size = perc, colour = outsourcing_status)) +geom_point(position ="dodge") +geom_label_repel(inherit.aes = F, aes(wtd_avg_income, MajorsubgroupOccupation_labelled, colour = outsourcing_status, label=paste0("n=",n)), size=3, #force_pull = 2 ) +theme_minimal() +theme(legend.position ="bottom",legend.title =element_blank()) +#coord_flip() +scale_x_continuous(breaks=scales::breaks_pretty(n=10)) +scale_colour_manual(values=colours) +geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=MajorsubgroupOccupation_labelled, label =paste0("N = ", N)), hjust=1) +guides(size=FALSE) +# remove size legend as gauging size is difficult xlab("Weighted average hourly income") +ylab("Major subgroup occupation") +labs(caption ="Size of bubble represents the size of the respective workforce within the occupation") +ggtitle(sector)show(p)ggsave(here('outputs','figures','occupation_pay_plots',paste0('major_subgroup_occupation_pay_plot_hourly_', sector, '.png')), height =8, width =8, dpi=800, bg="white")}```#### Comparing pay penalty between weekly and hourlyNote only consider n >= 10The table below shows the weekly and hourly pay difference between outsourced and non-outsourced workers by major group occupation. As before, negative values indicate pay penalties for outsourced workers, and the 'pattern_reverse' column indicates the occupations where the direction of the difference is different if you consider hourly versus weekly pay difference. ```{r}# combinesummary_combined <- dplyr::bind_rows(summary_weekly,summary_hourly)# Function for processing occupations within sectors. Makes a kable table for occupations in each sector (as long as there's both outsourced and non-outsourced entries for the occupation)comparison_table <-function(df, within_sectors =TRUE){if(within_sectors){ caption_text ="within" }else{ caption_text ="across" } sectors <-unique(df[["SectorName_labelled"]]) sectors <- sectors[!is.na(sectors)] output_list <-vector('list', length(sectors))for(i in1:length(sectors)){ sector <- sectors[i] this_data <- df %>%filter(SectorName_labelled == sector) %>%filter(!is.na(MajorsubgroupOccupation_labelled)) %>%droplevels() %>%ungroup()if(length(unique(this_data$outsourcing_status)) ==2){ this_data <- this_data %>%pivot_wider(id_cols =c(MajorsubgroupOccupation_labelled, pay_frame), names_from = outsourcing_status, values_from =c(wtd_avg_income, n)) %>% janitor::clean_names() %>%mutate(pay_diff = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced ) %>%pivot_wider(id_cols = majorsubgroup_occupation_labelled, names_from = pay_frame, values_from = pay_diff, names_glue ="{pay_frame}_pay_diff") %>%mutate(pattern_reverse =ifelse((weekly_pay_diff <0& hourly_pay_diff >=0 ) | (weekly_pay_diff >=0& hourly_pay_diff <0), 1, 0 ) )# output_list[[i]] <- this_data }else{ output_list[[i]] <-NAnext }# k <- this_data %>%# filter(!is.na(weekly_pay_diff)) %>%# arrange(weekly_pay_diff) %>%# kable(caption = paste("Weekly and hourly pay difference by major group occupations", caption_text, sector, sep = " ")) %>%# kable_styling(full_width = F)# output_list[[i]] <- k output_list[[i]] <- this_datanames(output_list)[i] <- sector }return(output_list)}tables <-comparison_table(summary_combined, within_sectors = F)# Print the kable tables# for(i in 1:length(tables)){# if(!is.na(tables[i])){# tables[[i]] %>%# filter(!is.na(weekly_pay_diff)) %>%# arrange(weekly_pay_diff) %>%# kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>%# kable_styling(full_width = F) %>%# print()# }# }``````{r}#| warning: false#| error: false# there's got to be better way but can't find it# this prints the kable tables for the sectorsi <-1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try( tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))```### Major group occupations across all sectorsNote I only consider unit occupations where the the minimum n is >= 10.#### Weekly^[[outputs/data/major_subgroup_across_sectors_occupation_summary_pay_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/major_subgroup_across_sectors_occupation_summary_pay_weekly.csv)]```{r}#| height: 20#| width: 10major_occ_summary_pay <- data %>%filter(income_drop_all ==0&!is.na(income_weekly_all)) %>%group_by(MajorsubgroupOccupation_labelled, outsourcing_status) %>%summarise(n =n(),Frequency =sum(NatRepemployees),avg_income =mean(income_weekly_all, na.rm=T),wtd_avg_income =weighted.mean(income_weekly_all, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(MajorsubgroupOccupation_labelled) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum),MajorsubgroupOccupation_labelled =case_when(MajorsubgroupOccupation_labelled =="NA"~NA,TRUE~ MajorsubgroupOccupation_labelled),MajorsubgroupOccupation_labelled = stringr::str_to_title(MajorsubgroupOccupation_labelled) ) %>%ungroup()write_csv(major_occ_summary_pay, file="../outputs/data/major_subgroup_across_sectors_occupation_summary_pay_weekly.csv")# need to identify the unit occs that have an ok n# subste to occs with n>=10unit_subset_weekly <- major_occ_summary_pay %>%group_by(MajorsubgroupOccupation_labelled) %>%mutate(min_n =min(n, na.rm=TRUE) ) %>%filter(min_n >=10)unit_subset <- unit_subset_weekly# create a df with occs where outsourced paid less so we can just list itpaid_less <- unit_subset %>%pivot_wider(id_cols =c(MajorsubgroupOccupation_labelled), names_from = outsourcing_status, values_from =c(wtd_avg_income, n)) %>% janitor::clean_names() %>%mutate(pay_penalty = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced ) %>%filter( pay_penalty <0 )write_csv(paid_less, file="../outputs/data/major_subgroup_occupation_weekly_pay_penalty_across_sectors.csv")#print(sector)# subset to this sector and drop na occupatoinsplot_data <- unit_subset %>%filter(!is.na(MajorsubgroupOccupation_labelled)) %>%droplevels() %>%ungroup()# Order occs by N# First filter for 'outsourced' level and reorder by Nnot_outsourced_levels <- plot_data %>% dplyr::select(MajorsubgroupOccupation_labelled, outsourcing_status, N) %>%distinct(MajorsubgroupOccupation_labelled, N) %>%mutate(MajorsubgroupOccupation_labelled = forcats::fct_reorder(MajorsubgroupOccupation_labelled, N, .desc =FALSE))# not_outsourced_levels <- plot_data %>%# filter(outsourcing_status == 'Not outsourced') %>%# mutate(MajorsubgroupOccupation_labelled = forcats::fct_reorder(MajorsubgroupOccupation_labelled, N, .desc = FALSE))# Then apply the reordered levels back to the original dataplot_data <- plot_data %>%mutate(MajorsubgroupOccupation_labelled =factor(MajorsubgroupOccupation_labelled, levels =levels(not_outsourced_levels$MajorsubgroupOccupation_labelled)), )annotation_df <- plot_data %>%#filter(outsourcing_status == "Not outsourced") %>% dplyr::select(MajorsubgroupOccupation_labelled, n) %>%group_by(MajorsubgroupOccupation_labelled) %>%summarise(N =sum(n) ) %>%mutate(ypos =max(plot_data$wtd_avg_income, na.rm=T) *1.2 )p <- plot_data %>%ggplot(., aes(wtd_avg_income, MajorsubgroupOccupation_labelled, size = perc, colour = outsourcing_status)) +geom_point(position ="dodge") +geom_label_repel(inherit.aes = F, aes(wtd_avg_income, MajorsubgroupOccupation_labelled, colour = outsourcing_status, label=paste0("n=",n)), size=3, #force_pull = 2 ) +theme_minimal() +theme(legend.position ="bottom",legend.title =element_blank()) +#coord_flip() +scale_x_continuous(breaks=scales::breaks_pretty(n=5)) +#breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 200)) +scale_colour_manual(values=colours) +geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=MajorsubgroupOccupation_labelled, label =paste0("N = ", N)), hjust=1) +guides(size=FALSE) +# remove size legend as gauging size is difficult xlab("Weighted average weekly income") +ylab("Major sub group occupation") +labs(caption ="Size of bubble represents the size of the respective workforce within the occupation") +ggtitle("All sectors")show(p)ggsave(here('outputs','figures','occupation_pay_plots','major_subgroup_occupation_all_sectors_pay_plot.png'), height =8, width =8, dpi=800, bg="white")```Looking at occupations across all sectors, there are many occupations where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:^[[outputs/data/major_subgroup_occupation_weekly_pay_penalty_across_sectors.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/major_subgroup_occupation_weekly_pay_penalty_across_sectors.csv)]```{r}paid_less %>%arrange(pay_penalty) %>%relocate( pay_penalty, .after = majorsubgroup_occupation_labelled ) %>%kable(caption ="Weekly pay penalty for major subgroup occupations across all sectors") %>%kable_styling()```#### Hourly^[[outputs/data/major_subgroup_occupation_summary_pay_hourly_across_sectors.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/major_subgroup_occupation_summary_pay_hourly_across_sectors.csv)]```{r}#| height: 10#| width: 10major_occ_summary_pay <- data %>%filter(income_drop_all ==0&!is.na(income_hourly_all)) %>%group_by(MajorsubgroupOccupation_labelled, outsourcing_status) %>%summarise(n =n(),Frequency =sum(NatRepemployees),avg_income =mean(income_hourly_all, na.rm=T),wtd_avg_income =weighted.mean(income_hourly_all, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(MajorsubgroupOccupation_labelled) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum),MajorsubgroupOccupation_labelled =case_when(MajorsubgroupOccupation_labelled =="NA"~NA,TRUE~ MajorsubgroupOccupation_labelled),MajorsubgroupOccupation_labelled = stringr::str_to_title(MajorsubgroupOccupation_labelled) ) %>%ungroup()write_csv(major_occ_summary_pay, file="../outputs/data/major_subgroup_occupation_summary_pay_hourly_across_sectors.csv")# need to identify the unit occs that have an ok n# subste to occs with n>=10unit_subset_hourly <- major_occ_summary_pay %>%group_by(MajorsubgroupOccupation_labelled) %>%mutate(min_n =min(n, na.rm=TRUE) ) %>%filter(min_n >=10)unit_subset <- unit_subset_hourly# create a df with occs where outsourced paid less so we can just list itpaid_less <- unit_subset %>%pivot_wider(id_cols =c(MajorsubgroupOccupation_labelled), names_from = outsourcing_status, values_from =c(wtd_avg_income, n)) %>% janitor::clean_names() %>%mutate(pay_penalty = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced ) %>%filter( pay_penalty <0 )write_csv(paid_less, file="../outputs/data/major_subgroup_occupation_hourly_pay_penalty_across_sectors.csv")#print(sector)# subset to this sector and drop na occupatoinsplot_data <- unit_subset %>%filter(!is.na(MajorsubgroupOccupation_labelled)) %>%droplevels() %>%ungroup()# Order occs by N# First filter for 'outsourced' level and reorder by Nnot_outsourced_levels <- plot_data %>% dplyr::select(MajorsubgroupOccupation_labelled, outsourcing_status, N) %>%distinct(MajorsubgroupOccupation_labelled, N) %>%mutate(MajorsubgroupOccupation_labelled = forcats::fct_reorder(MajorsubgroupOccupation_labelled, N, .desc =FALSE))# not_outsourced_levels <- plot_data %>%# filter(outsourcing_status == 'Not outsourced') %>%# mutate(MajorsubgroupOccupation_labelled = forcats::fct_reorder(MajorsubgroupOccupation_labelled, N, .desc = FALSE))# Then apply the reordered levels back to the original dataplot_data <- plot_data %>%mutate(MajorsubgroupOccupation_labelled =factor(MajorsubgroupOccupation_labelled, levels =levels(not_outsourced_levels$MajorsubgroupOccupation_labelled)), )annotation_df <- plot_data %>%#filter(outsourcing_status == "Not outsourced") %>% dplyr::select(MajorsubgroupOccupation_labelled, n) %>%group_by(MajorsubgroupOccupation_labelled) %>%summarise(N =sum(n) ) %>%mutate(ypos =max(plot_data$wtd_avg_income, na.rm=T) *1.2 )p <- plot_data %>%ggplot(., aes(wtd_avg_income, MajorsubgroupOccupation_labelled, size = perc, colour = outsourcing_status)) +geom_point(position ="dodge") +geom_label_repel(inherit.aes = F, aes(wtd_avg_income, MajorsubgroupOccupation_labelled, colour = outsourcing_status, label=paste0("n=",n)), size=3, #force_pull = 2 ) +theme_minimal() +theme(legend.position ="bottom",legend.title =element_blank()) +#coord_flip() +scale_x_continuous(breaks=scales::breaks_pretty(n=5)) +#breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 200)) +scale_colour_manual(values=colours) +geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=MajorsubgroupOccupation_labelled, label =paste0("N = ", N)), hjust=1) +guides(size=FALSE) +# remove size legend as gauging size is difficult xlab("Weighted average hourly income") +ylab("Major subgroup occupation") +labs(caption ="Size of bubble represents the size of the respective workforce within the occupation") +ggtitle("All sectors")show(p)ggsave(here('outputs','figures','occupation_pay_plots','major_occupation_pay_plot_hourly_all_sectors.png'), height =8, width =8, dpi=800, bg="white")```Looking at occupations across all sectors, there are many occupations where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:^[[outputs/data/major_subgroup_occupation_hourly_pay_penalty_across_sectors.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/major_subgroup_occupation_hourly_pay_penalty_across_sectors.csv)]```{r}paid_less %>%arrange(pay_penalty) %>%relocate( pay_penalty, .after = majorsubgroup_occupation_labelled ) %>%kable(caption ="Hourly pay penalty for major subgroup occupations across all sectors") %>%kable_styling()```#### Comparing pay penalty between weekly and hourlyNote only consider n >= 10```{r}# add pay frame flagsunit_subset_weekly <- unit_subset_weekly %>%mutate(pay_frame ="weekly" )unit_subset_hourly <- unit_subset_hourly %>%mutate(pay_frame ="hourly" )# combineunit_subset_combined <- dplyr::bind_rows(unit_subset_weekly,unit_subset_hourly)unit_subset_combined2 <- unit_subset_combined %>%pivot_wider(id_cols =c(MajorsubgroupOccupation_labelled, pay_frame), names_from = outsourcing_status, values_from =c(wtd_avg_income, n)) %>% janitor::clean_names() %>%mutate(pay_diff = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced )unit_subset_combined3 <- unit_subset_combined2 %>%pivot_wider(id_cols = majorsubgroup_occupation_labelled, names_from = pay_frame, values_from = pay_diff, names_glue ="{pay_frame}_pay_diff") # pivot again to compare hourly and weekly pay differencesunit_subset_combined4 <- unit_subset_combined3 %>%mutate(pattern_reverse =ifelse((weekly_pay_diff <0& hourly_pay_diff >=0 ) | (weekly_pay_diff >=0& hourly_pay_diff <0), 1, 0 ) ) ```The table below shows the weekly and hourly pay difference between outsourced and non-outsourced workers by major group occupation. As before, negative values indicate pay penalties for outsourced workers, and the 'pattern_reverse' column indicates the occupations where the direction of the difference is different if you consider hourly versus weekly pay difference. ```{r}unit_subset_combined4 %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption ="Weekly and hourly pay difference by major sub group occupation") %>%kable_styling(full_width = F)```### Minor group occupations within sectors#### Weekly^[[outputs/data/minor_group_occupation_in_sector_summary_pay_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/minor_group_occupation_in_sector_summary_pay_weekly.csv)]Note I only consider unit occupations where the the minimum n is >= 10.```{r}#| height: 10#| width: 10unit_occ_in_sect_summary_pay <- data %>%filter(income_drop_all ==0&!is.na(income_weekly_all)) %>%group_by(SectorName, SectorName_labelled, UnitOccupation_labelled, outsourcing_status) %>%summarise(n =n(),Frequency =sum(NatRepemployees),avg_income =mean(income_weekly_all, na.rm=T),wtd_avg_income =weighted.mean(income_weekly_all, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(SectorName_labelled, UnitOccupation_labelled) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum),UnitOccupation_labelled =case_when(UnitOccupation_labelled =="NA"~NA,TRUE~ UnitOccupation_labelled),UnitOccupation_labelled = stringr::str_to_title(UnitOccupation_labelled),SectorName_labelled = stringr::str_to_title(SectorName_labelled) ) %>%ungroup()summary_weekly <- unit_occ_in_sect_summary_pay %>%group_by(SectorName_labelled,UnitOccupation_labelled) %>%mutate(min_n =min(n, na.rm=TRUE) ) %>%filter(min_n >=10) %>%# need to identify the unit occs that have an ok nungroup() %>%mutate(pay_frame ="weekly" )write_csv(unit_occ_in_sect_summary_pay, file="../outputs/data/minor_group_occupation_in_sector_summary_pay_weekly.csv")unit_subset <- summary_weekly# create a df with occs where outsourced paid less so we can just list itpaid_less <- unit_subset %>%pivot_wider(id_cols =c(SectorName_labelled, UnitOccupation_labelled), names_from = outsourcing_status, values_from =c(wtd_avg_income, n)) %>% janitor::clean_names() %>%mutate(pay_penalty = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced ) %>%filter( pay_penalty <0 )write_csv(paid_less, file="../outputs/data/minor_group_occupation_in_sector_weekly_pay_penalty.csv")for(sector in sectors_of_interest){#print(sector)# subset to this sector and drop na occupatoins plot_data <- unit_subset %>%filter(SectorName_labelled == sector) %>%filter(!is.na(UnitOccupation_labelled)) %>%droplevels() %>%ungroup()# Order occs by N# First filter for 'outsourced' level and reorder by N not_outsourced_levels <- plot_data %>% dplyr::select(UnitOccupation_labelled, outsourcing_status, N) %>%distinct(UnitOccupation_labelled, N) %>%mutate(UnitOccupation_labelled = forcats::fct_reorder(UnitOccupation_labelled, N, .desc =FALSE))# not_outsourced_levels <- plot_data %>%# filter(outsourcing_status == 'Not outsourced') %>%# mutate(UnitOccupation_labelled = forcats::fct_reorder(UnitOccupation_labelled, N, .desc = FALSE))# Then apply the reordered levels back to the original data plot_data <- plot_data %>%mutate(UnitOccupation_labelled =factor(UnitOccupation_labelled, levels =levels(not_outsourced_levels$UnitOccupation_labelled)), ) annotation_df <- plot_data %>%#filter(outsourcing_status == "Not outsourced") %>% dplyr::select(UnitOccupation_labelled, n) %>%group_by(UnitOccupation_labelled) %>%summarise(N =sum(n) ) %>%mutate(ypos =max(plot_data$wtd_avg_income, na.rm=T) *1.2 ) p <- plot_data %>%ggplot(., aes(wtd_avg_income, UnitOccupation_labelled, size = perc, colour = outsourcing_status)) +geom_point(position ="dodge") +geom_label_repel(inherit.aes = F, aes(wtd_avg_income, UnitOccupation_labelled, colour = outsourcing_status, label=paste0("n=",n)), size=3, #force_pull = 2 ) +theme_minimal() +theme(legend.position ="bottom",legend.title =element_blank()) +#coord_flip() +scale_x_continuous(breaks=scales::breaks_pretty(n=5)) +#breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 200)) +scale_colour_manual(values=colours) +geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=UnitOccupation_labelled, label =paste0("N = ", N)), hjust=1) +guides(size=FALSE) +# remove size legend as gauging size is difficult xlab("Weighted average weekly income") +ylab("Unit occupation") +labs(caption ="Size of bubble represents the size of the respective workforce within the occupation") +ggtitle(sector)show(p)ggsave(here('outputs','figures','occupation_pay_plots',paste0('unit_occupation_pay_plot_weekly_', sector, '.png')), height =8, width =8, dpi=800, bg="white")}```Many instances where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:^[[outputs/data/minor_group_occupation_in_sector_weekly_pay_penalty.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/minor_group_occupation_in_sector_weekly_pay_penalty.csv)]```{r}paid_less %>%arrange(pay_penalty) %>%relocate( pay_penalty, .after = unit_occupation_labelled ) %>%kable(caption ="Weekly pay penalty for unit occupations within sectors") %>%kable_styling()```#### Hourly^[[outputs/data/minor_group_occupation_in_sector_summary_pay_hourly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/minor_group_occupation_in_sector_summary_pay_hourly.csv)]Note I only consider unit occupations where the the minimum n is >= 10.```{r}#| height: 10#| width: 10unit_occ_in_sect_summary_pay <- data %>%filter(income_drop_all ==0&!is.na(income_hourly_all)) %>%group_by(SectorName, SectorName_labelled, UnitOccupation_labelled, outsourcing_status) %>%summarise(n =n(),Frequency =sum(NatRepemployees),avg_income =mean(income_hourly_all, na.rm=T),wtd_avg_income =weighted.mean(income_hourly_all, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(SectorName_labelled, UnitOccupation_labelled) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum),UnitOccupation_labelled =case_when(UnitOccupation_labelled =="NA"~NA,TRUE~ UnitOccupation_labelled),UnitOccupation_labelled = stringr::str_to_title(UnitOccupation_labelled),SectorName_labelled = stringr::str_to_title(SectorName_labelled) ) %>%ungroup()summary_hourly <- unit_occ_in_sect_summary_pay %>%group_by(SectorName_labelled,UnitOccupation_labelled) %>%mutate(min_n =min(n, na.rm=TRUE) ) %>%filter(min_n >=10) %>%ungroup() %>%mutate(pay_frame ="hourly" )write_csv(unit_occ_in_sect_summary_pay, file="../outputs/data/minor_group_occupation_in_sector_summary_pay_hourly.csv")# need to identify the unit occs that have an ok nunit_subset <- summary_hourly # create a df with occs where outsourced paid less so we can just list itpaid_less <- unit_subset %>%pivot_wider(id_cols =c(SectorName_labelled, UnitOccupation_labelled), names_from = outsourcing_status, values_from =c(wtd_avg_income, n)) %>% janitor::clean_names() %>%mutate(pay_penalty = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced ) %>%filter( pay_penalty <0 )write_csv(paid_less, file="../outputs/data/minor_group_occupation_in_sector_hourly_pay_penalty.csv")for(sector in sectors_of_interest){#print(sector)# subset to this sector and drop na occupatoins plot_data <- unit_subset %>%filter(SectorName_labelled == sector) %>%filter(!is.na(UnitOccupation_labelled)) %>%droplevels() %>%ungroup()# Order occs by N# First filter for 'outsourced' level and reorder by N not_outsourced_levels <- plot_data %>% dplyr::select(UnitOccupation_labelled, outsourcing_status, N) %>%distinct(UnitOccupation_labelled, N) %>%mutate(UnitOccupation_labelled = forcats::fct_reorder(UnitOccupation_labelled, N, .desc =FALSE))# not_outsourced_levels <- plot_data %>%# filter(outsourcing_status == 'Not outsourced') %>%# mutate(UnitOccupation_labelled = forcats::fct_reorder(UnitOccupation_labelled, N, .desc = FALSE))# Then apply the reordered levels back to the original data plot_data <- plot_data %>%mutate(UnitOccupation_labelled =factor(UnitOccupation_labelled, levels =levels(not_outsourced_levels$UnitOccupation_labelled)), ) annotation_df <- plot_data %>%#filter(outsourcing_status == "Not outsourced") %>% dplyr::select(UnitOccupation_labelled, n) %>%group_by(UnitOccupation_labelled) %>%summarise(N =sum(n) ) %>%mutate(ypos =max(plot_data$wtd_avg_income, na.rm=T) *1.2 ) p <- plot_data %>%ggplot(., aes(wtd_avg_income, UnitOccupation_labelled, size = perc, colour = outsourcing_status)) +geom_point(position ="dodge") +geom_label_repel(inherit.aes = F, aes(wtd_avg_income, UnitOccupation_labelled, colour = outsourcing_status, label=paste0("n=",n)), size=3, #force_pull = 2 ) +theme_minimal() +theme(legend.position ="bottom",legend.title =element_blank()) +#coord_flip() +scale_x_continuous(breaks=scales::breaks_pretty(n=5)) +#breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 200)) +scale_colour_manual(values=colours) +geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=UnitOccupation_labelled, label =paste0("N = ", N)), hjust=1) +guides(size=FALSE) +# remove size legend as gauging size is difficult xlab("Weighted average hourly income") +ylab("Unit occupation") +labs(caption ="Size of bubble represents the size of the respective workforce within the occupation") +ggtitle(sector)show(p)ggsave(here('outputs','figures','occupation_pay_plots',paste0('unit_occupation_pay_plot_hourly_', sector, '.png')), height =8, width =8, dpi=800, bg="white")}```Many instances where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:^[[outputs/data/minor_group_occupation_in_sector_hourly_pay_penalty.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/minor_group_occupation_in_sector_hourly_pay_penalty.csv)]```{r}paid_less %>%arrange(pay_penalty) %>%relocate( pay_penalty, .after = unit_occupation_labelled ) %>%kable(caption ="Hourly pay penalty for unit occupations within sectors") %>%kable_styling()```#### Comparing pay penalty between weekly and hourlyNote only consider n >= 10```{r}# combinesummary_combined <- dplyr::bind_rows(summary_weekly,summary_hourly)# Function for processing minor group occupations within sectors. Makes a kable table for occupations in each sector (as long as there's both outsourced and non-outsourced entries for the occupation)comparison_table2 <-function(df, within_sectors =TRUE){if(within_sectors){ caption_text ="within" }else{ caption_text ="across" } sectors <-unique(df[["SectorName_labelled"]]) sectors <- sectors[!is.na(sectors)] output_list <-vector('list', length(sectors))for(i in1:length(sectors)){ sector <- sectors[i] this_data <- df %>%filter(SectorName_labelled == sector) %>%filter(!is.na(UnitOccupation_labelled)) %>%droplevels() %>%ungroup()if(length(unique(this_data$outsourcing_status)) ==2){ this_data <- this_data %>%pivot_wider(id_cols =c(UnitOccupation_labelled, pay_frame), names_from = outsourcing_status, values_from =c(wtd_avg_income, n)) %>% janitor::clean_names() %>%mutate(pay_diff = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced ) %>%pivot_wider(id_cols = unit_occupation_labelled, names_from = pay_frame, values_from = pay_diff, names_glue ="{pay_frame}_pay_diff") %>%mutate(pattern_reverse =ifelse((weekly_pay_diff <0& hourly_pay_diff >=0 ) | (weekly_pay_diff >=0& hourly_pay_diff <0), 1, 0 ) )# output_list[[i]] <- this_data }else{ output_list[[i]] <-NAnext }# k <- this_data %>%# filter(!is.na(weekly_pay_diff)) %>%# arrange(weekly_pay_diff) %>%# kable(caption = paste("Weekly and hourly pay difference by minor group occupations", caption_text, sector, sep = " ")) %>%# kable_styling(full_width = F)# # output_list[[i]] <- k output_list[[i]] <- this_datanames(output_list)[i] <- sector }return(output_list)}tables <-comparison_table2(summary_combined)# Print the kable tables# for(i in 1:length(tables)){# if(!is.na(tables[[i]])){# print(tables[[i]])# }# }``````{r}#| warning: false#| error: false# there's got to be better way but can't find it# this prints the kable tables for the sectorsi <-1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try( tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))i <- i +1try(tables[[i]] %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption =paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep =" ")) %>%kable_styling(full_width = F))```### Minor group occupations across all sectorsNote I only consider unit occupations where the the minimum n is >= 10.#### Weekly^[[outputs/data/minor_group_occupation_summary_pay_weekly_across_sectors.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/minor_group_occupation_summary_pay_weekly_across_sectors.csv)]```{r}#| height: 20#| width: 10unit_occ_summary_pay <- data %>%filter(income_drop_all ==0&!is.na(income_weekly_all)) %>%group_by(UnitOccupation_labelled, outsourcing_status) %>%summarise(n =n(),Frequency =sum(NatRepemployees),avg_income =mean(income_weekly_all, na.rm=T),wtd_avg_income =weighted.mean(income_weekly_all, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(UnitOccupation_labelled) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum),UnitOccupation_labelled =case_when(UnitOccupation_labelled =="NA"~NA,TRUE~ UnitOccupation_labelled),UnitOccupation_labelled = stringr::str_to_title(UnitOccupation_labelled) ) %>%ungroup()write_csv(unit_occ_summary_pay, file="../outputs/data/minor_group_occupation_summary_pay_weekly_across_sectors.csv")# need to identify the unit occs that have an ok n# subste to occs with n>=10unit_subset_weekly <- unit_occ_summary_pay %>%group_by(UnitOccupation_labelled) %>%mutate(min_n =min(n, na.rm=TRUE) ) %>%filter(min_n >=10)unit_subset <- unit_subset_weekly# create a df with occs where outsourced paid less so we can just list itpaid_less <- unit_subset %>%pivot_wider(id_cols =c(UnitOccupation_labelled), names_from = outsourcing_status, values_from =c(wtd_avg_income, n)) %>% janitor::clean_names() %>%mutate(pay_penalty = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced ) %>%filter( pay_penalty <0 )write_csv(paid_less, file="../outputs/data/minor_group_occupation_weekly_pay_penalty_across_sectors.csv")#print(sector)# subset to this sector and drop na occupatoinsplot_data <- unit_subset %>%filter(!is.na(UnitOccupation_labelled)) %>%droplevels() %>%ungroup()# Order occs by N# First filter for 'outsourced' level and reorder by Nnot_outsourced_levels <- plot_data %>% dplyr::select(UnitOccupation_labelled, outsourcing_status, N) %>%distinct(UnitOccupation_labelled, N) %>%mutate(UnitOccupation_labelled = forcats::fct_reorder(UnitOccupation_labelled, N, .desc =FALSE))# not_outsourced_levels <- plot_data %>%# filter(outsourcing_status == 'Not outsourced') %>%# mutate(UnitOccupation_labelled = forcats::fct_reorder(UnitOccupation_labelled, N, .desc = FALSE))# Then apply the reordered levels back to the original dataplot_data <- plot_data %>%mutate(UnitOccupation_labelled =factor(UnitOccupation_labelled, levels =levels(not_outsourced_levels$UnitOccupation_labelled)), )annotation_df <- plot_data %>%#filter(outsourcing_status == "Not outsourced") %>% dplyr::select(UnitOccupation_labelled, n) %>%group_by(UnitOccupation_labelled) %>%summarise(N =sum(n) ) %>%mutate(ypos =max(plot_data$wtd_avg_income, na.rm=T) *1.2 )p <- plot_data %>%ggplot(., aes(wtd_avg_income, UnitOccupation_labelled, size = perc, colour = outsourcing_status)) +geom_point(position ="dodge") +geom_label_repel(inherit.aes = F, aes(wtd_avg_income, UnitOccupation_labelled, colour = outsourcing_status, label=paste0("n=",n)), size=3, #force_pull = 2 ) +theme_minimal() +theme(legend.position ="bottom",legend.title =element_blank()) +#coord_flip() +scale_x_continuous(breaks=scales::breaks_pretty(n=5)) +#breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 200)) +scale_colour_manual(values=colours) +geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=UnitOccupation_labelled, label =paste0("N = ", N)), hjust=1) +guides(size=FALSE) +# remove size legend as gauging size is difficult xlab("Weighted average weekly income") +ylab("Unit occupation") +labs(caption ="Size of bubble represents the size of the respective workforce within the occupation") +ggtitle("All sectors")show(p)ggsave(here('outputs','figures','occupation_pay_plots','unit_occupation_pay_plot_all_sectors.png'), height =8, width =8, dpi=800, bg="white")```Looking at occupations across all sectors, there are many occupations where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:^[[outputs/data/minor_group_occupation_weekly_pay_penalty_across_sectors.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/minor_group_occupation_weekly_pay_penalty_across_sectors.csv)]```{r}paid_less %>%arrange(pay_penalty) %>%relocate( pay_penalty, .after = unit_occupation_labelled ) %>%kable(caption ="Weekly pay penalty for unit occupations across all sectors") %>%kable_styling()```#### Hourly^[[outputs/data/minor_group_occupation_summary_pay_hourly_across_sectors.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/minor_group_occupation_summary_pay_hourly_across_sectors.csv)]```{r}#| height: 10#| width: 10unit_occ_summary_pay <- data %>%filter(income_drop_all ==0&!is.na(income_hourly_all)) %>%group_by(UnitOccupation_labelled, outsourcing_status) %>%summarise(n =n(),Frequency =sum(NatRepemployees),avg_income =mean(income_hourly_all, na.rm=T),wtd_avg_income =weighted.mean(income_hourly_all, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(UnitOccupation_labelled) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum),UnitOccupation_labelled =case_when(UnitOccupation_labelled =="NA"~NA,TRUE~ UnitOccupation_labelled),UnitOccupation_labelled = stringr::str_to_title(UnitOccupation_labelled) ) %>%ungroup()write_csv(unit_occ_summary_pay, file="../outputs/data/minor_group_occupation_summary_pay_hourly_across_sectors.csv")# need to identify the unit occs that have an ok n# subste to occs with n>=10unit_subset_hourly <- unit_occ_summary_pay %>%group_by(UnitOccupation_labelled) %>%mutate(min_n =min(n, na.rm=TRUE) ) %>%filter(min_n >=10)unit_subset <- unit_subset_hourly# create a df with occs where outsourced paid less so we can just list itpaid_less <- unit_subset %>%pivot_wider(id_cols =c(UnitOccupation_labelled), names_from = outsourcing_status, values_from =c(wtd_avg_income, n)) %>% janitor::clean_names() %>%mutate(pay_penalty = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced ) %>%filter( pay_penalty <0 )write_csv(paid_less, file="../outputs/data/minor_group_occupation_hourly_pay_penalty_across_sectors.csv")#print(sector)# subset to this sector and drop na occupatoinsplot_data <- unit_subset %>%filter(!is.na(UnitOccupation_labelled)) %>%droplevels() %>%ungroup()# Order occs by N# First filter for 'outsourced' level and reorder by Nnot_outsourced_levels <- plot_data %>% dplyr::select(UnitOccupation_labelled, outsourcing_status, N) %>%distinct(UnitOccupation_labelled, N) %>%mutate(UnitOccupation_labelled = forcats::fct_reorder(UnitOccupation_labelled, N, .desc =FALSE))# not_outsourced_levels <- plot_data %>%# filter(outsourcing_status == 'Not outsourced') %>%# mutate(UnitOccupation_labelled = forcats::fct_reorder(UnitOccupation_labelled, N, .desc = FALSE))# Then apply the reordered levels back to the original dataplot_data <- plot_data %>%mutate(UnitOccupation_labelled =factor(UnitOccupation_labelled, levels =levels(not_outsourced_levels$UnitOccupation_labelled)), )annotation_df <- plot_data %>%#filter(outsourcing_status == "Not outsourced") %>% dplyr::select(UnitOccupation_labelled, n) %>%group_by(UnitOccupation_labelled) %>%summarise(N =sum(n) ) %>%mutate(ypos =max(plot_data$wtd_avg_income, na.rm=T) *1.2 )p <- plot_data %>%ggplot(., aes(wtd_avg_income, UnitOccupation_labelled, size = perc, colour = outsourcing_status)) +geom_point(position ="dodge") +geom_label_repel(inherit.aes = F, aes(wtd_avg_income, UnitOccupation_labelled, colour = outsourcing_status, label=paste0("n=",n)), size=3, #force_pull = 2 ) +theme_minimal() +theme(legend.position ="bottom",legend.title =element_blank()) +#coord_flip() +scale_x_continuous(breaks=scales::breaks_pretty(n=5)) +#breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 200)) +scale_colour_manual(values=colours) +geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=UnitOccupation_labelled, label =paste0("N = ", N)), hjust=1) +guides(size=FALSE) +# remove size legend as gauging size is difficult xlab("Weighted average hourly income") +ylab("Unit occupation") +labs(caption ="Size of bubble represents the size of the respective workforce within the occupation") +ggtitle("All sectors")show(p)ggsave(here('outputs','figures','occupation_pay_plots','unit_occupation_pay_plot_hourly_all_sectors.png'), height =8, width =8, dpi=800, bg="white")```Looking at occupations across all sectors, there are many occupations where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:^[[outputs/data/minor_group_occupation_hourly_pay_penalty_across_sectors.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/minor_group_occupation_hourly_pay_penalty_across_sectors.csv)]```{r}paid_less %>%arrange(pay_penalty) %>%relocate( pay_penalty, .after = unit_occupation_labelled ) %>%kable(caption ="Hourly pay penalty for unit occupations across all sectors") %>%kable_styling()```#### Comparing pay penalty between weekly and hourlyNote only consider n >= 10```{r}# add pay frame flagsunit_subset_weekly <- unit_subset_weekly %>%mutate(pay_frame ="weekly" )unit_subset_hourly <- unit_subset_hourly %>%mutate(pay_frame ="hourly" )# combineunit_subset_combined <- dplyr::bind_rows(unit_subset_weekly,unit_subset_hourly)unit_subset_combined2 <- unit_subset_combined %>%pivot_wider(id_cols =c(UnitOccupation_labelled, pay_frame), names_from = outsourcing_status, values_from =c(wtd_avg_income, n)) %>% janitor::clean_names() %>%mutate(pay_diff = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced )unit_subset_combined3 <- unit_subset_combined2 %>%pivot_wider(id_cols = unit_occupation_labelled, names_from = pay_frame, values_from = pay_diff, names_glue ="{pay_frame}_pay_diff") # pivot again to compare hourly and weekly pay differencesunit_subset_combined4 <- unit_subset_combined3 %>%mutate(pattern_reverse =ifelse((weekly_pay_diff <0& hourly_pay_diff >=0 ) | (weekly_pay_diff >=0& hourly_pay_diff <0), 1, 0 ) ) ```The table below shows the pay difference between outsourced and non-outsourced workers by minor sub group occupation. Negative values indicate pay penalties for outsourced workers. The 'pattern_reverse' column indicates the four occupations where the direction of the difference is different if you consider hourly versus weekly pay difference. For example, per week, teaching professionals who are outsourced earn £82 less than non-outsourced counterparts, but per hour they are paid on average 16p more than non-outsourced workers. This suggests that outsrouced rates are higher in this occupation, but the amount of work available is not enough for outsrouced people to earn more than non-outsoruced people on a weekly basis. The reverse pattern is evident for the other three. For example, outsourced workers in food preparation and hospitality earn on average 40p less an hour than non-outsourced workers, but earn on average £17 more per week than non-outsourced workers. This suggests that outsourced workers in this occupation are paid less but work more hours than their non-outsourced counterparts. ```{r}unit_subset_combined4 %>%filter(!is.na(weekly_pay_diff)) %>%arrange(weekly_pay_diff) %>%kable(caption ="Weekly and hourly pay difference by minor sub group occupation") %>%kable_styling(full_width = F)```## London has a disproportionate share of the UK’s outsourced workers, followed by the East and West Midlands::: {.callout-tip title="#regions"}- In London, around 25% of workers are outsourced – the highest proportion of any region in the UK. London is followed by the East Midlands (19%) and West Midlands (18%) in the share of workers in the region who are outsourced, with the East of England being the region with the lowest share of outsourced workers as part of the total employed workforce, at 13%.- Possible addition: Should this include some comment on WHY we think this might be the case? Should we look at sectoral splits in London, compared to everywhere else, to see whether there are significant sector differences that might explain this trend?:::The plot below shows the proportion of workers within each region who are outsourced.[^22][^22]: [outputs/data/region_stats_2.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/region_stats_2.csv)```{r}region_statistics_2 <- data %>%# get values of labels# mutate_all(haven::as_factor) %>%group_by(Region, outsourcing_status) %>%summarise(Frequency =sum(NatRepemployees),n =n(), ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) ) %>%rename(`Outsourcing status`= outsourcing_status ) %>%ungroup()reg_levels <- region_statistics_2 %>%filter(`Outsourcing status`=="Outsourced") %>%mutate(Region = forcats::fct_reorder(Region, Percentage, .desc=FALSE) )annotation_df <- region_statistics_2 %>%filter(`Outsourcing status`=="Not outsourced") %>% dplyr::select(Region, N) %>%mutate(ypos =100 )region_statistics_2 %>%mutate(Region =factor(Region, levels =levels(reg_levels$Region)) ) %>%ggplot(., aes(Region, Percentage, fill =`Outsourcing status`)) +geom_col(colour="black") +geom_text(inherit.aes=F, data = annotation_df, aes(Region, ypos, label =paste0("N=",N)), hjust=1, nudge_y =-2) +coord_flip() +scale_fill_manual(values=many_colours) +theme_minimal()readr::write_csv(region_statistics_2, file ="../outputs/data/region_stats_2.csv")region_statistics_2_1 <- region_statistics_2 %>%filter(`Outsourcing status`=="Outsourced"& Region !="London")london_perc <- region_statistics_2[which(region_statistics_2$Region =="London"& region_statistics_2["Outsourcing status"] =="Outsourced"), "Percentage"]```Below we map the workforce composition in each region. The first map emphasises that London has the highest concentration of outsourced workers (`r round(region_statistics_2[which(region_statistics_2$Region == "London" & region_statistics_2["Outsourcing status"] == "Outsourced"), "Percentage"],0)`%).```{r}knitr::include_graphics('../outputs/figures/outsourcing_by_region.svg')```The second map excludes London so that is easier to see how the remaining regions compare. After London, the regions with the highest proportion of outsourced workers are:1. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 1), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 1), "Percentage"],0)`%)2. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 2), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 2), "Percentage"],0)`%)3. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 3), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 3), "Percentage"],0)`%)4. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 4), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 4), "Percentage"],0)`%)5. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 5), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 5), "Percentage"],0)`%)```{r}knitr::include_graphics('../outputs/figures/outsourcing_by_region_excl_london.svg')``````{r}region_statistics_3 <- data %>%filter(outsourcing_status =="Outsourced") %>%# get values of labels# mutate_all(haven::as_factor) %>%group_by(Region) %>%summarise(Frequency =sum(NatRepemployees) ) %>%mutate(Sum =sum(Frequency),Percentage =100* (Frequency / Sum) )readr::write_csv(region_statistics_3, file ="../outputs/data/region_stats_3.csv")```We can also explore how the the entire UK workforce is distributed across the country.[^23] The table and map below show the percentage of outsourced workers in each region as a proportion of the total UK workforce. They show where the UK's outsourced workforce is concentrated. The regions with the highest share of the UK's outsourced workforce are:[^23]: [outputs/data/region_stats_3.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/region_stats_3.csv)1. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 1), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 1), "Percentage"],0)`%)2. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 2), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 2), "Percentage"],0)`%)3. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 3), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 3), "Percentage"],0)`%)4. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 4), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 4), "Percentage"],0)`%)5. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 5), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 5), "Percentage"],0)`%)```{r}region_statistics_3 %>%mutate(Region = haven::as_factor(Region) ) %>%arrange(desc(Percentage)) %>% knitr::kable(.,digits =2) %>%kable_styling(full_width = F)``````{r}knitr::include_graphics('../outputs/figures/outsourcing_distribution_across_regions.svg')```