Key findings - matched to report

Authors

Jolyon Miles-Wilson

Celestin Okoroji

Published

April 15, 2025

1 Ethnicity categorisations

For reference, the table below provides a disambiguation of how ethnicities have been grouped in this analysis.

For analyses using the disaggregated (survey) categories, the reference category is “English / Welsh / Scottish / Northern Irish / British”.

For analyses using the aggregated categories, the reference category is “White British”

Ethnicity: Survey	Ethnicity: Aggregated	Ethnicity: Binary
English / Welsh / Scottish / Northern Irish / British	White British	White British
Irish	White other	Non-White British
Gypsy or Irish Traveller	White other	Non-White British
Roma	White other	Non-White British
Any other White background	White other	Non-White British
White and Black Caribbean	Mixed/Multiple ethnic group	Non-White British
White and Black African	Mixed/Multiple ethnic group	Non-White British
White and Asian	Mixed/Multiple ethnic group	Non-White British
Any other Mixed / Multiple ethnic background	Mixed/Multiple ethnic group	Non-White British
Indian	Asian/Asian British	Non-White British
Pakistani	Asian/Asian British	Non-White British
Bangladeshi	Asian/Asian British	Non-White British
Chinese	Asian/Asian British	Non-White British
Any other Asian background	Asian/Asian British	Non-White British
African	Black/African/Caribbean/Black British	Non-White British
Caribbean	Black/African/Caribbean/Black British	Non-White British
Any other Black, Black British, or Caribbean background	Black/African/Caribbean/Black British	Non-White British
Arab	Arab/British Arab	Non-White British
Any other ethnic group	Other ethnic group	Non-White British
Don’t think of myself as any of these	Prefer not to say	Non-White British
Prefer not to say	Prefer not to say	Non-White British
NA	Prefer not to say	Non-White British

2 Chapter 2: How many outsourced workers are there in the UK?

2.1 How many UK workers are outsourced?

#how-many

Around 1 in 6 UK workers meet our definition of an outsourced worker
The ‘outsourced sub-group’ is the most dominant of the three sub-groups - meaning the total group is predominantly made up of people who self-identify as an outsourced worker and they say they are hired to do work that is long-term or ongoing. People included in this sub-group (either uniquely, or while also meeting the criteria for at least one of the other sub-groups) make up around 67% (check) of our total outsourced group, or nearly 7 in 10. This group makes up X of all UK workers.

1 in 6 (17%) of UK workers are outsourced.¹

In terms of the the different possible types of outsourced groups², the numbers are as follows:

Definitely outsourced: 11%
Likely agency: 3%
High indicators: 3%

People included in this sub-group (either uniquely, or while also meeting the criteria for at least one of the other sub-groups) make up around 68% of our total outsourced group. This group makes up 11% of all UK workers.

#non-exclusive-subgroups1

The two other sub-groups – the agency and indicators sub-groups – are less dominant in comparison. Around 58% of all respondents meet the criteria for either or both of these sub-groups, but this falls to around 33% if we exclude people who are already captured in the outsourced sub-group. Excluding the first sub-group, these other two groups makes up X of all UK workers.

The percentages here refer to the number of people who are outsourced (super-ordinate group), not the total number of respondents. Below I provide percentages as function of the outsourced super-ordinate group as well as the total sample

Group criteria

Outsourced, defined as responding ‘I am sure I am outsourced’ or ‘I might be outsourced’, and responding ‘I do work on a long-term basis’.
Likely agency, defined as those responding ‘I am sure I am agency’ and ‘I do work on a long-term basis’, excluding those people who are already defined as being outsourced.
High indicators: defined as responding TRUE to 5 or 6 of the outsourcing indicators, as well as responding ‘I do work on a long-term basis’, excluding those people who are already defined as outsourced or likely agency.

Including outsourced group
agency_or_indicator	freq	n	total	perc	N
agency	342.6956	344	10155	3.374649	10155
both	106.3656	116	10155	1.047421	10155
indicator	513.2645	516	10155	5.054303	10155
neither	9192.6744	9179	10155	90.523627	10155

Exluding outsourced group
agency_or_indicator	freq	n	total	perc	N
agency	231.43068	231	8993.922	2.5731897	9032
both	35.10624	38	8993.922	0.3903329	9032
indicator	280.74106	291	8993.922	3.1214531	9032
neither	8446.64421	8472	8993.922	93.9150243	9032

9.48% of the whole sample meet the criteria for either or both of these sub-groups. This falls to 6.08% if we exclude people who are already captured in the outsourced sub-group.

Out of those who are in the ‘outsourced’ status (i.e., the combination of the three outsourced groups), 57.99% meet the criteria for either or both of these sub-groups, but this falls to around 33.27% if we exclude people who are already captured in the outsourced sub-group.

#non-exclusive-subgroups2

There is some overlap between these sub-groups, but they are not like for like. Just over a quarter (27%) of respondents are in more than one sub-group, while nearly three quarters (73%) of respondents are uniquely captured in just one of the three sub-groups.

Just over a quarter (26.35%) of respondents are in more than one sub-group, while nearly three quarters (73.65%) of respondents are uniquely captured in just one of the three sub-groups.³

2.2 Evaluating our total estimate

#evaluating-total-estimate To do

Around 1 in 4 “outsourced” respondents sit in more than one sub-group within our definition, but around 3 in 4 are uniquely captured in just one of the three sub-groups - predominantly in the outsourced sub-group.
As figure X shows, not all respondents in the outsourced sub-group said yes five or six of our six outsourcing

3 Chapter 3: Who are the UK’s outsourced workers?

3.1 Demographic breakdown

Demographic variables:

Categorical
- Gender
- Ethnicity
Numeric
- Age - in age section: Section 3.3

We want them broken down by

outsourcing status
- high low pay
outsourcing group
- high low pay

3.1.1 Ethnicity by outsourcing status

3.1.1.1 Collapsed ethnicity⁴

Ethnicity by outsourcing status (%)
outsourcing_status	White British	Arab/British Arab	Asian/Asian British	Black/African/Caribbean/Black British	Mixed/Multiple ethnic group	Other ethnic group	Prefer not to say	White other
Not outsourced	78.01	0.24	7.46	2.83	1.65	0.28	3.63	5.89
Outsourced	66.91	0.70	12.68	5.67	2.64	0.41	4.33	6.65

3.1.1.2 Full ethnicity⁵

Ethnicity by outsourcing status (%)
outsourcing_status	English / Welsh / Scottish / Northern Irish / British	Irish	Gypsy or Irish Traveller	Roma	Any other White background	White and Black Caribbean	White and Black African	White and Asian	Any other Mixed / Multiple ethnic background	Indian	Pakistani	Bangladeshi	Chinese	Any other Asian background	African	Caribbean	Any other Black, Black British, or Caribbean background	Arab	Any other ethnic group	Don’t think of myself as any of these	Prefer not to say	NA
Not outsourced	78.01	1.17	0.10	0.06	4.56	0.63	0.24	0.38	0.40	2.81	1.14	0.63	1.47	1.40	1.86	0.67	0.30	0.24	0.28	0.07	0.28	3.28
Outsourced	66.91	0.87	0.14	0.13	5.50	0.32	0.86	0.53	0.93	4.33	3.15	1.35	1.23	2.62	4.10	0.88	0.69	0.70	0.41	0.17	0.40	3.75

3.1.1.3 By high/low pay

3.1.1.3.1 Collapsed ethnicity⁶

Ethnicity by outsourcing status and income group(%)
outsourcing_status	income_group	White British	Arab/British Arab	Asian/Asian British	Black/African/Caribbean/Black British	Mixed/Multiple ethnic group	Other ethnic group	Prefer not to say	White other
Not outsourced	Not low	78.53	0.27	7.96	2.91	1.61	0.39	2.28	6.05
Not outsourced	Low	80.18	0.26	6.33	3.33	1.59	NA	2.37	5.94
Outsourced	Not low	64.95	0.63	15.03	6.85	2.13	0.33	2.99	7.10
Outsourced	Low	68.67	0.40	10.64	3.61	4.53	0.91	4.61	6.64

3.1.1.3.2 Full ethnicity⁷

Ethnicity by outsourcing status and income group(%)
outsourcing_status	income_group	English / Welsh / Scottish / Northern Irish / British	Irish	Gypsy or Irish Traveller	Roma	Any other White background	White and Black Caribbean	White and Black African	White and Asian	Any other Mixed / Multiple ethnic background	Indian	Pakistani	Bangladeshi	Chinese	Any other Asian background	African	Caribbean	Any other Black, Black British, or Caribbean background	Arab	Any other ethnic group	Don’t think of myself as any of these	Prefer not to say	NA
Not outsourced	Not low	78.53	0.97	0.13	0.10	4.84	0.53	0.26	0.42	0.40	3.14	1.14	0.50	1.78	1.40	1.97	0.70	0.24	0.27	0.39	0.06	0.22	1.99
Not outsourced	Low	80.18	1.38	0.04	NA	4.52	0.63	0.17	0.36	0.43	2.03	1.02	0.55	0.85	1.88	2.19	0.66	0.48	0.26	NA	0.04	0.26	2.07
Outsourced	Not low	64.95	0.79	0.26	0.24	5.81	0.22	1.03	0.24	0.64	5.77	3.12	1.37	1.60	3.17	5.04	1.02	0.80	0.63	0.33	0.22	0.23	2.54
Outsourced	Low	68.67	0.57	NA	NA	6.07	0.70	0.77	1.20	1.85	3.13	3.98	1.41	0.87	1.26	2.64	0.83	0.14	0.40	0.91	0.09	0.59	3.92

3.1.2 Ethnicity by oustourcing group

3.1.2.1 Collapsed ethnicity⁸

Ethnicity by outsourcing group (%)
outsourcing_group	White British	Arab/British Arab	Asian/Asian British	Black/African/Caribbean/Black British	Mixed/Multiple ethnic group	Other ethnic group	Prefer not to say	White other
Not outsourced	78.01	0.24	7.46	2.83	1.65	0.28	3.63	5.89
Outsourced	67.02	0.43	11.86	5.78	2.64	0.13	4.62	7.53
Likely agency	65.41	1.28	12.80	6.52	3.18	1.48	3.80	5.54
High indicators	67.88	1.29	15.99	4.42	2.17	0.57	3.64	4.04

3.1.2.2 Full ethnicity⁹

Ethnicity by outsourcing group (%)
outsourcing_group	English / Welsh / Scottish / Northern Irish / British	Irish	Gypsy or Irish Traveller	Roma	Any other White background	White and Black Caribbean	White and Black African	White and Asian	Any other Mixed / Multiple ethnic background	Indian	Pakistani	Bangladeshi	Chinese	Any other Asian background	African	Caribbean	Any other Black, Black British, or Caribbean background	Arab	Any other ethnic group	Don’t think of myself as any of these	Prefer not to say	NA
Not outsourced	78.01	1.17	0.10	0.06	4.56	0.63	0.24	0.38	0.40	2.81	1.14	0.63	1.47	1.40	1.86	0.67	0.30	0.24	0.28	0.07	0.28	3.28
Outsourced	67.02	0.97	0.21	0.13	6.22	0.33	0.95	0.50	0.85	3.79	2.82	1.55	1.10	2.61	4.07	0.90	0.81	0.43	0.13	0.22	0.14	4.25
Likely agency	65.41	0.25	NA	0.29	5.00	0.26	0.34	0.95	1.63	4.44	3.65	0.98	0.59	3.13	4.81	1.27	0.44	1.28	1.48	0.15	0.20	3.45
High indicators	67.88	1.06	NA	NA	2.98	0.35	0.93	0.28	0.61	6.48	4.07	0.88	2.39	2.17	3.54	0.47	0.41	1.29	0.57	NA	1.68	1.96

3.1.2.3 By high/low pay

3.1.2.3.1 Collapsed ethnicity¹⁰

Ethnicity by outsourcing status and income group(%)
outsourcing_group	income_group	White British	Arab/British Arab	Asian/Asian British	Black/African/Caribbean/Black British	Mixed/Multiple ethnic group	Other ethnic group	Prefer not to say	White other
Not outsourced	Not low	78.53	0.27	7.96	2.91	1.61	0.39	2.28	6.05
Not outsourced	Low	80.18	0.26	6.33	3.33	1.59	NA	2.37	5.94
Outsourced	Not low	63.23	0.52	14.55	7.26	2.06	0.25	3.40	8.73
Outsourced	Low	70.91	0.55	9.52	3.32	4.16	NA	4.52	7.02
Likely agency	Not low	67.09	0.79	12.62	8.39	2.14	NA	3.26	5.71
Likely agency	Low	61.13	NA	14.97	2.34	7.15	5.48	2.33	6.60
High indicators	Not low	68.81	0.87	18.63	4.20	2.36	0.86	1.39	2.87
High indicators	Low	64.73	NA	11.73	7.72	2.86	NA	8.96	3.99

3.1.2.3.2 Full ethnicity¹¹

Ethnicity by outsourcing status and income group(%)
outsourcing_group	income_group	English / Welsh / Scottish / Northern Irish / British	Irish	Gypsy or Irish Traveller	Roma	Any other White background	White and Black Caribbean	White and Black African	White and Asian	Any other Mixed / Multiple ethnic background	Indian	Pakistani	Bangladeshi	Chinese	Any other Asian background	African	Caribbean	Any other Black, Black British, or Caribbean background	Arab	Any other ethnic group	Don’t think of myself as any of these	Prefer not to say	NA
Not outsourced	Not low	78.53	0.97	0.13	0.10	4.84	0.53	0.26	0.42	0.40	3.14	1.14	0.50	1.78	1.40	1.97	0.70	0.24	0.27	0.39	0.06	0.22	1.99
Not outsourced	Low	80.18	1.38	0.04	NA	4.52	0.63	0.17	0.36	0.43	2.03	1.02	0.55	0.85	1.88	2.19	0.66	0.48	0.26	NA	0.04	0.26	2.07
Outsourced	Not low	63.23	1.04	0.40	0.24	7.05	0.15	1.03	0.30	0.57	5.22	3.25	1.30	1.40	3.39	5.02	1.29	0.96	0.52	0.25	0.27	0.27	2.85
Outsourced	Low	70.91	0.55	NA	NA	6.46	0.81	1.05	0.74	1.57	2.46	3.16	1.92	0.69	1.29	2.53	0.79	NA	0.55	NA	0.12	NA	4.39
Likely agency	Not low	67.09	0.42	NA	0.49	4.80	0.44	0.58	0.28	0.83	5.21	3.55	1.66	NA	2.19	7.32	0.33	0.74	0.79	NA	0.25	0.33	2.68
Likely agency	Low	61.13	NA	NA	NA	6.60	NA	NA	2.93	4.22	5.08	5.80	NA	2.21	1.89	0.85	1.49	NA	NA	5.48	NA	NA	2.33
High indicators	Not low	68.81	0.29	NA	NA	2.58	0.26	1.41	NA	0.68	8.06	2.33	1.34	3.61	3.29	3.20	0.71	0.30	0.87	0.86	NA	NA	1.39
High indicators	Low	64.73	1.63	NA	NA	2.36	1.09	NA	1.77	NA	4.79	6.93	NA	NA	NA	6.35	NA	1.37	NA	NA	NA	5.84	3.12

3.1.3 Gender by outsourcing status¹²

Gender by outsourcing status (%)
outsourcing_status	Female	Male	Other	Prefer not to say
Not outsourced	51.89	47.28	0.15	0.68
Outsourced	43.00	56.40	0.15	0.44

3.1.3.1 By high/low pay¹³

Gender by outsourcing status and income group(%)
outsourcing_status	income_group	Female	Male	Other	Prefer not to say
Not outsourced	Not low	46.31	53.32	0.13	0.24
Not outsourced	Low	71.70	27.83	0.18	0.29
Outsourced	Not low	35.85	63.83	0.14	0.18
Outsourced	Low	60.34	38.98	0.30	0.38

3.1.4 Gender by outsourcing group¹⁴

Gender by outsourcing group (%)
outsourcing_group	Female	Male	Other	Prefer not to say
Not outsourced	51.89	47.28	0.15	0.68
Outsourced	45.34	53.94	0.23	0.50
Likely agency	43.02	56.66	NA	0.32
High indicators	33.34	66.35	NA	0.32

3.1.4.1 By high/low pay¹⁵

Gender by outsourcing group and income group(%)
outsourcing_group	income_group	Female	Male	Other	Prefer not to say
Not outsourced	Not low	46.31	53.32	0.13	0.24
Not outsourced	Low	71.70	27.83	0.18	0.29
Outsourced	Not low	38.22	61.42	0.21	0.15
Outsourced	Low	62.10	36.97	0.41	0.52
Likely agency	Not low	37.89	61.57	NA	0.54
Likely agency	Low	53.35	46.65	NA	NA
High indicators	Not low	26.27	73.73	NA	NA
High indicators	Low	58.97	41.03	NA	NA

3.2 Evidence paints a racialised picture of outsourcing in the UK, with links to both ethnicity and migration

#ethnicity

More than 1 in 4 (nearly 1/3) outsourced workers are from an ethnic minority background
Workers from ethnic minority backgrounds are disproportionately over-represented in outsourced work in the UK, and typically more likely to be outsourced than White British workers.
Overall, 22% of non-outsourced workers are from an ethnic minority background, rising to 33% of outsourced workers – a more than ten percentage point difference. This means that while just over 1 in 6 non-outsourced workers in our sample were from an ethnic minority background, nearly 1 in 3 outsourced workers were.
People from an ethnic minority background are overall 1.75 times more likely to be outsourced than people from a White British background.
Workers from Arab backgrounds are 3.86 times more likely than White workers to be outsourced; (check sample size – are we confident in all of these significance tests, or should we just use some of them in these bullet points?)
Workers from Black backgrounds are 2.33 times more likely than White workers to be outsourced.
Workers from Asian backgrounds are 1.98 times more likely than White workers to be outsourced
Workers from Mixed Ethnicity backgrounds are 1.86 times more likely than White workers to be outsourced
White other workers are 1.32 times more likely than White British workers to be outsourced

People from an ethnic minority are 1.75 times more likely to be outsourced than people from a White British background; 33.09% of outsourced workers are from an ethnic minority, compared to 21.99% of non-outsourced workers.¹⁶

Overall, there is no interaction between being from a minority and outsourced on whether you are low paid. i.e., being from an ethnic minority and outsourced is not associated with being in the low pay group.¹⁷

However there is nuance in the groups. We do find evidence to suggest that among White British people, outsourced people are 1.35 times more likely to be in the low income group compared to non-outsourced people, and among Mixed ethnicity people, outsourced people are 2.8 times more likely to be in the low income group compared to non-outsourced people.¹⁸

Looking at this with disaggregated ethnicities indicates that among “English / Welsh / Scottish / Northern Irish / British” workers, outsourced people are 1.35 times more likely to be in the low income group compared to non-outsourced people. Among “White and Asian” workers, outsourced workers are 7.66 times more likely to be in the low income group compared to non-outsourced workers.¹⁹

Ethnicity (binary) by outsourcing status and income group(%)
outsourcing_status	income_group	White British	Non-White British
Not outsourced	Not low	78.53	21.47
Not outsourced	Low	80.18	19.82
Outsourced	Not low	64.95	35.05
Outsourced	Low	68.67	31.33

Comparison of ethnicities indicates that some groups are statistically more likely to be outsourced than others²⁰:

Arab/British Arab workers are 3.386 times more likely than White British workers to be outsourced.
Asian/Asian British workers are 1.982 times more likely than White British workers to be outsourced.
Black/African/Caribbean/Black British workers are 2.334 times more likely than White British workers to be outsourced.
Mixed/Multiple ethnic group workers are 1.865 times more likely than White British workers to be outsourced.
Prefer not to say workers are 1.389 times more likely than White British workers to be outsourced.
White other workers are 1.315 times more likely than White British workers to be outsourced.

Comparison of more disaggregated ethnicities indicates more nuance²¹:

Any other White background workers are 1.41 times more likely than White British workers to be outsourced.
White and Black African workers are 4.12 times more likely than White British workers to be outsourced.
Any other Mixed / Multiple ethnic background workers are 2.73 times more likely than White British workers to be outsourced.
Indian workers are 1.79 times more likely than White British workers to be outsourced.
Pakistani workers are 3.23 times more likely than White British workers to be outsourced.
Bangladeshi workers are 2.48 times more likely than White British workers to be outsourced.
Any other Asian background workers are 2.18 times more likely than White British workers to be outsourced.
African workers are 2.57 times more likely than White British workers to be outsourced.
Any other Black, Black British, or Caribbean background workers are 2.65 times more likely than White British workers to be outsourced.
Arab workers are 3.39 times more likely than White British workers to be outsourced.

#ethnicity-sub-group

These differences in ethnicity also shift slightly depending on which outsourced “sub-group” we look at. For example, compared to White British workers, Black outsourced workers are more likely to be in the “outsourced sub-group” meaning they have self-identified as outsourced, or the “agency sub-group”, meaning they are agency workers doing more long-term and ongoing work. Are there any other interesting points to mention here? Should we do a chart showing this different across sub-groups? Do we need an interpretive comment in this section?

# weights:  36 (24 variable)
initial  value 14077.819237 
iter  10 value 6009.847927
iter  20 value 5984.124702
iter  30 value 5983.869764
final  value 5983.869675 
converged

Breaking down by outsourcing group helps to separate out the type of outsourced work people from the ethnicities identified above engage in.²² Compared to White British workers,

Arab people are more likely to be likely agency or high indicators
Asian people are more likely to be in any of the groups
Black people are more likely to be likely agency or outsourced
People of mixed ethnicity are more likely to be outsourced
People who selected Other ethnicity are more likely to be agency
White other people are more likely to be outsourced

# weights:  88 (63 variable)
initial  value 13604.387456 
iter  10 value 5752.921034
iter  20 value 5738.642702
iter  30 value 5738.326928
iter  40 value 5738.207808
iter  50 value 5738.195963
final  value 5738.195716 
converged

More nuance from disaggregated ethnicities²³. The table below shows the likelihood of workers of different ethnicities falling into each of the outsourcing groups, compared to White British workers. Note that only significant relationships are shown here. Note also that the ‘n’ for many of these statistics is very low. As such many of these statistics are illustrative but not inferential.

Likelihood of belonging to different groups compared to White British. Note: NAs are non-sig. relationships. 'n_' is sample size, 'freq_' is weighted sample size
Ethnicity	Outsourced	Likely agency	High indicators	n_Outsourced	n_Likely agency	n_High indicators	freq_Outsourced	freq_Likely agency	freq_High indicators
Gypsy or Irish Traveller	NA	0.00	0.00	2	NA	NA	2.48	NA	NA
Any other White background	1.59	NA	NA	63	10	7	72.25	13.33	8.37
White and Black African	4.59	NA	NA	21	2	3	11.08	0.91	2.62
Any other Mixed / Multiple ethnic background	NA	4.87	NA	15	5	3	9.84	4.33	1.71
Indian	1.57	NA	2.64	32	8	15	43.96	11.83	18.18
Pakistani	2.88	3.83	4.11	29	8	12	32.69	9.74	11.43
Bangladeshi	2.84	NA	NA	15	3	3	17.95	2.61	2.48
Any other Asian background	2.17	2.66	NA	17	5	4	30.35	8.34	6.10
African	2.54	3.09	NA	74	22	15	47.20	12.82	9.93
Any other Black, Black British, or Caribbean background	3.13	NA	NA	13	1	2	9.46	1.16	1.16
Arab	NA	6.30	6.15	3	2	2	4.97	3.42	3.63
Any other ethnic group	NA	6.35	NA	1	1	1	1.52	3.93	1.60
Don’t think of myself as any of these	NA	NA	0.00	4	1	NA	2.54	0.40	NA
Prefer not to say	NA	NA	6.94	1	1	2	1.67	0.52	4.72

#ethnicity-pay-split

On the low-pay / high-pay split, you say “A person is more likely to be in the low income group if they are: Older; Female; Prefer not to say when they arrived, And less likely if they are: Asian/Asian British; Live in North West or Wales; Arrived in the UK in last 30 years”; Can I confirm this means we don’t see any other significant differences in the ethnicity breakdown if we look at high paid vs low paid workers? If so, let’s clarify what this says about how ethnicity relates to a) outsourced workers being disproportionately low paid, but b) ethnic minority workers being no more likely to be in our low pay group.

Using the new ethnicity groupings, there is no evidence indicating that any ethnicity is more or less likely to be in the low income group

Note to self: This could benefit from stepwise regression

A person is more likely to be in the low income group if they are:

Older
Female
Don’t have a degree (or don’t know if they have a degree?)
Are outsourced
Arrived in the UK in the last year

And less likely if they are:

Younger
Male
Have a degree
Live in the North West or Wales (compared to London)
Arrived in the UK in last 30 years

#migration

As you would expect, the vast majority of outsourced workers were born in the UK. However, we still see a significantly higher likelihood of outsourced workers having been born outside of the UK compared to people who aren’t outsourced. While around 14% of non-outsourced workers were born outside of the UK, this rose to just over 24% for outsourced workers – or nearly 1 in 4.
Overall, people who were born outside of the UK are 1.94 times more likely to be in outsourced work than people who were born here.

As for non-outsourced workers, the vast majority of outsourced workers are born in the UK. However, people not born in the UK are more likely to be outsourced than people born in the UK. 24.13% of outsourced workers are not born in the UK, compared to 14.08% of non-outsourced workers.²⁴ This difference is statistically significant; outsourced workers are 1.94 times more likely to have been born outside the UK than non-outsourced workers.²⁵

#migration-sub-groups

This pattern broadly holds across our three outsourcing sub-groups, with nearly no difference in the likelihood of people born outside of the UK being in any one of the three groups.

# weights:  12 (6 variable)
initial  value 14077.819237 
iter  10 value 6002.136126
final  value 6002.013178 
converged

#ethnicity-migration-interaction. Some attention needed here

Among all workers who were born in the UK:

Black workers are 2.01 times more likely to be outsourced than a White worker
Asian workers are 2.02 times more likely to be outsourced than a White worker.
Workers from Other ethnic backgrounds are X times more likely to be outsourced than a White other worker

For workers born outside of the UK:

Among White workers, someone not born in the UK is 1.82 times more likely to be outsourced than someone born in the UK.
Among workers from Mixed ethnic backgrounds, someone not born in the UK is 2.73 times more likely to be outsourced than someone born in the UK.

For workers from other ethnicities, it doesn’t matter whether you are born in the UK or not – you are equally likely as a Black or an Asian worker to be outsourced, whether you were born in the UK or somewhere else. And compared to a White person born in the UK, Black African and South Asian workers specifically are more likely to be outsourced, whether or not they were born in the UK . Does this need any further detail or explanation

To discuss confidence in our interpretation in this section: The evidence on ethnicity and country of birth clearly paints a racialised picture of outsourcing, and one with colonial undertones, as Black African and South Asian workers see a higher risk of being outsourced compared to White British workers, regardless of their country of birth. This obviously raises further questions about why, linked to (sector, occupation, labour market inequality and structural racism). Discuss the draft interpretation in the comment on the right.

However, workers from non-White ethnic groups are not the only workers who see a higher risk of being outsourced: Non-UK-born White workers are also more likely to be outsourced than UK-born White people . Ethnicity and country of birth interact independently for some groups, but seem to be fundamentally connected for others.

Analysis of Deviance Table

Model 1: outsourcing_status ~ Ethnicity_collapsed + BORNUK_binary
Model 2: outsourcing_status ~ Ethnicity_collapsed * BORNUK_binary
  Resid. Df Resid. Dev Df Deviance      F      Pr(>F)    
1     10146     9056.3                                   
2     10139     9018.1  7   38.205 5.4578 0.000002847 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Exploring the intersection of ethnicity and arrival time reveals some patterns whereby the likelihood of a person being outsourced is related to the combinations of ethnicity and whether they were born in the UK.²⁶ The plot below shows that

Among workers born in the UK, a Black worker is 2.01 times more likely to be outsourced than a White British worker.
Among workers born in the UK, an Asian worker is 2.03 times more likely to be outsourced than a White British worker.
Among workers born in the UK, an Other ethnicity worker is 5.63 times more likely to be outsourced than a White other worker.
Among workers not born in the UK, a White other worker is 0.58 times as likely (i.e., less likely) to be outsourced than a White British worker.
Among workers not born in the UK, a White other worker is 0.53 times as likely (i.e., less likely) to be outsourced than a Black worker.
Among workers not born in the UK, a White other worker is 0.37 times as likely (i.e., less likely) to be outsourced than a worker of mixed ethnicity.

Similarly, the plot below shows that²⁷

Among White British workers, someone not born in the UK is 2.48 times more likely to be outsourced than someone born in the UK.
Among Mixed workers, someone not born in the UK is 2.73 times more likely to be outsourced than someone born in the UK.
Among people who preferred not to say their ethnicity, someone not born in the UK is 1.95 times as likely (i.e.,-95% less likely) to be outsourced than someone born in the UK.

For people born in UK, if you are Pakistani you are more likely to be outsourced than if you are White.

For White people and for White and Asian people, if you’re not born in UK you’re more likely to be outsourced.

#migration-by-pay-split

If we do a basic “born UK / not born UK” split, looking by low and high pay, what % of the low-paid workers group were born outside of the UK, vs in the high-paid group?

20.96% of outsourced workers in the low pay group were not born in the UK, compared to 26.39% of people in the not low pay group. This difference is marginally statistically significant; someone in the low income group is less likely to be born outside the UK than someone in the not low income group. This pattern is the same for non outsourced workers, and when we consider the interaction between outsourcing status and migration status, the only factor predicting income group is outsourcing status.

3.3 Outsourced workers are on average younger than non-outsourced workers

#age

We find that outsourced workers are significantly younger than non-outsourced workers, on average. The median age of an outsourced worker is 35, compared to a median age of 43 for a non-outsourced worker.
the outsourced and indicator sub-groups – people who directly said that they were or might be outsourced, or ticked a high number of our indicators of outsourced working – see higher proportions of younger workers than the “agency” sub-group.

#age-violin

INSERT VIOLIN PLOT CHART HERE SHOWING MEDIAN AGE OF EACH SUB-GROUP, COMPARED TO NON-OUTSOURCED WORKERS. Is this necessary? We already have the density plots

Outsourced workers are on average younger than non-outsourced workers. The median age of the outsourced group is 36 , compared to 43 for the not outsourced group.²⁸ This difference is statistically significant.²⁹

Outsourcing group	Mean	Median	Min	Max	Standard dev.	N
Not outsourced	42.80	43	16	80	13.08	8472
Outsourced	38.63	36	16	78	13.07	1683

The higher concentration of younger workers identified above appears to be driven primarily by the ‘outsourced’ and ‘high indicator’ groups, whilst the ‘likely agency’ group follows a similar pattern to the non-outsourced group.³⁰

Outsourcing status	Income group	Mean	Median	Min	Max	Standard dev.	N
Not outsourced	Not low	41.97	41	18	78	12.47	5280
Not outsourced	Low	42.87	43	16	80	15.09	1644
Outsourced	Not low	37.96	35	18	77	12.53	986
Outsourced	Low	39.05	37	16	78	14.06	381

Outsourcing group	Mean	Median	Min	Max	Standard dev.	N
Not outsourced	42.80	43	16	80	13.08	8472
Outsourced	38.40	35	16	78	13.09	1123
Likely agency	39.80	38	18	77	13.49	269
High indicators	38.49	35	18	72	12.55	291

Outsourcing group	Income group	Mean	Median	Min	Max	Standard dev.	N
Not outsourced	Not low	41.97	41.00	18	78.0	12.47	5280
Not outsourced	Low	42.87	43.00	16	80.0	15.09	1644
Outsourced	Not low	37.81	34.52	18	67.0	12.57	625
Outsourced	Low	39.07	37.00	16	78.0	13.89	272
Likely agency	Not low	39.33	38.00	18	77.0	12.66	168
Likely agency	Low	39.35	37.00	19	71.5	15.66	63
High indicators	Not low	37.29	35.00	18	65.0	12.25	193
High indicators	Low	38.42	34.59	19	67.0	12.82	46

#gender

The evidence also finds meaningful differences by gender between the outsourced and non-outsourced groups in our data. Men make up 56% of the outsourced workforce compared to 47% of the non-outsourced workforce, a nearly 10 percentage point difference.
Outsourced workers are 1.44 times more likely to be male than female.
The group with the largest proportion of men in the workforce is the ‘high indicators’ group (66.35%), followed by the ‘likely agency’ group (56.66%), followed by the ‘outsourced’ group (53.94%). Comparison of outsourced and non-outsourced workers finds that
Someone in the high indicators sub-group is 2.18 times more likely to be male than female.
Someone in the agency sub-group is 1.45 times more likely to be male than female.
Someone in the outsourced sub-group is 1.31 times more likely to be male than female.

#gender-sector

Possible addition: Will readers want to know more about how this intersects with the roles or sectors with higher rates of outsourcing – even if this is just an interpretive comment from us on how gender interacts with jobs and sectors more generally in the labour market?

# weights:  12 (6 variable)
initial  value 14077.819237 
iter  10 value 7610.573378
iter  20 value 7465.550476
final  value 7465.517316 
converged

The outsourced workforce consists of a greater proportion of males than the non-outsourced workforce.³¹ Men make up 56% of the outsourced workforce compared to 47% of the non-outsourced workforce. This difference is statistically significant; outsourced workers, compared to non-outsourced workers, are 1.44 times more likely to be male than female.³²


Call:
glm(formula = outsourcing_status ~ Gender, family = "quasibinomial", 
    data = data, weights = NatRepemployees)

Coefficients:
                        Estimate Std. Error t value             Pr(>|t|)    
(Intercept)             -1.78606    0.03987 -44.792 < 0.0000000000000002 ***
GenderMale               0.36421    0.05365   6.788       0.000000000012 ***
GenderOther              0.20126    0.68008   0.296                0.767    
GenderPrefer not to say -0.24251    0.38958  -0.622                0.534    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for quasibinomial family taken to be 1.000395)

    Null deviance: 9201.8  on 10154  degrees of freedom
Residual deviance: 9153.9  on 10151  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 4

# A tibble: 4 × 6
  variable                Estimate    or `Std. Error` `t value` `Pr(>|t|)`
  <chr>                      <dbl> <dbl>        <dbl>     <dbl>      <dbl>
1 (Intercept)               -1.79  0.168       0.0399   -44.8     0       
2 GenderMale                 0.364 1.44        0.0537     6.79    1.20e-11
3 GenderOther                0.201 1.22        0.680      0.296   7.67e- 1
4 GenderPrefer not to say   -0.243 0.785       0.390     -0.622   5.34e- 1

# weights:  20 (12 variable)
initial  value 14077.819237 
iter  10 value 7977.307669
iter  20 value 7461.899083
iter  30 value 7457.852026
iter  40 value 7457.374598
final  value 7457.362521 
converged

Breaking down by outsourcing group, we find that the group with the largest proportion of men in the workforce is the ‘high indicators’ group (66.35%), followed by the ‘likely agency’ group (56.66%), followed by the ‘outsourced’ group (53.94%). Statistically speaking, compared to a not outsourced person,

Someone in the high indicators group is 2.18 times more likely to be male than female.
Someone in the likely agency group is 1.45 times more likely tobe male than female.
Someone in the outsourced group is 1.31 times more likely tobe male than female.

Additionally, people identifying as ‘Other’ gender are absent from the high indicators and likely agency groups, though given the small N (14) for this group, this finding is unlikely to be meaningful.

3.4 Outsourced workers are more likely to work in some sectors than others; but seem to be spread across the labour market

#sectors

The three most common sectors for outsourced workers in our survey to be employed within – excluding those with an N size below X (50?) – were administrative and support service activities; water supply, sewerage, waste supply and remediation activities; and other service activities
Five of the twenty employment sectors have at least 1 in 5 of their workforce “outsourced”: more than the average of around 17% across the whole workforce.

Here we explore what proportion of workers in each sector are outsourced.³³

The plot below shows the proportion of outsourced and not outsourced workers within each sector. I.e. this is showing what sectors have higher and lower proportions of outsourced workers.

The top three Sectors with the highest proportion of outsourced workers are:

ACTIVITIES OF HOUSEHOLDS AS EMPLOYERS; UNDIFFERENTIATED GOODS-AND SERVICES-PRODUCING ACTIVITIES OF HOUSEHOLDS FOR OWN US (note that N = 31)
ADMINISTRATIVE AND SUPPORT SERVICE ACTIVITIES
WATER SUPPLY; SEWERAGE, WASTE MANAGEMENT AND REMEDIATION ACTIVITIES

Note that for an undefined sector (‘Not found’) contained one of the largest proportions of outsourced workers (31% of workers in the ‘Not found’ category were outsourced).

A key takeaway here is that whereas the total outsourced population is 17%, this figure varies by sector, from 0% for Mining… and Extraterritoral organisations… all the way to 36% for Activities of households as employers, with 5 out 20 sectors having at least 20% of their workforce outsourced.

#sectors-ogroup

Figure X also shows how the total outsourced group in each sector splits into our three outsourced “sub-groups”. We find – as you might expect, based on its dominance within the group of outsourced workers – that outsourced workers in every sector are most likely to be in the “outsourced sub-group”, i.e. those who self-identified as outsourced workers.

4 Pay

’#pay

Using regression analysis, we find that outsourced workers are on average paid £2170 less than non-outsourced workers .
The “outsourced sub-group” earns £3,813 less, and the “agency sub-group” £2,603 less, than the non-outsourced group. This finds that pay is lowest in the “outsourced sub-group” of workers, i.e. those who directly identified themselves as being outsourced. Figure X below shows the median and distribution of pay across the three outsourced sub-groups and the non-outsourced group, for comparison.

#pay-violin

Violin plot for the above

The tables and plots below show descriptive statistics on income and its distribution for outsourced and non-outsourced people. Regression analysis shows that outsourced workers are on average paid £2170 less annually than non-outsourced workers.³⁴ Per week, outsourced workers are on average paid £47 less than non-outsourced workers

Weekly stats here³⁵

Outsourcing status	n	Mean	Median	Min	Max	Standard dev.
Not outsourced	6924	26781.29	25120.67	2000	66250	13365.63
Outsourced	1367	24611.38	23061.99	2400	66108	12998.56

Outsourcing status	n	Mean	Median	Min	Max	Standard dev.
Not outsourced	6924	575.41	539.73	42.97	1423.42	287.17
Outsourced	1367	528.79	495.50	51.57	1420.37	279.28

The tables and plots below show descriptive statistics on income and its distribution for outsrouced groups. Only the full outsourced subgroup has lower income than non-outsourced people. Regression analysis shows that outsourced workers are on average paid £3100 less annually than non-outsourced workers.³⁶ Per week, outsourced workers are on average paid £67 less than non-outsourced workers

Weekly stats here³⁷

Outsourcing group	n	Mean	Median	Min	Max	Standard dev.
Not outsourced	6924	26781.29	25120.67	2000.0	66250.00	13365.63
Outsourced	897	23680.86	22165.73	2400.0	66000.00	12783.87
Likely agency	231	25081.11	22800.00	3194.7	65846.67	13702.90
High indicators	239	27921.52	25860.36	4644.0	65000.00	12629.15

Outsourcing group	n	Mean	Median	Min	Max	Standard dev.
Not outsourced	6924	575.41	539.73	42.97	1423.42	287.17
Outsourced	897	508.80	476.24	51.57	1418.05	274.67
Likely agency	231	538.88	489.87	68.64	1414.75	294.41
High indicators	239	599.91	555.62	99.78	1396.56	271.34

This difference increases to £2951 annually (£63 per week) when we take into account Age, Gender, Education, Ethnicity, Region, and Arrival Time. ³⁸ This analysis shows that all other variables, apart from Age, are in some way relevant to income. On average, and controlling for each of the other variables in the model.

Annually:

Men earn £7028 more than women.
People who have a degree earn £8195 more than people without a degree.
Workers in all non-London regions earn less than workers in London
- East Midlands: -£5770
- East of England: -£4074
- North East: -£4850
- North West: -£4476
- Northern Ireland: -£6546
- Scotland: -£5466
- South East: -£3406
- Wales: -£5366
- West Midlands: -£5002
- Yorkshire and the Humber: -£5524
People who arrived in the UK within the last year earn £6136 less than people born in the UK
People who arrived in the UK within the last 3 years earn £2392 less than people born in the UK
People who arrived in the UK within the last 5 years earn £2031 less than people born in the UK
People who arrived within the last 30 years earn £3501 more than people born in the UK.

Weekly³⁹:

Men earn £151 more than women.
People who have a degree earn £176 more than people without a degree.
Workers in all non-London regions earn less than workers in London
- East Midlands: -£124
- East of England: -£88
- North East: -£104
- North West: -£96
- Northern Ireland: -£141
- Scotland: -£117
- South East: -£73
- Wales: -£115
- West Midlands: -£107
- Yorkshire and the Humber: -£119
People who arrived in the UK within the last year earn £132 less than people born in the UK
People who arrived in the UK within the last 3 years earn £51 less than people born in the UK
People who arrived in the UK within the last 5 years earn £44 less than people born in the UK
People who arrived within the last 30 years earn £75 more than people born in the UK.

4.1 Gender pay gap

#gender-pay-gap

On average within our sample, male workers earn £6400 more than female workers per year; but further exploration of how pay relates to gender for outsourced workers suggests that this gender pay gap doesn’t differ in a statistically significant way depending on whether workers are outsourced or not
For female outsourced workers, this suggests that being an outsourced worker neither exacerbates nor diminishes the gender pay gap they face compared to male workers. Check what this controls for

4.1.1 Outsourcing status

outsourcing_status	Gender	n	mean	median	min	max	stdev
Not outsourced	Female	4321	23750.33	22199.18	2234.057	66249.37	12612.378
Not outsourced	Male	3576	30197.26	28000.00	2400.000	66000.00	13346.555
Not outsourced	Other	11	28154.06	29713.06	10800.000	50000.00	15199.138
Not outsourced	Prefer not to say	57	28650.68	25260.52	13684.996	64000.00	17065.444
Outsourced	Female	666	20318.32	19600.50	2400.000	65000.00	11603.484
Outsourced	Male	863	27983.89	26000.00	3600.000	66085.73	13040.672
Outsourced	Other	2	16230.00	21011.55	8460.000	24000.00	9886.687
Outsourced	Prefer not to say	7	21245.30	28425.08	11635.714	42000.50	16565.978

outsourcing_status	Gender	n	mean	median	min	max	stdev
Not outsourced	Female	4321	510.2894	476.9621	48.00000	1423.4057	270.9842
Not outsourced	Male	3576	648.8054	601.5961	51.56538	1418.0479	286.7584
Not outsourced	Other	11	604.9061	638.4022	232.04420	1074.2787	326.5622
Not outsourced	Prefer not to say	57	615.5762	542.7368	294.03000	1375.0767	366.6609
Outsourced	Female	666	436.5508	421.1280	51.56538	1396.5623	249.3075
Outsourced	Male	863	601.2499	558.6249	77.34807	1419.8899	280.1863
Outsourced	Other	2	348.7109	451.4451	181.76796	515.6538	212.4211
Outsourced	Prefer not to say	7	456.4674	610.7292	250.00000	902.4048	355.9295

Annual⁴⁰:

Exploring the gender pay gap by outsourcing status indicates that the pay gap does not differ depending on whether workers are outsourced our not. For non-outsourced workers, females are paid £5800.82 less than males. For outsourced workers, females are paid £6399.5 less than males. The difference between non-outsourced and outsourced workers is not significant.

Weekly⁴¹:

Exploring the gender pay gap by outsourcing status indicates that the pay gap does not differ depending on whether workers are outsourced our not. For non-outsourced workers, females are paid £124.63 less than males. For outsourced workers, females are paid £137.5 less than males. The difference between non-outsourced and outsourced workers is not significant.

The gender by outsourcing status is also not relevant for whether a worker is low income (i.e. non-sig relationship with income_group).

4.1.2 Outsourcing group

Annual data files⁴²:

Weekly⁴³:

The gender by outsourcing group is also not relevant for whether a worker is low income (i.e. non-sig relationship with income_group).

#gender-income-group

In particular, people are more likely to be in our low-paid outsourced group if they are female, or older workers .

Income group⁴⁴

A person is more likely to be in the low income group if they are:

Older
Female
Don’t have a degree (or don’t know if they have a degree?)
Are outsourced
Arrived in the UK in the last year

And less likely if they are:

Younger
Male
Have a degree
Live in the North West or Wales (compared to London)
Arrived in the UK in last 30 years

#gender-by-pay-split

Is there already a basic low / high pay split for gender? I know you talk about women being more likely to be in the low-paid group, but again not sure if there is just a basic “women make up x% of low pay group and x% of not low pay group”?

60.34% of outsourced workers in the low pay group were female, compared to 35.85% of outsourced workers in the not low pay group. This difference is statistically significant; women are more likely to be in the low income group. This pattern is the same for non outsourced workers, and there is no interaction effect; irrespective of outsourcing status, women are more likely to be low paid, and irrespective of gender, outsourced people are more likely to be low paid.

#pay-gap-sector

Overall, we find that workers in administrative and support service activities – one of the dominant sectors for outsourced workers in this research – are more likely to be lower-paid than non-outsourced workers in the same sector. The same is true for outsourced water supply (full name; sewerage, waste etc.) workers – another prominent outsourcing sector – information and communication, transportation and storage, and education workers, amongst others. In contrast, we find outsourced workers in financial and insurance activities, for example, appear to be slightly higher paid on average than their non-outsourced counterparts; however, this is one of the few sectors in which this appears to be the case.to be confirmed

I don’t quite understand the chart below the above chart in the file, would you be able to explain it – thanks! Is this the best chart to use, above? Does this need to control for anything else to show us the most accurate analysis of pay by sector for outsourced and non outsourced, or are we confident that this is showing us something notable about sector and pay?

4.2 Sectors/occupations

4.2.1 Sector and occupation hierarchy

The data from Opinium has four variables relating to sectors/occupations. These are

SectorName
Majorgroupcode
MajorsubgroupOccupation
UnitOccupation

SOC 2020 has nine major groups, 26 sub-major groups, 104 minor groups and 412 unit groups. The variables we have appear to map in the following way:

Majorgroupcode = the 9 ‘major groups’
MajorsubgroupOccupation = the 26 ‘sub-major’ groups
UnitOccupation = the 104 ‘minor groups’

This last pairing is the point of confusion. The ‘UnitOccupation’ wording came from Opinium and these categories match the coding index where they are confusingly referred to as ‘unit groups’ even though they are the minor groups.

There is no variable in our data that relates to the most disaggregated category, the 412 ‘unit groups’.

The unique values of each variable are shown in each section below.

4.2.1.1 SectorName

SectorName_labelled
FINANCIAL AND INSURANCE ACTIVITIES
WHOLESALE AND RETAIL TRADE; REPAIR OF MOTOR VEHICLES AND MOTORCYCLES
TRANSPORTATION AND STORAGE
MANUFACTURING
INFORMATION AND COMMUNICATION
ELECTRICITY, GAS, STEAM AND AIR CONDITIONING SUPPLY
CONSTRUCTION
PUBLIC ADMINISTRATION AND DEFENCE; COMPULSORY SOCIAL SECURITY
PROFESSIONAL, SCIENTIFIC AND TECHNICAL ACTIVITIES
WATER SUPPLY; SEWERAGE, WASTE MANAGEMENT AND REMEDIATION ACTIVITIES
HUMAN HEALTH AND SOCIAL WORK ACTIVITIES
EDUCATION
OTHER SERVICE ACTIVITIES
ADMINISTRATIVE AND SUPPORT SERVICE ACTIVITIES
ACCOMMODATION AND FOOD SERVICE ACTIVITIES
AGRICULTURE, FORESTRY AND FISHING
MINING AND QUARRYING
ARTS, ENTERTAINMENT AND RECREATION
REAL ESTATE ACTIVITIES
ACTIVITIES OF HOUSEHOLDS AS EMPLOYERS; UNDIFFERENTIATED GOODS-AND SERVICES-PRODUCING ACTIVITIES OF HOUSEHOLDS FOR OWN US
ACTIVITIES OF EXTRATERRITORIAL ORGANISATIONS AND BODIES
Not found

4.2.1.2 Majorgroupcode

These are the 9 major groups according to SOC

Majorgroupcode_labelled
ADMINISTRATIVE AND SECRETARIAL OCCUPATIONS
MANAGERS, DIRECTORS AND SENIOR OFFICIALS
ELEMENTARY OCCUPATIONS
SALES AND CUSTOMER SERVICE OCCUPATIONS
ASSOCIATE PROFESSIONAL OCCUPATIONS
SKILLED TRADES OCCUPATIONS
PROFESSIONAL OCCUPATIONS
PROCESS, PLANT AND MACHINE OPERATIVES
CARING, LEISURE AND OTHER SERVICE OCCUPATIONS

4.2.1.3 MajorsubgroupOccupation

These are the 26 ‘sub-major’ groups

MajorsubgroupOccupation_labelled
ADMINISTRATIVE OCCUPATIONS
CORPORATE MANAGERS AND DIRECTORS
ELEMENTARY ADMINISTRATION AND SERVICE OCCUPATIONS
CUSTOMER SERVICE OCCUPATIONS
BUSINESS AND PUBLIC SERVICE ASSOCIATE PROFESSIONALS
SKILLED CONSTRUCTION AND BUILDING TRADES
SCIENCE, RESEARCH, ENGINEERING AND TECHNOLOGY PROFESSIONALS
BUSINESS, MEDIA AND PUBLIC SERVICE PROFESSIONALS
PROCESS, PLANT AND MACHINE OPERATIVES
CARING PERSONAL SERVICE OCCUPATIONS
TRANSPORT AND MOBILE MACHINE DRIVERS AND OPERATIVES
SKILLED METAL, ELECTRICAL AND ELECTRONIC TRADES
CULTURE, MEDIA AND SPORTS OCCUPATIONS
SALES OCCUPATIONS
LEISURE, TRAVEL AND RELATED PERSONAL SERVICE OCCUPATIONS
SECRETARIAL AND RELATED OCCUPATIONS
HEALTH AND SOCIAL CARE ASSOCIATE PROFESSIONALS
HEALTH PROFESSIONALS
SKILLED AGRICULTURAL AND RELATED TRADES
TEACHING AND OTHER EDUCATIONAL PROFESSIONALS
SCIENCE, ENGINEERING AND TECHNOLOGY ASSOCIATE PROFESSIONALS
PROTECTIVE SERVICE OCCUPATIONS
OTHER MANAGERS AND PROPRIETORS
ELEMENTARY TRADES AND RELATED OCCUPATIONS
TEXTILES, PRINTING AND OTHER SKILLED TRADES
COMMUNITY AND CIVIL ENFORCEMENT OCCUPATIONS

4.2.1.4 UnitOccupation

These are indeed the 104 ‘minor groups’.

UnitOccupation_labelled
Administrative Occupations: Finance
Managers and Directors in Retail and Wholesale
Functional Managers and Directors
Elementary Storage Occupations
Customer Service Occupations
Business Associate Professionals
Construction and Building Trades Supervisors
Information Technology Professionals
Legal Professionals
Production, Factory and Assembly Supervisors
Administrative Occupations: Government and Related Organisations
Caring Personal Services
Mobile Machine Drivers and Operatives
Quality and Regulatory Professionals
Metal Forming, Welding and Related Trades
Artistic, Literary and Media Occupations
Metal Machining, Fitting and Instrument Making Trades
Sales Assistants and Retail Cashiers
Other Administrative Occupations
Production Managers and Directors
Regulatory Associate Professionals
Public Services Associate Professionals
Animal Care and Control Services
Administrative Occupations: Office Managers and Supervisors
Engineering Professionals
Shopkeepers and Sales Supervisors
Elementary Security Occupations
Other Elementary Services Occupations
Legal Associate Professionals
Architects, Chartered Architectural Technologists, Planning Officers, Surveyors and Construction Professionals
Housekeeping and Related Services
Secretarial and Related Occupations
Welfare and Housing Associate Professionals
Sales, Marketing and Related Associate Professionals
Nursing Professionals
NA
Natural and Social Science Professionals
Agricultural and Related Trades
Elementary Administration Occupations
Process Operatives
Other Educational Professionals
Road Transport Drivers
Information Technology Technicians
Business and Financial Project Management Professionals
Finance Professionals
Therapy Professionals
Electrical and Electronic Trades
Teaching Professionals
HR, Training and Other Vocational Associate Guidance Professionals
Construction Operatives
Protective Service Occupations
Sales Related Occupations
Leisure and Travel Services
Chief Executives and Senior Officials
Teaching and Childcare Support Occupations
Managers and Proprietors in Health and Care Services
Managers and Proprietors in Other Services
Managers in Logistics, Warehousing and Transport
Other Health Professionals
Elementary Construction Occupations
Business, Research and Administrative Professionals
Veterinary nurses
Research and Development (R&D) and Other Research Professionals
Assemblers and Routine Operatives
Welfare Professionals
Science, Engineering and Production Technicians
Finance Associate Professionals
Plant and Machine Operatives
Elementary Cleaning Occupations
Food Preparation and Hospitality Trades
Construction and Building Trades
Senior Officers in Protective Services
Managers and Proprietors in Hospitality and Leisure Services
Other Drivers and Transport Operatives
Health Associate Professionals
Health and Social Services Managers and Directors
Skilled Metal, Electrical and Electronic Trades Supervisors
Administrative Occupations: Records
Sports and Fitness Occupations
Medical Practitioners
Media Professionals
Web and Multimedia Design Professionals
Transport Associate Professionals
Conservation and Environment Professionals
Vehicle Trades
Elementary Process Plant Occupations
Teaching and Childcare Associate Professionals
Cleaning and Housekeeping Managers and Supervisors
Librarians and Related Professionals
Customer Service Supervisors
Elementary Sales Occupations
Veterinarians
Hairdressers and Related Services
Printing Trades
Building Finishing Trades
Managers and Proprietors in Agriculture Related Services
Other Skilled Trades
Directors in Logistics, Warehousing and Transport
Metal Working Machine Operatives
Community and Civil Enforcement Occupations
Design Occupations
Elementary Agricultural Occupations
Textiles and Garments Trades
CAD, Drawing and Architectural Technicians

4.2.2 Sectoral pay differences

4.2.2.1 Weekly⁴⁵

4.2.2.2 Hourly⁴⁶

4.2.2.3 Comparing pay penalty between weekly and hourly

Note only consider n >= 10

The table below shows the pay difference between outsourced and non-outsourced workers by sector. Negative values indicate pay penalties for outsourced workers. The ‘pattern_reverse’ column indicates the 4 sectors where the direction of the difference is different if you consider hourly versus weekly pay difference.

For example, per week, outsourced workers in PROFESSIONAL, SCIENTIFIC AND TECHNICAL ACTIVITIES earn £1.77 less than non-outsourced counterparts, but per hour they are paid on average 1.3y more than non-outsourced workers. This suggests that outsourced rates are higher in this occupation, but the amount of work available is not enough for outsourced people to earn more than non-outsourced people on a weekly basis.

The reverse pattern indicates sectors where outsourced workers are paid less per hour but work more hours and earn more per week than their non-outsourced counterparts.

Weekly and hourly pay difference by sector
sector_name_labelled	weekly_pay_diff	hourly_pay_diff	pattern_reverse
WATER SUPPLY; SEWERAGE, WASTE MANAGEMENT AND REMEDIATION ACTIVITIES	-143.703719	-2.4894583	0
ACTIVITIES OF HOUSEHOLDS AS EMPLOYERS; UNDIFFERENTIATED GOODS-AND SERVICES-PRODUCING ACTIVITIES OF HOUSEHOLDS FOR OWN US	-127.012080	0.5256981	1
PUBLIC ADMINISTRATION AND DEFENCE; COMPULSORY SOCIAL SECURITY	-108.857788	-2.8196227	0
TRANSPORTATION AND STORAGE	-106.492790	-2.4819904	0
ADMINISTRATIVE AND SUPPORT SERVICE ACTIVITIES	-100.690083	-3.5297510	0
EDUCATION	-99.173285	-0.8902779	0
MANUFACTURING	-86.099585	-2.2453893	0
INFORMATION AND COMMUNICATION	-82.198780	-2.9866898	0
Not found	-68.232481	0.9112035	1
HUMAN HEALTH AND SOCIAL WORK ACTIVITIES	-49.055714	-1.7786666	0
ELECTRICITY, GAS, STEAM AND AIR CONDITIONING SUPPLY	-36.646155	-0.3160751	0
REAL ESTATE ACTIVITIES	-32.495631	-1.5397384	0
ACCOMMODATION AND FOOD SERVICE ACTIVITIES	-30.203839	-0.7020112	0
CONSTRUCTION	-20.556524	-0.9727247	0
OTHER SERVICE ACTIVITIES	-9.189807	-0.5184089	0
PROFESSIONAL, SCIENTIFIC AND TECHNICAL ACTIVITIES	-1.776135	1.3701111	1
FINANCIAL AND INSURANCE ACTIVITIES	11.183407	-0.7242390	1
WHOLESALE AND RETAIL TRADE; REPAIR OF MOTOR VEHICLES AND MOTORCYCLES	31.192287	0.4196930	0
ARTS, ENTERTAINMENT AND RECREATION	50.109641	1.9067333	0

4.2.3 Major group occupations

4.2.3.1 Weekly⁴⁷

Here we look at Major subgroup occupations within sectors. We only consider the down to ‘Other services’, as the remaining sectors have small n for outsourced group. Note you can find larger images for these plots in outputs/figures/occupation_pay_plots.

The figures indicate there is variation between occupations within sectors in terms of whether outsourced people are paid less or more than non-outsourced workers.

4.2.3.2 Hourly⁴⁸

4.2.3.3 Comparing pay penalty between weekly and hourly

Note only consider n >= 10

The table below shows the weekly and hourly pay difference between outsourced and non-outsourced workers by major group occupation. As before, negative values indicate pay penalties for outsourced workers, and the ‘pattern_reverse’ column indicates the occupations where the direction of the difference is different if you consider hourly versus weekly pay difference.

Weekly and hourly pay difference by major group occupations within Accommodation And Food Service Activities
majorsubgroup_occupation_labelled	weekly_pay_diff	hourly_pay_diff	pattern_reverse
Elementary Administration And Service Occupations	-43.529668	-0.7017468	0
Textiles, Printing And Other Skilled Trades	7.332828	-0.5894382	1

Weekly and hourly pay difference by major group occupations within Administrative And Support Service Activities
majorsubgroup_occupation_labelled	weekly_pay_diff	hourly_pay_diff	pattern_reverse
Customer Service Occupations	-102.66553	-3.2164373	0
Elementary Administration And Service Occupations	15.41439	-1.0311391	1
Administrative Occupations	43.30396	-0.2282938	1

Error in UseMethod("filter") : 
  no applicable method for 'filter' applied to an object of class "logical"

Weekly and hourly pay difference by major group occupations within Construction
majorsubgroup_occupation_labelled	weekly_pay_diff	hourly_pay_diff
Corporate Managers And Directors	-140.83839	-3.0843931
Administrative Occupations	-66.73665	-0.3469444
Skilled Construction And Building Trades	114.15752	1.0713412

Weekly and hourly pay difference by major group occupations within Education
majorsubgroup_occupation_labelled	weekly_pay_diff	hourly_pay_diff	pattern_reverse
Teaching And Other Educational Professionals	-63.69984	0.9922447	1
Caring Personal Service Occupations	-44.52325	-1.2978309	0
Elementary Administration And Service Occupations	62.16781	0.3060616	0

Weekly and hourly pay difference by major group occupations within Financial And Insurance Activities
majorsubgroup_occupation_labelled	weekly_pay_diff	hourly_pay_diff
Business, Media And Public Service Professionals	-74.01367	-3.4437168
Administrative Occupations	12.10298	0.3119162
Business And Public Service Associate Professionals	38.27790	0.1876232
Corporate Managers And Directors	83.87818	0.7076445

Weekly and hourly pay difference by major group occupations within Human Health And Social Work Activities
majorsubgroup_occupation_labelled	weekly_pay_diff	hourly_pay_diff	pattern_reverse
Corporate Managers And Directors	-121.63952	-4.2284955	0
Health Professionals	-82.89714	-3.7965198	0
Caring Personal Service Occupations	-29.46213	-0.4411910	0
Administrative Occupations	-16.11184	-0.3403748	0
Health And Social Care Associate Professionals	13.05492	-0.2742730	1

Weekly and hourly pay difference by major group occupations within Information And Communication
majorsubgroup_occupation_labelled	weekly_pay_diff	hourly_pay_diff
Science, Engineering And Technology Associate Professionals	-172.85859	-4.858679
Science, Research, Engineering And Technology Professionals	-63.61085	-2.228024
Corporate Managers And Directors	-46.09461	-5.531615

Weekly and hourly pay difference by major group occupations within Manufacturing
majorsubgroup_occupation_labelled	weekly_pay_diff	hourly_pay_diff	pattern_reverse
Corporate Managers And Directors	-224.324833	-3.950745	0
Process, Plant And Machine Operatives	-4.759046	-0.255132	0
Elementary Administration And Service Occupations	30.827500	-1.333051	1

Error in UseMethod("filter") : 
  no applicable method for 'filter' applied to an object of class "logical"

Weekly and hourly pay difference by major group occupations within Other Service Activities
majorsubgroup_occupation_labelled	weekly_pay_diff	hourly_pay_diff	pattern_reverse
Elementary Administration And Service Occupations	-34.19164	1.336877	1

Weekly and hourly pay difference by major group occupations within Professional, Scientific And Technical Activities
majorsubgroup_occupation_labelled	weekly_pay_diff	hourly_pay_diff
Business And Public Service Associate Professionals	-79.87644	-2.665625
Business, Media And Public Service Professionals	-70.48336	-2.998460
Science, Research, Engineering And Technology Professionals	184.43019	5.567409

Weekly and hourly pay difference by major group occupations within Public Administration And Defence; Compulsory Social Security
majorsubgroup_occupation_labelled	weekly_pay_diff	hourly_pay_diff	pattern_reverse
Administrative Occupations	-55.57135	-1.576596	0

Weekly and hourly pay difference by major group occupations within Transportation And Storage
majorsubgroup_occupation_labelled	weekly_pay_diff	hourly_pay_diff	pattern_reverse
Transport And Mobile Machine Drivers And Operatives	-153.62305	-4.2130255	0
Elementary Administration And Service Occupations	19.59351	0.8366258	0

Weekly and hourly pay difference by major group occupations within Wholesale And Retail Trade; Repair Of Motor Vehicles And Motorcycles
majorsubgroup_occupation_labelled	weekly_pay_diff	hourly_pay_diff
Corporate Managers And Directors	-97.84726	-1.7337137
Transport And Mobile Machine Drivers And Operatives	-21.37001	-1.0275407
Elementary Administration And Service Occupations	17.09860	0.2720542
Administrative Occupations	61.76549	0.7687880
Sales Occupations	85.88476	0.6364519

4.2.4 Major group occupations across all sectors

Note I only consider unit occupations where the the minimum n is >= 10.

4.2.4.1 Weekly⁴⁹

Looking at occupations across all sectors, there are many occupations where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:⁵⁰

Weekly pay penalty for major subgroup occupations across all sectors
majorsubgroup_occupation_labelled	pay_penalty	wtd_avg_income_not_outsourced	wtd_avg_income_outsourced	n_not_outsourced	n_outsourced
Protective Service Occupations	-186.115913	798.1466	612.0307	87	11
Science, Engineering And Technology Associate Professionals	-109.411519	694.4187	585.0072	168	38
Transport And Mobile Machine Drivers And Operatives	-107.609950	623.2422	515.6322	207	54
Leisure, Travel And Related Personal Service Occupations	-86.111295	440.7630	354.6517	109	25
Elementary Trades And Related Occupations	-81.066699	522.1490	441.0823	40	11
Health Professionals	-80.351915	701.1247	620.7728	303	62
Business, Media And Public Service Professionals	-79.019478	805.3639	726.3444	458	69
Teaching And Other Educational Professionals	-65.534711	665.9185	600.3837	364	42
Corporate Managers And Directors	-62.645479	781.4969	718.8514	600	123
Other Managers And Proprietors	-55.771413	610.1658	554.3944	162	29
Customer Service Occupations	-44.336562	506.9532	462.6166	185	35
Business And Public Service Associate Professionals	-43.246143	700.1621	656.9160	504	75
Secretarial And Related Occupations	-39.218379	477.0919	437.8735	146	17
Caring Personal Service Occupations	-23.753758	396.5326	372.7788	502	117
Process, Plant And Machine Operatives	-17.004071	546.9718	529.9677	154	34
Science, Research, Engineering And Technology Professionals	-1.418507	828.6870	827.2685	397	82

4.2.4.2 Hourly⁵¹

Looking at occupations across all sectors, there are many occupations where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:⁵²

Hourly pay penalty for major subgroup occupations across all sectors
majorsubgroup_occupation_labelled	pay_penalty	wtd_avg_income_not_outsourced	wtd_avg_income_outsourced	n_not_outsourced	n_outsourced
Protective Service Occupations	-4.6610127	20.77082	16.10980	87	11
Science, Engineering And Technology Associate Professionals	-3.6935626	19.08387	15.39031	168	38
Health Professionals	-3.5288815	21.39635	17.86747	303	62
Transport And Mobile Machine Drivers And Operatives	-2.8911514	16.32247	13.43131	207	54
Business, Media And Public Service Professionals	-2.0749121	22.18071	20.10580	458	69
Business And Public Service Associate Professionals	-1.5629055	19.28122	17.71831	504	75
Secretarial And Related Occupations	-1.4253607	14.65362	13.22826	146	17
Leisure, Travel And Related Personal Service Occupations	-1.3425317	13.64332	12.30079	109	25
Other Managers And Proprietors	-1.2644888	17.16476	15.90027	162	29
Corporate Managers And Directors	-1.1326326	21.20963	20.07699	600	123
Customer Service Occupations	-0.7665494	15.11169	14.34514	185	35
Science, Research, Engineering And Technology Professionals	-0.6370135	22.48816	21.85114	397	82
Caring Personal Service Occupations	-0.5554487	12.79316	12.23771	502	117
Skilled Construction And Building Trades	-0.4931561	16.05214	15.55899	63	18
Elementary Administration And Service Occupations	-0.4652052	12.25335	11.78814	483	173
Elementary Trades And Related Occupations	-0.3578431	14.92769	14.56985	40	11
Process, Plant And Machine Operatives	-0.3423633	14.20301	13.86065	154	34
Textiles, Printing And Other Skilled Trades	-0.3203910	13.48948	13.16909	115	24

4.2.4.3 Comparing pay penalty between weekly and hourly

Note only consider n >= 10

Weekly and hourly pay difference by major sub group occupation
majorsubgroup_occupation_labelled	weekly_pay_diff	hourly_pay_diff	pattern_reverse
Protective Service Occupations	-186.115913	-4.6610127	0
Science, Engineering And Technology Associate Professionals	-109.411519	-3.6935626	0
Transport And Mobile Machine Drivers And Operatives	-107.609950	-2.8911514	0
Leisure, Travel And Related Personal Service Occupations	-86.111295	-1.3425317	0
Elementary Trades And Related Occupations	-81.066699	-0.3578431	0
Health Professionals	-80.351915	-3.5288815	0
Business, Media And Public Service Professionals	-79.019478	-2.0749121	0
Teaching And Other Educational Professionals	-65.534711	0.8216280	1
Corporate Managers And Directors	-62.645479	-1.1326326	0
Other Managers And Proprietors	-55.771413	-1.2644888	0
Customer Service Occupations	-44.336562	-0.7665494	0
Business And Public Service Associate Professionals	-43.246143	-1.5629055	0
Secretarial And Related Occupations	-39.218379	-1.4253607	0
Caring Personal Service Occupations	-23.753758	-0.5554487	0
Process, Plant And Machine Operatives	-17.004071	-0.3423633	0
Science, Research, Engineering And Technology Professionals	-1.418507	-0.6370135	0
Elementary Administration And Service Occupations	1.101633	-0.4652052	1
Health And Social Care Associate Professionals	1.527012	0.7717535	0
Administrative Occupations	9.805435	0.9328750	0
Textiles, Printing And Other Skilled Trades	17.090645	-0.3203910	1
Skilled Construction And Building Trades	59.427980	-0.4931561	1
Sales Occupations	71.009348	0.4770525	0
Skilled Metal, Electrical And Electronic Trades	75.473170	2.0030587	0

4.2.5 Minor group occupations within sectors

4.2.5.1 Weekly⁵³

Note I only consider unit occupations where the the minimum n is >= 10.

Many instances where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:⁵⁴

Weekly pay penalty for unit occupations within sectors
sector_name_labelled	unit_occupation_labelled	pay_penalty	wtd_avg_income_not_outsourced	wtd_avg_income_outsourced	n_not_outsourced	n_outsourced
Manufacturing	Functional Managers And Directors	-293.934661	796.9414	503.0067	35	14
Wholesale And Retail Trade; Repair Of Motor Vehicles And Motorcycles	Functional Managers And Directors	-214.846399	750.7923	535.9459	50	12
Information And Communication	Information Technology Technicians	-195.211801	783.5755	588.3637	16	11
Administrative And Support Service Activities	Customer Service Occupations	-120.175519	437.9813	317.8058	11	10
Administrative And Support Service Activities	Elementary Cleaning Occupations	-108.761529	324.9068	216.1453	15	16
Information And Communication	Information Technology Professionals	-94.301622	935.2414	840.9397	79	26
Education	Teaching Professionals	-83.084349	674.6332	591.5488	283	40
Information And Communication	Functional Managers And Directors	-68.796060	933.1020	864.3060	28	11
Human Health And Social Work Activities	Nursing Professionals	-61.552329	670.7051	609.1528	177	35
Accommodation And Food Service Activities	Other Elementary Services Occupations	-59.551254	336.4167	276.8655	94	32
Education	Teaching And Childcare Support Occupations	-46.708093	335.0658	288.3577	137	26
Transportation And Storage	Road Transport Drivers	-32.418643	606.6916	574.2729	71	19
Human Health And Social Work Activities	Caring Personal Services	-28.207324	424.4531	396.2457	301	81
Human Health And Social Work Activities	Other Health Professionals	-27.517406	662.8731	635.3557	50	13
Wholesale And Retail Trade; Repair Of Motor Vehicles And Motorcycles	Road Transport Drivers	-23.475416	524.6526	501.1772	37	11
Wholesale And Retail Trade; Repair Of Motor Vehicles And Motorcycles	Shopkeepers And Sales Supervisors	-20.296337	454.6491	434.3528	55	20
Wholesale And Retail Trade; Repair Of Motor Vehicles And Motorcycles	Elementary Storage Occupations	-8.014079	453.8159	445.8018	53	14

4.2.5.2 Hourly⁵⁵

Note I only consider unit occupations where the the minimum n is >= 10.

Many instances where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:⁵⁶

Hourly pay penalty for unit occupations within sectors
sector_name_labelled	unit_occupation_labelled	pay_penalty	wtd_avg_income_not_outsourced	wtd_avg_income_outsourced	n_not_outsourced	n_outsourced
Information And Communication	Functional Managers And Directors	-5.7086015	27.91172	22.20312	28	11
Manufacturing	Functional Managers And Directors	-5.6508440	20.27183	14.62099	35	14
Information And Communication	Information Technology Technicians	-5.5420128	22.78647	17.24446	16	11
Wholesale And Retail Trade; Repair Of Motor Vehicles And Motorcycles	Functional Managers And Directors	-4.4353102	20.37413	15.93882	50	12
Administrative And Support Service Activities	Customer Service Occupations	-3.4485038	13.55552	10.10702	11	10
Information And Communication	Information Technology Professionals	-3.2629174	25.27990	22.01698	79	26
Human Health And Social Work Activities	Nursing Professionals	-3.0311052	20.78165	17.75055	177	35
Administrative And Support Service Activities	Elementary Cleaning Occupations	-2.3970238	12.43682	10.03980	15	16
Human Health And Social Work Activities	Other Health Professionals	-1.6585664	19.26977	17.61120	50	13
Education	Teaching And Childcare Support Occupations	-1.3680396	12.22188	10.85385	137	26
Wholesale And Retail Trade; Repair Of Motor Vehicles And Motorcycles	Road Transport Drivers	-1.2375271	14.20518	12.96765	37	11
Wholesale And Retail Trade; Repair Of Motor Vehicles And Motorcycles	Shopkeepers And Sales Supervisors	-1.1256400	13.48155	12.35591	55	20
Accommodation And Food Service Activities	Other Elementary Services Occupations	-0.9364942	11.70754	10.77105	94	32
Transportation And Storage	Road Transport Drivers	-0.5985330	15.42085	14.82232	71	19
Accommodation And Food Service Activities	Food Preparation And Hospitality Trades	-0.5303345	12.14506	11.61473	50	15
Human Health And Social Work Activities	Caring Personal Services	-0.4395354	12.99653	12.55699	301	81
Wholesale And Retail Trade; Repair Of Motor Vehicles And Motorcycles	Elementary Storage Occupations	-0.1650738	12.77512	12.61005	53	14

4.2.5.3 Comparing pay penalty between weekly and hourly

Note only consider n >= 10

Weekly and hourly pay difference by major group occupations within Accommodation And Food Service Activities
unit_occupation_labelled	weekly_pay_diff	hourly_pay_diff	pattern_reverse
Other Elementary Services Occupations	-59.55125	-0.9364942	0
Food Preparation And Hospitality Trades	10.27509	-0.5303345	1

Weekly and hourly pay difference by major group occupations within Administrative And Support Service Activities
unit_occupation_labelled	weekly_pay_diff	hourly_pay_diff	pattern_reverse
Customer Service Occupations	-120.1755	-3.448504	0
Elementary Cleaning Occupations	-108.7615	-2.397024	0

Weekly and hourly pay difference by major group occupations within Construction
unit_occupation_labelled	weekly_pay_diff	hourly_pay_diff	pattern_reverse
Construction And Building Trades	84.97612	0.6437012	0

Weekly and hourly pay difference by major group occupations within Education
unit_occupation_labelled	weekly_pay_diff	hourly_pay_diff	pattern_reverse
Teaching Professionals	-83.08435	0.2693538	1
Teaching And Childcare Support Occupations	-46.70809	-1.3680396	0

Weekly and hourly pay difference by major group occupations within Financial And Insurance Activities
unit_occupation_labelled	weekly_pay_diff	hourly_pay_diff	pattern_reverse
Functional Managers And Directors	69.06287	0.6346015	0

Weekly and hourly pay difference by major group occupations within Human Health And Social Work Activities
unit_occupation_labelled	weekly_pay_diff	hourly_pay_diff
Nursing Professionals	-61.55233	-3.0311052
Caring Personal Services	-28.20732	-0.4395354
Other Health Professionals	-27.51741	-1.6585664
Other Administrative Occupations	92.16508	3.1321709

Weekly and hourly pay difference by major group occupations within Information And Communication
unit_occupation_labelled	weekly_pay_diff	hourly_pay_diff
Information Technology Technicians	-195.21180	-5.542013
Information Technology Professionals	-94.30162	-3.262917
Functional Managers And Directors	-68.79606	-5.708602

Weekly and hourly pay difference by major group occupations within Manufacturing
unit_occupation_labelled	weekly_pay_diff	hourly_pay_diff	pattern_reverse
Functional Managers And Directors	-293.9347	-5.650844	0

Error in UseMethod("filter") : 
  no applicable method for 'filter' applied to an object of class "logical"

Error in UseMethod("filter") : 
  no applicable method for 'filter' applied to an object of class "logical"

Error in UseMethod("filter") : 
  no applicable method for 'filter' applied to an object of class "logical"

Weekly and hourly pay difference by major group occupations within Transportation And Storage
unit_occupation_labelled	weekly_pay_diff	hourly_pay_diff	pattern_reverse
Road Transport Drivers	-32.41864	-0.598533	0
Elementary Storage Occupations	98.87197	2.374470	0

Weekly and hourly pay difference by major group occupations within Wholesale And Retail Trade; Repair Of Motor Vehicles And Motorcycles
unit_occupation_labelled	weekly_pay_diff	hourly_pay_diff
Functional Managers And Directors	-214.846399	-4.4353102
Road Transport Drivers	-23.475416	-1.2375271
Shopkeepers And Sales Supervisors	-20.296337	-1.1256400
Elementary Storage Occupations	-8.014079	-0.1650738
Managers And Directors In Retail And Wholesale	9.050868	0.7060230
Sales Assistants And Retail Cashiers	98.384931	0.9154210

4.2.6 Minor group occupations across all sectors

Note I only consider unit occupations where the the minimum n is >= 10.

4.2.6.1 Weekly⁵⁷

Looking at occupations across all sectors, there are many occupations where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:⁵⁸

Weekly pay penalty for unit occupations across all sectors
unit_occupation_labelled	pay_penalty	wtd_avg_income_not_outsourced	wtd_avg_income_outsourced	n_not_outsourced	n_outsourced
Protective Service Occupations	-186.11591	798.1466	612.0307	87	11
Administrative Occupations: Government And Related Organisations	-173.25254	660.8699	487.6173	150	11
Information Technology Technicians	-172.50995	748.9725	576.4626	90	27
Elementary Administration Occupations	-125.57767	477.9704	352.3927	34	11
Functional Managers And Directors	-89.04844	820.2879	731.2395	385	88
Business, Research And Administrative Professionals	-88.82911	845.0837	756.2546	101	18
Teaching Professionals	-82.42140	675.1625	592.7411	293	41
Nursing Professionals	-70.58157	673.2751	602.6935	180	36
Sales, Marketing And Related Associate Professionals	-69.20211	684.0741	614.8720	155	20
Business Associate Professionals	-65.93246	685.3387	619.4062	125	21
Finance Professionals	-64.48880	805.6680	741.1792	110	20
Information Technology Professionals	-62.39887	887.5669	825.1680	231	50
Finance Associate Professionals	-54.41873	726.6762	672.2575	55	12
Teaching And Childcare Support Occupations	-53.27575	348.2522	294.9765	156	28
Shopkeepers And Sales Supervisors	-46.32116	479.0732	432.7521	87	27
Science, Engineering And Production Technicians	-39.51294	644.4197	604.9067	76	11
Secretarial And Related Occupations	-39.21838	477.0919	437.8735	146	17
Welfare And Housing Associate Professionals	-38.62785	525.0780	486.4501	84	10
Other Elementary Services Occupations	-38.03264	307.3713	269.3387	144	39
Customer Service Occupations	-35.31753	493.9996	458.6820	163	29
Other Health Professionals	-32.72395	671.4596	638.7357	62	17
Road Transport Drivers	-26.76629	569.7437	542.9774	154	41
Caring Personal Services	-25.12189	422.0606	396.9387	332	87
Elementary Cleaning Occupations	-18.85941	282.8777	264.0183	113	60

4.2.6.2 Hourly⁵⁹

Looking at occupations across all sectors, there are many occupations where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:⁶⁰

Hourly pay penalty for unit occupations across all sectors
unit_occupation_labelled	pay_penalty	wtd_avg_income_not_outsourced	wtd_avg_income_outsourced	n_not_outsourced	n_outsourced
Information Technology Technicians	-5.3727754	21.00403	15.63125	90	27
Administrative Occupations: Government And Related Organisations	-5.2459770	19.14590	13.89992	150	11
Protective Service Occupations	-4.6610127	20.77082	16.10980	87	11
Nursing Professionals	-3.3125387	20.83875	17.52621	180	36
Elementary Administration Occupations	-3.1391398	14.09894	10.95980	34	11
Business, Research And Administrative Professionals	-2.7378800	22.77643	20.03855	101	18
Finance Professionals	-2.6390686	22.80200	20.16293	110	20
Business Associate Professionals	-2.5706545	19.48274	16.91209	125	21
Science, Engineering And Production Technicians	-2.4845163	17.31369	14.82918	76	11
Information Technology Professionals	-2.1256053	24.07395	21.94834	231	50
Shopkeepers And Sales Supervisors	-1.9766868	13.87657	11.89988	87	27
Functional Managers And Directors	-1.8781560	22.40838	20.53022	385	88
Sales, Marketing And Related Associate Professionals	-1.8203230	18.43185	16.61152	155	20
Finance Associate Professionals	-1.8146591	20.48350	18.66884	55	12
Secretarial And Related Occupations	-1.4253607	14.65362	13.22826	146	17
Teaching And Childcare Support Occupations	-1.2699956	12.33642	11.06642	156	28
Welfare And Housing Associate Professionals	-1.2294317	17.03022	15.80078	84	10
Other Health Professionals	-1.2132578	19.56315	18.34989	62	17
Elementary Cleaning Occupations	-0.9426106	12.07398	11.13137	113	60
Other Elementary Services Occupations	-0.7139322	11.27759	10.56366	144	39
Road Transport Drivers	-0.6822714	14.77702	14.09474	154	41
Construction And Building Trades	-0.6551832	16.55799	15.90281	48	16
Hr, Training And Other Vocational Associate Guidance Professionals	-0.5313910	20.16320	19.63181	115	11
Food Preparation And Hospitality Trades	-0.4007838	13.05801	12.65722	98	23
Caring Personal Services	-0.3815696	13.03798	12.65641	332	87
Customer Service Occupations	-0.0397066	14.68619	14.64648	163	29

4.2.6.3 Comparing pay penalty between weekly and hourly

Note only consider n >= 10

The table below shows the pay difference between outsourced and non-outsourced workers by minor sub group occupation. Negative values indicate pay penalties for outsourced workers. The ‘pattern_reverse’ column indicates the four occupations where the direction of the difference is different if you consider hourly versus weekly pay difference. For example, per week, teaching professionals who are outsourced earn £82 less than non-outsourced counterparts, but per hour they are paid on average 16p more than non-outsourced workers. This suggests that outsrouced rates are higher in this occupation, but the amount of work available is not enough for outsrouced people to earn more than non-outsoruced people on a weekly basis.

The reverse pattern is evident for the other three. For example, outsourced workers in food preparation and hospitality earn on average 40p less an hour than non-outsourced workers, but earn on average £17 more per week than non-outsourced workers. This suggests that outsourced workers in this occupation are paid less but work more hours than their non-outsourced counterparts.

Weekly and hourly pay difference by minor sub group occupation
unit_occupation_labelled	weekly_pay_diff	hourly_pay_diff	pattern_reverse
Protective Service Occupations	-186.115913	-4.6610127	0
Administrative Occupations: Government And Related Organisations	-173.252537	-5.2459770	0
Information Technology Technicians	-172.509948	-5.3727754	0
Elementary Administration Occupations	-125.577667	-3.1391398	0
Functional Managers And Directors	-89.048442	-1.8781560	0
Business, Research And Administrative Professionals	-88.829106	-2.7378800	0
Teaching Professionals	-82.421399	0.1691203	1
Nursing Professionals	-70.581567	-3.3125387	0
Sales, Marketing And Related Associate Professionals	-69.202110	-1.8203230	0
Business Associate Professionals	-65.932462	-2.5706545	0
Finance Professionals	-64.488797	-2.6390686	0
Information Technology Professionals	-62.398866	-2.1256053	0
Finance Associate Professionals	-54.418727	-1.8146591	0
Teaching And Childcare Support Occupations	-53.275748	-1.2699956	0
Shopkeepers And Sales Supervisors	-46.321162	-1.9766868	0
Science, Engineering And Production Technicians	-39.512943	-2.4845163	0
Secretarial And Related Occupations	-39.218379	-1.4253607	0
Welfare And Housing Associate Professionals	-38.627851	-1.2294317	0
Other Elementary Services Occupations	-38.032638	-0.7139322	0
Customer Service Occupations	-35.317533	-0.0397066	0
Other Health Professionals	-32.723952	-1.2132578	0
Road Transport Drivers	-26.766285	-0.6822714	0
Caring Personal Services	-25.121892	-0.3815696	0
Elementary Cleaning Occupations	-18.859410	-0.9426106	0
Hr, Training And Other Vocational Associate Guidance Professionals	2.135395	-0.5313910	1
Managers And Directors In Retail And Wholesale	3.901812	1.0154738	0
Elementary Storage Occupations	6.604673	0.3415404	0
Administrative Occupations: Finance	10.603485	3.5501683	0
Food Preparation And Hospitality Trades	17.138094	-0.4007838	1
Other Administrative Occupations	22.177338	1.0835737	0
Administrative Occupations: Office Managers And Supervisors	49.688329	1.5565714	0
Construction And Building Trades	51.831109	-0.6551832	1
Process Operatives	65.088732	1.5912972	0
Administrative Occupations: Records	80.002327	1.5821197	0
Electrical And Electronic Trades	86.173451	2.4032684	0
Sales Assistants And Retail Cashiers	96.203172	1.0828781	0
Engineering Professionals	104.921934	1.1020864	0
Elementary Security Occupations	228.853164	0.9008344	0

4.3 London has a disproportionate share of the UK’s outsourced workers, followed by the East and West Midlands

#regions

In London, around 25% of workers are outsourced – the highest proportion of any region in the UK. London is followed by the East Midlands (19%) and West Midlands (18%) in the share of workers in the region who are outsourced, with the East of England being the region with the lowest share of outsourced workers as part of the total employed workforce, at 13%.
Possible addition: Should this include some comment on WHY we think this might be the case? Should we look at sectoral splits in London, compared to everywhere else, to see whether there are significant sector differences that might explain this trend?

The plot below shows the proportion of workers within each region who are outsourced.⁶¹

Below we map the workforce composition in each region. The first map emphasises that London has the highest concentration of outsourced workers (25%).

The second map excludes London so that is easier to see how the remaining regions compare. After London, the regions with the highest proportion of outsourced workers are:

East Midlands (19%)
West Midlands (18%)
Wales (18%)
North West (17%)
Northern Ireland (16%)

We can also explore how the the entire UK workforce is distributed across the country.⁶² The table and map below show the percentage of outsourced workers in each region as a proportion of the total UK workforce. They show where the UK’s outsourced workforce is concentrated. The regions with the highest share of the UK’s outsourced workforce are:

London (21%)
North West (11%)
South East (11%)
West Midlands (9%)
East Midlands (8%)

Region	Frequency	Sum	Percentage
London	357.35	1708.36	20.92
North West	189.39	1708.36	11.09
South East	188.47	1708.36	11.03
West Midlands	161.49	1708.36	9.45
East Midlands	140.50	1708.36	8.22
Scotland	125.82	1708.36	7.37
East of England	125.49	1708.36	7.35
South West	120.50	1708.36	7.05
Yorkshire and the Humber	119.46	1708.36	6.99
Wales	83.25	1708.36	4.87
North East	53.06	1708.36	3.11
Northern Ireland	43.56	1708.36	2.55

Footnotes

--- title: "Key findings - matched to report" author: - Jolyon Miles-Wilson - Celestin Okoroji date: "`r format(Sys.time(), '%e %B %Y')`" format: html: self-contained: true code-fold: true code-tools: true code-summary: "Code for Nerds" toc: true toc-depth: 5 execute: echo: false warning: false number-sections: true --- ```{r packages} library(haven) library(poLCA) library(Hmisc) library(dplyr) library(ggplot2) library(tidyr) library(skimr) library(kableExtra) #library(MASS) library(wesanderson) library(ggrepel) library(here) library(emmeans) #library(devtools) #install_version("sjstats", version = "0.18.2") library(sjstats) library(readr) library(sjPlot) library(nnet) ``` ```{r palette} rm(list = ls()) options(scipen = 999) colours <- wes_palette("GrandBudapest2",4,"discrete") better_colours <- c('#8dd3c7','#bebada','#fb8072','#80b1d3','#fdb462') many_colours <- c('#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99','#e31a1c','#fdbf6f','#ff7f00','#cab2d6','#6a3d9a','#ffff99','#b15928','#8dd3c7','#ffffb3','#bebada','#fb8072','#80b1d3','#fdb462','#b3de69','#fccde5','#d9d9d9','#bc80bd','#ccebc5','#ffed6f') ``` ```{r functions} extract_glm_coefs <- function(mod, only_sig=F, decimal_places = 3){ coefs <- coef(summary(mod)) if(only_sig==T){ coefs <- coefs[which(coefs[,4] < .05),] } coefs <- as_tibble(coefs, rownames="variable") %>% # specify new variable to add rownames to mutate( or = round(exp(Estimate), decimal_places), .after=Estimate ) } extract_lm_coefs <- function(mod, only_sig = F){ coefs <- coef(summary(mod)) if(only_sig==T){ coefs <- coefs[which(coefs[,4] < .05),] } coefs <- as_tibble(coefs, rownames="variable") # specify new variable to add rownames to } ``` ```{r data, output=FALSE} data <- readRDS("../Data/2025-04-07 - Cleaned_data.rds") # Specify data to be used in income analysis income_data <- filter(data, income_drop_all==0) ``` # Ethnicity categorisations For reference, the table below provides a disambiguation of how ethnicities have been grouped in this analysis. For analyses using the disaggregated (survey) categories, the reference category is "English / Welsh / Scottish / Northern Irish / British". For analyses using the aggregated categories, the reference category is "White British" ```{r} ethnicity_cat <- data %>% dplyr::select(contains("ethnicity")) %>% distinct() %>% arrange(Ethnicity) %>% dplyr::select(-c(1:2,Ethnicity_collapsed_disaggregated)) ethn_colnames = c( "Ethnicity: Survey", "Ethnicity: Aggregated", "Ethnicity: Binary" ) ethnicity_cat %>% kable(col.names = ethn_colnames) %>% kable_styling(full_width=FALSE) ``` # Chapter 2: How many outsourced workers are there in the UK? ## How many UK workers are outsourced? ::: {.callout-tip title="#how-many"} - Around 1 in 6 UK workers meet our definition of an outsourced worker - The 'outsourced sub-group' is the most dominant of the three sub-groups - meaning the total group is predominantly made up of people who self-identify as an outsourced worker and they say they are hired to do work that is long-term or ongoing. People included in this sub-group (either uniquely, or while also meeting the criteria for at least one of the other sub-groups) make up around 67% (check) of our total outsourced group, or nearly 7 in 10. This group makes up X of all UK workers. ::: ```{r sum-outsourced} total_outsourced <- data %>% group_by(outsourcing_status) %>% summarise( Sum = sum(NatRepemployees), n = n() ) %>% mutate( Proportion = Sum / sum(Sum), Percentage = 100 * Proportion, N = sum(n) ) readr::write_csv(total_outsourced, file="../outputs/data/total_outsourced.csv") # Create function to find nearest denominator to express as a fraction. f <- function(x) ifelse(abs(1/floor(1/x) - x) < abs(1/ceiling(1/x) - x),floor(1/x),ceiling(1/x)) ``` **1 in `r f(total_outsourced$Proportion[which(total_outsourced$outsourcing_status=="Outsourced")])` (`r round(total_outsourced$Percentage[which(total_outsourced$outsourcing_status=="Outsourced")], 0)`%) of UK workers are outsourced.**[^1] [^1]: [outputs/data/total_outsourced.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/total_outsourced.csv) ```{r sum-outsourcing-group} total_outsourced_group <- data %>% group_by(outsourcing_group) %>% summarise( Sum = sum(NatRepemployees), n = n(), ) %>% mutate( Proportion = Sum / sum(Sum), Percentage = 100 * Proportion, N = sum(n) ) readr::write_csv(total_outsourced_group, file="../outputs/data/total_outsourced_2.csv") ``` In terms of the the different possible types of outsourced groups[^2], the numbers are as follows: [^2]: [outputs/data/total_outsourced_2.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/total_outsourced_2.csv) 1. Definitely outsourced: `r round(total_outsourced_group$Percentage[which(total_outsourced_group$outsourcing_group=="Outsourced")], 0)`% 2. Likely agency: `r round(total_outsourced_group$Percentage[which(total_outsourced_group$outsourcing_group=="Likely agency")], 0)`% 3. High indicators: `r round(total_outsourced_group$Percentage[which(total_outsourced_group$outsourcing_group=="High indicators")], 0)`% ```{r} breakdown <- data %>% filter(outsourcing_status=="Outsourced") %>% group_by(outsourcing_group) %>% summarise( freq = sum(NatRepemployees), n = n() ) %>% mutate( total = sum(freq), percentage = 100 * (freq/total), N = sum(n) ) breakdown2 <- data %>% group_by(outsourcing_group) %>% summarise( freq = sum(NatRepemployees), n = n() ) %>% mutate( total = sum(freq), percentage = 100 * (freq/total), N = sum(n) ) ``` People included in this sub-group (either uniquely, or while also meeting the criteria for at least one of the other sub-groups) make up around `r round(breakdown[which(breakdown$outsourcing_group=="Outsourced"),"percentage"],0)`% of our total outsourced group. This group makes up `r round(breakdown2[which(breakdown2$outsourcing_group=="Outsourced"),"percentage"],0)`% of all UK workers. :::{.callout-tip title='#non-exclusive-subgroups1'} - The two other sub-groups – the agency and indicators sub-groups – are less dominant in comparison. Around 58% of all respondents meet the criteria for either or both of these sub-groups, but this falls to around 33% if we exclude people who are already captured in the outsourced sub-group. Excluding the first sub-group, these other two groups makes up X of all UK workers. **The percentages here refer to the number of people who are outsourced (super-ordinate group), not the total number of respondents.** Below I provide percentages as function of the outsourced super-ordinate group as well as the total sample ::: Group criteria - **Outsourced**, defined as responding 'I am sure I am outsourced' or 'I might be outsourced', and responding 'I do work on a long-term basis'. - **Likely agency**, defined as those responding 'I am sure I am agency' and 'I do work on a long-term basis', **excluding** those people who are already defined as being outsourced. - **High indicators**: defined as responding TRUE to 5 or 6 of the outsourcing indicators, as well as responding 'I do work on a long-term basis', **excluding** those people who are already defined as outsourced or likely agency. ```{r} # non mutually exclusive groups_non_excl <- data %>% mutate( # SURE outsourced or MIGHT BE outsourced + LONGTERM outsourced = ifelse((Q3v3a == 1 & Q2 == 1) | (Q3v3a == 2 & Q2 == 1), 1, 0), # NOT outsourced, SURE agency, and LONG-TERM likely_agency = ifelse(Q2 == 1 & (Q3v3b == 1 | Q3v3c == 1 | Q3v3d == 1), 1, 0), likely_agency = ifelse(is.na(likely_agency), 0, likely_agency), # NOT outsourced, NOT likely agency, 5 or more indicators, & LONGTERM high_indicators = ifelse((Q2 == 1 & sum_true >= 5), 1, 0) ) either <- groups_non_excl %>% mutate( agency_or_indicator = case_when((likely_agency == 1 & high_indicators == 0) ~ "agency", (likely_agency == 0 & high_indicators == 1) ~ "indicator", (likely_agency == 1 & high_indicators == 1) ~ "both", (likely_agency == 0 & high_indicators == 0) ~ "neither", TRUE ~ NA) ) %>% group_by(agency_or_indicator) %>% summarise( freq = sum(NatRepemployees), n = n () ) %>% mutate( total = sum(freq), perc = 100 * (freq/total), N = sum(n) ) either_perc <- either %>% filter(agency_or_indicator != "neither") %>% summarise( round(sum(perc),2) # perc or weighted perc? ) %>% pull() either_excl_outsourced <- groups_non_excl %>% filter(outsourced==0) %>% mutate( agency_or_indicator = case_when((likely_agency == 1 & high_indicators == 0) ~ "agency", (likely_agency == 0 & high_indicators == 1) ~ "indicator", (likely_agency == 1 & high_indicators == 1) ~ "both", (likely_agency == 0 & high_indicators == 0) ~ "neither", TRUE ~ NA) ) %>% group_by(agency_or_indicator) %>% summarise( freq = sum(NatRepemployees), n = n () ) %>% mutate( total = sum(freq), perc = 100 * (freq/total), N = sum(n) ) either_excl_perc <- either_excl_outsourced %>% filter(agency_or_indicator != "neither") %>% summarise( round(sum(perc),2) ) %>% pull() either %>% kable(caption = "Including outsourced group") %>% kable_styling(full_width = F) either_excl_outsourced %>% kable(caption = "Exluding outsourced group") %>% kable_styling(full_width = F) ``` `r either_perc`% of the whole sample meet the criteria for either or both of these sub-groups. This falls to `r either_excl_perc`% if we exclude people who are already captured in the outsourced sub-group. ```{r} # same as above but now only among those who are outsourced groups_non_excl <- data %>% filter(outsourcing_status=="Outsourced") %>% mutate( # SURE outsourced or MIGHT BE outsourced + LONGTERM outsourced = ifelse((Q3v3a == 1 & Q2 == 1) | (Q3v3a == 2 & Q2 == 1), 1, 0), # NOT outsourced, SURE agency, and LONG-TERM likely_agency = ifelse(Q2 == 1 & (Q3v3b == 1 | Q3v3c == 1 | Q3v3d == 1), 1, 0), likely_agency = ifelse(is.na(likely_agency), 0, likely_agency), # NOT outsourced, NOT likely agency, 5 or more indicators, & LONGTERM high_indicators = ifelse((Q2 == 1 & sum_true >= 5), 1, 0) ) either <- groups_non_excl %>% mutate( agency_or_indicator = case_when((likely_agency == 1 & high_indicators == 0) ~ "agency", (likely_agency == 0 & high_indicators == 1) ~ "indicator", (likely_agency == 1 & high_indicators == 1) ~ "both", (likely_agency == 0 & high_indicators == 0) ~ "neither", TRUE ~ NA) ) %>% group_by(agency_or_indicator) %>% summarise( freq = sum(NatRepemployees), n = n () ) %>% mutate( total = sum(freq), perc = 100 * (freq/total), N = sum(n) ) either_perc <- either %>% filter(agency_or_indicator != "neither") %>% summarise( round(sum(perc),2) ) %>% pull() either_excl_outsourced <- groups_non_excl %>% filter(outsourced==0) %>% mutate( agency_or_indicator = case_when((likely_agency == 1 & high_indicators == 0) ~ "agency", (likely_agency == 0 & high_indicators == 1) ~ "indicator", (likely_agency == 1 & high_indicators == 1) ~ "both", (likely_agency == 0 & high_indicators == 0) ~ "neither", TRUE ~ NA) ) %>% group_by(agency_or_indicator) %>% summarise( freq = sum(NatRepemployees), n = n () ) %>% mutate( total = sum(freq), perc = 100 * (freq/total), N = sum(n) ) either_excl_perc <- either_excl_outsourced %>% filter(agency_or_indicator != "neither") %>% summarise( round(sum(perc),2) ) %>% pull() n_outsourced <- total_outsourced[which(total_outsourced$outsourcing_status=="Outsourced"), "n"] %>% pull() either_incl_perc <- either %>% filter(agency_or_indicator != "neither") %>% summarise( round(100 * (sum(n) / n_outsourced),2) ) %>% pull() either_excl_perc <- either_excl_outsourced %>% filter(agency_or_indicator != "neither") %>% summarise( round(100 * (sum(n) / n_outsourced),2) ) %>% pull() ``` Out of those who are in the 'outsourced' status (i.e., the combination of the three outsourced groups), `r either_incl_perc`% meet the criteria for either or both of these sub-groups, but this falls to around `r either_excl_perc`% if we exclude people who are already captured in the outsourced sub-group. :::{.callout-tip title="#non-exclusive-subgroups2"} - There is some overlap between these sub-groups, but they are not like for like. Just over a quarter (27%) of respondents are in more than one sub-group, while nearly three quarters (73%) of respondents are uniquely captured in just one of the three sub-groups. ::: ```{r} groups_count <- data %>% filter(outsourcing_status=="Outsourced") %>% mutate( # SURE outsourced or MIGHT BE outsourced + LONGTERM outsourced = ifelse((Q3v3a == 1 & Q2 == 1) | (Q3v3a == 2 & Q2 == 1), 1, 0), # NOT outsourced, SURE agency, and LONG-TERM likely_agency = ifelse(Q2 == 1 & (Q3v3b == 1 | Q3v3c == 1 | Q3v3d == 1), 1, 0), likely_agency = ifelse(is.na(likely_agency), 0, likely_agency), # NOT outsourced, NOT likely agency, 5 or more indicators, & LONGTERM high_indicators = ifelse((Q2 == 1 & sum_true >= 5), 1, 0), number_of_groups = rowSums(across(c(outsourced,likely_agency,high_indicators))) ) %>% group_by(number_of_groups) %>% summarise( total = sum(NatRepemployees), n = n() ) %>% mutate( wtd_percentage = 100 * (n/sum(n)), percentage = 100 * (total / sum(total)) ) write_csv(groups_count, file="../outputs/data/number_of_groups.csv") ``` Just over a quarter (`r round(groups_count[which(groups_count$number_of_groups==2),"percentage"] + groups_count[which(groups_count$number_of_groups==3),"percentage"],2)`%) of respondents are in more than one sub-group, while nearly three quarters (`r round(groups_count[which(groups_count$number_of_groups==1),"percentage"],2)`%) of respondents are uniquely captured in just one of the three sub-groups.^[[outputs/data/number_of_groups.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/number_of_groups.csv)] ## Evaluating our total estimate ::: {.callout-important title="#evaluating-total-estimate To do"} - Around 1 in 4 "outsourced" respondents sit in more than one sub-group within our definition, but around 3 in 4 are uniquely captured in just one of the three sub-groups - predominantly in the outsourced sub-group. - As figure X shows, not all respondents in the outsourced sub-group said yes five or six of our six outsourcing ::: # Chapter 3: Who are the UK’s outsourced workers? ## Demographic breakdown {#sec-demographic-breakdown} Demographic variables: - Categorical - [x] Gender - [x] Ethnicity - Numeric - [x] Age - in age section: @sec-age We want them broken down by - outsourcing status - high low pay - outsourcing group - high low pay ### Ethnicity by outsourcing status ```{r} # pollster # crosstab(df = data, x = outsourcing_status, y = Ethnicity_collapsed, weight = NatRepemployees) %>% # kable() # # # base r # tab <- as.data.frame(xtabs(NatRepemployees ~ outsourcing_status + Ethnicity_collapsed, data=data)) # test <- xtabs(NatRepemployees ~ outsourcing_status + income_group + Ethnicity_collapsed, data=data) # prop.table(test) # # percent_row <- 100 * prop.table(test, margin = 1) # test2 <- as.data.frame(percent_row) # # test2 %>% # filter(outsourcing_status=="Outsourced") %>% # summarise(sum(Freq)) ``` #### Collapsed ethnicity^[[outputs/data/status_by_ethnicity.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/status_by_ethnicity.csv)] ```{r} tab <- data %>% group_by(outsourcing_status, Ethnicity_collapsed) %>% summarise( n = n(), # count cases Frequency = sum(NatRepemployees) # count weighted cases ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum), Ethnicity_short = Ethnicity_collapsed ) write_csv(tab, file="../outputs/data/status_by_ethnicity.csv") tab %>% pivot_wider( id_cols = outsourcing_status, names_from = Ethnicity_collapsed, values_from = Percentage ) %>% kable(caption = "Ethnicity by outsourcing status (%)", digits = 2) %>% kable_styling(full_width = F) ``` #### Full ethnicity^[[outputs/data/status_by_ethnicity_full.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/status_by_ethnicity_full.csv)] ```{r} tab <- data %>% group_by(outsourcing_status, Ethnicity_labelled) %>% summarise( n = n(), # count cases Frequency = sum(NatRepemployees) # count weighted cases ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum) ) write_csv(tab, file="../outputs/data/status_by_ethnicity_full.csv") tab %>% pivot_wider( id_cols = outsourcing_status, names_from = Ethnicity_labelled, values_from = Percentage ) %>% kable(caption = "Ethnicity by outsourcing status (%)", digits = 2) %>% kable_styling(full_width = F) ``` #### By high/low pay ##### Collapsed ethnicity^[[outputs/data/status_by_ethnicity_income_group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/status_by_ethnicity_income_group.csv)] ```{r} tab <- income_data %>% filter(!is.na(income_group)) %>% group_by(outsourcing_status, income_group, Ethnicity_collapsed) %>% summarise( n = n(), # count cases Frequency = sum(NatRepemployees) # count weighted cases ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum), Ethnicity_short = Ethnicity_collapsed ) write_csv(tab, file="../outputs/data/status_by_ethnicity_income_group.csv") tab %>% pivot_wider( id_cols = c(outsourcing_status,income_group), names_from = Ethnicity_collapsed, values_from = Percentage )%>% kable(caption = "Ethnicity by outsourcing status and income group(%)", digits = 2) %>% kable_styling(full_width = F) ``` ##### Full ethnicity^[[outputs/data/status_by_ethnicity_full_income_group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/status_by_ethnicity_full_income_group.csv)] ```{r} tab <- income_data %>% filter(!is.na(income_group)) %>% group_by(outsourcing_status, income_group, Ethnicity_labelled) %>% summarise( n = n(), # count cases Frequency = sum(NatRepemployees) # count weighted cases ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum), Ethnicity_short = Ethnicity_labelled ) write_csv(tab, file="../outputs/data/status_by_ethnicity_full_income_group.csv") tab %>% pivot_wider( id_cols = c(outsourcing_status,income_group), names_from = Ethnicity_labelled, values_from = Percentage )%>% kable(caption = "Ethnicity by outsourcing status and income group(%)", digits = 2) %>% kable_styling(full_width = F) ``` ### Ethnicity by oustourcing group #### Collapsed ethnicity^[[outputs/data/group_by_ethnicity.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/group_by_ethnicity.csv)] ```{r} tab <- data %>% group_by(outsourcing_group, Ethnicity_collapsed) %>% summarise( n = n(), # count cases Frequency = sum(NatRepemployees) # count weighted cases ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum), Ethnicity_short = Ethnicity_collapsed ) write_csv(tab, file="../outputs/data/group_by_ethnicity.csv") tab %>% pivot_wider( id_cols = outsourcing_group, names_from = Ethnicity_collapsed, values_from = Percentage )%>% kable(caption = "Ethnicity by outsourcing group (%)", digits = 2) %>% kable_styling(full_width = F) ``` #### Full ethnicity^[[outputs/data/group_by_ethnicity_full.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/group_by_ethnicity_full.csv)] ```{r} tab <- data %>% group_by(outsourcing_group, Ethnicity_labelled) %>% summarise( n = n(), # count cases Frequency = sum(NatRepemployees) # count weighted cases ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum), Ethnicity_short = Ethnicity_labelled ) write_csv(tab, file="../outputs/data/group_by_ethnicity_full.csv") tab %>% pivot_wider( id_cols = outsourcing_group, names_from = Ethnicity_labelled, values_from = Percentage )%>% kable(caption = "Ethnicity by outsourcing group (%)", digits = 2) %>% kable_styling(full_width = F) ``` #### By high/low pay ##### Collapsed ethnicity^[[outputs/data/group_by_ethnicity_income_group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/group_by_ethnicity_income_group.csv)] ```{r} tab <- income_data %>% filter(!is.na(income_group)) %>% group_by(outsourcing_group, income_group, Ethnicity_collapsed) %>% summarise( n = n(), # count cases Frequency = sum(NatRepemployees) # count weighted cases ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum), Ethnicity_short = Ethnicity_collapsed ) write_csv(tab, file="../outputs/data/group_by_ethnicity_income_group.csv") tab %>% pivot_wider( id_cols = c(outsourcing_group,income_group), names_from = Ethnicity_collapsed, values_from = Percentage )%>% kable(caption = "Ethnicity by outsourcing status and income group(%)", digits = 2) %>% kable_styling(full_width = F) ``` ##### Full ethnicity^[[outputs/data/group_by_ethnicity_full_income_group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/group_by_ethnicity_full_income_group.csv)] ```{r} tab <- income_data %>% filter(!is.na(income_group)) %>% group_by(outsourcing_group, income_group, Ethnicity_labelled) %>% summarise( n = n(), # count cases Frequency = sum(NatRepemployees) # count weighted cases ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum), Ethnicity_short = Ethnicity_labelled ) write_csv(tab, file="../outputs/data/group_by_ethnicity_full_income_group.csv") tab %>% pivot_wider( id_cols = c(outsourcing_group,income_group), names_from = Ethnicity_labelled, values_from = Percentage )%>% kable(caption = "Ethnicity by outsourcing status and income group(%)", digits = 2) %>% kable_styling(full_width = F) ``` ### Gender by outsourcing status^[[outputs/data/status_by_gender.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/status_by_gender.csv)] ```{r} tab <- data %>% group_by(outsourcing_status, Gender) %>% summarise( n = n(), # count cases Frequency = sum(NatRepemployees) # count weighted cases ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum) ) write_csv(tab, file="../outputs/data/status_by_gender.csv") tab %>% pivot_wider( id_cols = outsourcing_status, names_from = Gender, values_from = Percentage )%>% kable(caption = "Gender by outsourcing status (%)", digits = 2) %>% kable_styling(full_width = F) ``` #### By high/low pay^[[outputs/data/status_by_gender_income_group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/status_by_gender_income_group.csv)] ```{r} tab <- income_data %>% filter(!is.na(income_group)) %>% group_by(outsourcing_status, income_group, Gender) %>% summarise( n = n(), # count cases Frequency = sum(NatRepemployees) # count weighted cases ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum) ) write_csv(tab, file="../outputs/data/status_by_gender_income_group.csv") tab %>% pivot_wider( id_cols = c(outsourcing_status,income_group), names_from = Gender, values_from = Percentage )%>% kable(caption = "Gender by outsourcing status and income group(%)", digits = 2) %>% kable_styling(full_width = F) ``` ### Gender by outsourcing group^[[outputs/data/group_by_gender.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/group_by_gender.csv)] ```{r} tab <- data %>% group_by(outsourcing_group, Gender) %>% summarise( n = n(), # count cases Frequency = sum(NatRepemployees) # count weighted cases ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum) ) write_csv(tab, file="../outputs/data/group_by_gender.csv") tab %>% pivot_wider( id_cols = outsourcing_group, names_from = Gender, values_from = Percentage )%>% kable(caption = "Gender by outsourcing group (%)", digits = 2) %>% kable_styling(full_width = F) ``` #### By high/low pay^[[outputs/data/group_by_gender_income_group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/group_by_gender_income_group.csv)] ```{r} tab <- income_data %>% filter(!is.na(income_group)) %>% group_by(outsourcing_group, income_group, Gender) %>% summarise( n = n(), # count cases Frequency = sum(NatRepemployees) # count weighted cases ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum) ) write_csv(tab, file="../outputs/data/group_by_gender_income_group.csv") tab %>% pivot_wider( id_cols = c(outsourcing_group,income_group), names_from = Gender, values_from = Percentage )%>% kable(caption = "Gender by outsourcing group and income group(%)", digits = 2) %>% kable_styling(full_width = F) ``` ## Evidence paints a racialised picture of outsourcing in the UK, with links to both ethnicity and migration ::: {.callout-tip title="#ethnicity"} - More than 1 in 4 (nearly 1/3) outsourced workers are from an ethnic minority background - Workers from ethnic minority backgrounds are disproportionately over-represented in outsourced work in the UK, and typically more likely to be outsourced than White British workers. - Overall, 22% of non-outsourced workers are from an ethnic minority background, rising to 33% of outsourced workers – a more than ten percentage point difference. This means that while just over 1 in 6 non-outsourced workers in our sample were from an ethnic minority background, nearly 1 in 3 outsourced workers were. - People from an ethnic minority background are overall 1.75 times more likely to be outsourced than people from a White British background. - Workers from Arab backgrounds are 3.86 times more likely than White workers to be outsourced; (check sample size – are we confident in all of these significance tests, or should we just use some of them in these bullet points?) - Workers from Black backgrounds are 2.33 times more likely than White workers to be outsourced. - Workers from Asian backgrounds are 1.98 times more likely than White workers to be outsourced - Workers from Mixed Ethnicity backgrounds are 1.86 times more likely than White workers to be outsourced - White other workers are 1.32 times more likely than White British workers to be outsourced ::: ```{r ethnicity-counts} ethnicity_statistics <- data %>% group_by(outsourcing_status, Ethnicity_collapsed) %>% summarise( n = n(), # count cases Frequency = sum(NatRepemployees) # count weighted cases ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum), Ethnicity_short = Ethnicity_collapsed ) %>% separate_wider_delim(Ethnicity_short, names = c("Ethnicity_short", "Ethnicity detail"), delim = stringr::regex(" / |, "), # use multiple delims too_few = "align_start", too_many = "merge") readr::write_csv(ethnicity_statistics, file = "../outputs/data/ethnicity_stats_1.csv") ``` ```{r ethnicity_binary_inferential, output=FALSE} ethnicities <- as.vector(unique(data$Ethnicity_collapsed)) non_white_ethnicities <- ethnicities[!(ethnicities %in% "White British")] # Will throw NA warning. I think this OK but investigate how to avoid the problem data <- data %>% mutate( Ethnicity_binary = forcats::fct_collapse(Ethnicity_collapsed, "White British" = c("White British"), "Non-White British" = non_white_ethnicities) ) mod <- glm(outsourcing_status ~ Ethnicity_binary, data, weights = NatRepemployees, family="quasibinomial") # mod <- glm(Ethnicity_binary~outsourcing_status , data, weights = NatRepemployees, family="quasibinomial") # summary(mod) coefs <- extract_glm_coefs(mod) write_csv(coefs, file = "../outputs/data/ethnicity_binary_o-status_inferential_tab.csv") ``` People from an ethnic minority are `r round(coefs[2, 'or'],2)` times more likely to be outsourced than people from a White British background; `r round(100 - ethnicity_statistics[which(ethnicity_statistics$outsourcing_status == "Outsourced" & ethnicity_statistics$Ethnicity_collapsed == "White British"), "Percentage"],2)`% of outsourced workers are from an ethnic minority, compared to `r round(100 - ethnicity_statistics[which(ethnicity_statistics$outsourcing_status == "Not outsourced" & ethnicity_statistics$Ethnicity_collapsed == "White British"), "Percentage"],2)`% of non-outsourced workers.[^3] [^3]: [outputs/data/ethnicity_stats_1.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_stats_1.csv) & [outputs/data/ethnicity_binary_o-status_inferential_tab.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_binary_o-status_inferential_tab.csv) ```{r ethnicity-plot} data %>% group_by(outsourcing_status, Ethnicity_binary) %>% summarise( n = n(), # count cases Frequency = sum(NatRepemployees) # count weighted cases ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum) ) %>% ggplot(., aes(outsourcing_status, Percentage, fill = Ethnicity_binary)) + geom_col(colour="black") + annotate("text", x = ethnicity_statistics$outsourcing_status, y = 99, label = paste0("N = ",ethnicity_statistics$N), hjust=1) + coord_flip() + scale_fill_manual(values = many_colours, name = "Ethnicity") + xlab("Outsourcing group") + theme_minimal() ``` ```{r} #| output: false #| warning: false #| message: false mod_2 <- glm(income_group ~ Ethnicity_collapsed * outsourcing_status, income_data, family="quasibinomial", weights = NatRepemployees) summary(mod_2) mod_3 <- glm(income_group ~ Ethnicity_binary * outsourcing_status, income_data, family="quasibinomial", weights = NatRepemployees) summary(mod_3) coefs2 <- extract_glm_coefs(mod_2) write_csv(coefs2, file = "../outputs/data/ethnicity_collapsed_income_group_inferential.csv") coefs3 <- extract_glm_coefs(mod_3) write_csv(coefs3, file = "../outputs/data/ethnicity_binary_income_group_inferential.csv") coefs <- extract_glm_coefs(mod_2, only_sig=T) ems <- emmeans(mod_2, specs = "outsourcing_status", by = "Ethnicity_collapsed") cons <- summary(contrast(ems, "pairwise",adjust="tukey")) sig_cons <- cons %>% filter(p.value < .05) %>% mutate( or = 1 / exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) ) sjPlot::plot_model(mod_2, type = "pred", legend.title="", terms = c("Ethnicity_collapsed","outsourcing_status"), dodge=0.5) + coord_flip() + xlab("") + theme_minimal() ems_2 <- emmeans(mod_2, specs = "Ethnicity_collapsed", by = "outsourcing_status") cons <- summary(contrast(ems_2, "pairwise",adjust="tukey")) sig_cons_2 <- cons %>% filter(p.value < .05) %>% mutate( or = 1 / exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) ) sjPlot::plot_model(mod_2, type = "pred", legend.title="", terms = c("outsourcing_status","Ethnicity_collapsed"), dodge=0.5) + coord_flip() + xlab("") + theme_minimal() # other_coef <- extract_glm_coefs(mod_2,, only_sig = T)[6,"Estimate"] %>% # exp(.) %>% # round(.,2) %>% # pull() wb <- sig_cons %>% filter(Ethnicity_collapsed == "White British") %>% pull(or) %>% round(2) mix <- sig_cons %>% filter(Ethnicity_collapsed == "Mixed/Multiple ethnic group") %>% pull(or) %>% round(2) ``` Overall, there is no interaction between being from a minority and outsourced on whether you are low paid. i.e., being from an ethnic minority and outsourced is not associated with being in the low pay group.^[[outputs/data/ethnicity_binary_income_group_inferential.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/outputs/data/ethnicity_binary_income_group_inferential.csv)] However there is nuance in the groups. We do find evidence to suggest that among White British people, outsourced people are `r wb` times more likely to be in the low income group compared to non-outsourced people, and among Mixed ethnicity people, outsourced people are `r mix` times more likely to be in the low income group compared to non-outsourced people.^[[outputs/data/ethnicity_collapsed_income_group_inferential.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/outputs/data/ethnicity_collapsed_income_group_inferential.csv)] ```{r} sjPlot::plot_model(mod_2, type = "pred", legend.title="", terms = c("Ethnicity_collapsed","outsourcing_status"), dodge=0.5) + coord_flip() + xlab("") + theme_minimal() ``` ```{r} #| output: false #| warning: false #| message: false mod_2 <- glm(income_group ~ Ethnicity_collapsed_disaggregated * outsourcing_status, income_data, family="quasibinomial", weights = NatRepemployees) summary(mod_2) coefs <- extract_glm_coefs(mod_2, only_sig=T) coefs2 <- extract_glm_coefs(mod_2) write_csv(coefs2, file = "../outputs/data/ethnicity_collapsed_disaggregated_income_group_inferential.csv") ems <- emmeans(mod_2, specs = "outsourcing_status", by = "Ethnicity_collapsed_disaggregated") cons <- summary(contrast(ems, "pairwise",adjust="tukey")) sig_cons <- cons %>% filter(p.value < .05) %>% mutate( or = 1 / exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) ) sjPlot::plot_model(mod_2, type = "pred", legend.title="", terms = c("Ethnicity_collapsed_disaggregated","outsourcing_status"), dodge=0.5) + coord_flip() + xlab("") + theme_minimal() ems_2 <- emmeans(mod_2, specs = "Ethnicity_collapsed_disaggregated", by = "outsourcing_status") cons <- summary(contrast(ems_2, "pairwise",adjust="tukey")) sig_cons_2 <- cons %>% filter(p.value < .05) %>% mutate( or = 1 / exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) ) sjPlot::plot_model(mod_2, type = "pred", legend.title="", terms = c("outsourcing_status","Ethnicity_collapsed_disaggregated"), dodge=0.5) + coord_flip() + xlab("") + theme_minimal() + theme( legend.position = "none" ) ew <- sig_cons %>% filter(Ethnicity_collapsed_disaggregated == "English / Welsh / Scottish / Northern Irish / British") %>% pull(or) %>% round(2) wa <- sig_cons %>% filter(Ethnicity_collapsed_disaggregated == "White and Asian") %>% pull(or) %>% round(2) ``` Looking at this with disaggregated ethnicities indicates that among “English / Welsh / Scottish / Northern Irish / British” workers, outsourced people are `r ew` times more likely to be in the low income group compared to non-outsourced people. Among “White and Asian” workers, outsourced workers are `r wa` times more likely to be in the low income group compared to non-outsourced workers.^[[outputs/data/ethnicity_collapsed_disaggregated_income_group_inferential.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/outputs/data/ethnicity_collapsed_disaggregated_income_group_inferential.csv)] ```{r} sjPlot::plot_model(mod_2, type = "pred", legend.title="", terms = c("Ethnicity_collapsed_disaggregated","outsourcing_status"), dodge=0.5) + coord_flip() + xlab("") + theme_minimal() ``` ```{r} tab_split <- income_data %>% filter(!is.na(income_group)) %>% group_by(outsourcing_status, income_group, Ethnicity_binary) %>% summarise( n = n(), # count cases Frequency = sum(NatRepemployees) # count weighted cases ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum) ) tab_split %>% pivot_wider( id_cols = c(outsourcing_status,income_group), names_from = Ethnicity_binary, values_from = Percentage )%>% kable(caption = "Ethnicity (binary) by outsourcing status and income group(%)", digits = 2) %>% kable_styling(full_width = F) write_csv(tab_split, "../outputs/data/ethnicity_binary_income_group.csv") tab_split %>% ggplot(., aes(outsourcing_status, Percentage, fill = Ethnicity_binary)) + facet_grid(rows=vars(income_group)) + geom_col(colour="black") + coord_flip() + scale_fill_manual(values = many_colours, name = "Ethnicity") + xlab("") + theme_minimal() ``` ```{r ethnicity-interential-status} mod <- glm(outsourcing_status ~ Ethnicity_collapsed, data, weights = NatRepemployees, family = "quasibinomial") # summary(mod) coef_table <- extract_glm_coefs(mod) %>% mutate(across(where(is.numeric), ~round(.x,2))) rownames(coef_table) <- coef_table$variable sig_coefs <- extract_glm_coefs(mod, only_sig = T) # set rownames so we can index rownames(sig_coefs) <- sig_coefs$variable # get labels for piping ethnicity_keys <- sig_coefs$variable ethnicity_labs <- sub(".*collapsed","",ethnicity_keys) write_csv(coef_table, file="../outputs/data/ethnicity_model_inferential.csv") ``` Comparison of ethnicities indicates that some groups are statistically more likely to be outsourced than others[^4]: [^4]: [outputs/data/ethnicity_model_inferential.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_model_inferential.csv) - `r ethnicity_labs[2]` workers are `r sig_coefs[ethnicity_keys[2], "or"]` times more likely than White British workers to be outsourced. - `r ethnicity_labs[3]` workers are `r sig_coefs[ethnicity_keys[3], "or"]` times more likely than White British workers to be outsourced. - `r ethnicity_labs[4]` workers are `r sig_coefs[ethnicity_keys[4], "or"]` times more likely than White British workers to be outsourced. - `r ethnicity_labs[5]` workers are `r sig_coefs[ethnicity_keys[5], "or"]` times more likely than White British workers to be outsourced. - `r ethnicity_labs[6]` workers are `r sig_coefs[ethnicity_keys[6], "or"]` times more likely than White British workers to be outsourced. - `r ethnicity_labs[7]` workers are `r sig_coefs[ethnicity_keys[7], "or"]` times more likely than White British workers to be outsourced. ```{r} mod <- glm(outsourcing_status ~ Ethnicity_collapsed_disaggregated, data, weights = NatRepemployees, family = "quasibinomial") # summary(mod) coef_table <- extract_glm_coefs(mod) %>% mutate(across(where(is.numeric), ~round(.x,2))) rownames(coef_table) <- coef_table$variable sig_coefs <- extract_glm_coefs(mod, only_sig = T) %>% mutate(across(where(is.numeric), ~round(.x,2))) rownames(sig_coefs) <- sig_coefs$variable ethnicity_keys <- sig_coefs$variable ethnicity_labs <- sub(".*disaggregated","",ethnicity_keys) write_csv(coef_table, file="../outputs/data/ethnicity_model_inferential_2.csv") ``` Comparison of more disaggregated ethnicities indicates more nuance[^5]: [^5]: [outputs/data/ethnicity_model_inferential_2.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_model_inferential_2.csv) - `r ethnicity_labs[2]` workers are `r sig_coefs[ethnicity_keys[2], "or"]` times more likely than White British workers to be outsourced. - `r ethnicity_labs[3]` workers are `r sig_coefs[ethnicity_keys[3], "or"]` times more likely than White British workers to be outsourced. - `r ethnicity_labs[4]` workers are `r sig_coefs[ethnicity_keys[4], "or"]` times more likely than White British workers to be outsourced. - `r ethnicity_labs[5]` workers are `r sig_coefs[ethnicity_keys[5], "or"]` times more likely than White British workers to be outsourced. - `r ethnicity_labs[6]` workers are `r sig_coefs[ethnicity_keys[6], "or"]` times more likely than White British workers to be outsourced. - `r ethnicity_labs[7]` workers are `r sig_coefs[ethnicity_keys[7], "or"]` times more likely than White British workers to be outsourced. - `r ethnicity_labs[8]` workers are `r sig_coefs[ethnicity_keys[8], "or"]` times more likely than White British workers to be outsourced. - `r ethnicity_labs[9]` workers are `r sig_coefs[ethnicity_keys[9], "or"]` times more likely than White British workers to be outsourced. - `r ethnicity_labs[10]` workers are `r sig_coefs[ethnicity_keys[10], "or"]` times more likely than White British workers to be outsourced. - `r ethnicity_labs[11]` workers are `r sig_coefs[ethnicity_keys[11], "or"]` times more likely than White British workers to be outsourced. ```{r} #| include: false count <- data %>% group_by(Ethnicity_collapsed) %>% summarise( count = n(), freq = sum(NatRepemployees) ) cis <- confint(mod, level=.95) ``` ::: {.callout-tip title="#ethnicity-sub-group"} - These differences in ethnicity also shift slightly depending on which outsourced “sub-group” we look at. For example, compared to White British workers, Black outsourced workers are more likely to be in the “outsourced sub-group” meaning they have self-identified as outsourced, or the “agency sub-group”, meaning they are agency workers doing more long-term and ongoing work. **Are there any other interesting points to mention here? Should we do a chart showing this different across sub-groups? Do we need an interpretive comment in this section?** ::: ```{r ethnicity-group} mod <- multinom(outsourcing_group ~ Ethnicity_collapsed, data, weights=NatRepemployees) #summary(mod) # get coefficients and calcualte p coefs <- summary(mod)$coefficients # get predicted group names to insert later group <- rownames(coefs) ors <- exp(coefs) colnames(ors) <- paste(colnames(ors), "or", sep="_") z <- coefs/summary(mod)$standard.errors p <- (1 - pnorm(abs(z), 0, 1)) * 2 colnames(p) <- paste(colnames(p), "p", sep="_") p_2 <- apply(p, 2, function(x) ifelse(x < 0.01, 1, NA)) sig_ors <- exp(summary(mod)$coefficients * p_2) # add to table for saving coefs2 <- cbind(coefs, ors, p) %>% as_tibble() %>% mutate( predicted_group = group, .before=everything() # insert predicted group so output table can be better interpeted ) write_csv(coefs2, file = "../outputs/data/ethnicity_ogroup_inferential_tab.csv") # sig_ors ``` Breaking down by outsourcing group helps to separate out the *type* of outsourced work people from the ethnicities identified above engage in.[^6] Compared to White British workers, [^6]: [outputs/data/ethnicity_ogroup_inferential_tab.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_ogroup_inferential_tab.csv) - Arab people are more likely to be likely agency or high indicators - Asian people are more likely to be in any of the groups - Black people are more likely to be likely agency or outsourced - People of mixed ethnicity are more likely to be outsourced - People who selected Other ethnicity are more likely to be agency - White other people are more likely to be outsourced ```{r} sjPlot::plot_model(mod) ``` ```{r} mod <- multinom(outsourcing_group ~ Ethnicity_collapsed_disaggregated, data, weights=NatRepemployees) #summary(mod) # get coefficients and calcualte p coefs <- summary(mod)$coefficients # get predicted group names to insert later group <- rownames(coefs) ors <- exp(coefs) colnames(ors) <- paste(colnames(ors), "or", sep="_") z <- coefs/summary(mod)$standard.errors p <- (1 - pnorm(abs(z), 0, 1)) * 2 colnames(p) <- paste(colnames(p), "p", sep="_") p_2 <- apply(p, 2, function(x) ifelse(x < 0.01, 1, NA)) sig_ors <- exp(summary(mod)$coefficients * p_2) # add to table for saving coefs2 <- cbind(coefs, ors, p) %>% as_tibble() %>% mutate( predicted_group = group, .before=everything() # insert predicted group so output table can be better interpeted ) sig_ors2 <- sig_ors[,colSums(!is.na(sig_ors)) > 0] sig_ors2 <- t(sig_ors2) # get teh sample informatoin sample_count <- data %>% group_by(outsourcing_group,Ethnicity_collapsed_disaggregated) %>% summarise( n = n(), freq = sum(NatRepemployees) ) %>% filter(outsourcing_group != "Not outsourced") %>% pivot_wider(names_from = outsourcing_group, values_from = c(n, freq)) # combine sampel info with estimates # NAs in this table simply indicate non-sig results sig_ors2 <- as.data.frame(sig_ors2) %>% tibble::rownames_to_column(var = "Ethnicity_collapsed_disaggregated") %>% mutate( Ethnicity_collapsed_disaggregated = sub(".*disaggregated","", Ethnicity_collapsed_disaggregated) ) %>% filter(Ethnicity_collapsed_disaggregated != "(Intercept)") %>% left_join(sample_count, by = "Ethnicity_collapsed_disaggregated") write_csv(coefs2, file = "../outputs/data/ethnicity_ogroup_inferential_tab_2.csv") ``` More nuance from disaggregated ethnicities[^7]. The table below shows the likelihood of workers of different ethnicities falling into each of the outsourcing groups, compared to White British workers. Note that only significant relationships are shown here. *Note also that the 'n' for many of these statistics is very low. As such many of these statistics are illustrative but not inferential.* [^7]: [outputs/data/ethnicity_ogroup_inferential_tab_2.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_ogroup_inferential_tab_2.csv) ```{r} sig_ors2 %>% rename( Ethnicity = Ethnicity_collapsed_disaggregated ) %>% kable(caption = "Likelihood of belonging to different groups compared to White British. Note: NAs are non-sig. relationships. 'n_' is sample size, 'freq_' is weighted sample size", digits = 2) %>% kable_styling(full_width = F) ``` ::: {.callout-tip title="#ethnicity-pay-split"} - On the low-pay / high-pay split, you say “*A person is more likely to be in the low income group if they are: Older; Female; Prefer not to say when they arrived, And less likely if they are: Asian/Asian British; Live in North West or Wales; Arrived in the UK in last 30 years*”; Can I confirm this means we don’t see any other significant differences in the ethnicity breakdown if we look at high paid vs low paid workers? If so, let’s clarify what this says about how ethnicity relates to a) outsourced workers being disproportionately low paid, but b) ethnic minority workers being no more likely to be in our low pay group. *Using the new ethnicity groupings, there is no evidence indicating that any ethnicity is more or less likely to be in the low income group* **Note to self: This could benefit from stepwise regression** ::: ```{r income-group} #| output: false #| message: false # test significance # mod <- glm(income_group ~ outsourcing_status, data, family="quasibinomial", weights = NatRepemployees) # summary(mod) # # test <- summary(mod) # # or <- exp(mod[["coefficients"]][["outsourcing_statusOutsourced"]]) # p <- test[["coefficients"]][2,4] mod_2 <- glm(income_group ~ Age + Gender + Has_Degree + Ethnicity_collapsed + Region + outsourcing_status + BORNUK_labelled, income_data, family="quasibinomial", weights = NatRepemployees) summary(mod_2) test <- summary(mod_2) or <- exp(mod_2[["coefficients"]][["outsourcing_statusOutsourced"]]) p <- test[["coefficients"]][2,4] rownames(coef_table) <- coef_table$variable sig_coefs <- extract_glm_coefs(mod_2, only_sig = T) write_csv(coef_table, file="../outputs/data/income_group_outsourcing.csv") ``` ```{r} #| output: false #| message: false mod_2 <- glm(income_group ~ Ethnicity_collapsed * outsourcing_status, income_data, family="quasibinomial", weights = NatRepemployees) summary(mod_2) mod_3 <- glm(income_group ~ Ethnicity_binary * outsourcing_status, data, family="quasibinomial", weights = NatRepemployees) summary(mod_3) black_coef <- extract_glm_coefs(mod_2,, only_sig = T)[5,"Estimate"] %>% exp(.) %>% round(.,2) %>% pull() other_coef <- extract_glm_coefs(mod_2,, only_sig = T)[6,"Estimate"] %>% exp(.) %>% round(.,2) %>% pull() ``` A person is more likely to be in the low income group if they are: - Older - Female - Don't have a degree (or don't know if they have a degree?) - Are outsourced - Arrived in the UK in the last year And less likely if they are: - Younger - Male - Have a degree - Live in the North West or Wales (compared to London) - Arrived in the UK in last 30 years ::: {.callout-tip title="#migration"} - As you would expect, the vast majority of outsourced workers were born in the UK. However, we still see a significantly higher likelihood of outsourced workers having been born outside of the UK compared to people who aren’t outsourced. While around 14% of non-outsourced workers were born outside of the UK, this rose to just over 24% for outsourced workers – or nearly 1 in 4. - Overall, people who were born outside of the UK are 1.94 times more likely to be in outsourced work than people who were born here. ::: ```{r} data <- data %>% mutate( BORNUK_collapsed = forcats::fct_collapse(BORNUK_labelled, "Born in UK" = "I was born in the UK", "Came to UK recently" = c("Within the last year"), "Came to UK not recently" = c("Within the last 3 years", "Within the last 5 years", "Within the last 10 years", "Within the last 15 years", "Within the last 20 years", "Within the last 30 years", "More than 30 years ago"), "Prefer not to say" = c("Prefer not to say") ) ) bornuk_statistics <- data %>% group_by(outsourcing_status, BORNUK_collapsed) %>% summarise( n = n(), Frequency = sum(NatRepemployees) ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum) ) readr::write_csv(bornuk_statistics, file="../outputs/data/arrival_in_UK_collapsed_stats.csv") bornuk_statistics %>% ggplot(., aes(BORNUK_collapsed, Percentage, fill =outsourcing_status)) + geom_col(colour="black", position = "dodge") + geom_text(aes(BORNUK_collapsed, y = 99, label = paste0("n = ",n)), position=position_dodge(width=1), hjust=1) + coord_flip() + scale_fill_manual(values=many_colours, name="Outsourcing status") + theme_minimal() + xlab("Arrival in UK") ``` ```{r bornuk_inferential, output=FALSE} mod <- glm(outsourcing_status ~ BORNUK_binary, data, weights = NatRepemployees, family="quasibinomial") # mod <- glm(Ethnicity_binary~outsourcing_status , data, weights = NatRepemployees, family="quasibinomial") summary(mod) coefs <- extract_glm_coefs(mod) write_csv(coefs, file = "../outputs/data/bornuk_ostatus_inferential_tab.csv") ``` As for non-outsourced workers, the vast majority of outsourced workers are born in the UK. However, people not born in the UK are more likely to be outsourced than people born in the UK. `r 100 - round(bornuk_statistics[which(bornuk_statistics$outsourcing_status == "Outsourced" & bornuk_statistics$BORNUK_collapsed == "Born in UK"), "Percentage"],2)`% of outsourced workers are not born in the UK, compared to `r 100 - round(bornuk_statistics[which(bornuk_statistics$outsourcing_status == "Not outsourced" & bornuk_statistics$BORNUK_collapsed == "Born in UK"), "Percentage"],2)`% of non-outsourced workers.[^8] This difference is statistically significant; **outsourced workers are `r round(coefs %>% filter(variable == "BORNUK_binaryNot born in UK") %>% pull(or),2)` times more likely to have been born outside the UK than non-outsourced workers.**[^9] [^8]: [outputs/data/arrival_in_UK_stats.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/arrival_in_UK_stats.csv) [^9]: [outputs/data/bornuk_ostatus_inferential_tab.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/bornuk_ostatus_inferential_tab.csv) ::: {.callout-tip title="#migration-sub-groups"} - This pattern broadly holds across our three outsourcing sub-groups, with nearly no difference in the likelihood of people born outside of the UK being in any one of the three groups. ::: ```{r} mod <- multinom(outsourcing_group ~ BORNUK_binary, data, weights=NatRepemployees) #summary(mod) # get coefficients and calcualte p coefs <- summary(mod)$coefficients ors <- exp(coefs) colnames(ors) <- paste(colnames(ors), "or", sep="_") z <- coefs/summary(mod)$standard.errors p <- (1 - pnorm(abs(z), 0, 1)) * 2 colnames(p) <- paste(colnames(p), "p", sep="_") p_2 <- apply(p, 2, function(x) ifelse(x < 0.01, 1, NA)) sig_ors <- exp(summary(mod)$coefficients * p_2) # add to table for saving coefs <- cbind(coefs, ors, p) %>% as_tibble() write_csv(coefs, file = "../outputs/data/bornuk_ogroup_inferential_tab.csv") # sig_ors bornuk_statistics_ogroup <- data %>% group_by(outsourcing_group, BORNUK_collapsed) %>% summarise( n = n(), Frequency = sum(NatRepemployees) ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum) ) readr::write_csv(bornuk_statistics_ogroup, file="../outputs/data/arrival_in_UK_collapsed_stats_ogroup.csv") bornuk_statistics_ogroup %>% ggplot(., aes(BORNUK_collapsed, Percentage, fill =outsourcing_group)) + geom_col(colour="black", position = "dodge") + geom_text(aes(BORNUK_collapsed, y = 99, label = paste0("n = ",n)), position=position_dodge(width=1), hjust=1) + coord_flip() + scale_fill_manual(values=many_colours, name="Outsourcing status") + theme_minimal() + xlab("Arrival in UK") ``` ::: {.callout-warning title="#ethnicity-migration-interaction. Some attention needed here"} Among all workers who were born in the UK: - Black workers are 2.01 times more likely to be outsourced than a White worker - Asian workers are 2.02 times more likely to be outsourced than a White worker. - Workers from Other ethnic backgrounds are X times more likely to be outsourced than a White other worker For workers born outside of the UK: - Among White workers, someone not born in the UK is 1.82 times more likely to be outsourced than someone born in the UK. - Among workers from Mixed ethnic backgrounds, someone not born in the UK is 2.73 times more likely to be outsourced than someone born in the UK. For workers from other ethnicities, it doesn’t matter whether you are born in the UK or not – you are equally likely as a Black or an Asian worker to be outsourced, whether you were born in the UK or somewhere else. And compared to a White person born in the UK, Black African and South Asian workers specifically are more likely to be outsourced, whether or not they were born in the UK . Does this need any further detail or explanation **To discuss confidence in our interpretation in this section: The evidence on ethnicity and country of birth clearly paints a racialised picture of outsourcing, and one with colonial undertones, as Black African and South Asian workers see a higher risk of being outsourced compared to White British workers, regardless of their country of birth. This obviously raises further questions about why, linked to (sector, occupation, labour market inequality and structural racism). Discuss the draft interpretation in the comment on the right.** **However, workers from non-White ethnic groups are not the only workers who see a higher risk of being outsourced: Non-UK-born White workers are also more likely to be outsourced than UK-born White people . Ethnicity and country of birth interact independently for some groups, but seem to be fundamentally connected for others.** ::: ```{r} base_mod <- mod <- glm(outsourcing_status ~ Ethnicity_collapsed + BORNUK_binary, data, weights = NatRepemployees, family = "quasibinomial") mod <- glm(outsourcing_status ~ Ethnicity_collapsed*BORNUK_binary, data, weights = NatRepemployees, family = "quasibinomial") # summary(mod) # check that interaction imporves the model over main effects - it does anova(base_mod, mod, test = "F") coefs <- extract_glm_coefs(mod) ``` ```{r} ems <- emmeans(mod, specs = "Ethnicity_collapsed", by = "BORNUK_binary") cons <- summary(contrast(ems, "pairwise",adjust="tukey")) sig_cons <- cons %>% filter(p.value < .05) %>% mutate( or = 1 / exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) ) write_csv(cons, file = "../outputs/data/ethnicity_bornUK_binary_contrasts.csv") ``` Exploring the intersection of ethnicity and arrival time reveals some patterns whereby the likelihood of a person being outsourced is related to the combinations of ethnicity and whether they were born in the UK.[^10] The plot below shows that [^10]: [outputs/data/bornUK_binary_contrasts.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/bornUK_binary_contrasts.csv) - Among workers born in the UK, a Black worker is `r round(sig_cons %>% filter(contrast == "White British - (Black/African/Caribbean/Black British)") %>% pull(or),2)` times more likely to be outsourced than a White British worker. - Among workers born in the UK, an Asian worker is `r round(sig_cons %>% filter(contrast == "White British - (Asian/Asian British)") %>% pull(or),2)` times more likely to be outsourced than a White British worker. - Among workers born in the UK, an Other ethnicity worker is `r round(sig_cons %>% filter(contrast == "White British - Other ethnic group") %>% pull(or),2)` times more likely to be outsourced than a White other worker. - Among workers not born in the UK, a White other worker is `r round(sig_cons %>% filter(contrast == "White British - White other") %>% pull(or),2)` times as likely (i.e., less likely) to be outsourced than a White British worker. - Among workers not born in the UK, a White other worker is `r round(sig_cons %>% filter(contrast == "(Black/African/Caribbean/Black British) - White other") %>% pull(or),2)` times as likely (i.e., less likely) to be outsourced than a Black worker. - Among workers not born in the UK, a White other worker is `r round(sig_cons %>% filter(contrast == "(Mixed/Multiple ethnic group) - White other") %>% pull(or),2)` times as likely (i.e., less likely) to be outsourced than a worker of mixed ethnicity. ```{r} sjPlot::plot_model(mod, type = "pred", legend.title="", terms = c("BORNUK_binary","Ethnicity_collapsed"), dodge=0.5) + coord_flip() + xlab("") + ylab("Likelihood of being outsourced") + theme_minimal() ``` ```{r} ems_2 <- emmeans(mod, specs = "BORNUK_binary", by = "Ethnicity_collapsed") cons <- summary(contrast(ems_2, "pairwise",adjust="tukey")) sig_cons <- cons %>% filter(p.value < .05) %>% mutate( or = 1 / exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) ) write_csv(cons, file = "../outputs/data/bornUK_binary_contrasts_2.csv") ``` Similarly, the plot below shows that[^11] [^11]: [outputs/data/region_stats_2.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/region_stats_2.csv) - Among White British workers, someone not born in the UK is `r round(sig_cons %>% filter(contrast == "Born in UK - Not born in UK" & Ethnicity_collapsed == "White British") %>% pull(or),2)` times more likely to be outsourced than someone born in the UK. - Among Mixed workers, someone not born in the UK is `r round(sig_cons %>% filter(contrast == "Born in UK - Not born in UK" & Ethnicity_collapsed == "Mixed/Multiple ethnic group") %>% pull(or),2)` times more likely to be outsourced than someone born in the UK. - Among people who preferred not to say their ethnicity, someone not born in the UK is `r round(sig_cons %>% filter(contrast == "Born in UK - Not born in UK" & Ethnicity_collapsed == "Prefer not to say") %>% pull(or),2)` times as likely (i.e.,`r round(100 * (1 - (sig_cons %>% filter(contrast == "Born in UK - Not born in UK" & Ethnicity_collapsed == "Prefer not to say") %>% pull(or))),0)`% less likely) to be outsourced than someone born in the UK. ```{r} sjPlot::plot_model(mod, type = "pred", legend.title="", terms = c("Ethnicity_collapsed","BORNUK_binary"), dodge=0.5) + coord_flip() + xlab("") + ylab("Likelihood of being outsourced") + theme_minimal() ``` ```{r} mod <- glm(outsourcing_status ~ Ethnicity_collapsed_disaggregated*BORNUK_binary, data, weights = NatRepemployees, family = "quasibinomial") # summary(mod) coefs <- extract_glm_coefs(mod, only_sig = T) ems <- emmeans(mod, specs = "Ethnicity_collapsed_disaggregated", by = "BORNUK_binary") cons <- summary(contrast(ems, "pairwise",adjust="tukey")) sig_cons <- cons %>% filter(p.value < .05) %>% mutate( or = 1 / exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) ) sjPlot::plot_model(mod, type = "pred", legend.title="", terms = c("BORNUK_binary","Ethnicity_collapsed_disaggregated"), dodge=0.5) +# coord_flip() + xlab("") + ylab("Likelihood of being outsourced") + theme_minimal() + theme(legend.position = "none") ems_2 <- emmeans(mod, specs = "BORNUK_binary", by = "Ethnicity_collapsed_disaggregated") cons <- summary(contrast(ems_2, "pairwise",adjust="tukey")) sig_cons <- cons %>% filter(p.value < .05) %>% mutate( or = 1 / exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) ) sjPlot::plot_model(mod, type = "pred", legend.title="", terms = c("Ethnicity_collapsed_disaggregated","BORNUK_binary"), dodge=0.5) + coord_flip() + xlab("") + ylab("Likelihood of being outsourced") + theme_minimal() ``` For people born in UK, if you are Pakistani you are more likely to be outsourced than if you are White. For White people and for White and Asian people, if you're not born in UK you're more likely to be outsourced. ::: {.callout-tip title="#migration-by-pay-split"} If we do a basic “born UK / not born UK” split, looking by low and high pay, what % of the low-paid workers group were born outside of the UK, vs in the high-paid group? ::: ```{r} #| message: false mig_pay_split <- income_data %>% filter(!is.na(income_group)) %>% group_by(outsourcing_status, income_group, BORNUK_binary) %>% summarise( freq = sum(NatRepemployees), n = n() ) %>% mutate( total = sum(freq), percentage = 100 * (freq / total), N = sum(n) ) low_pay_perc <- mig_pay_split %>% filter(income_group == "Low" & BORNUK_binary == "Not born in UK" & outsourcing_status == "Outsourced") %>% mutate( round(percentage,2) ) %>% pull() high_pay_perc <- mig_pay_split %>% filter(income_group == "Not low" & BORNUK_binary == "Not born in UK" & outsourcing_status == "Outsourced") %>% mutate( round(percentage,2) ) %>% pull() mod <- glm(income_group ~ BORNUK_binary, income_data, weights = NatRepemployees, family ="quasibinomial") # summary(mod) mod_2 <- glm(income_group ~ BORNUK_binary * outsourcing_status, income_data, weights = NatRepemployees, family ="quasibinomial") # summary(mod_2) ``` `r low_pay_perc`% of outsourced workers in the low pay group were not born in the UK, compared to `r high_pay_perc`% of people in the not low pay group. This difference is marginally statistically significant; someone in the low income group is less likely to be born outside the UK than someone in the not low income group. This pattern is the same for non outsourced workers, and when we consider the interaction between outsourcing status and migration status, the only factor predicting income group is outsourcing status. ```{r} mig_pay_split %>% ggplot(aes(income_group, percentage, fill = BORNUK_binary)) + facet_grid(rows = vars(outsourcing_status)) + geom_col(position="dodge") + theme_minimal() ``` ## Outsourced workers are on average younger than non-outsourced workers {#sec-age} ::: {.callout-tip title="#age"} - We find that outsourced workers are significantly younger than non-outsourced workers, on average. The median age of an outsourced worker is 35, compared to a median age of 43 for a non-outsourced worker. - the outsourced and indicator sub-groups – people who directly said that they were or might be outsourced, or ticked a high number of our indicators of outsourced working – see higher proportions of younger workers than the “agency” sub-group. ::: ::: {.callout-important title="#age-violin"} INSERT VIOLIN PLOT CHART HERE SHOWING MEDIAN AGE OF EACH SUB-GROUP, COMPARED TO NON-OUTSOURCED WORKERS. **Is this necessary? We already have the density plots** ::: ```{r age-by-status} age_statistics <- data %>% group_by(outsourcing_status) %>% summarise( mean = weighted.mean(Age, w = NatRepemployees, na.rm = T), median = wtd.quantile(Age, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(Age, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(Age, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(Age, w = NatRepemployees, na.rm = T)), N = n() ) readr::write_csv(age_statistics, file = "../outputs/data/age_stats.csv") ``` ```{r age-inferential, include=FALSE} test <- lm(Age ~ outsourcing_status, weights = NatRepemployees, data) summary(test) coefs <- extract_lm_coefs(test) readr::write_csv(coefs,file="../outputs/data/age_inferential.csv") ``` Outsourced workers are on average younger than non-outsourced workers. The median age of the outsourced group is `r age_statistics[which(age_statistics$outsourcing_status=="Outsourced"),"median"]` , compared to `r age_statistics[which(age_statistics$outsourcing_status=="Not outsourced"),"median"]` for the not outsourced group.[^12] This difference is statistically significant.[^13] [^12]: [outputs/data/region_stats_2.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/region_stats_2.csv) [^13]: [outputs/data/region_stats_3.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/region_stats_3.csv) ```{r age-by-status-plot} knitr::kable(age_statistics, digits = 2, col.names = c("Outsourcing group", "Mean", "Median", "Min", "Max", "Standard dev.", "N")) %>% kable_styling(full_width = F) data %>% mutate( Age = as.numeric(as.character(as_factor(Age))) ) %>% ggplot(.,aes(Age, colour = outsourcing_status, fill = outsourcing_status)) + geom_density(alpha = 0.3) + geom_vline(data =age_statistics, aes(xintercept=median, colour = outsourcing_status)) + scale_x_continuous(breaks = seq(min(age_statistics$min), max(age_statistics$max),5)) + theme_minimal() + scale_colour_manual(values=colours, name = "Outsourcing status") + scale_fill_manual(values=colours, name = "Outsourcing status") ``` The higher concentration of younger workers identified above appears to be driven primarily by the 'outsourced' and 'high indicator' groups, whilst the 'likely agency' group follows a similar pattern to the non-outsourced group.[^14] [^14]: [outputs/data/sector_summary_3.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/sector_summary_3.csv) ```{r} age_statistics_income_group <- income_data %>% filter(!is.na(income_group)) %>% group_by(outsourcing_status, income_group) %>% summarise( mean = weighted.mean(Age, w = NatRepemployees, na.rm = T), median = wtd.quantile(Age, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(Age, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(Age, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(Age, w = NatRepemployees, na.rm = T)), N = n() ) knitr::kable(age_statistics_income_group, digits = 2, col.names = c("Outsourcing status", "Income group", "Mean", "Median", "Min", "Max", "Standard dev.", "N")) %>% kable_styling(full_width = F) income_data %>% filter(!is.na(income_group)) %>% mutate( Age = as.numeric(as.character(as_factor(Age))) ) %>% ggplot(.,aes(Age, colour = outsourcing_status, fill = outsourcing_status)) + facet_grid(rows = vars(income_group)) + geom_density(alpha = 0.3) + geom_vline(data = age_statistics_income_group, aes(xintercept=median, colour = outsourcing_status)) + scale_x_continuous(breaks = seq(min(age_statistics_income_group$min), max(age_statistics_income_group$max),5)) + theme_minimal() + scale_colour_manual(values=colours, name = "Outsourcing status") + scale_fill_manual(values=colours, name = "Outsourcing status") ``` ```{r age-by-group} age_statistics_2 <- data %>% group_by(outsourcing_group) %>% summarise( mean = weighted.mean(Age, w = NatRepemployees, na.rm = T), median = wtd.quantile(Age, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(Age, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(Age, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(Age, w = NatRepemployees, na.rm = T)), N = n() ) readr::write_csv(age_statistics_2, file = "../outputs/data/age_stats_2.csv") ``` ```{r age-by-group-plot} knitr::kable(age_statistics_2, digits = 2, col.names = c("Outsourcing group", "Mean", "Median", "Min", "Max", "Standard dev.", "N")) %>% kable_styling(full_width = F) data %>% ggplot(.,aes(Age, colour = outsourcing_group, fill = outsourcing_group)) + geom_density(alpha = 0.2) + geom_vline(data = age_statistics_2, aes(xintercept=median, colour = outsourcing_group)) + scale_x_continuous(breaks = seq(min(age_statistics_2$min), max(age_statistics_2$max),5)) + theme_minimal() + scale_colour_manual(values=better_colours, name = "Outsourcing group") + scale_fill_manual(values=better_colours, name = "Outsourcing group") ``` ```{r} age_statistics2_income_group <- income_data %>% filter(!is.na(income_group)) %>% group_by(outsourcing_group, income_group) %>% summarise( mean = weighted.mean(Age, w = NatRepemployees, na.rm = T), median = wtd.quantile(Age, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(Age, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(Age, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(Age, w = NatRepemployees, na.rm = T)), N = n() ) knitr::kable(age_statistics2_income_group, digits = 2, col.names = c("Outsourcing group", "Income group", "Mean", "Median", "Min", "Max", "Standard dev.", "N")) %>% kable_styling(full_width = F) income_data %>% filter(!is.na(income_group)) %>% mutate( Age = as.numeric(as.character(as_factor(Age))) ) %>% ggplot(.,aes(Age, colour = outsourcing_group, fill = outsourcing_group)) + facet_grid(rows = vars(income_group)) + geom_density(alpha = 0.3) + geom_vline(data = age_statistics2_income_group, aes(xintercept=median, colour = outsourcing_group)) + scale_x_continuous(breaks = seq(min(age_statistics2_income_group$min), max(age_statistics2_income_group$max),5)) + theme_minimal() + scale_colour_manual(values=colours, name = "Outsourcing group") + scale_fill_manual(values=colours, name = "Outsourcing group") ``` ::: {.callout-tip title="#gender"} - The evidence also finds meaningful differences by gender between the outsourced and non-outsourced groups in our data. Men make up 56% of the outsourced workforce compared to 47% of the non-outsourced workforce, a nearly 10 percentage point difference. - Outsourced workers are 1.44 times more likely to be male than female. - The group with the largest proportion of men in the workforce is the ‘high indicators’ group (66.35%), followed by the ‘likely agency’ group (56.66%), followed by the ‘outsourced’ group (53.94%). Comparison of outsourced and non-outsourced workers finds that - Someone in the high indicators sub-group is 2.18 times more likely to be male than female. - Someone in the agency sub-group is 1.45 times more likely to be male than female. - Someone in the outsourced sub-group is 1.31 times more likely to be male than female. ::: ::: {.callout-important title="#gender-sector"} - Possible addition: Will readers want to know more about how this intersects with the roles or sectors with higher rates of outsourcing – even if this is just an interpretive comment from us on how gender interacts with jobs and sectors more generally in the labour market? ::: ```{r} gender_statistics <- data %>% group_by(outsourcing_status, Gender) %>% summarise( n = n(), Frequency = sum(NatRepemployees) ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum) ) readr::write_csv(gender_statistics, file="../outputs/data/gender_statistics.csv") ``` ```{r gender-outsourcing-status} mod <- multinom(Gender ~ outsourcing_status, data, weights=NatRepemployees) #summary(mod) # get coefficients and calcualte p coefs <- summary(mod)$coefficients coef_names <- rownames(coefs) ors <- exp(coefs) colnames(ors) <- paste(colnames(ors), "or", sep="_") z <- coefs/summary(mod)$standard.errors p <- (1 - pnorm(abs(z), 0, 1)) * 2 colnames(p) <- paste(colnames(p), "p", sep="_") p_2 <- apply(p, 2, function(x) ifelse(x < 0.01, 1, NA)) sig_ors <- exp(summary(mod)$coefficients * p_2) coefs <- cbind(coefs, ors, p) %>% as_tibble() %>% mutate( gender = coef_names, .before=everything() ) write_csv(coefs, file = "../outputs/data/gender_inferential_tab.csv") ``` The outsourced workforce consists of a greater proportion of males than the non-outsourced workforce.[^15] Men make up `r round(gender_statistics[which(gender_statistics$outsourcing_status == "Outsourced" & gender_statistics$Gender == "Male"),"Percentage"], 0)`% of the outsourced workforce compared to `r round(gender_statistics[which(gender_statistics$outsourcing_status == "Not outsourced" & gender_statistics$Gender == "Male"),"Percentage"], 0)`% of the non-outsourced workforce. This difference is statistically significant; outsourced workers, compared to non-outsourced workers, are `r round(sig_ors['Male', 'outsourcing_statusOutsourced'], 2)` times more likely to be male than female.[^16] [^15]: [outputs/data/gender_statistics.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/gender_statistics.csv) [^16]: [../outputs/data/gender_inferential_tab.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/gender_inferential_tab.csv) ```{r} # try just using a glm? mod <- glm(outsourcing_status ~ Gender, family = "quasibinomial", weights = NatRepemployees, data) summary(mod) ors <- extract_glm_coefs(mod) ors ``` ```{r} # gender_statistics %>% # kable() %>% # kable_styling(full_width = F) gender_statistics %>% ggplot(., aes(outsourcing_status, Percentage, fill = Gender)) + geom_col(colour="black") + # annotate("text", x = gender_statistics$outsourcing_status, y = 75, label = paste0("n=", gender_statistics$Frequency)) + coord_flip() + scale_fill_manual(values=colours) + theme_minimal() + xlab("Outsourcing group") + annotate("text", x = gender_statistics$outsourcing_status, y = 99, label = paste0("N = ", gender_statistics$N), hjust=1) ``` ```{r} gender_statistics_2 <- data %>% group_by(outsourcing_group, Gender) %>% summarise( n = n(), Frequency = sum(NatRepemployees) ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum) ) readr::write_csv(gender_statistics_2, file="../outputs/data/gender_statistics_2.csv") ``` ```{r gender-outsourcing-group} mod <- multinom(Gender ~ outsourcing_group, data, weights=NatRepemployees) #summary(mod) # get coefficients and calcualte p coefs <- summary(mod)$coefficients ors <- exp(coefs) colnames(ors) <- paste(colnames(ors), "or", sep="_") z <- coefs/summary(mod)$standard.errors p <- (1 - pnorm(abs(z), 0, 1)) * 2 colnames(p) <- paste(colnames(p), "p", sep="_") p_2 <- apply(p, 2, function(x) ifelse(x < 0.01, 1, NA)) sig_ors <- exp(summary(mod)$coefficients * p_2) # add to table for saving coefs <- cbind(coefs, ors, p) %>% as_tibble() write_csv(coefs, file = "../outputs/data/gender_inferential_tab_2.csv") ``` Breaking down by outsourcing group, we find that the group with the largest proportion of men in the workforce is the 'high indicators' group (`r round(gender_statistics_2 %>% filter(outsourcing_group=="High indicators" & Gender == "Male") %>% pull(Percentage), 2)`%), followed by the 'likely agency' group (`r round(gender_statistics_2 %>% filter(outsourcing_group=="Likely agency" & Gender == "Male") %>% pull(Percentage), 2)`%), followed by the 'outsourced' group (`r round(gender_statistics_2 %>% filter(outsourcing_group=="Outsourced" & Gender == "Male") %>% pull(Percentage), 2)`%). Statistically speaking, compared to a not outsourced person, - Someone in the high indicators group is `r round(sig_ors['Male', 'outsourcing_groupHigh indicators'],2)` times more likely to be male than female. - Someone in the likely agency group is `r round(sig_ors['Male', 'outsourcing_groupLikely agency'],2)` times more likely tobe male than female. - Someone in the outsourced group is `r round(sig_ors['Male', 'outsourcing_groupOutsourced'],2)` times more likely tobe male than female. Additionally, people identifying as 'Other' gender are absent from the high indicators and likely agency groups, though given the small N (`r sum(data$Gender=="Other")`) for this group, this finding is unlikely to be meaningful. ```{r} # gender_statistics_2 %>% # kable() %>% # kable_styling(full_width = F) gender_statistics_2 %>% ggplot(., aes(outsourcing_group, Percentage, fill = Gender)) + geom_col(colour="black") + # annotate("text", x = gender_statistics$outsourcing_status, y = 75, label = paste0("n=", gender_statistics$Frequency)) + coord_flip() + scale_fill_manual(values=colours) + theme_minimal() + xlab("Outsourcing group") + annotate("text", x = gender_statistics_2$outsourcing_group, y = 99, label = paste0("N = ", gender_statistics_2$N), hjust=1) ``` ## Outsourced workers are more likely to work in some sectors than others; but seem to be spread across the labour market ::: {.callout-tip title="#sectors"} - The three most common sectors for outsourced workers in our survey to be employed within – excluding those with an N size below X (50?) – were administrative and support service activities; water supply, sewerage, waste supply and remediation activities; and other service activities - Five of the twenty employment sectors have at least 1 in 5 of their workforce “outsourced”: more than the average of around 17% across the whole workforce. ::: Here we explore what proportion of workers in each sector are outsourced.[^17] [^17]: [outputs/data/sector_summary_3.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/sector_summary_3.csv) ```{r sector-summary-3} sector_summary_3 <- data %>% #filter(income_drop_all == 0) %>% group_by(SectorName, SectorName_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), # avg_income = mean(income_annual_all, na.rm=T), # wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(SectorName) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum), SectorName_labelled = case_when(SectorName_labelled == "NA" ~ NA, TRUE ~ SectorName_labelled), SectorName_short = SectorName_labelled ) %>% # make the sector names more readable separate_wider_delim(SectorName_short, names = c("SectorName_short", "SectorName_short_detail"), delim=";", too_few = "align_start") %>% mutate( SectorName_short = factor(stringr::str_to_sentence(SectorName_short)), SectorName_short_detail = factor(stringr::str_to_sentence(SectorName_short_detail)), ) write_csv(sector_summary_3, file="../outputs/data/sector_summary_3.csv") ``` The plot below shows the proportion of outsourced and not outsourced workers within each sector. I.e. this is showing what sectors have higher and lower proportions of outsourced workers. ```{r sector-plot-2} plot_data <- sector_summary_3 %>% drop_na(SectorName_short) %>% droplevels() %>% ungroup() # Filter for 'outsourced' level and reorder SectorName_short not_outsourced_levels <- plot_data %>% filter(outsourcing_status == 'Not outsourced') %>% mutate(SectorName_short = forcats::fct_reorder(SectorName_short, perc, .desc = TRUE)) outsourced <- plot_data %>% filter(outsourcing_status == 'Outsourced') %>% mutate( rank = rank(desc(perc)) ) # Apply the reordered levels back to the original data plot_data <- plot_data %>% mutate( SectorName_short = factor(SectorName_short, levels = levels(not_outsourced_levels$SectorName_short)), ) # annotation_df <- plot_data %>% # dplyr::select(SectorName_short, outsourcing_status, perc, n # mutate( annotation_df <- plot_data %>% filter(outsourcing_status == "Not outsourced") %>% dplyr::select(SectorName_short, N) %>% mutate( ypos = 80 ) ggplot(plot_data, aes(SectorName_short, perc, fill = outsourcing_status)) + geom_col() + geom_text(inherit.aes=F,data=annotation_df, aes(x=SectorName_short, y=ypos, label = paste0("N = ", N)), hjust=1, nudge_y = 15) + coord_flip() + scale_fill_manual(values=many_colours) + scale_y_continuous(breaks=seq(0,100,10)) # sector_key <- data.frame("number" = seq(1,length(unique(plot_data$SectorName_labelled)),1), # "Sector" = levels(plot_data$SectorName_labelled)) # # sector_key %>% # kable() %>% # kable_styling(full_width = F) ``` The top three Sectors with the highest proportion of outsourced workers are: - `r unique(plot_data$SectorName_labelled[plot_data$SectorName==3])` (note that N = 31) - `r unique(plot_data$SectorName_labelled[plot_data$SectorName==4])` - `r unique(plot_data$SectorName_labelled[plot_data$SectorName==22])` Note that for an undefined sector ('Not found') contained one of the largest proportions of outsourced workers (`r round(plot_data$perc[which(plot_data$SectorName==16 & plot_data$outsourcing_status=="Outsourced")],0)`% of workers in the 'Not found' category were outsourced). A key takeaway here is that whereas the total outsourced population is 17%, this figure varies by sector, from 0% for Mining... and Extraterritoral organisations... all the way to `r round(outsourced[which(outsourced$rank==1),'perc'],0)`% for `r outsourced[which(outsourced$rank==1),'SectorName_short']`, with 5 out 20 sectors having at least 20% of their workforce outsourced. :::{.callout-tip title=#sectors-ogroup} - Figure X also shows how the total outsourced group in each sector splits into our three outsourced “sub-groups”. We find – as you might expect, based on its dominance within the group of outsourced workers – that outsourced workers in every sector are most likely to be in the “outsourced sub-group”, i.e. those who self-identified as outsourced workers. ::: ```{r} sector_summary_3 <- data %>% #filter(income_drop_all == 0) %>% filter(outsourcing_group!="Not outsourced") %>% group_by(SectorName, SectorName_labelled, outsourcing_group) %>% summarise( n = n(), Frequency = sum(NatRepemployees), # avg_income = mean(income_annual_all, na.rm=T), # wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(SectorName) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum), SectorName_labelled = case_when(SectorName_labelled == "NA" ~ NA, TRUE ~ SectorName_labelled), SectorName_short = SectorName_labelled ) %>% # make the sector names more readable separate_wider_delim(SectorName_short, names = c("SectorName_short", "SectorName_short_detail"), delim=";", too_few = "align_start") %>% mutate( SectorName_short = factor(stringr::str_to_sentence(SectorName_short)), SectorName_short_detail = factor(stringr::str_to_sentence(SectorName_short_detail)), ) plot_data <- sector_summary_3 %>% drop_na(SectorName_short) %>% droplevels() %>% ungroup() # Filter for 'outsourced' level and reorder SectorName_short outsourced_levels <- plot_data %>% filter(outsourcing_group == 'Outsourced') %>% mutate(SectorName_short = forcats::fct_reorder(SectorName_short, perc, .desc = TRUE)) outsourced <- plot_data %>% filter(outsourcing_group == 'Outsourced') %>% mutate( rank = rank(desc(perc)) ) # Apply the reordered levels back to the original data plot_data <- plot_data %>% mutate( SectorName_short = factor(SectorName_short, levels = levels(outsourced_levels$SectorName_short)), ) # annotation_df <- plot_data %>% # dplyr::select(SectorName_short, outsourcing_status, perc, n # mutate( annotation_df <- plot_data %>% filter(outsourcing_group == "Outsourced") %>% dplyr::select(SectorName_short, N) %>% mutate( ypos = 80 ) plot_data <- plot_data %>% filter(outsourcing_group!="Not outsourced") ggplot(plot_data, aes(SectorName_short, perc, fill = outsourcing_group)) + geom_col() + geom_text(inherit.aes=F,data=annotation_df, aes(x=SectorName_short, y=ypos, label = paste0("N = ", N)), hjust=1, nudge_y = 15) + coord_flip() + scale_fill_manual(values=many_colours) + scale_y_continuous(breaks=seq(0,100,10)) ``` # Pay ::: {.callout-tip title="'#pay"} - Using regression analysis, we find that outsourced workers are on average paid £2170 less than non-outsourced workers . - The “outsourced sub-group” earns £3,813 less, and the “agency sub-group” £2,603 less, than the non-outsourced group. This finds that pay is lowest in the “outsourced sub-group” of workers, i.e. those who directly identified themselves as being outsourced. Figure X below shows the median and distribution of pay across the three outsourced sub-groups and the non-outsourced group, for comparison. ::: ::: {.callout-important title="#pay-violin"} Violin plot for the above ::: ```{r income} # filter to just cases where income is abovve the fifth percentile and lower than the 95th? I.e., drop the top and bottom 5%. income_statistics <- data %>% filter(income_drop_all == 0 & !is.na(income_annual_all)) %>% group_by(outsourcing_status) %>% summarise( n = n(), mean = weighted.mean(income_annual_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_annual_all, w = NatRepemployees, na.rm = T)) ) readr::write_csv(income_statistics, file="../outputs/data/income_stats_o-status.csv") mod <- lm(income_annual_all ~ outsourcing_status, income_data, weights = NatRepemployees) # summary(mod) coef_table <- extract_lm_coefs(mod) rownames(coef_table) <- coef_table$variable sig_coefs <- extract_lm_coefs(mod, only_sig = T) write_csv(coef_table, file="../outputs/data/model_income_by_o-status.csv") income_statistics_weekly <- data %>% filter(income_drop_all == 0 & !is.na(income_weekly_all)) %>% group_by(outsourcing_status) %>% summarise( n = n(), mean = weighted.mean(income_weekly_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_weekly_all, w = NatRepemployees, na.rm = T)) ) readr::write_csv(income_statistics_weekly, file="../outputs/data/weekly_income_stats_o-status.csv") mod_weekly <- lm(income_weekly_all ~ outsourcing_status, income_data, weights = NatRepemployees) # summary(mod) coef_table_weekly <- extract_lm_coefs(mod_weekly) rownames(coef_table_weekly) <- coef_table_weekly$variable sig_coefs_weekly <- extract_lm_coefs(mod_weekly, only_sig = T) write_csv(coef_table_weekly, file="../outputs/data/model_income_by_o-status_weekly.csv") ``` The tables and plots below show descriptive statistics on income and its distribution for outsourced and non-outsourced people. Regression analysis shows that **outsourced workers are on average paid £`r abs(round(coef_table['outsourcing_statusOutsourced','Estimate'],0))` less annually than non-outsourced workers**.[^18] Per week, **outsourced workers are on average paid £`r abs(round(coef_table_weekly['outsourcing_statusOutsourced','Estimate'],0))` less than non-outsourced workers** [^18]: [outputs/data/income_stats_o-status.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/income_stats_o-status.csv) & [outputs/data/model_income_by_o-status.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/model_income_by_o-status.csv) Weekly stats here^[[outputs/data/weekly_income_stats_o-status.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/weekly_income_stats_o-status.csv) & [outputs/data/model_income_by_o-status_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/model_income_by_o-status_weekly.csv)] ```{r income-plot} knitr::kable(income_statistics, digits = 2, col.names = c("Outsourcing status", "n", "Mean", "Median", "Min", "Max", "Standard dev.")) %>% kable_styling(full_width = F) # plot the distribution of income for the two groups data %>% filter(income_drop_all == 0 & !is.na(income_annual_all)) %>% ggplot(., aes(outsourcing_status, income_annual_all)) + geom_violin() + geom_boxplot(width = 0.3) + geom_text(inherit.aes=F, data=income_statistics, aes(outsourcing_status, y = 6e+04), label=paste0("Mean = ", round(income_statistics$mean,0),"\n", "Median = ", income_statistics$median), nudge_x = 0.1, hjust=0) + coord_cartesian(xlim=c(1,2.5)) + theme_minimal() + xlab("Outsourcing status") + ylab("Annual income") + coord_cartesian(ylim = c(plyr::round_any(min(income_statistics$min), 5000, f = floor),plyr::round_any(max(income_statistics$max),5000, f = ceiling))) + scale_y_continuous(breaks = seq(plyr::round_any(min(income_statistics$min), 5000, f = ceiling), plyr::round_any(max(income_statistics$max),5000, f = ceiling), 10000)) # weekly knitr::kable(income_statistics_weekly, digits = 2, col.names = c("Outsourcing status", "n", "Mean", "Median", "Min", "Max", "Standard dev.")) %>% kable_styling(full_width = F) # plot the distribution of income for the two groups data %>% filter(income_drop_all == 0 & !is.na(income_weekly_all)) %>% ggplot(., aes(outsourcing_status, income_weekly_all)) + geom_violin() + geom_boxplot(width = 0.3) + geom_text(inherit.aes=F, data=income_statistics_weekly, aes(outsourcing_status, y = 6e+04), label=paste0("Mean = ", round(income_statistics_weekly$mean,0),"\n", "Median = ", income_statistics_weekly$median), nudge_x = 0.1, hjust=0) + coord_cartesian(xlim=c(1,2.5)) + theme_minimal() + xlab("Outsourcing status") + ylab("Weekly income") + coord_cartesian(ylim = c(plyr::round_any(min(income_statistics_weekly$min), 10, f = floor),plyr::round_any(max(income_statistics_weekly$max),10, f = ceiling))) + scale_y_continuous(breaks = seq(plyr::round_any(min(income_statistics_weekly$min), 10, f = ceiling), plyr::round_any(max(income_statistics_weekly$max),10, f = ceiling), 100)) ``` ```{r income-outsourcing-group} # filter to just cases where income is abovve the fifth percentile and lower than the 95th? I.e., drop the top and bottom 5%. income_statistics <- data %>% filter(income_drop_all == 0 & !is.na(income_annual_all)) %>% group_by(outsourcing_group) %>% summarise( n = n(), mean = weighted.mean(income_annual_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_annual_all, w = NatRepemployees, na.rm = T)) ) readr::write_csv(income_statistics, file="../outputs/data/income_stats_o-group.csv") mod <- lm(income_annual_all ~ outsourcing_group, income_data, weights = NatRepemployees) # summary(mod) coef_table <- extract_lm_coefs(mod) rownames(coef_table) <- coef_table$variable sig_coefs <- extract_lm_coefs(mod, only_sig = T) write_csv(coef_table, file="../outputs/data/model_income_by_o-group.csv") income_statistics_weekly <- data %>% filter(income_drop_all == 0 & !is.na(income_weekly_all)) %>% group_by(outsourcing_group) %>% summarise( n = n(), mean = weighted.mean(income_weekly_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_weekly_all, w = NatRepemployees, na.rm = T)) ) readr::write_csv(income_statistics_weekly, file="../outputs/data/weekly_income_stats_o-group.csv") mod_weekly <- lm(income_weekly_all ~ outsourcing_group, income_data, weights = NatRepemployees) # summary(mod) coef_table_weekly <- extract_lm_coefs(mod_weekly) rownames(coef_table_weekly) <- coef_table_weekly$variable sig_coefs_weekly <- extract_lm_coefs(mod_weekly, only_sig = T) write_csv(coef_table_weekly, file="../outputs/data/model_income_by_o-group_weekly.csv") ``` The tables and plots below show descriptive statistics on income and its distribution for outsrouced groups. Only the full outsourced subgroup has lower income than non-outsourced people. Regression analysis shows that **outsourced workers are on average paid £`r abs(round(coef_table['outsourcing_groupOutsourced','Estimate'],0))` less annually than non-outsourced workers**.[^18] Per week, **outsourced workers are on average paid £`r abs(round(coef_table_weekly['outsourcing_groupOutsourced','Estimate'],0))` less than non-outsourced workers** [^18]: [outputs/data/income_stats_o-group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/income_stats_o-group.csv) & [outputs/data/model_income_by_o-group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/model_income_by_o-group.csv) Weekly stats here^[[outputs/data/weekly_income_stats_o-group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/weekly_income_stats_o-group.csv) & [outputs/data/model_income_by_o-group_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/model_income_by_o-group_weekly.csv)] ```{r income-plot-group} knitr::kable(income_statistics, digits = 2, col.names = c("Outsourcing group", "n", "Mean", "Median", "Min", "Max", "Standard dev.")) %>% kable_styling(full_width = F) # plot the distribution of income for the two groups data %>% filter(income_drop_all == 0 & !is.na(income_annual_all)) %>% ggplot(., aes(outsourcing_group, income_annual_all)) + geom_violin() + geom_boxplot(width = 0.3) + geom_text(inherit.aes=F, data=income_statistics, aes(outsourcing_group, y = 6e+04), label=paste0("Mean = ", round(income_statistics$mean,0),"\n", "Median = ", round(income_statistics$median,0)), nudge_x = 0.1, hjust=0) + coord_cartesian(xlim=c(1,2.5)) + theme_minimal() + xlab("Outsourcing group") + ylab("Annual income") + coord_cartesian(ylim = c(plyr::round_any(min(income_statistics$min), 5000, f = floor),plyr::round_any(max(income_statistics$max),5000, f = ceiling))) + scale_y_continuous(breaks = seq(plyr::round_any(min(income_statistics$min), 5000, f = ceiling), plyr::round_any(max(income_statistics$max),5000, f = ceiling), 10000)) # weekly knitr::kable(income_statistics_weekly, digits = 2, col.names = c("Outsourcing group", "n", "Mean", "Median", "Min", "Max", "Standard dev.")) %>% kable_styling(full_width = F) # plot the distribution of income for the two groups data %>% filter(income_drop_all == 0 & !is.na(income_weekly_all)) %>% ggplot(., aes(outsourcing_group, income_weekly_all)) + geom_violin() + geom_boxplot(width = 0.3) + geom_text(inherit.aes=F, data=income_statistics_weekly, aes(outsourcing_group, y = 1300), label=paste0("Mean = ", round(income_statistics_weekly$mean,0),"\n", "Median = ", round(income_statistics_weekly$median,0)), nudge_x = 0.1, hjust=0) + coord_cartesian(xlim=c(1,2.5)) + theme_minimal() + xlab("Outsourcing group") + ylab("Weekly income") + coord_cartesian(ylim = c(plyr::round_any(min(income_statistics_weekly$min), 10, f = floor),plyr::round_any(max(income_statistics_weekly$max),10, f = ceiling))) + scale_y_continuous(breaks = seq(plyr::round_any(min(income_statistics_weekly$min), 10, f = ceiling), plyr::round_any(max(income_statistics_weekly$max),10, f = ceiling), 100)) ``` ```{r} #| output: false mod <- lm(income_annual_all ~ Age + Gender + Ethnicity_collapsed + Region + outsourcing_status, income_data, weights = NatRepemployees) summary(mod) mod_2 <- lm(income_annual_all ~ Age + Gender + Has_Degree + Ethnicity_collapsed + Region + outsourcing_status, income_data, weights = NatRepemployees) summary(mod_2) mod_3 <- update(mod_2, ~.+ BORNUK_labelled) summary(mod_3) # anova(mod_2, mod_3) # adding BORNUK improves model fit coef_table <- extract_lm_coefs(mod_3) rownames(coef_table) <- coef_table$variable sig_coefs <- extract_glm_coefs(mod_3, only_sig = T) write_csv(coef_table, file="../outputs/data/model_2_income_by_o-status.csv") mod_3_weekly <- lm(income_weekly_all ~ Age + Gender + Has_Degree + Ethnicity_collapsed + Region + outsourcing_status + BORNUK_labelled, income_data, weights = NatRepemployees) summary(mod_3_weekly) coef_table_weekly <- extract_lm_coefs(mod_3_weekly) rownames(coef_table_weekly) <- coef_table_weekly$variable sig_coefs <- extract_glm_coefs(mod_3_weekly, only_sig = T) write_csv(coef_table_weekly, file="../outputs/data/model_2_income_by_o-status_weekly.csv") ``` This difference increases to £`r abs(round(coef_table['outsourcing_statusOutsourced','Estimate'],0))` annually (£`r abs(round(coef_table_weekly['outsourcing_statusOutsourced','Estimate'],0))` per week) when we take into account Age, Gender, Education, Ethnicity, Region, and Arrival Time. [^19] This analysis shows that all other variables, apart from Age, are in some way relevant to income. On average, and controlling for each of the other variables in the model. Annually: [^19]: [outputs/data/model_2_income_by_o-status.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/model_2_income_by_o-status.csv) - Men earn £`r abs(round(coef_table['GenderMale','Estimate'],0))` more than women. - People who have a degree earn £`r abs(round(coef_table['Has_DegreeYes','Estimate'],0))` more than people without a degree. - Workers in all non-London regions earn less than workers in London - East Midlands: -£`r abs(round(coef_table['RegionEast Midlands','Estimate'],0))` - East of England: -£`r abs(round(coef_table['RegionEast of England','Estimate'],0))` - North East: -£`r abs(round(coef_table['RegionNorth East','Estimate'],0))` - North West: -£`r abs(round(coef_table['RegionNorth West','Estimate'],0))` - Northern Ireland: -£`r abs(round(coef_table['RegionNorthern Ireland','Estimate'],0))` - Scotland: -£`r abs(round(coef_table['RegionScotland','Estimate'],0))` - South East: -£`r abs(round(coef_table['RegionSouth East','Estimate'],0))` - Wales: -£`r abs(round(coef_table['RegionWales','Estimate'],0))` - West Midlands: -£`r abs(round(coef_table['RegionWest Midlands','Estimate'],0))` - Yorkshire and the Humber: -£`r abs(round(coef_table['RegionYorkshire and the Humber','Estimate'],0))` - People who arrived in the UK within the last year earn £`r abs(round(coef_table['BORNUK_labelledWithin the last year','Estimate'],0))` less than people born in the UK - People who arrived in the UK within the last 3 years earn £`r abs(round(coef_table['BORNUK_labelledWithin the last 3 years','Estimate'],0))` less than people born in the UK - People who arrived in the UK within the last 5 years earn £`r abs(round(coef_table['BORNUK_labelledWithin the last 5 years','Estimate'],0))` less than people born in the UK - People who arrived within the last 30 years earn £`r abs(round(coef_table['BORNUK_labelledWithin the last 30 years','Estimate'],0))` more than people born in the UK. Weekly^[[outputs/data/model_2_income_by_o-status_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/model_2_income_by_o-status_weekly.csv)]: - Men earn £`r abs(round(coef_table_weekly['GenderMale','Estimate'],0))` more than women. - People who have a degree earn £`r abs(round(coef_table_weekly['Has_DegreeYes','Estimate'],0))` more than people without a degree. - Workers in all non-London regions earn less than workers in London - East Midlands: -£`r abs(round(coef_table_weekly['RegionEast Midlands','Estimate'],0))` - East of England: -£`r abs(round(coef_table_weekly['RegionEast of England','Estimate'],0))` - North East: -£`r abs(round(coef_table_weekly['RegionNorth East','Estimate'],0))` - North West: -£`r abs(round(coef_table_weekly['RegionNorth West','Estimate'],0))` - Northern Ireland: -£`r abs(round(coef_table_weekly['RegionNorthern Ireland','Estimate'],0))` - Scotland: -£`r abs(round(coef_table_weekly['RegionScotland','Estimate'],0))` - South East: -£`r abs(round(coef_table_weekly['RegionSouth East','Estimate'],0))` - Wales: -£`r abs(round(coef_table_weekly['RegionWales','Estimate'],0))` - West Midlands: -£`r abs(round(coef_table_weekly['RegionWest Midlands','Estimate'],0))` - Yorkshire and the Humber: -£`r abs(round(coef_table_weekly['RegionYorkshire and the Humber','Estimate'],0))` - People who arrived in the UK within the last year earn £`r abs(round(coef_table_weekly['BORNUK_labelledWithin the last year','Estimate'],0))` less than people born in the UK - People who arrived in the UK within the last 3 years earn £`r abs(round(coef_table_weekly['BORNUK_labelledWithin the last 3 years','Estimate'],0))` less than people born in the UK - People who arrived in the UK within the last 5 years earn £`r abs(round(coef_table_weekly['BORNUK_labelledWithin the last 5 years','Estimate'],0))` less than people born in the UK - People who arrived within the last 30 years earn £`r abs(round(coef_table_weekly['BORNUK_labelledWithin the last 30 years','Estimate'],0))` more than people born in the UK. ## Gender pay gap ```{r} #| output: false #| messages: false #| warnings: false simp_mod <- lm(income_annual_all ~ outsourcing_status*Gender, income_data, weights=NatRepemployees) summary(simp_mod) mod <- lm(income_annual_all ~ Age + Gender + Has_Degree + Ethnicity_collapsed + Region + outsourcing_status + BORNUK_labelled + Gender:outsourcing_status, income_data, weights = NatRepemployees) summary(mod) mod_weekly <- lm(income_weekly_all ~ Age + Gender + Has_Degree + Ethnicity_collapsed + Region + outsourcing_status + BORNUK_labelled + Gender:outsourcing_status, income_data, weights = NatRepemployees) summary(mod_weekly) simp_mod_weekly <- lm(income_weekly_all ~ outsourcing_status*Gender, income_data, weights=NatRepemployees) summary(simp_mod) ``` ::: {.callout-warning title="#gender-pay-gap"} - On average within our sample, male workers earn £6400 more than female workers per year; but further exploration of how pay relates to gender for outsourced workers suggests that this gender pay gap doesn’t differ in a statistically significant way depending on whether workers are outsourced or not - For female outsourced workers, this suggests that being an outsourced worker neither exacerbates nor diminishes the gender pay gap they face compared to male workers. **Check what this controls for** ::: ### Outsourcing status ```{r gender-pay-gap-1} gender_outsourced_gap <- income_data %>% group_by(outsourcing_status, Gender) %>% summarise( n = n(), mean = weighted.mean(income_annual_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_annual_all, w = NatRepemployees, na.rm = T)) ) not_outsourced_gap <- gender_outsourced_gap %>% filter(outsourcing_status == "Not outsourced") %>% dplyr::select(c(outsourcing_status, Gender, median)) %>% pivot_wider(names_from = "Gender", values_from = "median") %>% mutate( diff = Male - Female ) %>% pull(diff) outsourced_gap <- gender_outsourced_gap %>% filter(outsourcing_status == "Outsourced") %>% dplyr::select(c(outsourcing_status, Gender, median)) %>% pivot_wider(names_from = "Gender", values_from = "median") %>% mutate( diff = Male - Female ) %>% pull(diff) gender_outsourced_gap %>% kable() %>% kable_styling(full_width = F) gender_outsourced_gap %>% ggplot(aes(outsourcing_status, median, fill = Gender)) + geom_col(position="dodge") + ggtitle("Annual income") write_csv(gender_outsourced_gap, "../outputs/data/o-status_gender_gap.csv") # weekly gender_outsourced_gap_weekly <- income_data %>% group_by(outsourcing_status, Gender) %>% summarise( n = n(), mean = weighted.mean(income_weekly_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_weekly_all, w = NatRepemployees, na.rm = T)) ) not_outsourced_gap_weekly <- gender_outsourced_gap_weekly %>% filter(outsourcing_status == "Not outsourced") %>% dplyr::select(c(outsourcing_status, Gender, median)) %>% pivot_wider(names_from = "Gender", values_from = "median") %>% mutate( diff = Male - Female ) %>% pull(diff) outsourced_gap_weekly <- gender_outsourced_gap_weekly %>% filter(outsourcing_status == "Outsourced") %>% dplyr::select(c(outsourcing_status, Gender, median)) %>% pivot_wider(names_from = "Gender", values_from = "median") %>% mutate( diff = Male - Female ) %>% pull(diff) gender_outsourced_gap_weekly %>% kable() %>% kable_styling(full_width = F) gender_outsourced_gap_weekly%>% ggplot(aes(outsourcing_status, median, fill = Gender)) + geom_col(position="dodge")+ ggtitle("Weekly income") write_csv(gender_outsourced_gap_weekly, "../outputs/data/o-status_gender_gap_weekly.csv") ``` **Annual**^[[outputs/data/o-status_gender_gap.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/o-status_gender_gap.csv) & [outputs/data/mod_o-status_gender.csv](outputs/data/mod_o-status_gender.csv)]: Exploring the gender pay gap by outsourcing status indicates that the pay gap does not differ depending on whether workers are outsourced our not. For non-outsourced workers, females are paid £`r round(not_outsourced_gap,2)` less than males. For outsourced workers, females are paid £`r round(outsourced_gap,2)` less than males. The difference between non-outsourced and outsourced workers is not significant. **Weekly**^[[outputs/data/o-status_gender_gap_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/o-status_gender_gap_weekly.csv) & [outputs/data/mod_o-status_gender_weekly.csv](outputs/data/mod_o-status_gender_weekly.csv)]: Exploring the gender pay gap by outsourcing status indicates that the pay gap does not differ depending on whether workers are outsourced our not. For non-outsourced workers, females are paid £`r round(not_outsourced_gap_weekly,2)` less than males. For outsourced workers, females are paid £`r round(outsourced_gap_weekly,2)` less than males. The difference between non-outsourced and outsourced workers is not significant. ```{r gender-outsourcing-int} #| output: false ggplot(gender_outsourced_gap, aes(outsourcing_status, median, fill = Gender)) + geom_col(position="dodge") + geom_label(aes(label=round(median,0)), position=position_dodge(width=0.9)) + theme_minimal() + ylab("Median income") + xlab("Outsourcing status") simp_mod <- lm(income_annual_all ~ Gender*outsourcing_status, income_data, weights = NatRepemployees) summary(simp_mod) # simp_mod2 <- update(simp_mod, ~. + Has_Degree) # summary(simp_mod2) # anova(simp_mod, simp_mod2) mod_2 <- lm(income_annual_all ~ Age + Has_Degree + Ethnicity_collapsed + Region + Gender*outsourcing_status, income_data, weights = NatRepemployees) summary(mod_2) mod_3 <- update(mod_2, ~.+ BORNUK_labelled) summary(mod_3) anova(mod_2, mod_3) # adding BORNUK improves model fit coef_table <- extract_lm_coefs(mod_3) rownames(coef_table) <- coef_table$variable sig_coefs <- extract_glm_coefs(mod, only_sig = T) write_csv(coef_table, "../outputs/data/mod_o-status_gender.csv") mod_3_weekly <- lm(income_weekly_all ~ Age + Has_Degree + Ethnicity_collapsed + Region + Gender*outsourcing_status + BORNUK_labelled, income_data, weights = NatRepemployees) summary(mod_3_weekly) coef_table_weekly <- extract_lm_coefs(mod_3_weekly) rownames(coef_table_weekly) <- coef_table_weekly$variable sig_coefs <- extract_glm_coefs(mod_3_weekly, only_sig = T) write_csv(coef_table_weekly, "../outputs/data/mod_o-status_gender_weekly.csv") ``` The gender by outsourcing status is also not relevant for whether a worker is low income (i.e. non-sig relationship with income_group). ```{r} #| output: false mod <- glm(income_group ~ Age + Has_Degree + Ethnicity_collapsed + Region + Gender*outsourcing_status + BORNUK_labelled, income_data, family="quasibinomial", weights = NatRepemployees) summary(mod) # test <- summary(mod) # # or <- exp(mod[["coefficients"]][["outsourcing_statusOutsourced"]]) # p <- test[["coefficients"]][2,4] # # coef_table <- extract_lm_coefs(mod_3) # rownames(coef_table) <- coef_table$variable # sig_coefs <- extract_glm_coefs(mod, only_sig = T) write_csv(coef_table, "../outputs/data/mod_gender_outsourcing_income_group.csv") ``` ### Outsourcing group ```{r gender-pay-gap-group} #| output: false #| warnings: false #| messages: false gender_outsourced_gap <- income_data %>% group_by(outsourcing_group, Gender) %>% summarise( n = n(), mean = weighted.mean(income_annual_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_annual_all, w = NatRepemployees, na.rm = T)) ) gender_outsourced_gap %>% kable() %>% kable_styling(full_width = F) gender_outsourced_gap %>% ggplot(aes(outsourcing_group, median, fill = Gender)) + geom_col(position="dodge") + ggtitle("Annual income") write_csv(gender_outsourced_gap, "../outputs/data/o-group_gender_gap.csv") # weekly gender_outsourced_gap_weekly <- income_data %>% group_by(outsourcing_group, Gender) %>% summarise( n = n(), mean = weighted.mean(income_weekly_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_weekly_all, w = NatRepemployees, na.rm = T)) ) gender_outsourced_gap_weekly %>% kable() %>% kable_styling(full_width = F) gender_outsourced_gap_weekly%>% ggplot(aes(outsourcing_group, median, fill = Gender)) + geom_col(position="dodge")+ ggtitle("Weekly income") write_csv(gender_outsourced_gap_weekly, "../outputs/data/o-group_gender_gap_weekly.csv") # models ## annual mod_3 <- lm(income_annual_all ~ Age + Has_Degree + Ethnicity_collapsed + Region + Gender*outsourcing_group + BORNUK_labelled, income_data, weights = NatRepemployees) summary(mod_3) coef_table <- extract_lm_coefs(mod_3) rownames(coef_table) <- coef_table$variable sig_coefs <- extract_glm_coefs(mod, only_sig = T) write_csv(coef_table, "../outputs/data/mod_o-group_gender.csv") # weekly mod_3_weekly <- lm(income_weekly_all ~ Age + Has_Degree + Ethnicity_collapsed + Region + Gender*outsourcing_group + BORNUK_labelled, income_data, weights = NatRepemployees) summary(mod_3_weekly) coef_table_weekly <- extract_lm_coefs(mod_3_weekly) rownames(coef_table_weekly) <- coef_table_weekly$variable sig_coefs_weekly <- extract_glm_coefs(mod_3_weekly, only_sig = T) write_csv(coef_table_weekly, "../outputs/data/mod_o-group_gender_weekly.csv") ``` **Annual data files**^[[outputs/data/o-group_gender_gap.csv.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/o-group_gender_gap.csv.csv) & [outputs/data/mod_o-group_gender.csv](outputs/data/mod_o-group_gender.csv)]: **Weekly**^[[outputs/data/o-group_gender_gap_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/o-group_gender_gap_weekly.csv) & [outputs/data/mod_o-group_gender_weekly.csv](outputs/data/mod_o-group_gender_weekly.csv)]: The gender by outsourcing group is also not relevant for whether a worker is low income (i.e. non-sig relationship with income_group). ```{r} #| output: false mod <- glm(income_group ~ Age + Has_Degree + Ethnicity_collapsed + Region + Gender*outsourcing_group + BORNUK_labelled, income_data, family="quasibinomial", weights = NatRepemployees) summary(mod) # test <- summary(mod) # # or <- exp(mod[["coefficients"]][["outsourcing_statusOutsourced"]]) # p <- test[["coefficients"]][2,4] # # coef_table <- extract_lm_coefs(mod_3) # rownames(coef_table) <- coef_table$variable # sig_coefs <- extract_glm_coefs(mod, only_sig = T) write_csv(coef_table, "../outputs/data/mod_gender_outsourcing_income_group.csv") ``` ::: {.callout-tip title="#gender-income-group"} - In particular, people are more likely to be in our low-paid outsourced group if they are female, or older workers . ::: Income group[^21] [^21]: [../outputs/data/income_group_outsourcing.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/income_group_outsourcing.csv) ```{r} #| output: false # test significance mod <- glm(income_group ~ outsourcing_status, data, family="quasibinomial", weights = NatRepemployees) summary(mod) test <- summary(mod) or <- exp(mod[["coefficients"]][["outsourcing_statusOutsourced"]]) p <- test[["coefficients"]][2,4] mod_2 <- glm(income_group ~ Age + Gender + Has_Degree + Ethnicity_collapsed + Region + outsourcing_status + BORNUK_labelled, income_data, family="quasibinomial", weights = NatRepemployees) summary(mod_2) # test <- summary(mod_2) or <- exp(mod_2[["coefficients"]][["outsourcing_statusOutsourced"]]) p <- test[["coefficients"]][2,4] coef_table <- extract_glm_coefs(mod_2) rownames(coef_table) <- coef_table$variable sig_coefs <- extract_glm_coefs(mod_2, only_sig = T) write_csv(coef_table, file="../outputs/data/income_group_outsourcing.csv") ``` A person is more likely to be in the low income group if they are: - Older - Female - Don't have a degree (or don't know if they have a degree?) - Are outsourced - Arrived in the UK in the last year And less likely if they are: - Younger - Male - Have a degree - Live in the North West or Wales (compared to London) - Arrived in the UK in last 30 years ::: {.callout-tip title="#gender-by-pay-split"} Is there already a basic low / high pay split for gender? I know you talk about women being more likely to be in the low-paid group, but again not sure if there is just a basic “women make up x% of low pay group and x% of not low pay group”? ::: ```{r} #| message: false gender_pay_split <- income_data %>% filter(!is.na(income_group)) %>% group_by(outsourcing_status, income_group, Gender) %>% summarise( freq = sum(NatRepemployees), n = n() ) %>% mutate( total = sum(freq), percentage = 100 * (freq / total), N = sum(n) ) low_pay_perc <- gender_pay_split %>% filter(income_group == "Low" & outsourcing_status == "Outsourced" & Gender == "Female") %>% mutate( round(percentage,2) ) %>% pull() high_pay_perc <- gender_pay_split %>% filter(income_group == "Not low" & outsourcing_status == "Outsourced" & Gender == "Female") %>% mutate( round(percentage,2) ) %>% pull() mod <- glm(income_group ~ Gender, income_data, weights = NatRepemployees, family ="quasibinomial") # summary(mod) mod_2 <- glm(income_group ~ Gender * outsourcing_status, income_data, weights = NatRepemployees, family ="quasibinomial") # summary(mod_2) ``` `r low_pay_perc`% of outsourced workers in the low pay group were female, compared to `r high_pay_perc`% of outsourced workers in the not low pay group. This difference is statistically significant; women are more likely to be in the low income group. This pattern is the same for non outsourced workers, and there is no interaction effect; irrespective of outsourcing status, women are more likely to be low paid, and irrespective of gender, outsourced people are more likely to be low paid. ```{r} gender_pay_split %>% ggplot(aes(income_group, percentage, fill = Gender)) + facet_grid(rows = vars(outsourcing_status)) + geom_col(position="dodge") + theme_minimal() ``` ::: {.callout-important title="#pay-gap-sector"} - Overall, we find that workers in administrative and support service activities – one of the dominant sectors for outsourced workers in this research – are more likely to be lower-paid than non-outsourced workers in the same sector. The same is true for outsourced water supply (full name; sewerage, waste etc.) workers – another prominent outsourcing sector – information and communication, transportation and storage, and education workers, amongst others. In contrast, we find outsourced workers in financial and insurance activities, for example, appear to be slightly higher paid on average than their non-outsourced counterparts; however, this is one of the few sectors in which this appears to be the case.**to be confirmed** I don’t quite understand the chart below the above chart in the file, would you be able to explain it – thanks! Is this the best chart to use, above? Does this need to control for anything else to show us the most accurate analysis of pay by sector for outsourced and non outsourced, or are we confident that this is showing us something notable about sector and pay? ::: ## Sectors/occupations ### Sector and occupation hierarchy The data from Opinium has four variables relating to sectors/occupations. These are - SectorName - Majorgroupcode - MajorsubgroupOccupation - UnitOccupation SOC 2020 has nine major groups, 26 sub-major groups, 104 minor groups and 412 unit groups. The variables we have appear to map in the following way: - Majorgroupcode = the 9 'major groups' - MajorsubgroupOccupation = the 26 'sub-major' groups - UnitOccupation = the 104 'minor groups' This last pairing is the point of confusion. The 'UnitOccupation' wording came from Opinium and these categories match the [coding index](https://www.ons.gov.uk/methodology/classificationsandstandards/standardoccupationalclassificationsoc/soc2020/soc2020volume2codingrulesandconventions) where they are confusingly referred to as 'unit groups' even though they are the minor groups. There is no variable in our data that relates to the most disaggregated category, the 412 'unit groups'. The unique values of each variable are shown in each section below. #### SectorName ```{r} data %>% distinct(SectorName_labelled) %>% filter(SectorName_labelled != "NA") %>% kable() %>% kable_styling(full_width = F) ``` #### Majorgroupcode These are the 9 major groups according to SOC ```{r} data %>% distinct(Majorgroupcode_labelled) %>% drop_na() %>% filter(Majorgroupcode_labelled != "NA") %>% kable() %>% kable_styling(full_width = F) ``` #### MajorsubgroupOccupation These are the 26 'sub-major' groups ```{r} data %>% distinct(MajorsubgroupOccupation_labelled) %>% drop_na() %>% filter(MajorsubgroupOccupation_labelled != "NA") %>% kable() %>% kable_styling(full_width = F) ``` #### UnitOccupation These are indeed the 104 'minor groups'. ```{r} data %>% distinct(UnitOccupation_labelled) %>% drop_na %>% kable() %>% kable_styling(full_width = F) ``` ### Sectoral pay differences #### Weekly^[[outputs/data/sector_summary_pay_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/sector_summary_pay_weekly.csv)] ```{r sector-bubble-weekly} sector_summary_pay <- data %>% filter(income_drop_all == 0 & !is.na(income_weekly_all)) %>% group_by(SectorName, SectorName_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_weekly_all, na.rm=T), wtd_avg_income = weighted.mean(income_weekly_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(SectorName) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum), SectorName_labelled = case_when(SectorName_labelled == "NA" ~ NA, TRUE ~ SectorName_labelled), SectorName_short = SectorName_labelled ) %>% # make the sector names more readable separate_wider_delim(SectorName_short, names = c("SectorName_short", "SectorName_short_detail"), delim=";", too_few = "align_start") %>% mutate( SectorName_short = factor(stringr::str_to_sentence(SectorName_short)), SectorName_short_detail = factor(stringr::str_to_sentence(SectorName_short_detail)), ) summary_weekly <- sector_summary_pay %>% group_by(SectorName_labelled) %>% mutate( min_n = min(n, na.rm=TRUE) ) %>% filter(min_n >= 10) %>% # need to identify the unit occs that have an ok n ungroup() write_csv(sector_summary_pay, file="../outputs/data/sector_summary_pay_weekly.csv") plot_data <- sector_summary_pay %>% drop_na(SectorName_short) %>% droplevels() %>% ungroup() # Filter for 'outsourced' level and reorder SectorName_short not_outsourced_levels <- plot_data %>% filter(outsourcing_status == 'Not outsourced') %>% mutate(SectorName_short = forcats::fct_reorder(SectorName_short, N, .desc = FALSE)) # outsourced <- plot_data %>% # filter(outsourcing_status == 'Outsourced') %>% # mutate( # rank = rank(desc(perc)) # ) # Apply the reordered levels back to the original data plot_data <- plot_data %>% mutate( SectorName_short = factor(SectorName_short, levels = levels(not_outsourced_levels$SectorName_short)), ) %>% arrange(desc(SectorName_short)) annotation_df <- plot_data %>% #filter(outsourcing_status == "Not outsourced") %>% dplyr::select(SectorName_short, n) %>% group_by(SectorName_short) %>% summarise( N = sum(n) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) plot_data %>% # mutate( # SectorName = as.factor(SectorName) # ) %>% ggplot(., aes(wtd_avg_income,SectorName_short, size = perc, colour = outsourcing_status)) + geom_point(position = "dodge") + theme_minimal() + theme(legend.position = "bottom", legend.title = element_blank())+ #coord_flip() + scale_x_continuous(breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 100)) + scale_colour_manual(values=colours) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=SectorName_short, label = paste0("N = ", N)), hjust=1) + geom_label_repel(inherit.aes = F, aes(wtd_avg_income, SectorName_short, colour = outsourcing_status, label=paste0("n=",n)), size=3) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average weekly income") + ylab("Sector") + labs(caption = "Size of bubble represents the size of the respective workforce within the sector") sectors_of_interest <- unique(plot_data$SectorName_labelled) sectors_of_interest <- sectors_of_interest[1:13] %>% stringr::str_to_title() ``` #### Hourly^[[outputs/data/sector_summary_pay_hourly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/sector_summary_pay_hourly.csv)] ```{r sector-bubble-hourly} sector_summary_pay <- data %>% filter(income_drop_all == 0 & !is.na(income_hourly_all)) %>% group_by(SectorName, SectorName_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_hourly_all, na.rm=T), wtd_avg_income = weighted.mean(income_hourly_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(SectorName) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum), SectorName_labelled = case_when(SectorName_labelled == "NA" ~ NA, TRUE ~ SectorName_labelled), SectorName_short = SectorName_labelled ) %>% # make the sector names more readable separate_wider_delim(SectorName_short, names = c("SectorName_short", "SectorName_short_detail"), delim=";", too_few = "align_start") %>% mutate( SectorName_short = factor(stringr::str_to_sentence(SectorName_short)), SectorName_short_detail = factor(stringr::str_to_sentence(SectorName_short_detail)), ) summary_hourly <- sector_summary_pay %>% group_by(SectorName_labelled) %>% mutate( min_n = min(n, na.rm=TRUE) ) %>% filter(min_n >= 10) %>% # need to identify the unit occs that have an ok n ungroup() write_csv(sector_summary_pay, file="../outputs/data/sector_summary_pay_hourly.csv") plot_data <- sector_summary_pay %>% drop_na(SectorName_short) %>% droplevels() %>% ungroup() # Filter for 'outsourced' level and reorder SectorName_short not_outsourced_levels <- plot_data %>% filter(outsourcing_status == 'Not outsourced') %>% mutate(SectorName_short = forcats::fct_reorder(SectorName_short, N, .desc = FALSE)) # outsourced <- plot_data %>% # filter(outsourcing_status == 'Outsourced') %>% # mutate( # rank = rank(desc(perc)) # ) # Apply the reordered levels back to the original data plot_data <- plot_data %>% mutate( SectorName_short = factor(SectorName_short, levels = levels(not_outsourced_levels$SectorName_short)), ) %>% arrange(desc(SectorName_short)) annotation_df <- plot_data %>% #filter(outsourcing_status == "Not outsourced") %>% dplyr::select(SectorName_short, n) %>% group_by(SectorName_short) %>% summarise( N = sum(n) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) plot_data %>% # mutate( # SectorName = as.factor(SectorName) # ) %>% ggplot(., aes(wtd_avg_income,SectorName_short, size = perc, colour = outsourcing_status)) + geom_point(position = "dodge") + theme_minimal() + theme(legend.position = "bottom", legend.title = element_blank())+ #coord_flip() + scale_x_continuous(breaks=scales::breaks_pretty(n=10)) + # seq(0,max(plot_data$wtd_avg_income, na.rm=T), 100)) + scale_colour_manual(values=colours) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=SectorName_short, label = paste0("N = ", N)), hjust=1) + geom_label_repel(inherit.aes = F, aes(wtd_avg_income, SectorName_short, colour = outsourcing_status, label=paste0("n=",n)), size=3) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average hourly income") + ylab("Sector") + labs(caption = "Size of bubble represents the size of the respective workforce within the sector") sectors_of_interest <- unique(plot_data$SectorName_labelled) sectors_of_interest <- sectors_of_interest[1:13] %>% stringr::str_to_title() ``` #### Comparing pay penalty between weekly and hourly Note only consider n >= 10 ```{r} # add pay frame flags summary_weekly <- summary_weekly %>% mutate( pay_frame = "weekly" ) summary_hourly <- summary_hourly %>% mutate( pay_frame = "hourly" ) # combine summary_combined <- dplyr::bind_rows(summary_weekly,summary_hourly) summary_combined2 <- summary_combined %>% filter(!is.na(SectorName_labelled)) %>% pivot_wider(id_cols = c(SectorName_labelled, pay_frame), names_from = outsourcing_status, values_from = c(wtd_avg_income, n)) %>% janitor::clean_names() %>% mutate( pay_diff = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced ) summary_combined3 <- summary_combined2 %>% pivot_wider(id_cols = sector_name_labelled, names_from = pay_frame, values_from = pay_diff, names_glue = "{pay_frame}_pay_diff") %>% mutate( pattern_reverse = ifelse((weekly_pay_diff < 0 & hourly_pay_diff >= 0 ) | (weekly_pay_diff >= 0 & hourly_pay_diff < 0), 1, 0 ) ) ``` The table below shows the pay difference between outsourced and non-outsourced workers by sector. Negative values indicate pay penalties for outsourced workers. The 'pattern_reverse' column indicates the `r sum(summary_combined3$pattern_reverse, na.rm=T)` sectors where the direction of the difference is different if you consider hourly versus weekly pay difference. For example, per week, outsourced workers in PROFESSIONAL, SCIENTIFIC AND TECHNICAL ACTIVITIES earn £1.77 less than non-outsourced counterparts, but per hour they are paid on average 1.3y more than non-outsourced workers. This suggests that outsourced rates are higher in this occupation, but the amount of work available is not enough for outsourced people to earn more than non-outsourced people on a weekly basis. The reverse pattern indicates sectors where outsourced workers are paid less per hour but work more hours and earn more per week than their non-outsourced counterparts. ```{r} summary_combined3 %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = "Weekly and hourly pay difference by sector") %>% kable_styling(full_width = F) ``` ### Major group occupations #### Weekly^[[outputs/data/major_subgroup_occupation_in_sector_summary_pay_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/major_subgroup_occupation_in_sector_summary_pay_weekly.csv)] Here we look at Major subgroup occupations within sectors. We only consider the down to 'Other services', as the remaining sectors have small n for outsourced group. Note you can find larger images for these plots in [outputs/figures/occupation_pay_plots](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/figures/occupation_pay_plots). The figures indicate there is variation between occupations within sectors in terms of whether outsourced people are paid less or more than non-outsourced workers. ```{r} #| height: 10 #| width: 10 occ_in_sect_summary_pay <- data %>% filter(income_drop_all == 0 & !is.na(income_weekly_all)) %>% group_by(SectorName, SectorName_labelled, MajorsubgroupOccupation_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_weekly_all, na.rm=T), wtd_avg_income = weighted.mean(income_weekly_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(SectorName_labelled, MajorsubgroupOccupation_labelled) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum), MajorsubgroupOccupation_labelled = case_when(MajorsubgroupOccupation_labelled == "NA" ~ NA, TRUE ~ MajorsubgroupOccupation_labelled), MajorsubgroupOccupation_labelled = stringr::str_to_title(MajorsubgroupOccupation_labelled), SectorName_labelled = stringr::str_to_title(SectorName_labelled) ) summary_weekly <- occ_in_sect_summary_pay %>% group_by(SectorName_labelled,MajorsubgroupOccupation_labelled) %>% mutate( min_n = min(n, na.rm=TRUE) ) %>% filter(min_n >= 10) %>% # need to identify the unit occs that have an ok n ungroup() %>% mutate( pay_frame = "weekly" ) write_csv(occ_in_sect_summary_pay, file="../outputs/data/major_subgroup_occupation_in_sector_summary_pay_weekly.csv") for(sector in sectors_of_interest){ #print(sector) # subset to this sector and drop na occupatoins plot_data <- occ_in_sect_summary_pay %>% filter(SectorName_labelled == sector) %>% filter(!is.na(MajorsubgroupOccupation_labelled)) %>% droplevels() %>% ungroup() # Order occs by N # First filter for 'outsourced' level and reorder by N not_outsourced_levels <- plot_data %>% dplyr::select(MajorsubgroupOccupation_labelled, outsourcing_status, N) %>% distinct(MajorsubgroupOccupation_labelled, N) %>% mutate(MajorsubgroupOccupation_labelled = forcats::fct_reorder(MajorsubgroupOccupation_labelled, N, .desc = FALSE)) # not_outsourced_levels <- plot_data %>% # filter(outsourcing_status == 'Not outsourced') %>% # mutate(MajorsubgroupOccupation_labelled = forcats::fct_reorder(MajorsubgroupOccupation_labelled, N, .desc = FALSE)) # Then apply the reordered levels back to the original data plot_data <- plot_data %>% mutate( MajorsubgroupOccupation_labelled = factor(MajorsubgroupOccupation_labelled, levels = levels(not_outsourced_levels$MajorsubgroupOccupation_labelled)), ) annotation_df <- plot_data %>% #filter(outsourcing_status == "Not outsourced") %>% dplyr::select(MajorsubgroupOccupation_labelled, n) %>% group_by(MajorsubgroupOccupation_labelled) %>% summarise( N = sum(n) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) p <- plot_data %>% ggplot(., aes(wtd_avg_income, MajorsubgroupOccupation_labelled, size = perc, colour = outsourcing_status)) + geom_point(position = "dodge") + geom_label_repel(inherit.aes = F, aes(wtd_avg_income, MajorsubgroupOccupation_labelled, colour = outsourcing_status, label=paste0("n=",n)), size=3, #force_pull = 2 ) + theme_minimal() + theme(legend.position = "bottom", legend.title = element_blank()) + #coord_flip() + scale_x_continuous(breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 200)) + scale_colour_manual(values=colours) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=MajorsubgroupOccupation_labelled, label = paste0("N = ", N)), hjust=1) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average weekly income") + ylab("Major subgroup occupation") + labs(caption = "Size of bubble represents the size of the respective workforce within the occupation") + ggtitle(sector) show(p) ggsave(here('outputs','figures','occupation_pay_plots',paste0('major_subgroup_occupation_pay_plot_weekly_', sector, '.png')), height = 8, width = 8, dpi=800, bg="white") } ``` #### Hourly^[[outputs/data/major_subgroup_occupation_in_sector_summary_pay_hourly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/major_subgroup_occupation_in_sector_summary_pay_hourly.csv)] ```{r} #| height: 10 #| width: 10 occ_in_sect_summary_pay <- data %>% filter(income_drop_all == 0 & !is.na(income_hourly_all)) %>% group_by(SectorName, SectorName_labelled, MajorsubgroupOccupation_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_hourly_all, na.rm=T), wtd_avg_income = weighted.mean(income_hourly_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(SectorName_labelled, MajorsubgroupOccupation_labelled) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum), MajorsubgroupOccupation_labelled = case_when(MajorsubgroupOccupation_labelled == "NA" ~ NA, TRUE ~ MajorsubgroupOccupation_labelled), MajorsubgroupOccupation_labelled = stringr::str_to_title(MajorsubgroupOccupation_labelled), SectorName_labelled = stringr::str_to_title(SectorName_labelled) ) summary_hourly <- occ_in_sect_summary_pay %>% group_by(SectorName_labelled,MajorsubgroupOccupation_labelled) %>% mutate( min_n = min(n, na.rm=TRUE) ) %>% filter(min_n >= 10) %>% # need to identify the unit occs that have an ok n ungroup() %>% mutate( pay_frame = "hourly" ) write_csv(occ_in_sect_summary_pay, file="../outputs/data/major_subgroup_occupation_in_sector_summary_pay_hourly.csv") for(sector in sectors_of_interest){ #print(sector) # subset to this sector and drop na occupatoins plot_data <- occ_in_sect_summary_pay %>% filter(SectorName_labelled == sector) %>% filter(!is.na(MajorsubgroupOccupation_labelled)) %>% droplevels() %>% ungroup() # Order occs by N # First filter for 'outsourced' level and reorder by N not_outsourced_levels <- plot_data %>% dplyr::select(MajorsubgroupOccupation_labelled, outsourcing_status, N) %>% distinct(MajorsubgroupOccupation_labelled, N) %>% mutate(MajorsubgroupOccupation_labelled = forcats::fct_reorder(MajorsubgroupOccupation_labelled, N, .desc = FALSE)) # not_outsourced_levels <- plot_data %>% # filter(outsourcing_status == 'Not outsourced') %>% # mutate(MajorsubgroupOccupation_labelled = forcats::fct_reorder(MajorsubgroupOccupation_labelled, N, .desc = FALSE)) # Then apply the reordered levels back to the original data plot_data <- plot_data %>% mutate( MajorsubgroupOccupation_labelled = factor(MajorsubgroupOccupation_labelled, levels = levels(not_outsourced_levels$MajorsubgroupOccupation_labelled)), ) annotation_df <- plot_data %>% #filter(outsourcing_status == "Not outsourced") %>% dplyr::select(MajorsubgroupOccupation_labelled, n) %>% group_by(MajorsubgroupOccupation_labelled) %>% summarise( N = sum(n) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) p <- plot_data %>% ggplot(., aes(wtd_avg_income, MajorsubgroupOccupation_labelled, size = perc, colour = outsourcing_status)) + geom_point(position = "dodge") + geom_label_repel(inherit.aes = F, aes(wtd_avg_income, MajorsubgroupOccupation_labelled, colour = outsourcing_status, label=paste0("n=",n)), size=3, #force_pull = 2 ) + theme_minimal() + theme(legend.position = "bottom", legend.title = element_blank()) + #coord_flip() + scale_x_continuous(breaks=scales::breaks_pretty(n=10)) + scale_colour_manual(values=colours) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=MajorsubgroupOccupation_labelled, label = paste0("N = ", N)), hjust=1) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average hourly income") + ylab("Major subgroup occupation") + labs(caption = "Size of bubble represents the size of the respective workforce within the occupation") + ggtitle(sector) show(p) ggsave(here('outputs','figures','occupation_pay_plots',paste0('major_subgroup_occupation_pay_plot_hourly_', sector, '.png')), height = 8, width = 8, dpi=800, bg="white") } ``` #### Comparing pay penalty between weekly and hourly Note only consider n >= 10 The table below shows the weekly and hourly pay difference between outsourced and non-outsourced workers by major group occupation. As before, negative values indicate pay penalties for outsourced workers, and the 'pattern_reverse' column indicates the occupations where the direction of the difference is different if you consider hourly versus weekly pay difference. ```{r} # combine summary_combined <- dplyr::bind_rows(summary_weekly,summary_hourly) # Function for processing occupations within sectors. Makes a kable table for occupations in each sector (as long as there's both outsourced and non-outsourced entries for the occupation) comparison_table <- function(df, within_sectors = TRUE){ if(within_sectors){ caption_text = "within" } else{ caption_text = "across" } sectors <- unique(df[["SectorName_labelled"]]) sectors <- sectors[!is.na(sectors)] output_list <- vector('list', length(sectors)) for(i in 1:length(sectors)){ sector <- sectors[i] this_data <- df %>% filter(SectorName_labelled == sector) %>% filter(!is.na(MajorsubgroupOccupation_labelled)) %>% droplevels() %>% ungroup() if(length(unique(this_data$outsourcing_status)) == 2){ this_data <- this_data %>% pivot_wider(id_cols = c(MajorsubgroupOccupation_labelled, pay_frame), names_from = outsourcing_status, values_from = c(wtd_avg_income, n)) %>% janitor::clean_names() %>% mutate( pay_diff = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced ) %>% pivot_wider(id_cols = majorsubgroup_occupation_labelled, names_from = pay_frame, values_from = pay_diff, names_glue = "{pay_frame}_pay_diff") %>% mutate( pattern_reverse = ifelse((weekly_pay_diff < 0 & hourly_pay_diff >= 0 ) | (weekly_pay_diff >= 0 & hourly_pay_diff < 0), 1, 0 ) ) # output_list[[i]] <- this_data } else{ output_list[[i]] <- NA next } # k <- this_data %>% # filter(!is.na(weekly_pay_diff)) %>% # arrange(weekly_pay_diff) %>% # kable(caption = paste("Weekly and hourly pay difference by major group occupations", caption_text, sector, sep = " ")) %>% # kable_styling(full_width = F) # output_list[[i]] <- k output_list[[i]] <- this_data names(output_list)[i] <- sector } return(output_list) } tables <- comparison_table(summary_combined, within_sectors = F) # Print the kable tables # for(i in 1:length(tables)){ # if(!is.na(tables[i])){ # tables[[i]] %>% # filter(!is.na(weekly_pay_diff)) %>% # arrange(weekly_pay_diff) %>% # kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% # kable_styling(full_width = F) %>% # print() # } # } ``` ```{r} #| warning: false #| error: false # there's got to be better way but can't find it # this prints the kable tables for the sectors i <- 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i +1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) ``` ### Major group occupations across all sectors Note I only consider unit occupations where the the minimum n is >= 10. #### Weekly^[[outputs/data/major_subgroup_across_sectors_occupation_summary_pay_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/major_subgroup_across_sectors_occupation_summary_pay_weekly.csv)] ```{r} #| height: 20 #| width: 10 major_occ_summary_pay <- data %>% filter(income_drop_all == 0 & !is.na(income_weekly_all)) %>% group_by(MajorsubgroupOccupation_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_weekly_all, na.rm=T), wtd_avg_income = weighted.mean(income_weekly_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(MajorsubgroupOccupation_labelled) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum), MajorsubgroupOccupation_labelled = case_when(MajorsubgroupOccupation_labelled == "NA" ~ NA, TRUE ~ MajorsubgroupOccupation_labelled), MajorsubgroupOccupation_labelled = stringr::str_to_title(MajorsubgroupOccupation_labelled) ) %>% ungroup() write_csv(major_occ_summary_pay, file="../outputs/data/major_subgroup_across_sectors_occupation_summary_pay_weekly.csv") # need to identify the unit occs that have an ok n # subste to occs with n>=10 unit_subset_weekly <- major_occ_summary_pay %>% group_by(MajorsubgroupOccupation_labelled) %>% mutate( min_n = min(n, na.rm=TRUE) ) %>% filter(min_n >= 10) unit_subset <- unit_subset_weekly # create a df with occs where outsourced paid less so we can just list it paid_less <- unit_subset %>% pivot_wider(id_cols = c(MajorsubgroupOccupation_labelled), names_from = outsourcing_status, values_from = c(wtd_avg_income, n)) %>% janitor::clean_names() %>% mutate( pay_penalty = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced ) %>% filter( pay_penalty < 0 ) write_csv(paid_less, file="../outputs/data/major_subgroup_occupation_weekly_pay_penalty_across_sectors.csv") #print(sector) # subset to this sector and drop na occupatoins plot_data <- unit_subset %>% filter(!is.na(MajorsubgroupOccupation_labelled)) %>% droplevels() %>% ungroup() # Order occs by N # First filter for 'outsourced' level and reorder by N not_outsourced_levels <- plot_data %>% dplyr::select(MajorsubgroupOccupation_labelled, outsourcing_status, N) %>% distinct(MajorsubgroupOccupation_labelled, N) %>% mutate(MajorsubgroupOccupation_labelled = forcats::fct_reorder(MajorsubgroupOccupation_labelled, N, .desc = FALSE)) # not_outsourced_levels <- plot_data %>% # filter(outsourcing_status == 'Not outsourced') %>% # mutate(MajorsubgroupOccupation_labelled = forcats::fct_reorder(MajorsubgroupOccupation_labelled, N, .desc = FALSE)) # Then apply the reordered levels back to the original data plot_data <- plot_data %>% mutate( MajorsubgroupOccupation_labelled = factor(MajorsubgroupOccupation_labelled, levels = levels(not_outsourced_levels$MajorsubgroupOccupation_labelled)), ) annotation_df <- plot_data %>% #filter(outsourcing_status == "Not outsourced") %>% dplyr::select(MajorsubgroupOccupation_labelled, n) %>% group_by(MajorsubgroupOccupation_labelled) %>% summarise( N = sum(n) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) p <- plot_data %>% ggplot(., aes(wtd_avg_income, MajorsubgroupOccupation_labelled, size = perc, colour = outsourcing_status)) + geom_point(position = "dodge") + geom_label_repel(inherit.aes = F, aes(wtd_avg_income, MajorsubgroupOccupation_labelled, colour = outsourcing_status, label=paste0("n=",n)), size=3, #force_pull = 2 ) + theme_minimal() + theme(legend.position = "bottom", legend.title = element_blank()) + #coord_flip() + scale_x_continuous(breaks=scales::breaks_pretty(n=5)) + #breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 200)) + scale_colour_manual(values=colours) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=MajorsubgroupOccupation_labelled, label = paste0("N = ", N)), hjust=1) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average weekly income") + ylab("Major sub group occupation") + labs(caption = "Size of bubble represents the size of the respective workforce within the occupation") + ggtitle("All sectors") show(p) ggsave(here('outputs','figures','occupation_pay_plots','major_subgroup_occupation_all_sectors_pay_plot.png'), height = 8, width = 8, dpi=800, bg="white") ``` Looking at occupations across all sectors, there are many occupations where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:^[[outputs/data/major_subgroup_occupation_weekly_pay_penalty_across_sectors.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/major_subgroup_occupation_weekly_pay_penalty_across_sectors.csv)] ```{r} paid_less %>% arrange(pay_penalty) %>% relocate( pay_penalty, .after = majorsubgroup_occupation_labelled ) %>% kable(caption = "Weekly pay penalty for major subgroup occupations across all sectors") %>% kable_styling() ``` #### Hourly^[[outputs/data/major_subgroup_occupation_summary_pay_hourly_across_sectors.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/major_subgroup_occupation_summary_pay_hourly_across_sectors.csv)] ```{r} #| height: 10 #| width: 10 major_occ_summary_pay <- data %>% filter(income_drop_all == 0 & !is.na(income_hourly_all)) %>% group_by(MajorsubgroupOccupation_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_hourly_all, na.rm=T), wtd_avg_income = weighted.mean(income_hourly_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(MajorsubgroupOccupation_labelled) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum), MajorsubgroupOccupation_labelled = case_when(MajorsubgroupOccupation_labelled == "NA" ~ NA, TRUE ~ MajorsubgroupOccupation_labelled), MajorsubgroupOccupation_labelled = stringr::str_to_title(MajorsubgroupOccupation_labelled) ) %>% ungroup() write_csv(major_occ_summary_pay, file="../outputs/data/major_subgroup_occupation_summary_pay_hourly_across_sectors.csv") # need to identify the unit occs that have an ok n # subste to occs with n>=10 unit_subset_hourly <- major_occ_summary_pay %>% group_by(MajorsubgroupOccupation_labelled) %>% mutate( min_n = min(n, na.rm=TRUE) ) %>% filter(min_n >= 10) unit_subset <- unit_subset_hourly # create a df with occs where outsourced paid less so we can just list it paid_less <- unit_subset %>% pivot_wider(id_cols = c(MajorsubgroupOccupation_labelled), names_from = outsourcing_status, values_from = c(wtd_avg_income, n)) %>% janitor::clean_names() %>% mutate( pay_penalty = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced ) %>% filter( pay_penalty < 0 ) write_csv(paid_less, file="../outputs/data/major_subgroup_occupation_hourly_pay_penalty_across_sectors.csv") #print(sector) # subset to this sector and drop na occupatoins plot_data <- unit_subset %>% filter(!is.na(MajorsubgroupOccupation_labelled)) %>% droplevels() %>% ungroup() # Order occs by N # First filter for 'outsourced' level and reorder by N not_outsourced_levels <- plot_data %>% dplyr::select(MajorsubgroupOccupation_labelled, outsourcing_status, N) %>% distinct(MajorsubgroupOccupation_labelled, N) %>% mutate(MajorsubgroupOccupation_labelled = forcats::fct_reorder(MajorsubgroupOccupation_labelled, N, .desc = FALSE)) # not_outsourced_levels <- plot_data %>% # filter(outsourcing_status == 'Not outsourced') %>% # mutate(MajorsubgroupOccupation_labelled = forcats::fct_reorder(MajorsubgroupOccupation_labelled, N, .desc = FALSE)) # Then apply the reordered levels back to the original data plot_data <- plot_data %>% mutate( MajorsubgroupOccupation_labelled = factor(MajorsubgroupOccupation_labelled, levels = levels(not_outsourced_levels$MajorsubgroupOccupation_labelled)), ) annotation_df <- plot_data %>% #filter(outsourcing_status == "Not outsourced") %>% dplyr::select(MajorsubgroupOccupation_labelled, n) %>% group_by(MajorsubgroupOccupation_labelled) %>% summarise( N = sum(n) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) p <- plot_data %>% ggplot(., aes(wtd_avg_income, MajorsubgroupOccupation_labelled, size = perc, colour = outsourcing_status)) + geom_point(position = "dodge") + geom_label_repel(inherit.aes = F, aes(wtd_avg_income, MajorsubgroupOccupation_labelled, colour = outsourcing_status, label=paste0("n=",n)), size=3, #force_pull = 2 ) + theme_minimal() + theme(legend.position = "bottom", legend.title = element_blank()) + #coord_flip() + scale_x_continuous(breaks=scales::breaks_pretty(n=5)) + #breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 200)) + scale_colour_manual(values=colours) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=MajorsubgroupOccupation_labelled, label = paste0("N = ", N)), hjust=1) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average hourly income") + ylab("Major subgroup occupation") + labs(caption = "Size of bubble represents the size of the respective workforce within the occupation") + ggtitle("All sectors") show(p) ggsave(here('outputs','figures','occupation_pay_plots','major_occupation_pay_plot_hourly_all_sectors.png'), height = 8, width = 8, dpi=800, bg="white") ``` Looking at occupations across all sectors, there are many occupations where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:^[[outputs/data/major_subgroup_occupation_hourly_pay_penalty_across_sectors.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/major_subgroup_occupation_hourly_pay_penalty_across_sectors.csv)] ```{r} paid_less %>% arrange(pay_penalty) %>% relocate( pay_penalty, .after = majorsubgroup_occupation_labelled ) %>% kable(caption = "Hourly pay penalty for major subgroup occupations across all sectors") %>% kable_styling() ``` #### Comparing pay penalty between weekly and hourly Note only consider n >= 10 ```{r} # add pay frame flags unit_subset_weekly <- unit_subset_weekly %>% mutate( pay_frame = "weekly" ) unit_subset_hourly <- unit_subset_hourly %>% mutate( pay_frame = "hourly" ) # combine unit_subset_combined <- dplyr::bind_rows(unit_subset_weekly,unit_subset_hourly) unit_subset_combined2 <- unit_subset_combined %>% pivot_wider(id_cols = c(MajorsubgroupOccupation_labelled, pay_frame), names_from = outsourcing_status, values_from = c(wtd_avg_income, n)) %>% janitor::clean_names() %>% mutate( pay_diff = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced ) unit_subset_combined3 <- unit_subset_combined2 %>% pivot_wider(id_cols = majorsubgroup_occupation_labelled, names_from = pay_frame, values_from = pay_diff, names_glue = "{pay_frame}_pay_diff") # pivot again to compare hourly and weekly pay differences unit_subset_combined4 <- unit_subset_combined3 %>% mutate( pattern_reverse = ifelse((weekly_pay_diff < 0 & hourly_pay_diff >= 0 ) | (weekly_pay_diff >= 0 & hourly_pay_diff < 0), 1, 0 ) ) ``` The table below shows the weekly and hourly pay difference between outsourced and non-outsourced workers by major group occupation. As before, negative values indicate pay penalties for outsourced workers, and the 'pattern_reverse' column indicates the occupations where the direction of the difference is different if you consider hourly versus weekly pay difference. ```{r} unit_subset_combined4 %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = "Weekly and hourly pay difference by major sub group occupation") %>% kable_styling(full_width = F) ``` ### Minor group occupations within sectors #### Weekly^[[outputs/data/minor_group_occupation_in_sector_summary_pay_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/minor_group_occupation_in_sector_summary_pay_weekly.csv)] Note I only consider unit occupations where the the minimum n is >= 10. ```{r} #| height: 10 #| width: 10 unit_occ_in_sect_summary_pay <- data %>% filter(income_drop_all == 0 & !is.na(income_weekly_all)) %>% group_by(SectorName, SectorName_labelled, UnitOccupation_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_weekly_all, na.rm=T), wtd_avg_income = weighted.mean(income_weekly_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(SectorName_labelled, UnitOccupation_labelled) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum), UnitOccupation_labelled = case_when(UnitOccupation_labelled == "NA" ~ NA, TRUE ~ UnitOccupation_labelled), UnitOccupation_labelled = stringr::str_to_title(UnitOccupation_labelled), SectorName_labelled = stringr::str_to_title(SectorName_labelled) ) %>% ungroup() summary_weekly <- unit_occ_in_sect_summary_pay %>% group_by(SectorName_labelled,UnitOccupation_labelled) %>% mutate( min_n = min(n, na.rm=TRUE) ) %>% filter(min_n >= 10) %>% # need to identify the unit occs that have an ok n ungroup() %>% mutate( pay_frame = "weekly" ) write_csv(unit_occ_in_sect_summary_pay, file="../outputs/data/minor_group_occupation_in_sector_summary_pay_weekly.csv") unit_subset <- summary_weekly # create a df with occs where outsourced paid less so we can just list it paid_less <- unit_subset %>% pivot_wider(id_cols = c(SectorName_labelled, UnitOccupation_labelled), names_from = outsourcing_status, values_from = c(wtd_avg_income, n)) %>% janitor::clean_names() %>% mutate( pay_penalty = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced ) %>% filter( pay_penalty < 0 ) write_csv(paid_less, file="../outputs/data/minor_group_occupation_in_sector_weekly_pay_penalty.csv") for(sector in sectors_of_interest){ #print(sector) # subset to this sector and drop na occupatoins plot_data <- unit_subset %>% filter(SectorName_labelled == sector) %>% filter(!is.na(UnitOccupation_labelled)) %>% droplevels() %>% ungroup() # Order occs by N # First filter for 'outsourced' level and reorder by N not_outsourced_levels <- plot_data %>% dplyr::select(UnitOccupation_labelled, outsourcing_status, N) %>% distinct(UnitOccupation_labelled, N) %>% mutate(UnitOccupation_labelled = forcats::fct_reorder(UnitOccupation_labelled, N, .desc = FALSE)) # not_outsourced_levels <- plot_data %>% # filter(outsourcing_status == 'Not outsourced') %>% # mutate(UnitOccupation_labelled = forcats::fct_reorder(UnitOccupation_labelled, N, .desc = FALSE)) # Then apply the reordered levels back to the original data plot_data <- plot_data %>% mutate( UnitOccupation_labelled = factor(UnitOccupation_labelled, levels = levels(not_outsourced_levels$UnitOccupation_labelled)), ) annotation_df <- plot_data %>% #filter(outsourcing_status == "Not outsourced") %>% dplyr::select(UnitOccupation_labelled, n) %>% group_by(UnitOccupation_labelled) %>% summarise( N = sum(n) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) p <- plot_data %>% ggplot(., aes(wtd_avg_income, UnitOccupation_labelled, size = perc, colour = outsourcing_status)) + geom_point(position = "dodge") + geom_label_repel(inherit.aes = F, aes(wtd_avg_income, UnitOccupation_labelled, colour = outsourcing_status, label=paste0("n=",n)), size=3, #force_pull = 2 ) + theme_minimal() + theme(legend.position = "bottom", legend.title = element_blank()) + #coord_flip() + scale_x_continuous(breaks=scales::breaks_pretty(n=5)) + #breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 200)) + scale_colour_manual(values=colours) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=UnitOccupation_labelled, label = paste0("N = ", N)), hjust=1) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average weekly income") + ylab("Unit occupation") + labs(caption = "Size of bubble represents the size of the respective workforce within the occupation") + ggtitle(sector) show(p) ggsave(here('outputs','figures','occupation_pay_plots',paste0('unit_occupation_pay_plot_weekly_', sector, '.png')), height = 8, width = 8, dpi=800, bg="white") } ``` Many instances where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:^[[outputs/data/minor_group_occupation_in_sector_weekly_pay_penalty.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/minor_group_occupation_in_sector_weekly_pay_penalty.csv)] ```{r} paid_less %>% arrange(pay_penalty) %>% relocate( pay_penalty, .after = unit_occupation_labelled ) %>% kable(caption = "Weekly pay penalty for unit occupations within sectors") %>% kable_styling() ``` #### Hourly^[[outputs/data/minor_group_occupation_in_sector_summary_pay_hourly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/minor_group_occupation_in_sector_summary_pay_hourly.csv)] Note I only consider unit occupations where the the minimum n is >= 10. ```{r} #| height: 10 #| width: 10 unit_occ_in_sect_summary_pay <- data %>% filter(income_drop_all == 0 & !is.na(income_hourly_all)) %>% group_by(SectorName, SectorName_labelled, UnitOccupation_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_hourly_all, na.rm=T), wtd_avg_income = weighted.mean(income_hourly_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(SectorName_labelled, UnitOccupation_labelled) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum), UnitOccupation_labelled = case_when(UnitOccupation_labelled == "NA" ~ NA, TRUE ~ UnitOccupation_labelled), UnitOccupation_labelled = stringr::str_to_title(UnitOccupation_labelled), SectorName_labelled = stringr::str_to_title(SectorName_labelled) ) %>% ungroup() summary_hourly <- unit_occ_in_sect_summary_pay %>% group_by(SectorName_labelled,UnitOccupation_labelled) %>% mutate( min_n = min(n, na.rm=TRUE) ) %>% filter(min_n >= 10) %>% ungroup() %>% mutate( pay_frame = "hourly" ) write_csv(unit_occ_in_sect_summary_pay, file="../outputs/data/minor_group_occupation_in_sector_summary_pay_hourly.csv") # need to identify the unit occs that have an ok n unit_subset <- summary_hourly # create a df with occs where outsourced paid less so we can just list it paid_less <- unit_subset %>% pivot_wider(id_cols = c(SectorName_labelled, UnitOccupation_labelled), names_from = outsourcing_status, values_from = c(wtd_avg_income, n)) %>% janitor::clean_names() %>% mutate( pay_penalty = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced ) %>% filter( pay_penalty < 0 ) write_csv(paid_less, file="../outputs/data/minor_group_occupation_in_sector_hourly_pay_penalty.csv") for(sector in sectors_of_interest){ #print(sector) # subset to this sector and drop na occupatoins plot_data <- unit_subset %>% filter(SectorName_labelled == sector) %>% filter(!is.na(UnitOccupation_labelled)) %>% droplevels() %>% ungroup() # Order occs by N # First filter for 'outsourced' level and reorder by N not_outsourced_levels <- plot_data %>% dplyr::select(UnitOccupation_labelled, outsourcing_status, N) %>% distinct(UnitOccupation_labelled, N) %>% mutate(UnitOccupation_labelled = forcats::fct_reorder(UnitOccupation_labelled, N, .desc = FALSE)) # not_outsourced_levels <- plot_data %>% # filter(outsourcing_status == 'Not outsourced') %>% # mutate(UnitOccupation_labelled = forcats::fct_reorder(UnitOccupation_labelled, N, .desc = FALSE)) # Then apply the reordered levels back to the original data plot_data <- plot_data %>% mutate( UnitOccupation_labelled = factor(UnitOccupation_labelled, levels = levels(not_outsourced_levels$UnitOccupation_labelled)), ) annotation_df <- plot_data %>% #filter(outsourcing_status == "Not outsourced") %>% dplyr::select(UnitOccupation_labelled, n) %>% group_by(UnitOccupation_labelled) %>% summarise( N = sum(n) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) p <- plot_data %>% ggplot(., aes(wtd_avg_income, UnitOccupation_labelled, size = perc, colour = outsourcing_status)) + geom_point(position = "dodge") + geom_label_repel(inherit.aes = F, aes(wtd_avg_income, UnitOccupation_labelled, colour = outsourcing_status, label=paste0("n=",n)), size=3, #force_pull = 2 ) + theme_minimal() + theme(legend.position = "bottom", legend.title = element_blank()) + #coord_flip() + scale_x_continuous(breaks=scales::breaks_pretty(n=5)) + #breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 200)) + scale_colour_manual(values=colours) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=UnitOccupation_labelled, label = paste0("N = ", N)), hjust=1) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average hourly income") + ylab("Unit occupation") + labs(caption = "Size of bubble represents the size of the respective workforce within the occupation") + ggtitle(sector) show(p) ggsave(here('outputs','figures','occupation_pay_plots',paste0('unit_occupation_pay_plot_hourly_', sector, '.png')), height = 8, width = 8, dpi=800, bg="white") } ``` Many instances where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:^[[outputs/data/minor_group_occupation_in_sector_hourly_pay_penalty.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/minor_group_occupation_in_sector_hourly_pay_penalty.csv)] ```{r} paid_less %>% arrange(pay_penalty) %>% relocate( pay_penalty, .after = unit_occupation_labelled ) %>% kable(caption = "Hourly pay penalty for unit occupations within sectors") %>% kable_styling() ``` #### Comparing pay penalty between weekly and hourly Note only consider n >= 10 ```{r} # combine summary_combined <- dplyr::bind_rows(summary_weekly,summary_hourly) # Function for processing minor group occupations within sectors. Makes a kable table for occupations in each sector (as long as there's both outsourced and non-outsourced entries for the occupation) comparison_table2 <- function(df, within_sectors = TRUE){ if(within_sectors){ caption_text = "within" } else{ caption_text = "across" } sectors <- unique(df[["SectorName_labelled"]]) sectors <- sectors[!is.na(sectors)] output_list <- vector('list', length(sectors)) for(i in 1:length(sectors)){ sector <- sectors[i] this_data <- df %>% filter(SectorName_labelled == sector) %>% filter(!is.na(UnitOccupation_labelled)) %>% droplevels() %>% ungroup() if(length(unique(this_data$outsourcing_status)) == 2){ this_data <- this_data %>% pivot_wider(id_cols = c(UnitOccupation_labelled, pay_frame), names_from = outsourcing_status, values_from = c(wtd_avg_income, n)) %>% janitor::clean_names() %>% mutate( pay_diff = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced ) %>% pivot_wider(id_cols = unit_occupation_labelled, names_from = pay_frame, values_from = pay_diff, names_glue = "{pay_frame}_pay_diff") %>% mutate( pattern_reverse = ifelse((weekly_pay_diff < 0 & hourly_pay_diff >= 0 ) | (weekly_pay_diff >= 0 & hourly_pay_diff < 0), 1, 0 ) ) # output_list[[i]] <- this_data } else{ output_list[[i]] <- NA next } # k <- this_data %>% # filter(!is.na(weekly_pay_diff)) %>% # arrange(weekly_pay_diff) %>% # kable(caption = paste("Weekly and hourly pay difference by minor group occupations", caption_text, sector, sep = " ")) %>% # kable_styling(full_width = F) # # output_list[[i]] <- k output_list[[i]] <- this_data names(output_list)[i] <- sector } return(output_list) } tables <- comparison_table2(summary_combined) # Print the kable tables # for(i in 1:length(tables)){ # if(!is.na(tables[[i]])){ # print(tables[[i]]) # } # } ``` ```{r} #| warning: false #| error: false # there's got to be better way but can't find it # this prints the kable tables for the sectors i <- 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i +1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) i <- i + 1 try( tables[[i]] %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = paste("Weekly and hourly pay difference by major group occupations within", names(tables)[i], sep = " ")) %>% kable_styling(full_width = F) ) ``` ### Minor group occupations across all sectors Note I only consider unit occupations where the the minimum n is >= 10. #### Weekly^[[outputs/data/minor_group_occupation_summary_pay_weekly_across_sectors.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/minor_group_occupation_summary_pay_weekly_across_sectors.csv)] ```{r} #| height: 20 #| width: 10 unit_occ_summary_pay <- data %>% filter(income_drop_all == 0 & !is.na(income_weekly_all)) %>% group_by(UnitOccupation_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_weekly_all, na.rm=T), wtd_avg_income = weighted.mean(income_weekly_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(UnitOccupation_labelled) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum), UnitOccupation_labelled = case_when(UnitOccupation_labelled == "NA" ~ NA, TRUE ~ UnitOccupation_labelled), UnitOccupation_labelled = stringr::str_to_title(UnitOccupation_labelled) ) %>% ungroup() write_csv(unit_occ_summary_pay, file="../outputs/data/minor_group_occupation_summary_pay_weekly_across_sectors.csv") # need to identify the unit occs that have an ok n # subste to occs with n>=10 unit_subset_weekly <- unit_occ_summary_pay %>% group_by(UnitOccupation_labelled) %>% mutate( min_n = min(n, na.rm=TRUE) ) %>% filter(min_n >= 10) unit_subset <- unit_subset_weekly # create a df with occs where outsourced paid less so we can just list it paid_less <- unit_subset %>% pivot_wider(id_cols = c(UnitOccupation_labelled), names_from = outsourcing_status, values_from = c(wtd_avg_income, n)) %>% janitor::clean_names() %>% mutate( pay_penalty = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced ) %>% filter( pay_penalty < 0 ) write_csv(paid_less, file="../outputs/data/minor_group_occupation_weekly_pay_penalty_across_sectors.csv") #print(sector) # subset to this sector and drop na occupatoins plot_data <- unit_subset %>% filter(!is.na(UnitOccupation_labelled)) %>% droplevels() %>% ungroup() # Order occs by N # First filter for 'outsourced' level and reorder by N not_outsourced_levels <- plot_data %>% dplyr::select(UnitOccupation_labelled, outsourcing_status, N) %>% distinct(UnitOccupation_labelled, N) %>% mutate(UnitOccupation_labelled = forcats::fct_reorder(UnitOccupation_labelled, N, .desc = FALSE)) # not_outsourced_levels <- plot_data %>% # filter(outsourcing_status == 'Not outsourced') %>% # mutate(UnitOccupation_labelled = forcats::fct_reorder(UnitOccupation_labelled, N, .desc = FALSE)) # Then apply the reordered levels back to the original data plot_data <- plot_data %>% mutate( UnitOccupation_labelled = factor(UnitOccupation_labelled, levels = levels(not_outsourced_levels$UnitOccupation_labelled)), ) annotation_df <- plot_data %>% #filter(outsourcing_status == "Not outsourced") %>% dplyr::select(UnitOccupation_labelled, n) %>% group_by(UnitOccupation_labelled) %>% summarise( N = sum(n) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) p <- plot_data %>% ggplot(., aes(wtd_avg_income, UnitOccupation_labelled, size = perc, colour = outsourcing_status)) + geom_point(position = "dodge") + geom_label_repel(inherit.aes = F, aes(wtd_avg_income, UnitOccupation_labelled, colour = outsourcing_status, label=paste0("n=",n)), size=3, #force_pull = 2 ) + theme_minimal() + theme(legend.position = "bottom", legend.title = element_blank()) + #coord_flip() + scale_x_continuous(breaks=scales::breaks_pretty(n=5)) + #breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 200)) + scale_colour_manual(values=colours) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=UnitOccupation_labelled, label = paste0("N = ", N)), hjust=1) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average weekly income") + ylab("Unit occupation") + labs(caption = "Size of bubble represents the size of the respective workforce within the occupation") + ggtitle("All sectors") show(p) ggsave(here('outputs','figures','occupation_pay_plots','unit_occupation_pay_plot_all_sectors.png'), height = 8, width = 8, dpi=800, bg="white") ``` Looking at occupations across all sectors, there are many occupations where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:^[[outputs/data/minor_group_occupation_weekly_pay_penalty_across_sectors.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/minor_group_occupation_weekly_pay_penalty_across_sectors.csv)] ```{r} paid_less %>% arrange(pay_penalty) %>% relocate( pay_penalty, .after = unit_occupation_labelled ) %>% kable(caption = "Weekly pay penalty for unit occupations across all sectors") %>% kable_styling() ``` #### Hourly^[[outputs/data/minor_group_occupation_summary_pay_hourly_across_sectors.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/minor_group_occupation_summary_pay_hourly_across_sectors.csv)] ```{r} #| height: 10 #| width: 10 unit_occ_summary_pay <- data %>% filter(income_drop_all == 0 & !is.na(income_hourly_all)) %>% group_by(UnitOccupation_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_hourly_all, na.rm=T), wtd_avg_income = weighted.mean(income_hourly_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(UnitOccupation_labelled) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum), UnitOccupation_labelled = case_when(UnitOccupation_labelled == "NA" ~ NA, TRUE ~ UnitOccupation_labelled), UnitOccupation_labelled = stringr::str_to_title(UnitOccupation_labelled) ) %>% ungroup() write_csv(unit_occ_summary_pay, file="../outputs/data/minor_group_occupation_summary_pay_hourly_across_sectors.csv") # need to identify the unit occs that have an ok n # subste to occs with n>=10 unit_subset_hourly <- unit_occ_summary_pay %>% group_by(UnitOccupation_labelled) %>% mutate( min_n = min(n, na.rm=TRUE) ) %>% filter(min_n >= 10) unit_subset <- unit_subset_hourly # create a df with occs where outsourced paid less so we can just list it paid_less <- unit_subset %>% pivot_wider(id_cols = c(UnitOccupation_labelled), names_from = outsourcing_status, values_from = c(wtd_avg_income, n)) %>% janitor::clean_names() %>% mutate( pay_penalty = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced ) %>% filter( pay_penalty < 0 ) write_csv(paid_less, file="../outputs/data/minor_group_occupation_hourly_pay_penalty_across_sectors.csv") #print(sector) # subset to this sector and drop na occupatoins plot_data <- unit_subset %>% filter(!is.na(UnitOccupation_labelled)) %>% droplevels() %>% ungroup() # Order occs by N # First filter for 'outsourced' level and reorder by N not_outsourced_levels <- plot_data %>% dplyr::select(UnitOccupation_labelled, outsourcing_status, N) %>% distinct(UnitOccupation_labelled, N) %>% mutate(UnitOccupation_labelled = forcats::fct_reorder(UnitOccupation_labelled, N, .desc = FALSE)) # not_outsourced_levels <- plot_data %>% # filter(outsourcing_status == 'Not outsourced') %>% # mutate(UnitOccupation_labelled = forcats::fct_reorder(UnitOccupation_labelled, N, .desc = FALSE)) # Then apply the reordered levels back to the original data plot_data <- plot_data %>% mutate( UnitOccupation_labelled = factor(UnitOccupation_labelled, levels = levels(not_outsourced_levels$UnitOccupation_labelled)), ) annotation_df <- plot_data %>% #filter(outsourcing_status == "Not outsourced") %>% dplyr::select(UnitOccupation_labelled, n) %>% group_by(UnitOccupation_labelled) %>% summarise( N = sum(n) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) p <- plot_data %>% ggplot(., aes(wtd_avg_income, UnitOccupation_labelled, size = perc, colour = outsourcing_status)) + geom_point(position = "dodge") + geom_label_repel(inherit.aes = F, aes(wtd_avg_income, UnitOccupation_labelled, colour = outsourcing_status, label=paste0("n=",n)), size=3, #force_pull = 2 ) + theme_minimal() + theme(legend.position = "bottom", legend.title = element_blank()) + #coord_flip() + scale_x_continuous(breaks=scales::breaks_pretty(n=5)) + #breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 200)) + scale_colour_manual(values=colours) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=UnitOccupation_labelled, label = paste0("N = ", N)), hjust=1) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average hourly income") + ylab("Unit occupation") + labs(caption = "Size of bubble represents the size of the respective workforce within the occupation") + ggtitle("All sectors") show(p) ggsave(here('outputs','figures','occupation_pay_plots','unit_occupation_pay_plot_hourly_all_sectors.png'), height = 8, width = 8, dpi=800, bg="white") ``` Looking at occupations across all sectors, there are many occupations where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:^[[outputs/data/minor_group_occupation_hourly_pay_penalty_across_sectors.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/minor_group_occupation_hourly_pay_penalty_across_sectors.csv)] ```{r} paid_less %>% arrange(pay_penalty) %>% relocate( pay_penalty, .after = unit_occupation_labelled ) %>% kable(caption = "Hourly pay penalty for unit occupations across all sectors") %>% kable_styling() ``` #### Comparing pay penalty between weekly and hourly Note only consider n >= 10 ```{r} # add pay frame flags unit_subset_weekly <- unit_subset_weekly %>% mutate( pay_frame = "weekly" ) unit_subset_hourly <- unit_subset_hourly %>% mutate( pay_frame = "hourly" ) # combine unit_subset_combined <- dplyr::bind_rows(unit_subset_weekly,unit_subset_hourly) unit_subset_combined2 <- unit_subset_combined %>% pivot_wider(id_cols = c(UnitOccupation_labelled, pay_frame), names_from = outsourcing_status, values_from = c(wtd_avg_income, n)) %>% janitor::clean_names() %>% mutate( pay_diff = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced ) unit_subset_combined3 <- unit_subset_combined2 %>% pivot_wider(id_cols = unit_occupation_labelled, names_from = pay_frame, values_from = pay_diff, names_glue = "{pay_frame}_pay_diff") # pivot again to compare hourly and weekly pay differences unit_subset_combined4 <- unit_subset_combined3 %>% mutate( pattern_reverse = ifelse((weekly_pay_diff < 0 & hourly_pay_diff >= 0 ) | (weekly_pay_diff >= 0 & hourly_pay_diff < 0), 1, 0 ) ) ``` The table below shows the pay difference between outsourced and non-outsourced workers by minor sub group occupation. Negative values indicate pay penalties for outsourced workers. The 'pattern_reverse' column indicates the four occupations where the direction of the difference is different if you consider hourly versus weekly pay difference. For example, per week, teaching professionals who are outsourced earn £82 less than non-outsourced counterparts, but per hour they are paid on average 16p more than non-outsourced workers. This suggests that outsrouced rates are higher in this occupation, but the amount of work available is not enough for outsrouced people to earn more than non-outsoruced people on a weekly basis. The reverse pattern is evident for the other three. For example, outsourced workers in food preparation and hospitality earn on average 40p less an hour than non-outsourced workers, but earn on average £17 more per week than non-outsourced workers. This suggests that outsourced workers in this occupation are paid less but work more hours than their non-outsourced counterparts. ```{r} unit_subset_combined4 %>% filter(!is.na(weekly_pay_diff)) %>% arrange(weekly_pay_diff) %>% kable(caption = "Weekly and hourly pay difference by minor sub group occupation") %>% kable_styling(full_width = F) ``` ## London has a disproportionate share of the UK’s outsourced workers, followed by the East and West Midlands ::: {.callout-tip title="#regions"} - In London, around 25% of workers are outsourced – the highest proportion of any region in the UK. London is followed by the East Midlands (19%) and West Midlands (18%) in the share of workers in the region who are outsourced, with the East of England being the region with the lowest share of outsourced workers as part of the total employed workforce, at 13%. - Possible addition: Should this include some comment on WHY we think this might be the case? Should we look at sectoral splits in London, compared to everywhere else, to see whether there are significant sector differences that might explain this trend? ::: The plot below shows the proportion of workers within each region who are outsourced.[^22] [^22]: [outputs/data/region_stats_2.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/region_stats_2.csv) ```{r} region_statistics_2 <- data %>% # get values of labels # mutate_all(haven::as_factor) %>% group_by(Region, outsourcing_status) %>% summarise( Frequency = sum(NatRepemployees), n = n(), ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum) ) %>% rename( `Outsourcing status` = outsourcing_status ) %>% ungroup() reg_levels <- region_statistics_2 %>% filter(`Outsourcing status` == "Outsourced") %>% mutate( Region = forcats::fct_reorder(Region, Percentage, .desc=FALSE) ) annotation_df <- region_statistics_2 %>% filter(`Outsourcing status` == "Not outsourced") %>% dplyr::select(Region, N) %>% mutate( ypos = 100 ) region_statistics_2 %>% mutate( Region = factor(Region, levels = levels(reg_levels$Region)) ) %>% ggplot(., aes(Region, Percentage, fill = `Outsourcing status`)) + geom_col(colour="black") + geom_text(inherit.aes=F, data = annotation_df, aes(Region, ypos, label = paste0("N=",N)), hjust=1, nudge_y = -2) + coord_flip() + scale_fill_manual(values=many_colours) + theme_minimal() readr::write_csv(region_statistics_2, file = "../outputs/data/region_stats_2.csv") region_statistics_2_1 <- region_statistics_2 %>% filter(`Outsourcing status` == "Outsourced" & Region != "London") london_perc <- region_statistics_2[which(region_statistics_2$Region == "London" & region_statistics_2["Outsourcing status"] == "Outsourced"), "Percentage"] ``` Below we map the workforce composition in each region. The first map emphasises that London has the highest concentration of outsourced workers (`r round(region_statistics_2[which(region_statistics_2$Region == "London" & region_statistics_2["Outsourcing status"] == "Outsourced"), "Percentage"],0)`%). ```{r} knitr::include_graphics('../outputs/figures/outsourcing_by_region.svg') ``` The second map excludes London so that is easier to see how the remaining regions compare. After London, the regions with the highest proportion of outsourced workers are: 1. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 1), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 1), "Percentage"],0)`%) 2. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 2), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 2), "Percentage"],0)`%) 3. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 3), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 3), "Percentage"],0)`%) 4. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 4), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 4), "Percentage"],0)`%) 5. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 5), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 5), "Percentage"],0)`%) ```{r} knitr::include_graphics('../outputs/figures/outsourcing_by_region_excl_london.svg') ``` ```{r} region_statistics_3 <- data %>% filter(outsourcing_status == "Outsourced") %>% # get values of labels # mutate_all(haven::as_factor) %>% group_by(Region) %>% summarise( Frequency = sum(NatRepemployees) ) %>% mutate( Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum) ) readr::write_csv(region_statistics_3, file = "../outputs/data/region_stats_3.csv") ``` We can also explore how the the entire UK workforce is distributed across the country.[^23] The table and map below show the percentage of outsourced workers in each region as a proportion of the total UK workforce. They show where the UK's outsourced workforce is concentrated. The regions with the highest share of the UK's outsourced workforce are: [^23]: [outputs/data/region_stats_3.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/region_stats_3.csv) 1. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 1), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 1), "Percentage"],0)`%) 2. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 2), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 2), "Percentage"],0)`%) 3. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 3), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 3), "Percentage"],0)`%) 4. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 4), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 4), "Percentage"],0)`%) 5. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 5), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 5), "Percentage"],0)`%) ```{r} region_statistics_3 %>% mutate( Region = haven::as_factor(Region) ) %>% arrange(desc(Percentage)) %>% knitr::kable(.,digits = 2) %>% kable_styling(full_width = F) ``` ```{r} knitr::include_graphics('../outputs/figures/outsourcing_distribution_across_regions.svg') ```

1 Ethnicity categorisations

2 Chapter 2: How many outsourced workers are there in the UK?

2.1 How many UK workers are outsourced?

2.2 Evaluating our total estimate

3 Chapter 3: Who are the UK’s outsourced workers?

3.1 Demographic breakdown

3.1.1 Ethnicity by outsourcing status

3.1.1.1 Collapsed ethnicity4

3.1.1.2 Full ethnicity5

3.1.1.3 By high/low pay

3.1.1.3.1 Collapsed ethnicity6

3.1.1.3.2 Full ethnicity7

3.1.2 Ethnicity by oustourcing group

3.1.2.1 Collapsed ethnicity8

3.1.2.2 Full ethnicity9

3.1.2.3 By high/low pay

3.1.2.3.1 Collapsed ethnicity10

3.1.2.3.2 Full ethnicity11

3.1.3 Gender by outsourcing status12

3.1.3.1 By high/low pay13

3.1.4 Gender by outsourcing group14

3.1.4.1 By high/low pay15

3.2 Evidence paints a racialised picture of outsourcing in the UK, with links to both ethnicity and migration

3.3 Outsourced workers are on average younger than non-outsourced workers

3.4 Outsourced workers are more likely to work in some sectors than others; but seem to be spread across the labour market

4 Pay

4.1 Gender pay gap

4.1.1 Outsourcing status

4.1.2 Outsourcing group

4.2 Sectors/occupations

4.2.1 Sector and occupation hierarchy

4.2.1.1 SectorName

4.2.1.2 Majorgroupcode

4.2.1.3 MajorsubgroupOccupation

4.2.1.4 UnitOccupation

4.2.2 Sectoral pay differences

4.2.2.1 Weekly45

4.2.2.2 Hourly46

4.2.2.3 Comparing pay penalty between weekly and hourly

4.2.3 Major group occupations

4.2.3.1 Weekly47

4.2.3.2 Hourly48

4.2.3.3 Comparing pay penalty between weekly and hourly

4.2.4 Major group occupations across all sectors

4.2.4.1 Weekly49

4.2.4.2 Hourly51

4.2.4.3 Comparing pay penalty between weekly and hourly

4.2.5 Minor group occupations within sectors

4.2.5.1 Weekly53

4.2.5.2 Hourly55

4.2.5.3 Comparing pay penalty between weekly and hourly

4.2.6 Minor group occupations across all sectors

4.2.6.1 Weekly57

4.2.6.2 Hourly59

4.2.6.3 Comparing pay penalty between weekly and hourly

4.3 London has a disproportionate share of the UK’s outsourced workers, followed by the East and West Midlands

Footnotes

3.1.1.1 Collapsed ethnicity⁴

3.1.1.2 Full ethnicity⁵

3.1.1.3.1 Collapsed ethnicity⁶

3.1.1.3.2 Full ethnicity⁷

3.1.2.1 Collapsed ethnicity⁸

3.1.2.2 Full ethnicity⁹

3.1.2.3.1 Collapsed ethnicity¹⁰

3.1.2.3.2 Full ethnicity¹¹

3.1.3 Gender by outsourcing status¹²

3.1.3.1 By high/low pay¹³

3.1.4 Gender by outsourcing group¹⁴

3.1.4.1 By high/low pay¹⁵

4.2.2.1 Weekly⁴⁵

4.2.2.2 Hourly⁴⁶

4.2.3.1 Weekly⁴⁷

4.2.3.2 Hourly⁴⁸

4.2.4.1 Weekly⁴⁹

4.2.4.2 Hourly⁵¹

4.2.5.1 Weekly⁵³

4.2.5.2 Hourly⁵⁵

4.2.6.1 Weekly⁵⁷

4.2.6.2 Hourly⁵⁹