An Analysis of the Impact of Union Membership on Wages Using Propensity Score
Matching
Brody Tyler Albregts
Department of Economics
Indiana University-Purdue University at Indianapolis
4/25/12
1. Introduction
Unions have been a powerful force in the labor market in the United States for well over a
century, beginning with the Knights of Labor in the late 1800s and reaching a peak in the
mid-twentieth century. Since then, economists have analyzed how unions affect workers’
earnings, and continually used new statistical techniques to uncover the benefits, or costs,
of labor unions. While it is widely accepted that union members have increased earnings,
the increased amount has been called into question. The goal of this paper is to use
propensity score matching to determine what hourly wage gains, as a percentage, members
receive from joining union organizations. Data were downloaded from the Panel Study on
Income Dynamics (PSID), from year 2005. The findings indicate that after matching the
wage rate decreased among the entire labor force from a 23.8 percent gain for union
membership to only 10.3 percent and among the private sector from 22.4 percent to 16.4
percent, using kernel matching methodology. Findings diminished greatly for workers in
the public sector, going as low as 2 percent after matching, however post-matching results
were not significant at any meaningful level.
2. Literature Review
One of the more prominent papers on union wages using propensity score matching was
published by Alex Bryson(2002). Bryson used data in Britain from the Workplace
Employee Relations Survey (WERS) and segregated his data among occupational coverage
and workplace coverage in the private sector. Unfortunately, there is no data from the
PSID to determine whether a person has occupational or workplace coverage, all the PSID
determines is whether the person is in a union or not. Bryson found that before matching
wage rates were, on average, 17-25 percent higher among union members, but after
matching wage rates fell to between 3-6 percent. Bryson used nearest neighbor matching
while this paper relies on kernel matching.
3. The Economic Model and Hypothesis
There may be several factors affecting pay between members and non-members.
Employers may collude with labor unions to pay lower wage rates to non-members, and
then share the benefits with union members, (Blakemore et al., 1986). However, as union
membership declines, it is unlikely that employers would be able to collude with unions to
pay non-members less than the market wage for their skills and experience, since the nonmembers increasingly have employment options elsewhere.
Another factor may be union spillovers to non-members and threat among non-members to
join the workplace union if not given higher pay. Kahn and Curme(1987) argue for
spillover effects concluding they raise nonunion wages, even after accounting for
employer-union collusion, and other phenomena that lowers wages. There seems to be
some agreement among economists that there is a spillover and/or threat effect which raises
nonunion wages. How far this extends outside of union heavy industries is debatable. One
would expect very little spillover effect in an industry with almost no union presence.
However, if spillover effects exist, this would raise the level of nonunion members’ wages,
which could affect the matching. Fortunately, the problem should be ameliorated by the
high population of non-members in the sample size, especially in the private sector.
Finally, there are other benefits that union members may receive that are not wage related.
These include increased job security, pension plans, severance pay, and other social
benefits one might get from joining a union. Much of these data are hard to come by and
typically cannot be measured via econometric methods, but may carry monetary value to
the union member. These extra benefits may lower the gains to the wage rate, for example,
a union might bargain for better pensions in lieu of higher wages, or may use a combination
of higher wages and better pensions.
Given these factors, one would still expect the gains from union membership to decrease
after matching. As most data has shown, union members’ mean age and experience, is
generally higher than that of non-members, which should increase their earnings regardless
of union membership. Bryson found similar results in Britain using the WERS data.
4. Econometrics Methodology
Although many differing econometric regression techniques exist, to find the real gains
from union membership, propensity score matching is the most useful. Caliendo and
Kopeinig(2008) offer an overview of the basics of propensity score matching. The goal of
propensity score matching is to establish a treated group (D=1) and an untreated group
(D=0). In this case, those who are in a union are the treated group, and those who are nonmembers are the untreated group. The treated and untreated groups are then matched along
a variety of independent variables, age, sex, education level, etc. and one, or both,
parameters are measured. They are the average treatment effect (ATE) and the average
treatment on the treated (ATT), the latter is the focus of this paper. The ATT shows the
gains of the treated from being in the treated group and can be shown to equal:
τATT= E(τ|D=1) = E[Y(1)|D = 1] - E[Y(0)|D = 1]
where Y(1) indicates if the individual got the treatment and Y(0) indicates if the individual
did not receive treatment
The gains of the treated are subtracted from what the untreated would have gotten from
being in the treated group, since it is impossible to be in both the treated and untreated
group these gains are estimated and calculated on average.
While there are several different matching methods that can be used, this paper will focus
on kernel matching. Caliendo and Kopeinig analyzed kernel matching and they found a
decrease in the variance associated with this type of matching, meaning a lower standard
error, and a higher t-statistic. However, there is the possibility of bad matches using kernel
matching, which can influence the ATT.
A useful paper by Markus Frolich(2004) analyzed the benefits of different types of
matching techniques. Frolich argued that matching one treated to one untreated observation
was “inefficient”, which is what one to one nearest neighbor matching employs, and tested
a variety of other techniques including kernel matching, local linear matching, and knearest-neighbor. Further, Frolich found that kernel matching seemed to work best when
there were a large number of untreated compared to treated observations, which is the case
with the data being used in this paper.
After the data were downloaded for this paper, several interaction terms were generated.
Different types of matching methods were used; first nearest neighbor, however this
method proved to be very erratic and outcomes would change drastically with the addition
or subtraction of one independent variable. Kernel matching was then used, which
produced more static results. Even using slightly different variables in each regression,
results tended to be in close proximity with the outcome reported, while consistently having
a significant t-test.
5. Data
The data were downloaded from the Panel Study on Income Dynamics for year 2005. Data
are obtained from the same individuals on a bi-yearly basis by the University of Michigan.
After pertinent variables were downloaded and cleaned there were a total of 2960
observations in the population, of which 516 were union members. Dummy variables were
generated where appropriate, with race and region, among other variables. Observations
were dropped for a variety of reasons, examples of which include: being unemployed for all
of year 2004, or working outside of the United States for year 2004. As noted before, union
membership was used as the treated group (D=1) and non-members were placed into the
non-treated group (D=0).
6. Results and Discussion
Kernel matching was used to find logwage, percent change in wage, gains from the entire
labor force. Then data were segregated into private sector workers and public sector
workers. Kernel matching method was then attempted with each sector individually.
6.1 Entire Labor Force
Table 2 shows the propensity score matching for the entire labor force. Before matching
there is a 23.8 percent difference in wages between union members and non-members.
After matching, that number falls to 10.3 percent, which means union membership offers a
10.3 percent gain to wages. This result is significant at the 5 percent significance level.
Table 3 shows the pstest of several of the variables used in the matching, but leaves out
exponential and interaction terms. As one can see, any variables that were significant
before matching, become insignificant after matching, and matching reduces the bias on
almost every variable. Table 4 shows the pstest of the significance of all variables, but only
includes the p-value and the pseudo R-squared. Of interest is that the p-value is
insignificant at all meaningful levels following matching, meaning that post-match the
combination of these variables are not significant in explaining change in the wage rate.
Table 5 shows the total sample size, and how many observations were on the common
support, the ratio of non-members to union members for the entire sample is about 5-to1.Graph 1 details the distribution of the propensity score of the observations. The top half
shows union members, labeled as treated, while the bottom half shows non-members and is
labeled untreated. Union members with a propensity scores close to 1 were dropped
because they were off the support, meaning they could not be matched with any of the
nonunion members.
6.2 Private and Public Sectors
Table 6 shows the propensity score matching for the private sector. Before matching there
is a 22.4 percent gain in wage among private sector workers from joining a union.
However, after matching, that gain decreases to 16.4 percent and is significant at the 5
percent level. Table 7 shows many of the variables used in the matching process, but leaves
out exponential and interaction terms, as can be said with the entire labor force, there is a
reduction in bias with almost every variable and any variables that were significant before
matching, become insignificant after matching. Table 8 shows the pstest for all variables,
as with the entire labor force, the p-value becomes insignificant at all meaningful levels.
Table 9 shows the sample size and how many observations were on the common support.
More observations are off the common support, but the ratio of non-members to union
members is higher at more than 7-to-1.Graph 2 shows the distribution of the population
using kernel matching. Again, much of the non-members have low propensity scores,
which makes matching with union members with high scores more difficult, and in the case
of union members with scores near 1, it cannot be done.
Kernel matching for the government sector was attempted; however post-match test
statistics were not significant at any meaningful level. Post-match logwage dropped to
between 2-6 percent in most of the kernel matching models, however the standard error on
each model was too high to obtain a test statistic high enough to be significant. Given that
the drop in logwage in the private sector was so low compared to the entire labor force, it is
likely that the drop in logwage in the public sector would be even higher.
6.3 Discussion
The outcome of the kernel matching produced an ATT higher than what Bryson found
using data from the British WERS. There could be several reasons for the differences.
First, American unions might be more powerful than that found in Britain; a more powerful
union would be able to command higher wages than a weaker one, which would push up
the logwage post-match. Unfortunately, data on union size or what union each member
belongs to is not available in the PSID data. Second, the matching method might play a
part in the discrepancy. As noted in the literature review, Bryson used nearest neighbor
matching, while this paper uses kernel matching, because of the high number of non-
members in relation to union members, kernel matching, Frolich argued, should be
employed. Lastly, there may be other non-quantifiable differences between Britain and the
United States that affect wages and unions. Types of laws, or even geographic makeup
could account for some of these differences.
7. Self-critique
The results put forward are the culmination of much work and effort in the attempt to
develop a model that was statistically significant and provided insight into the labor market.
While more work could have went into finding a significant test statistic for the public
sector, there seems to be little work done on the topic, perhaps because gains are so small
from union membership that any matching would produce insignificant results. Given the
number of matching attempts put forward for this paper, it seems very unlikely that any
type of kernel matching would show significant results.
While nearest neighbor was attempted, as noted before the results proved to be very erratic.
The makeup of the distribution of matching would also make nearest neighbor less efficient,
since several of the union members had propensity scores close to 1, while much of the
non-members had scores skewed closer to zero.
8. Conclusion
The goal of this paper was to determine the gains to wages from unionization by looking at
the percent change in wage rates using propensity score matching. The findings indicate
that gains in wage rate decrease from 23.8 to 10.3 percent, post-match, in the entire labor
force and 22.4 to 16.4 percent, post-match, in the private sector, while public sector wage
rates likely go even lower. Although the percentage change was higher, post-match than
obtained by Alex Bryson, the model is significant at the 5 percent level. It appears there
are differences among matching methods which produce differing results. There may also
be differences in the concentration of union members between the United States and Britain
which would allow for higher logwage in the United States.
While the amount gained from unionization decreases post-match, there are still gains in
wage rates among union members. There are likely other factors drawing individuals into
joining a labor union, of which some have been previously discussed. It would be
interesting to see a study achieve a significant test statistic with public sector employees.
9. References
Alex Bryson, 2002. "The Union Membership Wage Premium: An Analysis Using
Propensity Score Matching," CEP Discussion Papers dp0530, Centre for Economic
Performance, LSE.
Blakemore, A. E., Hunt, J. C. and Kiker, B. F. (1986), “Collective Bargaining and Union
Membership Effects on the Wages of Male Youths,” Journal of Labor Economics, 4,
April, pp. 193-211.
Caliendo, M. and Kopeinig, S. (2008), Some Practical Guidance for the Implementation of
Propensity Score Matching. Journal of Economic Surveys, 22: 31–72.
Frolich, Markus. (2004). “Finite-Sample Properties of
Propensity-Score Matching and Weighting Estimators,” The Review of
Economics and Statistics, 86(1): 77-90.
Kahn, Lawrence M. and Curme, Michael, (1987), “Unions and Nonunion Wage
Dispersion,” The Review of Economics and Statistics, Vol. 69, No. 4, Nov., pp. 600607.
10. Tables and Graphs
Table 1. Variable Table
Variable Name
age
exper
educ
poor
average
rich (not listed)
pension
northeast
midwest
south
west (not listed)
white
black
nativeamer
asian
rother (not listed)
govtworker
married
urban
whiecollar
Description
Age of the individual surveyed
Number of years of experience at main job
Number of years of education
Dummy variable for if the individual thought they grew up
in a poor household (monetarily)
Dummy variable for if the individual thought they grew up
in an average household (monetarily)
Dummy variable for if the individual thought they grew up
in a rich household (monetarily)
Binary variable indicating if the individual has a pension at
their main job (yes=1, no=0)
Dummy variable for if individual lives in: Connecticut,
Maine, Massachusetts, New Hampshire, New Jersey, New
York, Pennsylvania, Rhode Island, or Vermont
Dummy variable for if individual lives in: Illinois, Indiana,
Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska,
North Dakota, Ohio, South Dakota, or Wisconsin
Dummy variable for if individual lives in: Alabama,
Arkansas, Delaware, Florida, Georgia, Kentucky, Louisiana,
Maryland, Mississippi, North Carolina, Oklahoma, South
Carolina, Tennessee, Texas, Virginia, Washington D.C., or
West Virginia
Dummy variable for if individual livesin: Alaska, Arizona,
California, Colorado, Hawaii, Idaho, Montana, Nevada,
New Mexico, Oregon, Utah, Washington, or Wyoming
Dummy variable for if individual is white
Dummy variable for if individual is black
Dummy variable for if individual is Native American
Dummy variable for if individual is Asian
Dummy variable for if individual is race other
Binary variable indicating whether the individual workers
for the government (yes=1, no=0)
Binary variable indicating whether the individual is married
(yes=1, no=0)
Binary variable indicating whether the individual lives in an
urban setting (yes=1, no=0)
Binary variable indicating whether the individual works at a
white collar job
Table 2. Kernel Matching for the Entire Labor Force
Variable
Sample
Treated
Controls
Difference S.E.
Unmatched 3.06677
2.82794
.238829
.031576
Logwage
Matched
3.06345
2.95975
.103694
.050470
T-stat
7.56
2.05
Table 3. Reduction in Bias among selected non-exponential and non-interaction variables
for Entire Labor Force
Mean
Sample
Treated
Control
Significance level
Variable
Unmatched
45.062
42.556
***
age
Matched
44.9
45.591
exper
Unmatched
Matched
14.186
13.984
9.0078
14.088
***
educ
Unmatched
Matched
13.38
13.357
13.567
13.342
poor
Unmatched
Matched
.27519
.2749
.23118
.27964
average
Unmatched
Matched
.46318
.46016
.491
.48699
pension
Unmatched
Matched
.86047
.85857
.53396
.86383
***
northeast
Unmatched
Matched
.24225
.24104
.13625
.21564
***
midwest
Unmatched
Matched
.30426
.30677
.25941
.29448
**
south
Unmatched
Matched
.22674
.23108
.42185
.25075
***
white
Unmatched
Matched
.64729
.65139
.71522
.62859
***
black
Unmatched
Matched
.31008
.31474
.23445
.35107
***
**
nativeamer
Unmatched
Matched
.00581
0
.00327
.00019
asian
Unmatched
Matched
.01163
.01195
.01596
.0061
govtworker
Unmatched
Matched
.45736
.45418
.17471
.4811
***
married
Unmatched
Matched
.61822
.62351
.56997
.58503
**
urban
Unmatched
Matched
.92636
.9243
.88216
.92807
***
Unmatched
.42636
.51432
***
Matched
.42231
.45533
Where * indicates significance at the 10% level, ** indicates significance at the 5% level,
and *** indicates significance at the 1% level
whitecollar
Table 4. Test of the Significance of All Variables for the Entire Sector
Sample
Pseudo R2
LR chi2
p>chi2
0.358
980.46
0.000
Unmatched
0.057
79.33
1.000
Matched
Table 5. Breakdown of Variables on the Support in the Entire Labor Force
Psmatch2: Treatment
Psmatch2: Common support
assignment
Off Support
On Support
Total
0
2,444
2,444
Untreated
14
502
516
Treated
14
2,946
2,960
Total
Table 6. Kernel Matching for the Private Sector
Variable
Sample
Treated
Controls
Unmatched 3.04204
2.81759
Logwage
Matched
3.03661
2.87256
Difference S.E.
.224442
.043372
.164048
.060330
T-stat
5.17
2.72
Table 7.Reduction in Bias among selected non-exponential and non-interaction variables
for the Private Sector
Mean
Sample
Treated
Control
Significance level
Variable
Unmatched
43.939
42.397
**
age
Matched
43.072
43.91
exper
Unmatched
Matched
14.457
13.637
8.4422
13.428
***
educ
Unmatched
Matched
12.654
12.661
13.393
12.621
***
male
Unmatched
Matched
.84643
.84064
.75508
.85536
***
poor
Unmatched
Matched
.30714
.30279
.23104
.27038
***
average
Unmatched
Matched
.43214
.43426
.48884
.4313
*
pension
Unmatched
Matched
.79286
.77291
.47
.77639
***
northeast
Unmatched
Matched
.18214
.18327
.15171
.21465
midwest
Unmatched
Matched
.375
.38645
.2588
.3648
***
south
Unmatched
Matched
.25
.251
.39564
.25549
***
white
Unmatched
Matched
.63571
.66534
.73178
.68255
***
black
Unmatched
Matched
.30357
.30677
.21071
.29429
***
nativeamer
Unmatched
Matched
.01071
0
.00397
.00031
asian
Unmatched
Matched
.01429
0
.01785
.00137
married
Unmatched
Matched
.67143
.65339
.57759
.67259
urban
Unmatched
Matched
.91071
.92032
.89241
.92173
***
Unmatched
.15714
.47199
***
Matched
.17131
.16951
Where * indicates significance at the 10% level, ** indicates significance at the 5% level,
and *** indicates significance at the 1% level
whitecollar
Table 8.Test of the Significance of All Variables for the Private Sector
Sample
Pseudo R2
LR chi2
p>chi2
0.379
645.77
0.000
Unmatched
0.051
35.47
1.000
Matched
Table 9. Breakdown of Variables on the Support in the Private Sector
Psmatch2: Treatment
Psmatch2: Common Support
assignment
Off support
On support
0
2,017
Untreated
29
251
Treated
29
2,268
Total
Total
2,017
280
2,297
Graph 1. Kernel Matching Among the Entire Labor Force
Graph 2. Kernel Matching Among the Private Sector