[编者按]:
本文主要内容摘译自以下文章。同时,结合网上资料和个人经验整理了
Stata
相关代码,以供大家参考。主要包括:
PSM 的理解误区;
使用
regress + [fweight]
手动完成 PSM 估计,即 OLS 估计;
psmatch2
的使用范例
[Source]:
Jonathan E. Shipman, Quinn T. Swanquist, and Robert L. Whited (2017) Propensity Score Matching in Accounting Research. The Accounting Review: January 2017, Vol. 92, No. 1, pp. 213-244. [Link1], [Link2]
[Source]:
Jonathan E. Shipman, Quinn T. Swanquist, and Robert L. Whited (2017) Propensity Score Matching in Accounting Research. The Accounting Review: January 2017, Vol. 92, No. 1, pp. 213-244. [Link1], [Link2]
*, **, *** Indicate significance at the 0.10, 0.05, and 0.01 levels, respectively (based on two-tailed tests).Table 4 presents the first-stage model used for estimating propensity scores in each setting. Models are estimated using logistic regression with standarderrors that are robust to heteroscedasticity and clustered by firm ( Petersen 2009 ). t-statistics are presented in parentheses below the coefficients.
[Source]:
Jonathan E. Shipman, Quinn T. Swanquist, and Robert L. Whited (2017) Propensity Score Matching in Accounting Research. The Accounting Review: January 2017, Vol. 92, No. 1, pp. 213-244. [Link1], [Link2]
一旦获得了
_weight
变量,就相当于对样本的匹配情况进行了标记,我们可以直接在
regress
命令后附加加
fweight = _weight
进行样本匹配后的回归。其中,
fweight
为「frequency weights」的简写,是指观测值重复次数的权重。若是 1:2 重复匹配,成功匹配的处理组
_weight
= 2 / 2,成功匹配的控制组
_weight
= 参与匹配次数 / 2,即都要除以 2 进行标准化。因此,若想继续使用
fweight
选项,需要
_weight
* 2 转化为频数。详细请参考 Propensity Score Matching in Stata using teffects、[psmatch2 and fweight option of regress]。上述代码的具体输出结果如下:
以下部分为
psmatch2
命令的结果。第一个表列示了匹配前和匹配后处理组和控制组差异及其显著性,以
ABSACC
为例,匹配前处理组和控制组差异为「-.011637968」,并且 t 值为「-6.02」,匹配后处理组和控制组差异「ATT」为「-.006573884」,并且 t 值为「-2.47」。第二个表列示了处理组合控制组在共同取值范围的情况,其中控制组「17,726」个样本都在共同取值范围内,而处理组有「184」个样本不在共同取值范围内,有「1,163」在共同取值范围内。
在第一个表中,
Note
显示所汇报的标准误未考虑倾向得分估计的事实 (即假设倾向得分为真实值,然后推导标准误),详情参见:Propensity Score Matching in Stata using teffects。实际上,这里仅对系数的标准误和显著性有影响,而对系数值并不产生影响,也不会对匹配结果产生影响。
. psmatch2 BIG4, pscore(pscore) outcome(ABSACC RESTATE) ///
common n(1) norepl cal(0.03) //详见help文件
-----------------------------------------------------------------------------
Variable Sample | Treated Controls Difference S.E. T-stat
--------------------+--------------------------------------------------------
ABSACC Unmatched | .054644862 .06628283 -.011637968 .00193 -6.02
ATT | .056073738 .062647622 -.006573884 .00266 -2.47
--------------------+--------------------------------------------------------
RESTATE Unmatched | .067557535 .113110685 -.04555315 .008833 -5.16
ATT | .074806535 .09544282 -.020636285 .011569 -1.78
--------------------+--------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.
psmatch2: | psmatch2: Common
Treatment | support
assignment | Off suppo On suppor | Total
-----------+----------------------+----------
Untreated | 0 17,726 | 17,726
Treated | 184 1,163 | 1,347
-----------+----------------------+----------
Total | 184 18,889 | 19,073
pstest
命令主要考察匹配质量,以检验是否满足「平衡性假设 (balancing assumption)」。从下表可以看出,匹配后大多数变量标准化偏差 (%bias) 都比较小,而且 t 值都不拒绝处理组和控制组无系统性偏差的原假设。从下图也可以看出,所有变量的标准差在匹配后都缩小了。
. pstest $indepvar, both graph -------------------------------------------------------------------------------- Unmatched | Mean %reduct | t-test | V(T)/ Variable Matched | Treated Control %bias |bias| | t p>|t| | V(C) --------------------+----------------------------------+---------------+-------- LNASSET U | 24.021 22.124 146.2 | 56.28 0.000 | 1.44* M | 23.743 23.752 -0.7 99.5 | -0.15 0.878 | 0.89* | | | LEV U | .52906 .43117 51.1 | 17.16 0.000 | 0.78* M | .52314 .52429 -0.6 98.8 | -0.15 0.884 | 0.77* | | | ROA U | .0505 .04661 8.1 | 2.81 0.005 | 0.91 M | .05049 .05186 -2.9 64.6 | -0.67 0.500 | 0.92 | | | GROWTH U | .18181 .2246 -10.6 | -3.53 0.000 | 0.72* M | .18313 .17163 2.9 73.1 | 0.77 0.441 | 1.30* | | | BM U | .76837 .60176 71.8 | 25.11 0.000 | 0.95 M | .74742 .74201 2.3 96.8 | 0.54 0.590 | 0.85* | | | AGE U | 16.355 15.891 8.3 | 2.96 0.003 | 1.05 M | 16.771 16.681 1.6 80.6 | 0.39 0.698 | 0.90 | | | -------------------------------------------------------------------------------- * if variance ratio outside [0.90; 1.11] for U and [0.89; 1.12] for M
------------------------------------------------------------------------------- Sample | Ps R2 LR chi2 p>chi2 MeanBias MedBias B R %Var -----------+------------------------------------------------------------------- Unmatched | 0.265 2578.34 0.000 49.3 30.8 150.6* 1.58 50 Matched | 0.001 2.17 0.904 1.8 2.0 6.1 1.11 67 ------------------------------------------------------------------------------- * if B>25%, R outside [0.5; 2]