关于固定效应模型的四个Stata命令
对于面板数据,我们有多种估计方法,包括混合OLS、固定效应(FE)、随机效应(RE)和最小二乘虚拟变量(LSDV)等等。不过,我们最为常用的估计方法那自然还是固定效应(组内估计),
固定效应模型的Stata官方命令是
xtreg
,但它有时候其实并没有那么好用(如对数据格式有要求,运行速度慢等),我们经常使用的固定效应估计命令还有
reg
、
areg
和
reghdfe
。
xtreg
xtreg,fe
是固定效应模型的官方命令,使用这一命令估计出来的系数是最为纯正的固定效应估计量(组内估计量)
。
xtreg
对数据格式有严格要求,要求必须是面板数据,在使用xtreg命令之前,我们首先需要使用
xtset
命令进行面板数据声明,定义截面(个体)维度和时间维度。一旦在
xtreg
命令后加上选项
fe
,那就表示使用固定效应组内估计方法进行估计,并且默认个体固定效应定义在
xtset
所设定的截面维度上。至于时间固定效应,需要引入虚拟变量
i.year
来表示不同的时间。
下面使用林毅夫老师(1992)的AER论文《Rural Reforms and Agricultural Growth in China》(中国的农村改革与农业增长)所使用的数据lin_1992.dta,给大家演示一下该命令的用法和估计结果。
. xtset province year
panel variable: province (strongly balanced)
time variable: year, 70 to 87
delta: 1 unit
. xtreg ltvfo ltlan ltwlab ltpow ltfer hrs mci ngca i.year, fe vce(cluster province)
Fixed-effects (within) regression Number of obs = 476
Group variable: province Number of groups = 28
R-sq: Obs per group:
within = 0.8932 min = 17
between = 0.6596 avg = 17.0
overall = 0.7156 max = 17
F(23,27) = 949.82
corr(u_i, Xb) = -0.3425 Prob > F = 0.0000
(Std. Err. adjusted for 28 clusters in province)
------------------------------------------------------------------------------
| Robust
ltvfo | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ltlan | .5833594 .1745834 3.34 0.002 .2251439 .9415749
ltwlab | .1514909 .0585107 2.59 0.015 .0314368 .271545
ltpow | .0971114 .090911 1.07 0.295 -.0894225 .2836453
ltfer | .1693346 .0438098 3.87 0.001 .0794444 .2592248
hrs | .1503752 .0587581 2.56 0.016 .0298136 .2709368
mci | .1978373 .0810587 2.44 0.022 .0315186 .364156
ngca | .7784081 .4016301 1.94 0.063 -.0456688 1.602485
year |
71 | -.0240404 .023366 -1.03 0.313 -.0719836 .0239027
72 | -.1323624 .0404832 -3.27 0.003 -.2154272 -.0492977
73 | -.0377336 .0357883 -1.05 0.301 -.111165 .0356979
74 | .0058554 .0500774 0.12 0.908 -.096895 .1086058
75 | .0096731 .0566898 0.17 0.866 -.1066448 .1259911
76 | -.0476465 .061423 -0.78 0.445 -.1736761 .0783832
77 | -.0869336 .0680579 -1.28 0.212 -.2265767 .0527096
78 | -.0325205 .0766428 -0.42 0.675 -.1897785 .1247376
79 | -.0076332 .0833462 -0.09 0.928 -.1786454 .163379
81 | -.093479 .1093614 -0.85 0.400 -.3178701 .1309121
82 | -.0447862 .1207405 -0.37 0.714 -.2925251 .2029528
83 | -.0309435 .1377207 -0.22 0.824 -.313523 .2516361
84 | .0442535 .1428764 0.31 0.759 -.2489048 .3374117
85 | -.0033372 .1561209 -0.02 0.983 -.3236709 .3169965
86 | .00484 .157992 0.03 0.976 -.3193329 .3290129
87 | .0386475 .1639608 0.24 0.815 -.2977723 .3750674
_cons | 2.651286 .7738994 3.43 0.002 1.063376 4.239196
-------------+----------------------------------------------------------------
sigma_u | .29344594
sigma_e | .09930555
rho | .89724523 (fraction of variance due to u_i)
------------------------------------------------------------------------------
reg
通过在回归方程中引入虚拟变量来代表不同的个体,可以起到和固定效应组内估计方法(FE)同样的效果(已经被证明)。这种方法被称之为最小二乘虚拟变量方法(LSDV),一些教材和论文也把这种方法称之为固定效应估计方法 。它的好处是可以得到对个体异质性 u_i 的估计(FE是通过组内变换消去个体异质性 u_i ),但如果个体 n 很大,那么需要引入很多虚拟变量,自由度损失太多,还可能超出Stata所允许的解释变量个数。
LSDV方法的Stata命令是
reg i.id i.year
,其中,id是个体变量,year是时间变量,
reg
命令对数据格式没有要求,因而使用起来更为灵活,只是会生成一大长串虚拟变量估计结果。
. reg ltvfo ltlan ltwlab ltpow ltfer hrs mci ngca i.province i.year, vce(cluster province)
Linear regression Number of obs = 476
F(22, 27) = .
Prob > F = .
R-squared = 0.9695
Root MSE = .09931
(Std. Err. adjusted for 28 clusters in province)
-------------------------------------------------------------------------------
| Robust
ltvfo | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
ltlan | .5833594 .1800436 3.24 0.003 .2139404 .9527783
ltwlab | .1514909 .0603407 2.51 0.018 .027682 .2752998
ltpow | .0971114 .0937543 1.04 0.309 -.0952565 .2894792
ltfer | .1693346 .0451799 3.75 0.001 .0766331 .2620362
hrs | .1503752 .0605958 2.48 0.020 .026043 .2747075
mci | .1978373 .0835939 2.37 0.025 .0263169 .3693578
ngca | .7784081 .4141914 1.88 0.071 -.0714423 1.628259
province |
beijing | -.1865095 .1172887 -1.59 0.123 -.427166 .054147
fujian | .0434646 .0473107 0.92 0.366 -.0536089 .1405381
gansu | -.7945197 .1228202 -6.47 0.000 -1.046526 -.5425134
guangdong | -.0278664 .0609608 -0.46 0.651 -.1529476 .0972149
guangxi | -.2539549 .0614801 -4.13 0.000 -.3801015 -.1278082
guizhou | -.2526439 .0598147 -4.22 0.000 -.3753736 -.1299142
hebei | -.270106 .0948694 -2.85 0.008 -.4647619 -.07545
heilongjiang | -.0926732 .26542 -0.35 0.730 -.63727 .4519237
henan | -.0920743 .0396983 -2.32 0.028 -.1735284 -.0106201
hubei | .1024438 .0368811 2.78 0.010 .0267701 .1781176
hunan | -.0434275 .0581142 -0.75 0.461 -.1626679 .0758129
jiangsu | .1153335 .0352061 3.28 0.003 .0430965 .1875705
jiangxi | -.1401737 .0596644 -2.35 0.026 -.2625949 -.0177525
jilin | -.1783839 .2109985 -0.85 0.405 -.6113171 .2545493
liaoning | -.2517315 .1563399 -1.61 0.119 -.5725145 .0690515
neimong | -.8860432 .2325209 -3.81 0.001 -1.363137 -.4089498
ningxia | -.8489859 .1732579 -4.90 0.000 -1.204482 -.49349
qinghai | -.6982553 .1268849 -5.50 0.000 -.9586017 -.4379089
shaanxi | -.320607 .0887091 -3.61 0.001 -.502623 -.1385911
shangdong | .0040812 .0547494 0.07 0.941 -.1082554 .1164177
shanghai | .0864336 .0982642 0.88 0.387 -.1151878 .288055
shanxi | -.5005347 .1388718 -3.60 0.001 -.785476 -.2155934
sichuan | .0335563 .0392453 0.86 0.400 -.0469685 .1140811
tianjin | -.3011 .1049208 -2.87 0.008 -.5163796 -.0858203
xinjiang | -.3740561 .2053926 -1.82 0.080 -.7954869 .0473746
yunnan | -.2854833 .0590488 -4.83 0.000 -.4066415 -.1643251
zhejiang | .1615248 .0760427 2.12 0.043 .0054981 .3175515
year |
71 | -.0240404 .0240968 -1.00 0.327 -.073483 .0254022
72 | -.1323624 .0417494 -3.17 0.004 -.2180251 -.0466998
73 | -.0377336 .0369076 -1.02 0.316 -.1134616 .0379945
74 | .0058554 .0516436 0.11 0.911 -.1001086 .1118193
75 | .0096731 .0584628 0.17 0.870 -.1102827 .129629
76 | -.0476465 .0633441 -0.75 0.458 -.1776178 .0823249
77 | -.0869336 .0701864 -1.24 0.226 -.2309442 .057077
78 | -.0325205 .0790398 -0.41 0.684 -.1946968 .1296559
79 | -.0076332 .0859529 -0.09 0.930 -.1839939 .1687275
81 | -.093479 .1127818 -0.83 0.414 -.324888 .1379301
82 | -.0447862 .1245167 -0.36 0.722 -.3002733 .210701
83 | -.0309435 .142028 -0.22 0.829 -.3223608 .2604739
84 | .0442535 .147345 0.30 0.766 -.2580735 .3465804
85 | -.0033372 .1610037 -0.02 0.984 -.3336895 .3270151
86 | .00484 .1629333 0.03 0.977 -.3294716 .3391516
87 | .0386475 .1690888 0.23 0.821 -.3082941 .3855891
_cons | 2.874582 .7510459 3.83 0.001 1.333563 4.415601
-------------------------------------------------------------------------------
areg
areg
命令是对
reg
命令的改进和优化,其对数据结构也没有要求。有些时候我们想在回归中控制很多虚拟变量(
i.id
这种),但又不想生成虚拟变量,不想报告虚拟变量的回归结果,那么就可以使用
areg
命令,只需在选项
absorb()
的括号里加入你想要控制的类别变量就好。因此,我们也可以使用
areg
命令实现固定效应的估计,因为固定效应组内估计与LSDV效果是等价的。
不过
absorb()
的括号里只能加一个变量
,如果想要估计双向固定效应或是更高维度固定效应,那么就还是要使用使用
i.var
的方式引入虚拟变量。
. areg ltvfo ltlan ltwlab ltpow ltfer hrs mci ngca i.year, absorb(province) vce(cluster province)
Linear regression, absorbing indicators Number of obs = 476
Absorbed variable: province No. of categories = 28
F( 23, 27) = 893.08
Prob > F = 0.0000
R-squared = 0.9695
Adj R-squared = 0.9659
Root MSE = 0.0993
(Std. Err. adjusted for 28 clusters in province)
------------------------------------------------------------------------------
| Robust
ltvfo | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ltlan | .5833594 .1800436 3.24 0.003 .2139404 .9527783
ltwlab | .1514909 .0603407 2.51 0.018 .027682 .2752998
ltpow | .0971114 .0937543 1.04 0.309 -.0952565 .2894792
ltfer | .1693346 .0451799 3.75 0.001 .0766331 .2620362
hrs | .1503752 .0605958 2.48 0.020 .026043 .2747075
mci | .1978373 .0835939 2.37 0.025 .0263169 .3693578
ngca | .7784081 .4141914 1.88 0.071 -.0714423 1.628259
year |
71 | -.0240404 .0240968 -1.00 0.327 -.073483 .0254022
72 | -.1323624 .0417494 -3.17 0.004 -.2180251 -.0466998
73 | -.0377336 .0369076 -1.02 0.316 -.1134616 .0379945
74 | .0058554 .0516436 0.11 0.911 -.1001086 .1118193
75 | .0096731 .0584628 0.17 0.870 -.1102827 .129629
76 | -.0476465 .0633441 -0.75 0.458 -.1776178 .0823249
77 | -.0869336 .0701864 -1.24 0.226 -.2309442 .057077
78 | -.0325205 .0790398 -0.41 0.684 -.1946968 .1296559
79 | -.0076332 .0859529 -0.09 0.930 -.1839939 .1687275
81 | -.093479 .1127818 -0.83 0.414 -.324888 .1379301
82 | -.0447862 .1245167 -0.36 0.722 -.3002733 .210701
83 | -.0309435 .142028 -0.22 0.829 -.3223608 .2604739
84 | .0442535 .147345 0.30 0.766 -.2580735 .3465804
85 | -.0033372 .1610037 -0.02 0.984 -.3336895 .3270151
86 | .00484 .1629333 0.03 0.977 -.3294716 .3391516
87 | .0386475 .1690888 0.23 0.821 -.3082941 .3855891
_cons | 2.651286 .7981036 3.32 0.003 1.013713 4.288859
------------------------------------------------------------------------------
reghdfe
reghdfe
主要用于实现多维固定效应线性回归。有些时候,我们需要控制多个维度(如城市-行业-年度)的固定效应,
xtreg
等命令也OK,但运行速度会很慢,
reghdfe
解决的就是这一痛点,其在运行速度方面远远优于
xtreg
等命令。
reghdfe
是一个外部命令,作者是Sergio Correia,有关这一命令的更多介绍详见github作者主页(
https://
github.com/sergiocorrei
a/reghdfe
),大家在使用之前需要安装(
ssc install reghdfe
)。
reghdfe
命令可以包含多维固定效应,只需
absorb (var1,var2,var3,...)
,不需要使用
i.var
的方式引入虚拟变量,相比
xtreg
等命令方便许多,并且不会汇报一大长串虚拟变量回归结果。
. reghdfe ltvfo ltlan ltwlab ltpow ltfer hrs mci ngca, absorb(year province) vce(cluster province)
(MWFE estimator converged in 2 iterations)
HDFE Linear regression Number of obs = 476
Absorbing 2 HDFE groups F( 7, 27) = 229.56
Statistics robust to heteroskedasticity Prob > F = 0.0000
R-squared = 0.9695
Adj R-squared = 0.9658
Within R-sq. = 0.6751
Number of clusters (province) = 28 Root MSE = 0.0994
(Std. Err. adjusted for 28 clusters in province)
------------------------------------------------------------------------------
| Robust
ltvfo | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ltlan | .5833594 .1745834 3.34 0.002 .2251439 .9415749
ltwlab | .1514909 .0585107 2.59 0.015 .0314368 .271545
ltpow | .0971114 .090911 1.07 0.295 -.0894225 .2836453
ltfer | .1693346 .0438098 3.87 0.001 .0794444 .2592248
hrs | .1503752 .0587581 2.56 0.016 .0298136 .2709368
mci | .1978373 .0810587 2.44 0.022 .0315186 .364156
ngca | .7784081 .4016301 1.94 0.063 -.0456688 1.602485
_cons | 2.625513 .7307092 3.59 0.001 1.126221 4.124804
------------------------------------------------------------------------------
Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
year | 17 0 17 |
province | 28 28 0 *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation
总结
最后,让我们看看
xtreg
,
reg
,
areg
和
reghdfe
四个命令的估计差别。
. esttab FE_xtreg FE_reg FE_areg FE_reghdfe ,b(%6.3f) se scalars(N r2) star(* 0.1 ** 0.05 *** 0.01) ke
> ep(ltlan ltwlab ltpow ltfer hrs mci ngca) nogaps mtitles("FE_xtreg" "FE_reg" "FE_areg" "FE_reghdfe")
----------------------------------------------------------------------------
(1) (2) (3) (4)
FE_xtreg FE_reg FE_areg FE_reghdfe
----------------------------------------------------------------------------
ltlan 0.583*** 0.583*** 0.583*** 0.583***
(0.175) (0.180) (0.180) (0.175)
ltwlab 0.151** 0.151** 0.151** 0.151**
(0.059) (0.060) (0.060) (0.059)
ltpow 0.097 0.097 0.097 0.097
(0.091) (0.094) (0.094) (0.091)
ltfer 0.169*** 0.169*** 0.169*** 0.169***
(0.044) (0.045) (0.045) (0.044)
hrs 0.150** 0.150** 0.150** 0.150**
(0.059) (0.061) (0.061) (0.059)
mci 0.198** 0.198** 0.198** 0.198**
(0.081) (0.084) (0.084) (0.081)
ngca 0.778* 0.778* 0.778* 0.778*
(0.402) (0.414) (0.414) (0.402)
----------------------------------------------------------------------------