Stata学习:如何构建双重差分机器学习模型?ddml

Stata学习:如何构建双重差分机器学习模型?ddml

文献来源

Double/debiased machine learning

下载包

在Stata中运行:

ssc install ddml, replace
ssc install pystacked, replace

并且在cmd中运行:

pip install scikit-learn

采用Stata 17可能出现:

Cross-fitting fold 1 unrecognized command

该问题暂无解决方式。

示例1

文献来源

  1. 张涛,李均超.网络基础设施、包容性绿色增长与地区差距——基于双重机器学习的因果推断[J].数量经济技术经济研究,2023,40(04):113-135.
    1. 中国知网 【数据+Stata】

示例代码

运行环境:Stata 18 MP

样本分割比例为 1:4( kfolds(5) ),采用随机森林算法( rf )对主回归和辅助回归进行预测求解,

cd "C:\Download\数据"
use data, clear
gl Y PR
gl X Edu Constru Urban Pass Fre Inv Inter Fis Unemp Size Consump Sci Cap Edu2 Constru2 Urban2 Pass2 Fre2 Inv2 Inter2 Fis2 Unemp2 Size2 Consump2 Sci2 Cap2 i.year i.id
gl D Broadband
set seed 42 
ddml init partial, kfolds(5)
ddml E[D|X]: pystacked $D $X, type(reg) method(rf)
ddml E[Y|X]: pystacked $Y $X, type(reg) method(rf)
ddml crossfit
ddml estimate, robust

得到结果

运行时间约为3分钟:

Cross-fitting E[y|X] equation: PR
Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
Cross-fitting E[D|X] equation: Broadband
Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
. ddml estimate, robust
DDML estimation results:
spec  r     Y learner     D learner         b        SE
 opt  1  Y1_pystacked  D1_pystacked     0.049  ( 0.057)
opt = minimum MSE specification for that resample.
Min MSE DDML model
y-E[y|X]  = Y1_pystacked_1                         Number of obs   =      2820
D-E[D|X,Z]= D1_pystacked_1
------------------------------------------------------------------------------
             |               Robust
          PR | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   Broadband |    .049493   .0567812     0.87   0.383    -.0617961     .160782
       _cons |   .0100916   .0126023     0.80   0.423    -.0146085    .0347917
------------------------------------------------------------------------------

期刊排版

演示了第8列的结果:

张涛和李均超(2023)

更多其他变式参见原文及其附录代码。

示例2

文献来源

  1. Wang, W., et al. (2024). The impact of energy-consuming rights trading on green total factor productivity in the context of digital economy: Evidence from listed firms in China
    1. Appendix B. Supplementary data 【数据+Stata】

示例代码

clear
use data
gl X SIZE AGE LEV TOBINQ ROA CAPITAL GROWTH TOP1 INDEP EPI FDI GDP IND i.year i.id
*Random Forest
set seed 42 
ddml init partial, kfolds(5)
ddml E[D|X]: pystacked ECRT  $X, type(reg) method(rf)
ddml E[Y|X]: pystacked FGTFP $X, type(reg) method(rf)
ddml crossfit
ddml estimate, robust
*lassocv
set seed 44 
ddml init partial, kfolds(5)
ddml E[D|X]: pystacked ECRT  $X, type(reg) method(lassocv)
ddml E[Y|X]: pystacked FGTFP $X, type(reg) method(lassocv)
ddml crossfit
ddml estimate, robust

得到结果

Cross-fitting E[y|X] equation: FGTFP
Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
Cross-fitting E[D|X] equation: ECRT
Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
. ddml estimate, robust
DDML estimation results:
spec  r     Y learner     D learner         b        SE
 opt  1  Y1_pystacked  D1_pystacked     0.617  ( 0.284)
opt = minimum MSE specification for that resample.
Min MSE DDML model
y-E[y|X]  = Y1_pystacked_1                         Number of obs   =      7420
D-E[D|X,Z]= D1_pystacked_1
------------------------------------------------------------------------------
             |               Robust
       FGTFP | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        ECRT |   .6165052   .2842139     2.17   0.030     .0594562    1.173554
       _cons |   .0383372   .0096879     3.96   0.000     .0193492    .0573252
------------------------------------------------------------------------------
Cross-fitting E[y|X] equation: FGTFP
Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
Cross-fitting E[D|X] equation: ECRT
Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
. ddml estimate, robust
DDML estimation results:
spec  r     Y learner     D learner         b        SE
 opt  1  Y1_pystacked  D1_pystacked     0.422  ( 0.054)
opt = minimum MSE specification for that resample.
Min MSE DDML model
y-E[y|X]  = Y1_pystacked_1                         Number of obs   =       7420
D-E[D|X,Z]= D1_pystacked_1
------------------------------------------------------------------------------
             |               Robust