Stata学习:如何构建双重差分机器学习模型?ddml
![momo](https://picx.zhimg.com/v2-2725466f8fa7a318167ee2a74ccbfe86_l.jpg?source=172ae18b)
文献来源
Double/debiased machine learning
下载包
在Stata中运行:
ssc install ddml, replace
ssc install pystacked, replace
并且在cmd中运行:
pip install scikit-learn
采用Stata 17可能出现:
Cross-fitting fold 1 unrecognized command
该问题暂无解决方式。
示例1
文献来源
- 张涛,李均超.网络基础设施、包容性绿色增长与地区差距——基于双重机器学习的因果推断[J].数量经济技术经济研究,2023,40(04):113-135.
- 中国知网 【数据+Stata】
示例代码
运行环境:Stata 18 MP
样本分割比例为 1:4(
kfolds(5)
),采用随机森林算法(
rf
)对主回归和辅助回归进行预测求解,
cd "C:\Download\数据"
use data, clear
gl Y PR
gl X Edu Constru Urban Pass Fre Inv Inter Fis Unemp Size Consump Sci Cap Edu2 Constru2 Urban2 Pass2 Fre2 Inv2 Inter2 Fis2 Unemp2 Size2 Consump2 Sci2 Cap2 i.year i.id
gl D Broadband
set seed 42
ddml init partial, kfolds(5)
ddml E[D|X]: pystacked $D $X, type(reg) method(rf)
ddml E[Y|X]: pystacked $Y $X, type(reg) method(rf)
ddml crossfit
ddml estimate, robust
得到结果
运行时间约为3分钟:
Cross-fitting E[y|X] equation: PR
Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
Cross-fitting E[D|X] equation: Broadband
Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
. ddml estimate, robust
DDML estimation results:
spec r Y learner D learner b SE
opt 1 Y1_pystacked D1_pystacked 0.049 ( 0.057)
opt = minimum MSE specification for that resample.
Min MSE DDML model
y-E[y|X] = Y1_pystacked_1 Number of obs = 2820
D-E[D|X,Z]= D1_pystacked_1
------------------------------------------------------------------------------
| Robust
PR | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
Broadband | .049493 .0567812 0.87 0.383 -.0617961 .160782
_cons | .0100916 .0126023 0.80 0.423 -.0146085 .0347917
------------------------------------------------------------------------------
期刊排版
演示了第8列的结果:
更多其他变式参见原文及其附录代码。
示例2
文献来源
- Wang, W., et al. (2024). The impact of energy-consuming rights trading on green total factor productivity in the context of digital economy: Evidence from listed firms in China
- Appendix B. Supplementary data 【数据+Stata】
示例代码
clear
use data
gl X SIZE AGE LEV TOBINQ ROA CAPITAL GROWTH TOP1 INDEP EPI FDI GDP IND i.year i.id
*Random Forest
set seed 42
ddml init partial, kfolds(5)
ddml E[D|X]: pystacked ECRT $X, type(reg) method(rf)
ddml E[Y|X]: pystacked FGTFP $X, type(reg) method(rf)
ddml crossfit
ddml estimate, robust
*lassocv
set seed 44
ddml init partial, kfolds(5)
ddml E[D|X]: pystacked ECRT $X, type(reg) method(lassocv)
ddml E[Y|X]: pystacked FGTFP $X, type(reg) method(lassocv)
ddml crossfit
ddml estimate, robust
得到结果
Cross-fitting E[y|X] equation: FGTFP
Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
Cross-fitting E[D|X] equation: ECRT
Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
. ddml estimate, robust
DDML estimation results:
spec r Y learner D learner b SE
opt 1 Y1_pystacked D1_pystacked 0.617 ( 0.284)
opt = minimum MSE specification for that resample.
Min MSE DDML model
y-E[y|X] = Y1_pystacked_1 Number of obs = 7420
D-E[D|X,Z]= D1_pystacked_1
------------------------------------------------------------------------------
| Robust
FGTFP | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
ECRT | .6165052 .2842139 2.17 0.030 .0594562 1.173554
_cons | .0383372 .0096879 3.96 0.000 .0193492 .0573252
------------------------------------------------------------------------------
Cross-fitting E[y|X] equation: FGTFP
Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
Cross-fitting E[D|X] equation: ECRT
Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting
. ddml estimate, robust
DDML estimation results:
spec r Y learner D learner b SE
opt 1 Y1_pystacked D1_pystacked 0.422 ( 0.054)
opt = minimum MSE specification for that resample.
Min MSE DDML model
y-E[y|X] = Y1_pystacked_1 Number of obs = 7420
D-E[D|X,Z]= D1_pystacked_1
------------------------------------------------------------------------------
| Robust