相关文章推荐
正直的豆芽  ·  python - ...·  1 年前    · 
狂野的日光灯  ·  wpf - ...·  1 年前    · 
腹黑的豆浆  ·  流式生成Excel文件 - 掘金·  1 年前    · 
本文介绍了一种使用扩散概率模型 (DPM) 模拟电子健康记录 (EHR) 的新方法。具体来说,我们展示了 DPM 在合成纵向 EHR 方面的有效性,这些纵向 EHR 捕获混合类型的变量,包括数字、二进制和分类变量。据我们所知,这是首次将 DPM 用于此目的。我们将我们的 DPM 模拟数据集与之前基于生成对抗网络 (GAN) 的最先进结果进行了比较,用于两种临床应用:急性低血压和人类免疫缺陷病毒 (ART for HIV)。鉴于之前对 DPM 缺乏类似的研究,我们工作的核心组成部分涉及探索在广泛的方面使用 DPM 的优势和注意事项。除了评估合成数据集的真实性,我们还在合成数据上训练了强化学习 (RL) 代理,以评估它们在支持下游机器学习模型开发方面的效用。最后,我们估计我们的 DPM 模拟数据集是安全的,并且对公众访问的患者暴露风险较低。 This paper presents a novel approach to simulating electronic health records (EHRs) using diffusion probabilistic models (DPMs). Specifically, we demonstrate the effectiveness of DPMs in synthesising longitudinal EHRs that capture mixed-type variables, including numeric, binary, and categorical variables. To our knowledge, this represents the first use of DPMs for this purpose. We compared our DPM-simulated datasets to previous state-of-the-art results based on generative adversarial networks (GANs) for two clinical applications: acute hypotension and human immunodeficiency virus (ART for HIV). Given the lack of similar previous studies in DPMs, a core component of our work involves exploring the advantages and caveats of employing DPMs across a wide range of aspects. In addition to assessing the realism of the synthetic datasets, we also trained reinforcement learning (RL) agents on the synthetic data to evaluate their utility for supporting the development of downstream machine learning models. Finally, we estimated that our DPM-simulated datasets are secure and posed a low patient exposure risk for public access.