• Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas
  • Weatherhead PET Imaging Center, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas
  • 摘要

    Kolmogorov–Smirnov (KS) 统计量是一种基于经验分布函数的非参数统计量。对于单样本情况,它使用经验分布函数 (EDF) 和预先指定的累积分布函数 (CDF) 之间的上界距离。对于双样本情况,它测量两个 EDF 之间的距离的最大值。KS 检验以及其他基于 EDF 的检验,如 Anderson-Darling (AD) 检验和 Cramer-von Mises (CvM) 检验,已广泛用于统计分析。为了解决和比较这些测试统计的性能,我们进行了一项模拟研究,比较了 KS 测试、CvM 测试、AD 测试和卡方测试的 I 类错误和功效。我们的研究包括一个样本和两个样本测试以及独立样本和相关样本。我们的研究表明,如果我们没有关于测试分布的先验信息,基于 EDF 的测试会更好。然而,只要我们有关于测试分布的先验信息并且两个分布的密度是钟形的并且我们期望方差/稀疏性存在差异,那么卡方检验可能更可取。当测试样本之间存在相关性时,对信息样本量的调整是重要且需要的。

    Abstract

    Kolmogorov–Smirnov (KS) statistic is a non-parametric statistic based on the empirical distribution function. For the one-sample case, it uses the supremum distance between an empirical distribution function (EDF) and a pre-specified cumulative distribution function (CDF). For two-sample case, it measures the maximum of the distance between two EDFs. KS test, as well as other EDF-based tests such as the Anderson-Darling (AD) test and Cramer-von Mises (CvM) test, has been widely used in statistical analysis. To address and compare the performance of these test statistics, we have conducted a simulation study comparing the type I error and power of the KS test, the CvM test, the AD test, and the Chi-squared test. Our study includes both one sample and two sample tests and for both independent and correlated samples. Our study showed that if we do not have prior information about the tested distributions, EDF-based tests are better. However, so long as we have prior information about the tested distribution and the density of two distributions is bell-shaped and we are expecting differences in variance/sparseness, then the Chi-squared test may be more preferable. When correlation exists between tested samples, adjustment on the informative sample size is important and required.