首发于 技术闲谈

C++计算数据均值、方差和标准差的方法

均值、方差和标准差

数学概念,解释一下。

  • 均值: 平均值,所有数据求和后除以数据个数,英文mean。
  • 方差:衡量数据相对于均值(期望)的偏离程序,英文variance。
  • 标准差:方差的开方,英文Standard deviation。

具体释义请参考百度。

直接计算

对于给定的数组,可以根据公式直接计算:

void nake_mode(const vector<double>& data)
    double sum = std::accumulate(std::begin(data), std::end(data), 0.0);
    double mean = sum / data.size();
    double variance = 0.0;
    std::for_each(std::begin(data), std::end(data), [&](const double d) {
        variance += pow(d-mean, 2);
    variance /= data.size();
    double std_deviation = sqrt(variance);
    cout << "mean,variance,std_deviation\n";
    cout << mean << "," << variance << "," << std_deviation << endl;

使用Eigen

也可以使用Eigen库来计算,学习不深,仅保证代码可运行,不保证性能:

void eigen_mode(const vector<double>& data)
    vector<double> s = data;
    Eigen::VectorXd v = Eigen::Map<Eigen::VectorXd>(s.data(), s.size());
    double mean = v.sum() / v.size();
    Eigen::VectorXd v_mean(v);
    v_mean.setConstant(mean);
    Eigen::VectorXd v1 = v - v_mean;
    auto variance = v1.dot(v1.transpose());
    variance /= v.size();
    double std_deviation = std::sqrt(variance);
    cout << "mean,variance,std_deviation\n";
    cout << mean << "," << variance << "," << std_deviation << endl;

测试

随机生成数据元素,指定元素数量规模:

double generate_random()
    std::random_device rd;
    std::default_random_engine eng(rd());
    std::uniform_real_distribution<double> distr(-1, 1);
    return distr(eng);
int main(int argc, char *argv[])
    int num;
    if (argc == 1) {
        num = 10;
    } else if (argc == 2) {
        num = stoi(argv[1]);
    vector<double> data;
    data.reserve(num);
    for (int i = 0; i < num; ++i) {
        data.emplace_back(generate_random());
    std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now();
    nake_mode(data);
    std::chrono::high_resolution_clock::time_point t2 = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double, std::milli> span = t2 - t1;
    cout << "nake_mode tooks:" << span.count() << endl;
    cout << "--------------\n";
    t1 = std::chrono::high_resolution_clock::now();
    eigen_mode(data);
    t2 = std::chrono::high_resolution_clock::now();