C++计算数据均值、方差和标准差的方法
均值、方差和标准差
数学概念,解释一下。
- 均值: 平均值,所有数据求和后除以数据个数,英文mean。
- 方差:衡量数据相对于均值(期望)的偏离程序,英文variance。
- 标准差:方差的开方,英文Standard deviation。
具体释义请参考百度。
直接计算
对于给定的数组,可以根据公式直接计算:
void nake_mode(const vector<double>& data)
double sum = std::accumulate(std::begin(data), std::end(data), 0.0);
double mean = sum / data.size();
double variance = 0.0;
std::for_each(std::begin(data), std::end(data), [&](const double d) {
variance += pow(d-mean, 2);
variance /= data.size();
double std_deviation = sqrt(variance);
cout << "mean,variance,std_deviation\n";
cout << mean << "," << variance << "," << std_deviation << endl;
使用Eigen
也可以使用Eigen库来计算,学习不深,仅保证代码可运行,不保证性能:
void eigen_mode(const vector<double>& data)
vector<double> s = data;
Eigen::VectorXd v = Eigen::Map<Eigen::VectorXd>(s.data(), s.size());
double mean = v.sum() / v.size();
Eigen::VectorXd v_mean(v);
v_mean.setConstant(mean);
Eigen::VectorXd v1 = v - v_mean;
auto variance = v1.dot(v1.transpose());
variance /= v.size();
double std_deviation = std::sqrt(variance);
cout << "mean,variance,std_deviation\n";
cout << mean << "," << variance << "," << std_deviation << endl;
测试
随机生成数据元素,指定元素数量规模:
double generate_random()
std::random_device rd;
std::default_random_engine eng(rd());
std::uniform_real_distribution<double> distr(-1, 1);
return distr(eng);
int main(int argc, char *argv[])
int num;
if (argc == 1) {
num = 10;
} else if (argc == 2) {
num = stoi(argv[1]);
vector<double> data;
data.reserve(num);
for (int i = 0; i < num; ++i) {
data.emplace_back(generate_random());
std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now();
nake_mode(data);
std::chrono::high_resolution_clock::time_point t2 = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> span = t2 - t1;
cout << "nake_mode tooks:" << span.count() << endl;
cout << "--------------\n";
t1 = std::chrono::high_resolution_clock::now();
eigen_mode(data);
t2 = std::chrono::high_resolution_clock::now();