数据库统计数据量时常常用的是count(1)或者count(*),这两者区别不大,在postgres里面统计数据总量时,要注意count(column_name)与count(*)的区别,有些场景下会造成数据统计不准确的误解。
官方文档上有一段话:
For example, count(*) yields the total number of input rows;
count(f1) yields the number of input rows in which f1 is non-null, since count ignores nulls;
and count(distinct f1) yields the number of distinct non-null values of f1.
[postgres@localhost ~]$ psql
psql (9.2.3)
Type "help" for help.
postgres=# create table t_kenyon(id int);
CREATE TABLE
postgres=# insert into t_kenyon values(1),(2),(2),(3),(null),(null);
INSERT 0 6
postgres=# select * from t_kenyon;
(6 rows)
postgres=# select count(1),count(*),count(id),count(distinct id) from t_kenyon;
count | count | count | count
-------+-------+-------+-------
6 | 6 | 4 | 3
(1 row)
postgres=# drop table t_kenyon;
DROP TABLE
总结:
count(1)或者count(*)会统计所有行数,包括Null值;
count(字段)会对Null值去掉统计;
count(distinct column)则会对非Null字段进行去重统计。