大家好,我是考100分的小小码 ,祝大家学习进步,加薪顺利呀。今天说一说您所不了解的Postgres 10功能:CREATE STATISTICS(译),希望您对编程的造诣更进一步.
https://www.citusdata.com/blog/2018/03/06/postgres-planner-and-its-usage-of-statistics/,本文统一将原文中的“planner”译做“优化器”
CREATE TABLE tbl ( col1 int, col2 int ); INSERT INTO tbl SELECT i/10000, i/100000 FROM generate_series (1,10000000) s(i); ANALYZE tbl; select * from pg_stats where tablename = "tbl" and attname = "col1"; -[ RECORD 1 ]----------+-------------------------------- schemaname | public tablename | tbl attname | col1 inherited | f null_frac | 0 avg_width | 4 n_distinct | 1000 most_common_vals | {318,564,596,...} most_common_freqs | {0.00173333,0.0017,0.00166667,0.00156667,...} histogram_bounds | {0,8,20,30,39,...} correlation | 1 most_common_elems | most_common_elem_freqs | elem_count_histogram |
代码100分
代码100分EXPLAIN ANALYZE SELECT * FROM tbl where col1 = 1; QUERY PLAN ----------------------------------------------------------------------------------------------------------- Seq Scan on tbl (cost=0.00..169247.80 rows=9584 width=8) (actual time=0.641..622.851 rows=10000 loops=1) Filter: (col1 = 1) Rows Removed by Filter: 9990000 Planning time: 0.051 ms Execution time: 623.185 ms (5 rows)
EXPLAIN ANALYZE SELECT * FROM tbl where col1 = 1 and col2 = 0; QUERY PLAN ---------------------------------------------------------------------------------------------------------- Seq Scan on tbl (cost=0.00..194248.69 rows=100 width=8) (actual time=0.640..630.130 rows=10000 loops=1) Filter: ((col1 = 1) AND (col2 = 0)) Rows Removed by Filter: 9990000 Planning time: 0.072 ms Execution time: 630.467 ms (5 rows)
将其乘以表中的行数,即10000000,我们得到100。这就是计划者估计的100的来源。
如果这些列不是独立的(有多个列之间存在依赖关系),我们如何告诉优化器(planner)呢?
早些年曾经执着地研究过SQLServer对非相关列预估的算法,
类似于pg,SQLServer从预估行数从2012版的p
0*p
1*p
2*p
3……*RowCount,演变为P
0*P
1
1/2 * P
2
1/4 * P
3
1/8……* RowCount,
https://www.cnblogs.com/wy123/p/5790855.html
代码100分CREATE STATISTICS s1 (dependencies) on col1, col2 from tbl; ANALYZE tbl;
EXPLAIN ANALYZE SELECT * FROM tbl where col1 = 1 and col2 = 0; QUERY PLAN ----------------------------------------------------------------------------------------------------------- Seq Scan on tbl (cost=0.00..194247.76 rows=9584 width=8) (actual time=0.638..629.741 rows=10000 loops=1) Filter: ((col1 = 1) AND (col2 = 0)) Rows Removed by Filter: 9990000 Planning time: 0.115 ms Execution time: 630.076 ms (5 rows)
SELECT stxname, stxkeys, stxdependencies FROM pg_statistic_ext WHERE stxname = "s1"; stxname | stxkeys | stxdependencies ---------+---------+---------------------- s1 | 1 2 | {"1 => 2": 1.000000} (1 row)
ndistinct statistics)
EXPLAIN ANALYZE SELECT col1,col2,count(*) from tbl group by col1, col2; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=1990523.20..2091523.04 rows=100000 width=16) (actual time=2697.246..4470.789 rows=1001 loops=1) Group Key: col1, col2 -> Sort (cost=1990523.20..2015523.16 rows=9999984 width=8) (actual time=2695.498..3440.880 rows=10000000 loops=1) Sort Key: col1, col2 Sort Method: external sort Disk: 176128kB -> Seq Scan on tbl (cost=0.00..144247.84 rows=9999984 width=8) (actual time=0.008..665.689 rows=10000000 loops=1) Planning time: 0.072 ms Execution time: 4494.583 ms
优化器(planner)估计的数量(等于col1和col2的不同值的数量)将为100000。
优化器(planner)捕获n_distinct统计信息,然后重新运行查询并找出答案。
CREATE STATISTICS s2 (ndistinct) on col1, col2 from tbl; ANALYZE tbl; EXPLAIN ANALYZE SELECT col1,col2,count(*) from tbl group by col1, col2; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=219247.63..219257.63 rows=1000 width=16) (actual time=2431.767..2431.928 rows=1001 loops=1) Group Key: col1, col2 -> Seq Scan on tbl (cost=0.00..144247.79 rows=9999979 width=8) (actual time=0.008..643.488 rows=10000000 loops=1) Planning time: 0.129 ms Execution time: 2432.010 ms (5 rows)
优化器(planner)学到了什么。
SELECT stxkeys AS k, stxndistinct AS nd FROM pg_statistic_ext WHERE stxname = "s2"; k | nd -----+---------------- 1 2 | {"1, 2": 1000} Real-world implications
- 由于要在报表中显示按所有人分组的统计信息,因此具有月,季度和年的列。
- 地理层次结构之间的关系,例如:具有国家,州和城市列,并按它们进行过滤/分组。
因为Citus是纯粹的扩展,而不是分支,所以使用Citus时可以利用每个发行版中的所有出色新功能。
如果您有兴趣阅读我们团队的更多帖子,请注册我们的每月时事通讯,并将最新内容直接发送到您的收件箱。
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。
转载请注明出处: https://daima100.com/7396.html