Dead rows in a materialized view

栏目: IT技术 · 发布时间: 3年前

内容简介:In the application that I work on, we have a materialized view. It was created from several joined tables. It’s used to speed up searching for data without joining seven or eight tables in every query. At least it should make searching faster but in practi

In the application that I work on, we have a materialized view. It was created from several joined tables. It’s used to speed up searching for data without joining seven or eight tables in every query. At least it should make searching faster but in practice, it didn’t and I will describe to you why.

Queries to that view were really slow, but only on the production server, on staging everything was fine. That view was exactly the same as on staging server, but queries were ~50 times slower on production.

The first step was to check execution plans for any query on both servers.

EXPLAIN ANALYZE SELECT COUNT(*) FROM 'table_name';

On staging execution plan involved parallel scan, on production index scan.

When I discovered that, the first thing that came to my mind was the possibility of different configurations that disabled parallel queries on production. I checked that and it turned out to be false. My next attempts were to check various Postgres views that kept some data about tables. After looking through many numbers that didn’t help I finally discovered that our table (materialized view actually, but in this context, it makes no difference) on the production server it took 7GB of disc space and on staging only 300MB.

SELECT pg_size_pretty( pg_total_relation_size('table_name') );

I expected that row count had to be totally different but unexpectedly it was ~320k on production and ~310k on staging.

After a really long investigation, I found the reason why those numbers were so different, the table on production contained an enormous number of dead rows.

SELECT n_dead_tup 
FROM pg_stat_all_tables
WHERE relname = 'table_name'

At this time I was almost sure that’s the reason for our whole problem (or at least I hoped so). I started researching those dead rows, what they are, and why they appear. The major defect of that approach is that after every single update on one of the records from joined tables, the whole materialized view is refreshed. Refreshed i.e. whole view is dropped and then it’s building query is run. We have two types of a refresh in PostgreSQL:

  • non concurrently

This refresh type does not produce any dead rows, but for the time of refreshing view is locked, and no data can be read from it. This is an unacceptable solution for my project.

  • concurrently

It allows reading data during the process. It’s possible thanks to duplicating all data before deleting it. During refresh, all SELECT queries see that duplicated data, and after the process, all queries have access to newly created view, and duplicates remain as dead rows. It’s the way how the view is bloated with tons of unnecessary data.

And here comes VACUUM mechanism that is used to remove all dead rows from the table or materialized view.

VACUUM table_name

This command removes all dead queries from the given table, but it has to be run ‘manually’, or by some application code. There is also a mechanism called autovacuum . PostgreSQL has some workers (quantity set in configuration) that all the time search our database and run VACUUM on tables that are in need. What do we know now? We have many dead rows in the materialized view, we also have a mechanism that should clean them. Why isn’t it? In order to check that I added query that logs last auto and vacuum time with dead rows count.

SELECT n_dead_tup, last_autovacuum, last_vacuum 
FROM pg_stat_all_tables
WHERE relname = 'table_name'

After a few days, I checked that logs. The first thing to notice was that the Autovacuum process is working but it’s triggered only in evenings. All dead rows are cleaned and the next day from something like 7 a.m. they appear and it’s count is growing. I checked our search in the morning and indeed it was as fast as it should be, but within a day it was getting much slower. Following really long research why is that happening I found that Autovacuum cannot be run on the table when it’s locked by SHARE UPDATE EXCLUSIVE lock. What locks a table that way? Obviously it’s REFRESH MATERIALIZED VIEW CONCURRENTLY. When that view is refreshed in our application? Other logs added and the answer is: the view is refreshed almost whole time (during a workday from morning to evening). Refresh is as I mentioned triggered by every data update on each of the tables that problematic view is made of. And here comes our final answer. Dead rows aren’t cleaned because Autovacuum cannot be run during refresh and refresh is running continuously from morning to evening.

Possible solutions:

  1. Architectural changes that would prevent an application from refreshing whole materialized view during every data update. This is definitely possible to just update those rows that really changed without dropping and building from the scratch whole table.

    This is definitely the best solution and I would choose that if I had much more time to spend on fixing that problem, but it would be too long to refactor every place in the application that can update one of the included tables.

  2. The second possible fix is to append some breaks between refreshes that AUTOVACUUM process could be fired on our view. Theoretically, it is a fine idea, it would reduce database overload, it should be quite fast to implement, but I think there is too much room for unforeseen consequences and finally, I decided to give up that idea. Even though adding breaks would not take much time to implement, testing it carefully would and as I noticed above there were no more time after that really long investigation

  3. The third and final solution I anticipated was the simplest one. In the job used to refresh the materialized view, I added code that ‘manually’ vacuums view from dead rows. During that whole research on I run vacuum manually many times and I knew it worked. Going this way assures us that there is no possibility that dead rows stay in view because they are vacuumed as soon as refresh (that creates them) ends.

Weighing up the pros and cons I finally chose the third solution. Time was a problem and it was the only one that could be tested immediately. It worked, but I think refactoring the application to not refresh the whole view would be the best option.

We create dedicated software for companies, bring ideas to life and enjoy what we do.

Looking for professional team?


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

颠覆者:周鸿祎自传

颠覆者:周鸿祎自传

周鸿祎、范海涛 / 北京联合出版公司 / 2017-11 / 49.80元

周鸿祎,一个在中国互联网历史上举足轻重的名字。他被认为是奠定当今中国互联网格局的人之一。 作为第一代互联网人,中国互联网行业最好的产品经理、创业者,他每时每刻都以自己的实践,为互联网的发展贡献自己的力量。 在很长一段时间内,他没有在公共场合发声,甚至有粉丝对当前死水一潭的互联网现状不满意,发出了“人民想念周鸿祎”的呼声。 但周鸿祎在小时候,却是一个踢天弄井,动不动就大闹天宫的超级......一起来看看 《颠覆者:周鸿祎自传》 这本书的介绍吧!

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

MD5 加密
MD5 加密

MD5 加密工具

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换