Compression of PostgreSQL WAL Archives Becoming More Important

栏目: IT技术 · 发布时间: 4年前

内容简介:There were days when CPU and Memory was a limitation. More than a decade back, servers with 4 cores were “High End” and as a DBA, my biggest worry was managing the available resources. And for an old DBA like me, Oracle’s attempt to pool CPU and Memory fro

Compression of PostgreSQL WAL Archives Becoming More Important As hardware and software evolve, the bottlenecks in a database system also shift. Many old problems might disappear and new types of problems pop-up.

Old Limitations

There were days when CPU and Memory was a limitation. More than a decade back, servers with 4 cores were “High End” and as a DBA, my biggest worry was managing the available resources. And for an old DBA like me, Oracle’s attempt to pool CPU and Memory from multiple host machines for a single database using RAC architecture was a great attempt to solve it.

Then came the days of storage speed limitations. It was triggered by the emergence of multi-core with multi-thread processors becoming common, as well as memory size and bus speed increasing. Enterprises tried to solve it with sophisticated SAN drives, Specialized Storages with cache, etc. But it has remained for many years, even now as enterprises started increasingly shifting to NVMe drives.

Recently we started observing a new bottleneck which is becoming a pain point for many database users. As the capability of the single-host server increased, it started processing a huge number of transactions. There are systems that produce thousands of WAL files in a couple of minutes, and there were a few cases reported where WAL archiving to a cheaper, slower disk system was not able to catch up with WAL generation. To add more complexity, many organizations prefer to store WAL archives over a low bandwidth network. (There is an inherent problem in Postgres Archiving that if it lags behind, it tends to lag more because the archive process needs to search among .ready files. which won’t be discussed here.)

In this blog post, I would like to bring to your attention the fact that compressing WALs can be easily achieved if you are not already doing it, as well as a query to monitor the archiving gap.

Compressing PostgreSQL WALs

The demands and requirements for compressing WALs before archiving are increasing day by day. Luckily, most of the PostgreSQL backup tools like pgbackrest/wal-g etc already take care of it. The archive_command invokes these tools, silently archiving for users.

For example, in pg_backrest, we can specify archive_command, which uses the gzip behind the scene.

ALTER SYSTEM SET archive_command = 'pgbackrest --stanza=mystanza archive-push %p';

Or in WAL-G, we can specify:

ALTER SYSTEM SET archive_command = 'WALG_FILE_PREFIX=/path/to/archive /usr/local/bin/wal-g wal-push  %p';

This does the lz4 compression of WAL files.

But what if we are not using any specific backup tool for WAL compression for archiving? We can still compress the WALs using Linux tools like gzip or bzip, etc. Gzip will be available in most of the Linux installations by default, so configuring it will be an easy task.

alter system set archive_command = '/usr/bin/gzip -c %p > /home/postgres/archived/%f.gz';

However, 7za is the most interesting among all the compression options for WAL, which gives the highest compression as fast as possible, which is the major criterion in a system with high WAL generation. You may have to explicitly install the 7za, which is part of the 7zip package from an extra repo.

On CentOS 7 it is:

sudo yum install epel-release
sudo yum install p7zip

On Ubuntu it is:

sudo apt install p7zip-full

Now we should be able to specify the archive_command like this:

postgres=# alter system set archive_command = '7za a -bd -mx2 -bsp0 -bso0 /home/postgres/archived/%f.7z %p';
ALTER SYSTEM

In my test system, I could see archived WAL files of less than 200kb. Size can vary according to the content of the WALs, which depends on the type of transaction on the database.

-rw-------. 1 postgres postgres <strong>197K</strong> Feb  6 12:13 0000000100000000000000AA.7z
-rw-------. 1 postgres postgres <strong>197K</strong> Feb  6 12:13 0000000100000000000000AB.7z
-rw-------. 1 postgres postgres <strong>198K</strong> Feb  6 12:13 0000000100000000000000AC.7z
-rw-------. 1 postgres postgres <strong>196K</strong> Feb  6 12:13 0000000100000000000000AD.7z
-rw-------. 1 postgres postgres <strong>197K</strong> Feb  6 12:13 0000000100000000000000AE.7z

Compressing 16MB files to kilobyte rages is definitely going to save network bandwidth and storage while addressing the problem of archiving falling behind.

Restoring the WALs

Archiving and getting the highest compression is just one part, but we should also be able to restore them when required. The backup tools provide their own restore command options. For example, pgbackrest can use archive - get :

restore_command = 'pgbackrest --stanza=demo archive-get %f "%p"'

Wal-g provides wal - fetch for the same purpose.

In case you are opting for manual archive compression using gzip, we can use the gunzip utility in restore_command as follows:

gunzip -c /home/postgres/archived/%f.gz > %p

If you are already started using PostgreSQL 12, this parameter can be set using ALTER SYSTEM:

postgres=# alter system set restore_command = 'gunzip -c /home/postgres/archived/%f.gz > %p';
ALTER SYSTEM

OR

For 7za as shown above, you may use the following:

postgres=# alter system set restore_command = '7za x -so /home/postgres/archived/%f.7z > %p';
ALTER SYSTEM

However, unlike archive_command changes, restore_command changes require you to restart the standby database.

Monitoring Archive Progress

The current WAL archive is available from pg_stat_archiver status, but finding out the gap using the WAL file names is a bit tricky. A sample query which I used to find out the WAL archive lagging is this:

select pg_walfile_name(pg_current_wal_lsn()),last_archived_wal,last_failed_wal, 
  ('x'||substring(pg_walfile_name(pg_current_wal_lsn()),9,8))::bit(32)::int*256 + 
  ('x'||substring(pg_walfile_name(pg_current_wal_lsn()),17))::bit(32)::int  -
  ('x'||substring(last_archived_wal,9,8))::bit(32)::int*256 -
  ('x'||substring(last_archived_wal,17))::bit(32)::int
  as diff from pg_stat_archiver;

The caveat here is that both current WAL and the WAL to be archived are of the same timeline in order for this query to work, which is the common case. Very rarely we may encounter a different case than that in production. So this query could be of good help when monitoring the WAL archiving of a PostgreSQL server.

Learn more about the Percona Distribution for PostgreSQL .

Our white paper “Why Choose PostgreSQL?” looks at the features and benefits of PostgreSQL and presents some practical usage examples. We also examine how PostgreSQL can be useful for companies looking to migrate from Oracle.

Download PDF


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

GUI设计禁忌2.0

GUI设计禁忌2.0

Jeff Johnson / 盛海艳 等 / 机械工业出版社 / 2008 / 49.00元

本书描述软件开发人员在设计图形用户界面(GUI)时经常犯的“禁忌”,并提出避免这些错误的基本原则和理论依据。本书将GUI禁忌分为7种类型:GUI控件禁忌、导航禁忌、文字禁忌、图形设计和布局禁忌、交互禁忌、响应性禁忌以及管理禁忌,并分别进行详述。 本书编排独特,条理清晰,针对性极强,是不可多得的GUI设计优秀资源。本书适合软件开发人员、web站点设计人员、开发经理、用户界面设计人员等阅读。一起来看看 《GUI设计禁忌2.0》 这本书的介绍吧!

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具