postgresql 主备及切换-恢复方案

内容简介：前期的gitlab 已经开始推广测试，最近对postgresql 做了主备，这里说下方案及在实施过程中遇到的坑。postgresql 的具安装不在此介绍。涉及修改的配置文件有：

前言

前期的gitlab 已经开始推广测试，最近对postgresql 做了主备，这里说下方案及在实施过程中遇到的坑。

postgresql 的具安装不在此介绍。

基础信息

primary_ip： 192.168.10.2，
    standby_ip： 192.168.10.3，
    PGDATA： /opt/gitlab/postgresql/data，
    postgresql_version：(PostgreSQL) 9.6.8，
    PGCONF_DIR: $PGDATA，

涉及修改的配置文件有：

postgresql.conf --------- postgresql 主配置文件
pg_hba.conf ------------- postgresql 访问规则文件
recovery.conf ----------- postgresql 备库访问主库配置文件

注意事项！

1. 主备postgresql 版本需保持一致！
    2. postgresql.conf 配置文件需保持一致！
    3. 备库提权为主库后，切记不要直接启动原主库！

准备操作

在primary 192.168.10.2 主机操作

1.为备库准备主库，修改配置文件

cat postgresql.conf

wal_level = hot_standby         # minimal, replica, or logical
    max_wal_senders = 2     # max number of walsender processes
    hot_standby = on            # "on" allows queries during recovery
    max_connections = 300           # (change requires restart)
    archive_mode = on
    restore_command = ''

cat pg_hba.conf

host    all             all             127.0.0.1/32            trust
    host    all             all             ::1/128                     trust
    host    replication     gitlab_replicator    192.168.10.3/32    trust

cat recovery.done

restore_command = ''
    recovery_target_timeline = 'latest'
    standby_mode = on
    primary_conninfo = 'host=192.168.10.3 port=5432 user=gitlab_replicator'

2.创建用于复制的帐号，并赋予replication 权限

postgres=#CREATE USER gitlab_replicator REPLICATION LOGIN;

3.基本备份为备库准备引导数据

postgres=#SELECT pg_start_backup(back_20180929);
    cd  /opt/gitlab/postgresql && tar zcf base_data.tar.gz data
    postgres=#SELECT pg_start_stop();

在 standby 192.168.10.3 主机操作

1.解压基本数据

将主库上创建的 base_data.tar.gz 上传到备库主机，并解压到数据目录

tar zxf base_data.tar.gz -C /opt/gitlab/postgresql/

2.修改配置文件

注：postgresql.conf 文件内此部分一定要与主库的配置保持一致，否则可能会在主从切换恢复时产生错误

cat postgresql.conf

wal_level = hot_standby         # minimal, replica, or logical
    max_wal_senders = 2     # max number of walsender processes
    hot_standby = on            # "on" allows queries during recovery
    max_connections = 300           # (change requires restart)
    archive_mode = on
    restore_command = ''

cat pg_hba.conf

host    all             all             127.0.0.1/32            trust
    host    all             all             ::1/128                     trust
    host    replication     gitlab_replicator    192.168.10.2/32    trust

cat recovery.conf

restore_command = ''
    recovery_target_timeline = 'latest'
    standby_mode = on
    primary_conninfo = 'host=192.168.10.2 port=5432 user=gitlab_replicator'

3.启动备库，在主库执行sql，并在备库验证

主从切换

主备库的判断是根据当前是否存在recovery.conf文件

在将备库提升为主库时，会自动重命名recovery.conf文件为recovery.done。同时要将主库降为备库，降备方式为重命名recovery.done文件

mv recover.done recovery.conf

这样在处理完主库故障后，才会将提升到主库的更新数据同步过来

这里提供个简单的思路及脚本，前提是假设主备之间不存在网络故障,且不存在同时为主或备的情况

判断主库的状态

1.为 shut down 时

判断备库是否为 in archive recovery 并执行将主库降为备库，将备库升为主库，其余状态发送报警

2.为 in production 时

判断备库是否为 in archive recovery ，其余状态发送报警

3.为 in archive recovery 时

判断备库是否为 in production ，其余状态发送报警

4.为 shut down in recovery 时

发送报警

shell script

#!/bin/bash
    PRIMARY_IP="192.168.10.2"
    STANDBY_IP="192.168.10.3"
    PGDATA="/DATA/postgresql/data"
    SYS_USER="root"
    PG_USER="postgresql"
    PGPREFIX="/opt/pgsql"

    pg_status()
    {
            ssh ${SYS_USER}@$1 /
            "su - ${PG_USER} -c '${PGPREFIX}/bin/pg_controldata -D ${PGDATA} /
            | grep cluster' | awk -F : '{print \$2}' | sed 's/^[ \t]*\|[ \t]*$//'"
    }

    # recover to primary
    recovery_primary()
    {
            ssh ${SYS_USER}@$1 /
            "su - ${PG_USER} -c '${PGPREFIX}/bin/pg_ctl promote -D ${PGDATA}'"
    }

    # primary to recovery
    primary_recovery()
    {
            ssh ${SYS_USER}@$1 /
            "su - ${PG_USER} -c 'cd ${PGDATA} && mv recovery.done recovery.conf'"
    }

    send_mail()
    {
            echo "send SNS"
    }

    case "`pg_status ${PRIMARY_IP}`" in
            "shut down")
                    case "`pg_status ${STANDBY_IP}`" in
                            "in archive recovery")
                                    primary_recovery ${PRIMARY_IP}
                                    recovery_primary ${STANDBY_IP}
                                    ;;
                            "shut down in recovery"|"in production")
                                    send_mail
                                    ;;
                    esac
                    ;;
            "in production")
                    case "`pg_status ${STANDBY_IP}`" in
                            "shut down in recovery"|"shut down"|"in production")
                                    send_mail
                                    ;;
                    esac
                    echo "primary"
                    ;;
            "in archive recovery")
                    case "`pg_status ${STANDBY_IP}`" in
                            "shut down")
                                    primary_recovery ${STANDBY_IP}
                                    recovery_primary ${PRIMARY_IP}
                                    ;;
                            "shut down in recovery"|"in archive recovery")
                                    send_mail
                                    ;;
                    esac
                    echo "recovery"
                    ;;
            "shut down in recovery")
                    case "`pg_status ${STANDBY_IP}`" in
                            "shut down in recovery"|"shut down"|"in archive recovery")
                                    send_mail
                                    ;;
                    esac
                    echo "recovery down"
                    ;;
    esac

报错处理

error 1

FATAL:  no pg_hba.conf entry for replication connection from host "192.168.1.2", user "standby", SSL off

需要将用户加入到 192.168.1.2 的 pg_hba.conf 文件内，并配置好认证方式及口令

error 2

FATAL:  database system identifier differs between the primary and standby
DETAIL:  The primary's identifier is 6589099331306617531, the standby's identifier is 6605061381709180314

这是因为在将备库提升为主库后，将原先的主库恢复为主库时没有完全将缺少的数据同步过来导致的

error 3

FATAL:  number of requested standby connections exceeds max_wal_senders (currently 0)

FATAL:  hot standby is not possible because max_connections = 100 is a lower setting than on the master server (its value was 200)

FATAL:  hot standby is not possible because max_locks_per_transaction = 64 is a lower setting than on the master server (its value was 128)

这是因为备库的数量超过主库配置的允许备库最大连接数量了

这里配置的为0

此问提出现在将备库升为主库后，将原主库降为备库同步数据时，因此需要注意这部分的配置主备要一致

后记

postgresql 主主同步需要使用三方中间件实现，有需要的可查询相关资料

本文参考资料为postgresql 官方文档

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

实战移动互联网营销

江礼坤 / 机械工业出版社 / 2016-1 / 79.00

移动互联网的兴起，又为企业带来了新的挑战与机遇！越来越多的人，看到了移动互联网的价值与前景，但是在具体操作时，移动互联网具体如何玩？企业如何向移动互联网转型？如何通过移动互联网做营销？等等一系列问题，接踵而至。虽然目前相关的资料和文章很多，但是都过于零散，让人看完后，还是无从下手。而本书旨在成为移动互联网营销领域标准的工具书、参考书，为大家呈现一个系统、全面的移动互联网营销体系。让大家从思维模式到......一起来看看《实战移动互联网营销》这本书的介绍吧!

码农工具

JS 压缩/解压工具

在线压缩/解压 JS 代码