0%

安装acme.sh

1
2
$ curl https://get.acme.sh sh
$ source ~/.bashrc

会自动添加cron任务

创建api token

acme.sh支持name.com,访问https://www.name.com/account/settings/api,随意设置token昵称acme_sh_dns,生成token
编辑~/.acme.sh/account.conf添加如下两行:

1
2
export Namecom_Username="your_name_com_username"
export Namecom_Token="*********"

Namecom_Username指定你在name.com的用户名而不是token name

申请证书

1
$ acme.sh --issue --dns dns_namecom -d g.openwares.net

可以会有提示

1
g.openwares.net:Verify error:DNS problem: SERVFAIL looking up CAA for openwares.net - the domain's nameservers may be malfunctioning

忽略即可

安装证书

1
2
3
4
$ acme.sh --install-cert -d g.openwares.net \\
--cert-file /path/to/certfile/cert.pem \\
--key-file /path/to/keyfile/key.pem \\
--fullchain-file /path/to/fullchain/certfile/fullchain.pem

自动更新
acme.sh自动安装了crontab入口,acme.sh会自动记录下申请证书和安装证书的命令,所以会在设定的周期内自动更新证书。

使用alacarte删除了几个菜单项后,再也打不开gnome main menu了,终端下运行alacarte出现错误:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ alacarte
usr/share/alacarte/Alacarte/MainWindow.py:22: PyGIWarning: GMenu was imported without specifying a version first. Use gi.require_version('GMenu', '3.0') before import to ensure that the right version gets loaded.
from gi.repository import Gtk, GdkPixbuf, Gdk, GMenu

(alacarte:1718): Gtk-CRITICAL **: 16:54:24.699: gtk_accel_label_set_accel_closure: assertion 'gtk_accel_group_from_accel_closure (accel_closure) != NULL' failed

(alacarte:1718): Gtk-CRITICAL **: 16:54:24.699: gtk_accel_label_set_accel_closure: assertion 'gtk_accel_group_from_accel_closure (accel_closure) != NULL' failed
Traceback (most recent call last):
File "/usr/bin/alacarte", line 26, in <module>
main()
File "/usr/share/alacarte/Alacarte/MainWindow.py", line 464, in main
app.setMenuBasename(basename)
File "/usr/share/alacarte/Alacarte/MainWindow.py", line 62, in setMenuBasename
self.editor = MenuEditor(menu_basename)
File "/usr/share/alacarte/Alacarte/MenuEditor.py", line 36, in __init__
self.load()
File "/usr/share/alacarte/Alacarte/MenuEditor.py", line 49, in load
if not self.tree.load_sync():
gi.repository.GLib.Error: g-markup-error-quark: Error on line 1 char 1: Document was empty or contained only whitespace (1)

发现~/.config/menus目录下多了一个空白文件gnome-applications.menu,将其删除问题解决。

一、判断主备角色
有几个方法可以判断:
1、查看wal进程

1
$ ps aux grep wal

如果进程名有”postgres: 11/main: walwriter”字样,则为主库,walwriter为wal发送方。
如果进程名有”postgres: 11/main: walreceiver streaming 2A/ACAA1088”字样,则为备库,walreceiver为wal接收方。
2、pg_is_in_recovery函数

1
2
3
4
5
6
$ sudo -u postgres psql
postgres=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
f
(1 row)

显示f说明是主库,显示t说明为备库。
3、查看数据库控制信息

1
2
3
4
5
6
7
8
9
$ sudo -u postgres /usr/lib/postgresql/11/bin/pg_controldata -D /var/lib/postgresql/11/main/
g_control version number: 1100
Catalog version number: 201809051
Database system identifier: 6745356148899875633
Database cluster state: in production
pg_control last modified: Thu 09 Jul 2020 02:40:49 PM CST
Latest checkpoint location: 825/245660D8
Latest checkpoint's REDO location: 825/245660A0
...

Database cluster state这行为in production说明位主库,为in archive recovery说明为备库。
4、通过recovery.conf 文件判断
一般的,备库才有recovery.conf,主库一般没有或是改名为recovery.done

二、主备切换
1、使用trigger文件切换

a. 在备库启动时在 recovery.conf 文件中加入一个触发文件的路径(新加则需要重启备库)

1
trigger_file='/var/lib/postgresql/11/main/.postgresql.trigger'

b. 关闭主库:

1
$ sudo systemctl stop postgresql@11-main.service

或者
先查看postgresql集群信息

1
2
3
$ pg_lsclusters
Ver Cluster Port Status Owner Data directory Log file
11 main 5432 online postgres /var/lib/postgresql/11/main /var/log/postgresql/postgresql-11-main.log

然后执行:

1
$ sudo pg_ctlcluster 11 main stop

c.在备库上创建trigger文件

1
$ sudo touch /var/lib/postgresql/11/main/.postgresql.trigger

可以看到备库上的recovery文件已经成为done了,此时备库已经被激活为主库,可以直接做读写操作了
在新主库上创建复制槽

d. 原主库搭建为新备库
准备recovery.conf文件,primary_conninfo指向新主库,使用合适的复制槽,然后重新启动数据库即可。

1
2
3
4
recovery_target_timeline='latest'
standby_mode = 'on'
primary_conninfo = 'host=59.206.31.149 port=5432 user=xxxx password=xxxx'
primary_slot_name = 'repl_slot_2'

这里必须添加recovery_target_timeline=’latest’,因为主备切换时timeline更新了。
注意pg_hba_conf中对replication的权限设定

2、使用pg_ctlcluster命令切换
a. 停止应用程序,关闭主库
b. 备库提升为主库
备库端执行:

1
$ sudo pg_ctlcluster 11 main promote

c. 老主库上配置recovery.conf文件,启动原主库为新的备库

References:
[1]26.3. Failover
[2]PostgreSQL 流复制的主备切换
[3]PostgreSQL Switchover vs. Failover
[4]PostgreSQL主备切换

NFS通常情况下会使用动态端口,对于防火墙配置很不友好。
可以设置使用固定的几个端口。
修改以下配置文件:
/etc/default/nfs-common:

1
STATDOPTS="--port 3000 --outgoing-port 3001"

/etc/default/nfs-kernel-server:

1
RPCMOUNTDOPTS="--manage-gids --port 3002"

新添加配置文件:
/etc/sysctl.d/nfs-static-ports.conf:

1
2
fs.nfs.nlm_tcpport = 3003
fs.nfs.nlm_udpport = 3003

然后:

1
2
3
$ sudo sysctl -p /etc/sysctl.d/nfs-static-ports.conf
$ sudo systemctl restart nfs-utils.service
$ sudo systemctl restart nfs-kernel-server.service

然后打通防火墙的TCP和UDP端口:111,2049,3000-3003就可以了。

如果出现错误:

1
mount.nfs: access denied by server while mounting ...

可以检查/etc/exports设置的访问网段是否正确,如果通过防火墙NAT方式访问,端口号会大约1024,需要添加insecure访问选项,比如(insecure,rw)

修改/etc/exports后,可以使用

1
$ sudo exportfs -a

重新导出文件系统

References:
[1]SecuringNFS
[2]Setting Up iptables for NFS on Ubuntu

1、danted
安装

1
$ sudo apt install dante-server

配置/etc/danted.conf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
logoutput: syslog /var/log/sockd.log stdout
internal: br0 port = 1080
external: 10.100.0.32
clientmethod: none
socksmethod: none
user.privileged: proxy
user.unprivileged: nobody
user.libwrap: nobody
client pass {
from: 0.0.0.0/0 port 1-65535 to: 0.0.0.0/0
}
socks pass {
from: 0.0.0.0/8 to: 0.0.0.0/0
command: bind connect udpassociate
log: error
}

References:
[1]

sshfs可以通过ssh和sftp协议来安全的在本地挂载远程文件系统,比NFS更方便的是无需更改防火墙设置,只要能使用ssh访问远程主机就可以了。

sshfs使用fuse在用户空间挂载远程文件系统,debian系统直接安装sshfs包,为了方便挂载,最好配置使用公私钥对来访问远程ssh主机,特别是fstab文件不支持ssh的密码访问方式。

挂载远程文件系统:

1
$ sshfs user@host:/mnt/data/reis_dump/ /mnt/hwy06_reisdb_bak/ -o reconnect

也支持直接使用ssh别名

1
$ sshfs hwy-reisdb-3:/mnt/data/reis_dump/ /mnt/hwy06_reisdb_bak/ -o reconnect

具体的sshfs选项参见man

注意:
ssh长时间连接会超时,导致出现类似错误提示:

1
client_loop: send disconnect: Broken pipe

可以在ssh服务器/etc/ssh/sshd_config中打开客户端心跳探测:

1
2
ClientAliveInterval 30
ClientAliveCountMax 3

30秒发送一个心跳探测,超过3次没有回应断开连接。

References:
[1]SSHFS: Mounting a remote file system over SSH
[2]SSHFS

客户端连接失败,提示ORA-19815,alert.log有以下提示:
Errors in file /u01/app/oracle/admin/orcl/bdump/orcl_arc0_2734.trc:
ORA-19815: WARNING: db_recovery_file_dest_size of 2147483648 bytes is 100.00% used, and has 0 remaining bytes available.
Wed Jun 24 09:02:21 2020


You have following choices to free up space from flash recovery area:

  1. Consider changing RMAN RETENTION POLICY. If you are using Data Guard,
    then consider changing RMAN ARCHIVELOG DELETION POLICY.
  2. Back up files to tertiary device such as tape using RMAN
    BACKUP RECOVERY AREA command.
  3. Add disk space and increase db_recovery_file_dest_size parameter to
    reflect the new space.
  4. Delete unnecessary files using RMAN DELETE command. If an operating
    system command was used to delete files, then use RMAN CROSSCHECK and
    DELETE EXPIRED commands.

查看磁盘空间还有很多剩余空间,是因为默认的归档目标设置为USE_DB_RECOVERY_FILE_DEST,并且flash_recovery_area的最大尺寸设置为了2GB(db_recovery_file_dest_size= 2147483648),可以有多种方法来解决此问题,通过设置RMAN归档保持策略来自动删除过期的日志文件。也可以使用RMAN DELETE来删除日志文件。如果使用操作系统命令直接删除归档文件,并不能真正释放空间,还需要执行:

1
2
3
$ rman target /
rman> crosscheck archivelog all;
rman> delete expired archivelog all;