# grep -Ei "err|war" /var/log/mariadb/mariadb.log
WSREP_SST: [ERROR] Parent mysqld process (PID:379774) terminated unexpectedly. (20221009 11:01:54.254)
2022-10-09 11:01:59 0 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2022-10-09 11:02:00 1 [Warning] WSREP: Gap in state sequence. Need state transfer.
2022-10-09 11:02:00 1 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (ab886bd7-46d6-11ed-8a83-fe4004c311ab): 1 (Operation not permitted)
2022-10-09 11:02:01 0 [Warning] WSREP: 0.0 (node-1): State transfer to 1.0 (node-2) failed: -255 (Unknown error 255)
2022-10-09 11:02:01 0 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():780: Will never receive state. Need to abort.
谷歌百度搜遍了也无法解决
a,删除galera.cache、grastate.dat、gvwstate.dat文件 (无效)我甚至将galera相关配置及文件全部删除,重新创建或安装,都不行
b,修改mariadb.service的TimeoutSec,(无效)
c,wsrep_cluster_address配置的地址顺序等(无效),这方案看着就不太靠谱,死马当活马医了
d,防火墙,selinux等等,(无效)
还有一些奇葩方法,一点用都没
直到后来无意中在/var/log/message中看到一条关于rsync的报错
rsyncd[380389]: rsyncd version 3.1.3 starting, listening on port 4444
rsyncd[380409]: connect from node1 (192.168.0.1)
rsyncd[380409]: rsync to rsync_sst/ from node1 (192.168.0.1)
rsyncd[380409]: rsync: on remote machine: --sparse-block=1024: unknown option
rsyncd[380409]: rsync error: requested action not supported (code 4) at clientserver.c(971) [Receiver=3.1.3]
rsyncd[380389]: sent 0 bytes received 0 bytes total size 0
rsyncd[380605]: rsyncd version 3.1.3 starting, listening on port 4444
结合mariadb.log中rsync的日志
2022-10-09 3:27:39 2 [Warning] WSREP: Gap in state sequence. Need state transfer.
2022-10-09 3:27:39 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '192.168.0.2' --datadir '/var/lib/mysql/' --parent '3693645' --mysqld-args --basedir=/usr'
2022-10-09 3:27:40 2 [Note] WSREP: Prepared SST request: rsync|192.168.0.2:4444/rsync_sst
2022-10-09 3:27:40 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2022-10-09 3:27:40 2 [Note] WSREP: Assign initial position for certification: 237433, protocol version: 4 2022-10-09 3:27:40 0 [Note] WSREP: Service thread queue flushed. 2022-10-09 3:27:40 2 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (ab886bd7-46d6-11ed-8a83-fe4004c311ab): 1 (Operation not permitted) at galera/src/replicator_str.cpp:prepare_for_IST():467. IST will be unavailable.
怀疑是rsync有问题,可能版本太低,导致无法识别--sparse-block=1024这个选项,从而导致同步失败无法启动mariadb
于是顺手升级下rsync
# yum update rsync
再次启动mariadb
# systemctl restart mariadb
居然启动成功了,热泪盈眶啊
原版本:rsync-3.1.3-14.el8.2.x86_64
新版本:rsync-3.1.3-19.el8.x86_64
# rpm -qa |grep rsync
rsync-3.1.3-14.el8.2.x86_64
# rsync --help |grep sparse
-S, --sparse turn sequences of nulls into sparse blocks
# rpm -qa |grep rsync
rsync-3.1.3-19.el8.x86_64
# rsync --help |grep sparse
-S, --sparse turn sequences of nulls into sparse blocks
--sparse-block=SIZE set block size used to handle sparse files