mysql+drbd+heartbeat高可用配置说明
时间:2022-03-14 02:32
环境描述:
系统版本:Red Hat Enterprise Linux Server release 5.5 (Tikanga)x86_64 2.6.18-164.el5
mysql版本:mysql-5.1.49.tar.gz
drbd版本:drbd83-8.3.15-2.el5.centos.rpm
heartbeat版本:heartbeat.x86_64 0:2.1.3-3.el5.centos.rpm
主机名 | Eth0 | Eth1 | 备注 |
ln-master | 10.10.206.193 | 192.168.1.10 | 主 |
ln-slave | 10.10.206.194 | 192.168.1.11 | 备 |
VIP:10.10.206.211
前期工作(两台都要改):
修改主机名:
[root@ln-master ~]# cat /etc/sysconfig/network
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=ln-master
更改hosts文件(两台都要改)
[root@ln-master ~]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
10.10.206.193 ln-master
192.168.1.10 ln-master
10.10.206.194 ln-slave
192.168.1.11 ln-slave
确保ping主机名能通,例如ping ln-slave
优化开机启动项,开基础的四项即可
[root@ln-master ~]#
[root@ln-master ~]# for A in `chkconfig --list |grep 3:on|awk ‘{print $1}‘`;do chkconfig $A off;done
[root@ln-master ~]# for B in sshd crond syslog network;do chkconfig $B on;done
[root@ln-master ~]# chkconfig --list |grep 3:on
crond 0:off 1:off 2:on 3:on 4:on 5:on 6:off
network 0:off 1:off 2:on 3:on 4:on 5:on 6:off
sshd 0:off 1:off 2:on 3:on 4:on 5:on 6:off
syslog 0:off 1:off 2:on 3:on 4:on 5:on 6:off
重启
reboot
安装配置DRBD:
注意:事先配置好YUM源
7. 测试
mater操作:
[root@ln-master /]# cd /data/
[root@ln-master data]# ll
total 16
drwx------ 2 root root 16384 Aug 6 10:39 lost+found
[root@ln-master data]# touch test #创建test文件
[root@ln-master data]# ll
total 16
drwx------ 2 root root 16384 Aug 6 10:39 lost+found
-rw-r--r-- 1 root root 0 Aug 6 12:35 test
[root@ln-master data]# cd ..
[root@ln-master /]# umount /data/
[root@ln-master /]# df -h
文件系统 容量 已用可用已用% 挂载点
/dev/mapper/VolGroup00-LogVol00
45G 14G 30G 32% /
/dev/sda1 99M 24M 70M 26% /boot
tmpfs 3.9G 0 3.9G 0% /dev/shm
[root@ln-master /]# drbdadm secondary r0 #将自己切换成从
[root@ln-master /]# drbd-overview #检查
0:r0 Connected Secondary/Secondary UpToDate/UpToDate C r-----
slave操作:
[root@ln-slave /]# drbd-overview
0:r0 Connected Secondary/Secondary UpToDate/UpToDate C r-----
[root@ln-slave /]# drbdadm primary r0 #将自己提升为主
[root@ln-slave /]# drbd-overview
0:r0 Connected Primary/Secondary UpToDate/UpToDate C r-----
[root@ln-slave /]# mount /dev/drbd0 /data/
[root@ln-slave /]# cd /data/
[root@ln-slave data]# ll
total 16
drwx------ 2 root root 16384 Aug 6 2014 lost+found
-rw-r--r-- 1 root root 0 Aug 6 2014 test
以上如果在slave端看到了test文件,证明数据已经同步了。
注:将master 还原成主,再做下面的操作。
安装配置mysql(两台都要装)
安装mysql:
#创建mysql用户
useradd mysql -s /sbin/nologin –M
id mysql
#安装
tar zxf mysql-5.1.49.tar.gz
cd mysql-5.1.49
#配置
./configure --prefix=/usr/local/mysql-5.1.49 --with-unix-socket-path=/usr/local/mysql-5.1.49/tmp/mysql.sock --localstatedir=/data --enable-assembler --with-charset=utf8 --with-collation=utf8_general_ci --with-plugins=innobase --enable-thread-safe-client --with-mysqld-user=mysql --with-big-tables --without-debug --with-pthread --enable-assembler --with-extra-charsets=complex --with-readline --with-ssl --with-embedded-server --enable-local-infile --with-plugins=partition,innobase --with-mysqld-ldflags=-all-static --with-client-ldflags=-all-static
#拷贝启动脚本,配置文件
/bin/cp support-files/mysql.server /etc/init.d/mysqld
/bin/cp support-files/my-small.cnf /etc/my.cnf
chmod 700 /etc/init.d/mysqld
#做软连接
ln -s /usr/local/mysql-5.1.49/ /usr/local/mysql
#主库初始化
/application/tools/mysql-5.1.49/scripts/mysql_install_db --basedir=/usr/local/mysql --datadir=/data/mysql/data --user=mysql #备库不需要初始化
mysql程序路径:/usr/local/mysql
mysql data目录:/data/mysql/data
my.cnf配置文件(两边相同),根据机器配置优化,以下配置仅供参考
[root@ln-master mysql]# cat /etc/my.cnf
[client]
port = 3306
[mysqld]
basedir=/usr/local/mysql
datadir=/data/mysql/data
socket=/tmp/mysql.sock
sync_binlog = 0
binlog_format = ROW
skip-locking
skip-name-resolve
skip-host-cache
default-character-set=utf8
default-collation=utf8_general_ci
skip-character-set-client-handshake
max_allowed_packet = 16M
table_cache = 128
sort_buffer_size = 512K
net_buffer_length = 8K
read_buffer_size = 256K
read_rnd_buffer_size = 512K
myisam_sort_buffer_size = 2M
default-storage-engine=INNODB
log-bin=mysql-bin
max_connections=5000
max_connect_errors=100000
log_slow_queries=slow.log
long_query_time=2
log_queries_not_using_indexes=0
#skip-federated
server-id= 10
table_lock_wait_timeout=180
innodb_lock_wait_timeout=180
innodb_data_file_path = ibdata1:1000M:autoextend
#innodb_buffer_pool_size = 1G
innodb_additional_mem_pool_size = 8M
innodb_log_file_size = 100M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 2
innodb_thread_concurrency=0
transaction-isolation=READ-COMMITTED
innodb_doublewrite=1
innodb_flush_method=O_DIRECT
[mysqldump]
quick
max_allowed_packet = 16M
[mysql]
no-auto-rehash
[isamchk]
key_buffer = 20M
sort_buffer_size = 20M
read_buffer = 2M
write_buffer = 2M
[myisamchk]
key_buffer = 20M
sort_buffer_size = 20M
read_buffer = 2M
write_buffer = 2M
[mysqlhotcopy]
interactive-timeout
安装配置heartbeat
1.添加主机路由
master:
route add -host 192.168.1.11 dev eth1
slave:
route add -host 192.168.1.10 dev eth1
添加到/etc/rc.local.
2.安装heartbeat
yum install -y heartbeat
3.配置heartbeat
heartbeat会有三个配置文件ha.cf,haresources,authkeys(本实验主备节点三个配置文件必须相同)
ha.cf主配置文件:
[root@ln-master ha.d]# cat ha.cf
logfile /var/log/ha-log #日志名字及存放位置
keepalive 2 #设定心跳检测时间2秒
deadtime 15 #死亡时间15秒,备用节点15秒没有检测到主节点心跳,确认对方故障
warntime 10 #警告次数
initdead 30 #守护进程启动30s后启动服务资源
udpport 694 #使用ucast或bcast的udp通讯端口,默认694
bcast eth1 #广播通讯接口,ucast得指定ip地址
auto_failback off #当主节点切换到备份节点后,主节点又恢复正常,此处定义不进行回切操作,因为回切一次,NFS和mysql等成本很高
#watchdog /dev/watchdog
node ln-master
node ln-slave
ping 10.10.206.1
authkeys验证文件:
[root@ln-master ha.d]# cat authkeys
auth 1 #使用crc验证方式,这种方式不需要秘钥,因此性能比较好,还有其他两种安全依次增高,性能依次降低
1 crc
[root@ln-master ha.d]# ll authkeys #必须为600权限
-rw------- 1 root root 31 08-12 17:49 authkeys
haresources配置文件:
[root@ln-master ha.d]# cat haresources
ln-master IPaddr::10.10.206.211 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext3 mysqld
说明:
IPaddr::10.10.206.211 #VIP
drbddisk::r0 #启动drbd r0资源,相当于执行/etc/ha.d/resource.d/drbddisk r0 stop/start操作
Filesystem::/dev/drbd0::/data::ext3 #drbd分区挂载到/data目录,相当于执行/etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext3 stop/start <==相当于系统中执行mount /dev/drbd0 /data
mysqld #启动mysql服务脚本,相当于/etc/init.d/mysqld stop/start
4.启动heartbeat(两边都要启动)
[root@ln-master ~]# /etc/init.d/heartbeat start
Starting High-Availability services:
2014/08/13_10:25:01 INFO: Resource is stopped
[确定]
[root@ln-master ~]# ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:50:56:97:39:7c brd ff:ff:ff:ff:ff:ff
inet 10.10.206.193/24 brd 10.10.206.255 scope global eth0
inet 10.10.206.211/24 brd 10.10.206.255 scope global secondary eth0:0 #VIP也起来了
inet6 fe80::250:56ff:fe97:397c/64 scope link
valid_lft forever preferred_lft forever
测试(启动heartbeat后)
master:
drbd自动成为了主,mysql也启动了。
root@ln-master ~]# ip add|grep "10.10"
inet 10.10.206.193/24 brd 10.10.206.255 scope global eth0
inet 10.10.206.211/24 brd 10.10.206.255 scope global secondary eth0:0
[root@ln-master ~]# drbd-overview
0:r0 Connected Primary/Secondary UpToDate/UpToDate C r----- /data ext3 30G 1.4G 27G 5%
[root@ln-master ~]# lsof -i :3306
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
mysqld 13275 mysql 22u IPv4 28732 TCP *:mysql (LISTEN)
slave:
root@ln-slave ha.d]# ip add |grep "10.10"
inet 10.10.206.194/24 brd 10.10.206.255 scope global eth0 #没有VIP
[root@ln-slave ha.d]# drbd-overview
0:r0 Connected Secondary/Primary UpToDate/UpToDate C r----- #DRBD为从
[root@ln-slave ha.d]# lsof -i :3306 #mysql也没启动
[root@ln-slave ha.d]#
模拟宕机故障,停掉master的heartbeat服务
master:
root@ln-master ~]# /etc/init.d/heartbeat stop
Stopping High-Availability services:
[确定]
[root@ln-master ~]#
[root@ln-master ~]# ip add|grep "10.10" #vip没有了
inet 10.10.206.193/24 brd 10.10.206.255 scope global eth0
[root@ln-master ~]# drbd-overview #drbd自动切换为从
0:r0 Connected Secondary/Primary UpToDate/UpToDate C r-----
[root@ln-master ~]# lsof -i :3306 #数据库也停了
[root@ln-master ~]#
slave:
root@ln-slave ha.d]# ip add |grep "10.10" #slave自动接管了
inet 10.10.206.194/24 brd 10.10.206.255 scope global eth0
inet 10.10.206.211/24 brd 10.10.206.255 scope global secondary eth0:0
[root@ln-slave ha.d]# drbd-overview
0:r0 Connected Primary/Secondary UpToDate/UpToDate C r----- /data ext3 30G 1.4G 27G 5%
[root@ln-slave ha.d]# lsof -i :3306
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
mysqld 17556 mysql 22u IPv4 278001 TCP *:mysql (LISTEN)
如果master恢复了,master自动变成了备机,只有当对端再挂的时候,master才会接管过来。
高可用脑裂问题及解决方案
(1)、导致裂脑发生的原因
1、高可用服务器之间心跳链路故障,导致无法相互检查心跳
2、高可用服务器上开启了防火墙,阻挡了心跳检测
3、高可用服务器上网卡地址等信息配置不正常,导致发送心跳失败
4、其他服务配置不当等原因,如心跳方式不同,心跳广播冲突,软件BUG等
(2)、防止裂脑一些方案
1、加冗余线路
2、检测到裂脑时,强行关闭心跳检测(远程关闭主节点,控制电源的电路fence)
3、做好脑裂的监控报警
4、报警后,备节点在接管时设置比较长的时间去接管,给运维人员足够的时间去处理(人为处理)
5、启动磁盘锁,正在服务的一方锁住磁盘,裂脑发生时,让对方完全抢不走"共享磁盘资源
磁盘锁存在的问题:
使用锁磁盘会有死锁的问题,如果占用共享磁盘的一方不主动"解锁"另一方就永远得不到共享磁盘,假如服务器节点突然死机或崩溃,就不可能执行解锁命令,备节点也就无法接管资源和服务了,有人在HA中设计了智能锁,正在提供服务的一方只在发现心跳全部断开时才会启用磁盘锁,平时就不上锁
本文出自 “” 博客,谢绝转载!