Redis 哨兵(Sentinel)

1、Redis 集群介绍

主从架构和MySQL的主从复制一样,无法实现master和slave角色的自动切换,即当master出现故障时,不能实现自动的将一个slave 节点提升为新的master节点,即主从复制无法实现自动的故障转移功能,如果想实现转移,则需要手动修改配置,才能将 slave 服务器提升新的master节点。此外只有一个主节点支持写操作,所以业务量很大时会导致Redis服务性能达到瓶颈。

需要解决的主从复制的存在以下弊端:

  • master和slave角色的自动切换,且不能影响业务
  • 提升Redis服务整体性能,支持更高并发访问

2、哨兵Sentinel工作原理

哨兵Sentinel从Redis2.6版本开始引用,Redis 2.8版本之后稳定可用。生产环境如果要使用此功能建议
使用Redis的2.8版本以上版本

2.1、Sentinel 架构和故障转移

图片[1]-Redis 哨兵(Sentinel)-李佳程的个人主页
图片[2]-Redis 哨兵(Sentinel)-李佳程的个人主页
图片[3]-Redis 哨兵(Sentinel)-李佳程的个人主页

专门的Sentinel 服务进程是用于监控redis集群中Master工作的状态,当Master主服务器发生故障的时候,可以实现Master和Slave的角色的自动切换,从而实现系统的高可用性
Sentinel是一个分布式系统,即需要在多个节点上各自同时运行一个sentinel进程,Sentienl 进程通过流言协议(gossip protocols)来接收关于Master是否下线状态,并使用投票协议(Agreement Protocols)来决定是否执行自动故障转移,并选择合适的Slave作为新的Master
每个Sentinel进程会向其它Sentinel、Master、Slave定时发送消息,来确认对方是否存活,如果发现某个节点在指定配置时间内未得到响应,则会认为此节点已离线,即为主观宕机Subjective Down,简称为 SDOWN
如果哨兵集群中的多数Sentinel进程认为Master存在SDOWN,共同利用 is-master-down-by-addr 命令互相通知后,则认为客观宕机Objectively Down, 简称 ODOWN
接下来利用投票算法,从所有slave节点中,选一台合适的slave将之提升为新Master节点,然后自动修改其它slave相关配置,指向新的master节点,最终实现故障转移failover
Redis Sentinel中的Sentinel节点个数应该为大于等于3且最好为奇数
客户端初始化时连接的是Sentinel节点集合,不再是具体的Redis节点,即 Sentinel只是配置中心不是代理。
Redis Sentinel 节点与普通 Redis 没有区别,要实现读写分离依赖于客户端程序
Sentinel 机制类似于MySQL中的MHA功能,只解决master和slave角色的自动故障转移问题,但单个Master 的性能瓶颈问题并没有解决
Redis 3.0 之前版本中,生产环境一般使用哨兵模式较多,Redis 3.0后推出Redis cluster功能,可以支持更大规模的高并发环境

2.2、Sentinel中的三个定时任务

  • 每10 秒每个sentinel 对master和slave执行info
    • 发现slave节点
    • 确认主从关系
  • 每2秒每个sentinel通过master节点的channel交换信息(pub/sub)
    • 通过sentinel__:hello频道交互
    • 交互对节点的“看法”和自身信息
  • 每1秒每个sentinel对其他sentinel和redis执行ping

3、实现哨兵架构

图片[4]-Redis 哨兵(Sentinel)-李佳程的个人主页

3.1、哨兵需要先实现主从复制

哨兵的前提是已经实现了Redis的主从复制

注意:master 的配置文件中masterauth 和slave 都必须相同

# 准备主从环境配置

# 在所有主从节点执行
[root@redis01 ~]# vim /apps/redis/etc/redis.conf
bind 0.0.0.0
masterauth "123456"
requirepass "123456"

# 在所有从节点执行
[root@redis02 ~]# echo "replicaof 192.168.1.41 6379" >> /apps/redis/etc/redis.conf
[root@redis02 ~]# systemctl restart redis
# master 服务器状态
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=192.168.1.42,port=6379,state=online,offset=0,lag=0
slave1:ip=192.168.1.43,port=6379,state=online,offset=0,lag=0
master_failover_state:no-failover
master_replid:991ca2ded657c0e742b8cfe5dec5734e452bbf9f
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:0

3.2、编辑哨兵配置

Sentinel实际上是一个特殊的redis服务器,有些redis指令支持,但很多指令并不支持。默认监听在
26379/tcp端口。
哨兵服务可以和Redis服务器分开部署在不同主机,但为了节约成本一般会部署在一起。

dir "/tmp"  #工作目录

sentinel monitor mymaster 10.0.0.8 6379 2
# mymaster是集群的名称,此行指定当前mymaster集群中master服务器的地址和端口
# 2为法定人数限制(quorum),即有几个sentinel认为master down了就进行故障转移,一般此值是所有sentinel节点(一般总数是>=3的 奇数,如:3,5,7等)的一半以上的整数值,比如,总数是3,即3/2=1.5,取整为2,是master的ODOWN客观下线的依据

sentinel auth-pass mymaster 123456
# mymaster集群中master的密码,注意此行要在上面行的下面

sentinel down-after-milliseconds mymaster 30000
# 判断mymaster集群中所有节点的主观下线(SDOWN)的时间,单位:毫秒,建议3000

sentinel parallel-syncs mymaster 1
# 发生故障转移后,可以同时向新master同步数据的slave的数量,数字越小总同步时间越长,但可以减轻新master的负载压力

sentinel failover-timeout mymaster 180000
# 所有slaves指向新的master所需的超时时间,单位:毫秒

sentinel deny-scripts-reconfig yes
# 禁止修改脚本

3.3、启动哨兵服务

[root@redis01 ~]# vim /apps/redis/etc/sentinel.conf
bind 0.0.0.0
port 26379
daemonize yes
pidfile "redis-sentinel.pid"
Logfile "sentinel_26379.log"
dir "/apps/redis/data"
sentinel monitor mymaster 192.168.1.41 6379 2
sentinel auth-pass mymaster 123456
sentinel down-after-milliseconds mymaster 15000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
sentinel deny-scripts-reconfig yes

[root@redis02 ~]# vim /apps/redis/etc/sentinel.conf

[root@redis03 ~]# vim /apps/redis/etc/sentinel.conf
# 启动sentinel
[root@redis01 ~]# redis-sentinel /apps/redis/etc/sentinel.conf
# 使用service文件启动

3.4、验证哨兵服务

[root@redis01 ~]# ss -ntl
State       Recv-Q Send-Q                    Local Address:Port                                   Peer Address:Port
LISTEN      0      511                                   *:26379                                             *:*
LISTEN      0      511                                   *:6379                                              *:*
LISTEN      0      128                                   *:111                                               *:*
LISTEN      0      128                                   *:22                                                *:*
LISTEN      0      100                           127.0.0.1:25                                                *:*
LISTEN      0      511                               [::1]:6379                                           [::]:*
LISTEN      0      128                                [::]:111                                            [::]:*
LISTEN      0      128                                [::]:22                                             [::]:*
LISTEN      0      100                               [::1]:25                                             [::]:*
[root@redis01 ~]# tail -f /apps/redis/data/sentinel_26379.log
1602:X 27 Nov 2022 19:52:43.664 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1602:X 27 Nov 2022 19:52:43.664 # Redis version=6.2.5, bits=64, commit=00000000, modified=0, pid=1602, just started
1602:X 27 Nov 2022 19:52:43.664 # Configuration loaded
1602:X 27 Nov 2022 19:52:43.664 * Increased maximum number of open files to 10032 (it was originally set to 1024).
1602:X 27 Nov 2022 19:52:43.664 * monotonic clock: POSIX clock_gettime
1602:X 27 Nov 2022 19:52:43.665 * Running mode=sentinel, port=26379.
1602:X 27 Nov 2022 19:52:43.665 # Sentinel ID is 2493f625d0e25f70bdf55056cdb985974b41ff67
1602:X 27 Nov 2022 19:52:43.665 # +monitor master mymaster 192.168.1.41 6379 quorum 2
1602:X 27 Nov 2022 19:53:48.991 * +sentinel sentinel 1ed8aba7d18f3c61c2e234e418883d62c56806bf 192.168.1.42 26379 @ mymaster 192.168.1.41 6379
1602:X 27 Nov 2022 19:53:51.529 * +sentinel sentinel b4f9aeb446e6825124385e8ff7dff1d5267762b1 192.168.1.43 26379 @ mymaster 192.168.1.41 6379
[root@redis02 ~]# tail -15f /apps/redis/data/sentinel_26379.log
1590:X 27 Nov 2022 19:53:46.965 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1590:X 27 Nov 2022 19:53:46.965 # Redis version=6.2.5, bits=64, commit=00000000, modified=0, pid=1590, just started
1590:X 27 Nov 2022 19:53:46.965 # Configuration loaded
1590:X 27 Nov 2022 19:53:46.965 * Increased maximum number of open files to 10032 (it was originally set to 1024).
1590:X 27 Nov 2022 19:53:46.965 * monotonic clock: POSIX clock_gettime
1590:X 27 Nov 2022 19:53:46.966 * Running mode=sentinel, port=26379.
1590:X 27 Nov 2022 19:53:46.967 # Sentinel ID is 1ed8aba7d18f3c61c2e234e418883d62c56806bf
1590:X 27 Nov 2022 19:53:46.967 # +monitor master mymaster 192.168.1.41 6379 quorum 2
1590:X 27 Nov 2022 19:53:46.968 * +slave slave 192.168.1.42:6379 192.168.1.42 6379 @ mymaster 192.168.1.41 6379
1590:X 27 Nov 2022 19:53:46.968 * +slave slave 192.168.1.43:6379 192.168.1.43 6379 @ mymaster 192.168.1.41 6379
1590:X 27 Nov 2022 19:53:47.008 * +sentinel sentinel 2493f625d0e25f70bdf55056cdb985974b41ff67 192.168.1.41 26379 @ mymaster 192.168.1.41 6379
1590:X 27 Nov 2022 19:53:51.529 * +sentinel sentinel b4f9aeb446e6825124385e8ff7dff1d5267762b1 192.168.1.43 26379 @ mymaster 192.168.1.41 6379
[root@redis03 ~]# tail -15f /apps/redis/data/sentinel_26379.log
1605:X 27 Nov 2022 19:53:49.453 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1605:X 27 Nov 2022 19:53:49.453 # Redis version=6.2.5, bits=64, commit=00000000, modified=0, pid=1605, just started
1605:X 27 Nov 2022 19:53:49.453 # Configuration loaded
1605:X 27 Nov 2022 19:53:49.453 * Increased maximum number of open files to 10032 (it was originally set to 1024).
1605:X 27 Nov 2022 19:53:49.453 * monotonic clock: POSIX clock_gettime
1605:X 27 Nov 2022 19:53:49.453 * Running mode=sentinel, port=26379.
1605:X 27 Nov 2022 19:53:49.455 # Sentinel ID is b4f9aeb446e6825124385e8ff7dff1d5267762b1
1605:X 27 Nov 2022 19:53:49.455 # +monitor master mymaster 192.168.1.41 6379 quorum 2
1605:X 27 Nov 2022 19:53:49.456 * +slave slave 192.168.1.42:6379 192.168.1.42 6379 @ mymaster 192.168.1.41 6379
1605:X 27 Nov 2022 19:53:49.456 * +slave slave 192.168.1.43:6379 192.168.1.43 6379 @ mymaster 192.168.1.41 6379
1605:X 27 Nov 2022 19:53:51.021 * +sentinel sentinel 1ed8aba7d18f3c61c2e234e418883d62c56806bf 192.168.1.42 26379 @ mymaster 192.168.1.41 6379
1605:X 27 Nov 2022 19:53:51.044 * +sentinel sentinel 2493f625d0e25f70bdf55056cdb985974b41ff67 192.168.1.41 26379 @ mymaster 192.168.1.41 6379
# 在sentinel状态中尤其是最后一行,涉及到masterIP是多少,有几个slave,有几个sentinels,必须是符合全部服务器数量
[root@redis01 ~]# redis-cli -p 26379
127.0.0.1:26379> INFO sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=192.168.1.41:6379,slaves=2,sentinels=3

# 两个slave,三个sentinel服务器,如果sentinels值不符合,检查myid可能冲突,修改配置文件中myid值即可

3.5、停止Master 实现故障转移

[root@redis01 ~]# ps -ef | grep redis-server
redis       962      1  0 19:42 ?        00:00:02 /apps/redis/bin/redis-server 0.0.0.0:6379
root      30191   2843  0 20:22 pts/2    00:00:00 grep --color=auto redis-server
[root@redis01 ~]# kill -9 962
# 查看各节点上哨兵信息
127.0.0.1:26379> info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=192.168.1.42:6379,slaves=2,sentinels=3
# 故障转移时sentinel的信息
[root@redis01 ~]# tail -f /apps/redis/data/sentinel_26379.log
29942:X 27 Nov 2022 20:22:57.017 # +sdown master mymaster 192.168.1.41 6379
29942:X 27 Nov 2022 20:22:57.058 # +new-epoch 1
29942:X 27 Nov 2022 20:22:57.058 # +vote-for-leader 1ed8aba7d18f3c61c2e234e418883d62c56806bf 1
29942:X 27 Nov 2022 20:22:57.095 # +odown master mymaster 192.168.1.41 6379 #quorum 3/2
29942:X 27 Nov 2022 20:22:57.095 # Next failover delay: I will not start a failover before Sun Nov 27 20:28:57 2022
29942:X 27 Nov 2022 20:22:58.243 # +config-update-from sentinel 1ed8aba7d18f3c61c2e234e418883d62c56806bf 192.168.1.42 26379 @ mymaster 192.168.1.41 6379
29942:X 27 Nov 2022 20:22:58.243 # +switch-master mymaster 192.168.1.41 6379 192.168.1.42 6379
29942:X 27 Nov 2022 20:22:58.243 * +slave slave 192.168.1.43:6379 192.168.1.43 6379 @ mymaster 192.168.1.42 6379
29942:X 27 Nov 2022 20:22:58.243 * +slave slave 192.168.1.41:6379 192.168.1.41 6379 @ mymaster 192.168.1.42 6379
29942:X 27 Nov 2022 20:23:13.294 # +sdown slave 192.168.1.41:6379 192.168.1.41 6379 @ mymaster 192.168.1.42 6379
# 验证故障转移
# 故障转移后redis.conf中的replicaof行的master IP会被修改
[root@redis03 ~]# grep ^replicaof /apps/redis/etc/redis.conf
replicaof 192.168.1.42 6379
# 哨兵配置文件的sentinel monitor IP 同样也会被修改
[root@redis01 ~]# grep "^[a-Z]" /apps/redis/etc/sentinel.conf
bind 0.0.0.0
port 26379
daemonize yes
pidfile "redis-sentinel.pid"
logfile "sentinel_26379.log"
dir "/apps/redis/data"
sentinel monitor mymaster 192.168.1.42 6379 2
sentinel auth-pass mymaster 123456
sentinel down-after-milliseconds mymaster 15000
sentinel deny-scripts-reconfig yes
protected-mode no
user default on nopass sanitize-payload ~* &* +@all
sentinel myid 2493f625d0e25f70bdf55056cdb985974b41ff67
sentinel config-epoch mymaster 1
sentinel leader-epoch mymaster 1
sentinel current-epoch 1
sentinel known-replica mymaster 192.168.1.43 6379
sentinel known-replica mymaster 192.168.1.41 6379
sentinel known-sentinel mymaster 192.168.1.42 26379 1ed8aba7d18f3c61c2e234e418883d62c56806bf
sentinel known-sentinel mymaster 192.168.1.43 26379 b4f9aeb446e6825124385e8ff7dff1d5267762b1

[root@redis02 ~]# grep "^[a-Z]" /apps/redis/etc/sentinel.conf
bind 0.0.0.0
port 26379
daemonize yes
pidfile "redis-sentinel.pid"
logfile "sentinel_26379.log"
dir "/apps/redis/data"
sentinel monitor mymaster 192.168.1.42 6379 2
sentinel auth-pass mymaster 123456
sentinel down-after-milliseconds mymaster 15000
sentinel deny-scripts-reconfig yes
protected-mode no
user default on nopass sanitize-payload ~* &* +@all
sentinel myid 1ed8aba7d18f3c61c2e234e418883d62c56806bf
sentinel config-epoch mymaster 1
sentinel leader-epoch mymaster 1
sentinel current-epoch 1
sentinel known-replica mymaster 192.168.1.43 6379
sentinel known-replica mymaster 192.168.1.41 6379
sentinel known-sentinel mymaster 192.168.1.43 26379 b4f9aeb446e6825124385e8ff7dff1d5267762b1
sentinel known-sentinel mymaster 192.168.1.41 26379 2493f625d0e25f70bdf55056cdb985974b41ff67

[root@redis03 ~]# grep "^[a-Z]" /apps/redis/etc/sentinel.conf
bind 0.0.0.0
port 26379
daemonize yes
pidfile "redis-sentinel.pid"
logfile "sentinel_26379.log"
dir "/apps/redis/data"
sentinel monitor mymaster 192.168.1.42 6379 2
sentinel auth-pass mymaster 123456
sentinel down-after-milliseconds mymaster 15000
sentinel deny-scripts-reconfig yes
protected-mode no
user default on nopass ~* &* +@all
sentinel myid b4f9aeb446e6825124385e8ff7dff1d5267762b1
sentinel config-epoch mymaster 1
sentinel leader-epoch mymaster 1
sentinel current-epoch 1
sentinel known-replica mymaster 192.168.1.41 6379
sentinel known-replica mymaster 192.168.1.43 6379
sentinel known-sentinel mymaster 192.168.1.41 26379 2493f625d0e25f70bdf55056cdb985974b41ff67
sentinel known-sentinel mymaster 192.168.1.42 26379 1ed8aba7d18f3c61c2e234e418883d62c56806bf
#验证 Redis 各节点状态
[root@redis02 ~]# redis-cli
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=192.168.1.43,port=6379,state=online,offset=360415,lag=1
master_failover_state:no-failover
master_replid:c685b1d06504b059e9403db6b56e796aa6cd61ec
master_replid2:a1e656ace9d72c53bcd9a09b4edaad920628d7e0
master_repl_offset:360415
second_repl_offset:269010
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:360415

[root@redis03 ~]# redis-cli
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:192.168.1.42
master_port:6379
master_link_status:up
master_last_io_seconds_ago:2
master_sync_in_progress:0
slave_repl_offset:372592
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:c685b1d06504b059e9403db6b56e796aa6cd61ec
master_replid2:a1e656ace9d72c53bcd9a09b4edaad920628d7e0
master_repl_offset:372592
second_repl_offset:269010
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:15
repl_backlog_histlen:372578

3.6、原master重新加入Redis集群

[root@redis01 ~]# systemctl start redis.service
[root@redis01 ~]# grep ^replicaof /apps/redis/etc/redis.conf
#sentinel会自动修改下面行指向新的master
replicaof 192.168.1.42 6379
# 在原 master上观察状态
[root@redis01 ~]# redis-cli -a 123456
127.0.0.1:6379> INFO replication
# Replication
role:slave
master_host:192.168.1.42
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:401028
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:c685b1d06504b059e9403db6b56e796aa6cd61ec
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:401028
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:383505
repl_backlog_histlen:17524
# 观察新master上状态和日志
[root@redis02 ~]# redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=192.168.1.43,port=6379,state=online,offset=414595,lag=1
slave1:ip=192.168.1.41,port=6379,state=online,offset=414873,lag=0
master_failover_state:no-failover
master_replid:c685b1d06504b059e9403db6b56e796aa6cd61ec
master_replid2:a1e656ace9d72c53bcd9a09b4edaad920628d7e0
master_repl_offset:415012
second_repl_offset:269010
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:415012

[root@redis02 ~]# tail -3f /apps/redis/data/sentinel_26379.log
1678:X 27 Nov 2022 20:23:13.502 # +sdown slave 192.168.1.41:6379 192.168.1.41 6379 @ mymaster 192.168.1.42 6379
1678:X 27 Nov 2022 20:32:01.950 # -sdown slave 192.168.1.41:6379 192.168.1.41 6379 @ mymaster 192.168.1.42 6379
1678:X 27 Nov 2022 20:32:11.940 * +convert-to-slave slave 192.168.1.41:6379 192.168.1.41 6379 @ mymaster 192.168.1.42 6379

4、Sentinel 运维

# 手动让主节点下线
127.0.0.1:26379> sentinel failover <masterName>

# 手动故障转移
[root@redis01 ~]# vim /apps/redis/etc/redis.conf
replica-priority 10
# 指定优先级,值越小sentinel会优先将之选为新的master,默为值为100

[root@redis01 ~]# redis-cli -a 123456
127.0.0.1:6379> CONFIG GET replica-priority
1) "replica-priority"
2) "100"
127.0.0.1:6379> CONFIG SET replica-priority 99
OK
127.0.0.1:6379> CONFIG GET replica-priority
1) "replica-priority"
2) "99"

[root@redis01 ~]# redis-cli   -p 26379
127.0.0.1:26379> sentinel failover mymaster
OK

© 版权声明
THE END
喜欢就支持一下吧
点赞0 分享