• 周五. 12月 2nd, 2022

5G编程聚合网

5G时代下一个聚合的编程学习网

热门标签

Solution to bug: zookeeper cluster denial of service

[db:作者]

1月 6, 2022

Preface

ZooKeeper As dubbo Registration Center for , It’s the top priority , on-line ZK Any wind and grass will affect the heartstrings . Recently, I came across online ZK Leader After downtime , The failure of the elector leads to ZK Cluster denial of service phenomenon , So I put this case Write it out and share it with you ( be based on ZooKeeper 3.4.5).

Bug The scene

One morning , Get a call out of the blue , be supposed to ZooKeeper The physical machine is down , And the rest of the machines are

sh zkServer.sh status
it is probably not running

I looked at the monitoring , What’s causing physical downtime is ZK Of leader.3 Node ZK,leader When it’s down , The other two have never been leader, After pulling up the down emergency machine , Still unable to choose ,
Lead to ZK Cluster total denial of service !
 picture

Business impact

Dubbo If you can’t connect ZK, The call meta information is always cached , So there’s no real impact on the request invocation . The trouble is , If in ZK During a denial of service , The app can’t be restarted or released , In case of emergency and restart ( Release ) You can’t , It will have a significant impact .
Fortunately, for high availability , We have built peer-to-peer computer rooms , So it’s very calm to switch the traffic to B Computer room ,
 picture
Double room construction is good , One button switching !
After switching, you can have enough time to recover A The cluster of computer rooms . As the tension recovers , The author also started the analysis work .

Log performance

First , Check the log , There was a lot of client Connection error , It’s natural to filter it out , To avoid interference .

cat zookeeper.out | grep -v 'client xxx' | > /tmp/1.txt

The first thing I see is the following log :

ZK-A Machine log

Zk-A machine :
2021-06-16 03:32:35 ... New election. My id=3
2021-06-16 03:32:46 ... QuoeumPeer] LEADING // Be careful , It's a success here
2021-06-16 03:32:46 ... QuoeumPeer] LEADING - LEADER ELECTION TOOK - 7878'
2021-06-16 03:32:48 ... QuoeumPeer] Reading snapshot /xxx/snapshot.xxx
2021-06-16 03:32:54 ... QuoeumPeer] Snahotting xxx to /xxx/snapshot.xxx
2021-06-16 03:33:08 ... Follower sid ZK-B.IP
2021-06-16 03:33:08 ... Unexpected exception causing shutdown while sock still open
java.io.EOFException
at java.io.DataInputStream.readInt
......
at quorum.LearnerHandler.run
2021-06-16 03:33:08 ******* GOODBYE ZK-B.IP *******
2021-06-16 03:33:27 Shutting down

This log looks like the election was successful , But there’s something wrong with communication with other machines , Lead to Shutdown And then re elect .

ZK-B Machine log

2021-06-16 03:32:48 New election. My id=2
2021-06-16 03:32:48 QuoeumPeer] FOLLOWING
2021-06-16 03:32:48 QuoeumPeer] FOLLOWING - LEADER ELECTION TOOK - 222
2021-06-16 03:33:08.833 QuoeumPeer] Exception when following the leader
java.net.SocketTimeoutException: Read time out
at java.net.SocketInputStream.socketRead0
......
at org.apache.zookeeper.server.quorum.Follower.followLeader
2021-06-16 03:33:08.380 Shutting down

This log also shows that the selection was successful , And I am Following state , It’s just Leader Not coming back , Results in a time-out Shutdown

Sequence diagram

The author take The log above is drawn as a sequence diagram , For analysis :
 picture
from ZK-B It can be seen from the log of , It’s becoming follower after , Have been waiting for leader, until Read time out.
from ZK-A It can be seen from the log of , It’s becoming LEADING after , stay 33:08,803 Just received Follower That is to say ZK-B The bag sent out . And then ,ZK-B Already in 33:08,301 When Read timed out 了 .

First analysis follower(ZK-B) The situation of

We know it’s in 03:32:48 Become follower, And then in 03:33:08 error Read time out, It happens to be 20s. So the author starts with Zookeeper Find its settings in the source code Read time out How long is it .

Learner
protected void connectToLeader(InetSocketAddress addr) {
......
sock = new Socket()
// self.tockTime 2000 self.initLimit 10
sock.setSoTimeout(self.tickTime * self.initLimit);
......
}

Its Read time out Is in accordance with the zoo.cfg The configuration item in :

tickTime=2000 self.tickTime
initLimit=10 self.initLimit
syncLimit=5

The obvious ,ZK-B Is becoming follower after , For some reason leader stay 20s And then respond . So what’s next is leader Analyze .

Yes leader(ZK-A) Analyze

First of all, let’s take a look at Leader Initialization logic of :

quorumPeer
|-> Print LEADING
|->makeLeader
|-> new ServerSocket listen and bind
|->leader.lead()
|-> Print LEADER ELECTION TOOK
|->loadData
|->loadDataBase
|->resore Print Reading snapshot
|->takeSnapshot
|->save Print Snapshotting
|->cnxAcceptor Processing requests Accept

You can see , In our ZK Between starting the listening port and formally processing the request , also Reading Snapshot and Snapshotting( Write ) action . It can be seen from the log that one spent 6s many , One spent 14s many . And then there is 20s The processing gap of . As shown in the figure below :
 picture
Because in socket listen 20s And then we start processing the data , therefore ZK-B Establishing a successful connection is still in the process of tcp Kernel full connection queue (backlog) Inside , Because from the kernel’s point of view, three handshakes are successful , So it can receive normally ZK-B Sent follower ZK-B data . stay 20s,ZK-A After really dealing with it , from buffer Take it out 20s front ZK-B Data sent , When we’re done with the return package , Find out ZK-B The connection has been disconnected .
alike , another follower( At this time, we have pulled up the downtime , So it is 3 platform ) It’s also for this reason gg, and leader No response from other machines , Think of yourself as leader Don’t reach 1/2 The number of votes , and Shutdown Re election .

Snapshot Time consuming

So what causes Snapshotting Reading and writing is so time-consuming ? The author looked at Snapshot file size , There’s nearly one G about .

turn up initLimit

In this case , In fact, we just need to increase initLimit, We should be able to cross this barrier .

zoo.cfg
tickTime=2000 // Don't move this , Because and ZK It's about the heartbeat mechanism
initLimit=100 // Directly to 100,200s!

It’s a coincidence that 20s Well ?

It’s just a coincidence , Every election process happens to be stuck in 20s however ? There have been repeated elections , There should be one <20s success Well , Otherwise, it’s bad luck . If it needs to be dealt with every time Snapshot 30s Even if , But this 20s It’s too close to the limit , Is there any other factor that leads to Lord You don’t succeed ?

The second case

So I went through the log , There are ! This time, leader This way Snapshot fast , however follower Again ! The log is as follows :

leader(ZK-A) The second case

2021-06-16 03:38:03 New election. My id= 3
2021-06-16 03:38:22 QuorumPeer] LEADING
2021-06-16 03:38:22 QuorumPeer] LEADING - LEADER ELECTION TOOK 25703
2021-06-16 03:38:22 QuorumPeer] Reading snapshot
2021-06-16 03:38:29 QuorumPeer] Snapshotting
2021-06-16 03:38:42 LearnerHandler] Follower sid 1
2021-06-16 03:38:42 LearnerHandler] Follower sid 3
2021-06-16 03:38:42 LearnerHandler] Sending DIFF
2021-06-16 03:38:42 LearnerHandler] Sending DIFF
2021-06-16 03:38:54 LearnerHandler] Have quorum of supporters
2021-06-16 03:38:55 client attempting to establsh new session We're here to start receiving client Request the
......
2021-06-16 03:38:58 Shutdown callsed
java.lang.Exception: shutdown Leader! reason: Only 1 followers,need 1
at org.apache.zookeeper.server.quorum.Leader.shutdown

We can see from the log that the election was a success , After all, deal with Snapshot Only deal with 13s( May be pagecache It’s faster to deal with the causes of ). The other two follower Smooth connection , And send them DIFF package , But it didn’t take long , Another one follower Not enough error reports , The error message here is confusing .
Let’s look at the code :

Leader.lead
void lead() {
while(true){
Thread.sleep(self.tickTime/2);
......
syncedSet.add(self.getId())
for(LearnerHandler f:getLearners()){
if(f.synced() && f.getLearnerType()==LearnerType.PARTICIPANT){
syncedSet.add(f.getSid());
}
f.ping();
}
// syncedSet Only 1 One is itself , Do not conform to the >1/2 Conditions , Report an error and jump out
if (!tickSkip && !self.getQuorumVerifier().containsQuorum(syncedSet)) {
shutdown("Only" + syncedSet.size() + " followers, need" + (self.getVotingView().size()/2));
return;
}
}
}

The essence of error reporting is to communicate with leader synchronous syncedSet Less than fixed 1/2 colony , therefore shutdown 了 . At the same time, we can see in the code syncedSet It’s decided by learnerHander.synced() To decide . Let’s keep looking at the code :

LearnerHandler
public boolean synced(){
// here isAlive It's threaded isAlive
return isAlive() && tickOfLastAck >= leader.self.tick - leader.self.syncLimit;
}

The obvious ,follower and leader The synchronization time of is longer than leader.self.syncLimit That is to say 5 * 2 = 10s

zoo.cfg
tickTime = 2000
syncLimit = 5

So our tick How is it updated , The answer is follower Respond to UPTODATE package , That is to say, it has been with leader After synchronization ,follower Every package will be updated once , Not updated before .
 picture
Further reasoning , That’s our follower Handle leader I have more bags than 10s, Lead to tick Not updated in time , , in turn, syncedSet Less than quantity , Lead to leader shutdown.
 picture

follower(ZK-B) The second case

With this conclusion , The author has gone to look through follower(ZK-B) Log ( notes :ZK-C So it is with )

2021-06-16 03:38:24 New election. My id = 3
2021-06-16 03:38:24 FOLLOWING
2021-06-16 03:38:24 FOLLOWING - LEADER ELECTION TOOK - 8004
2021-06-16 03:38:42 Getting a diff from the leader
2021-06-16 03:38:42 Snapshotting
2021-06-16 03:38:57 Snapshotting
2021-06-16 03:39:12 Got zxid xxx
2021-06-16 03:39:12 Exception when following the leader
java.net.SocketException: Broken pipe

again Snapshot, This time we can see every time Snapshot Can spend 15s about , It’s far more than that syncLimit.
From the source code we can see that , Every time Snapshot After that, it’s going to be right away writePacket( That is, in response ), But the first time I returned the package, it was not handled UPTODATE package , So it doesn’t update Leader The corresponding end of tick:

learner:
proteced void syncWithLeader(...){
outerloop:
while(self.isRunning()){
readPacket(qp);
switch(qp.getType()){
case Leader.UPTODATE
if(!snapshotTaken){
zk.takeSnapshot();
......
}
break outerloop;
}
case Leader.NEWLEADER:
zk.takeSnapshot();
......
writePacket(......) // leader It will be updated when received tick
break;
}
......
writePacket(ack,True); // leader It will be updated when received tick
}

Be careful ,ZK-B It says twice in my journal Snapshotting. As for why twice , It should be a subtle Bug,( stay 3.4.5 In the official notes of fix, But I still typed it twice ), I didn’t go into it . Okay , The whole sequence diagram is as follows :
 picture
Okay , The second situation is also gg 了 . This time, time is not just on the edge , It’s close to 30s can Okay, and synedSet Only 10s(2*5).ZK The cluster elects repeatedly in both cases , Until human intervention .

turn up syncLimit

In this case , In fact, we just need to increase syncLimit, We should be able to cross this barrier .

zoo.cfg
tickTime=2000 // Don't move this , Because and ZK It's about the heartbeat mechanism
syncLimit=50 // Directly to 50,100s!

Offline reproduction

Yes, of course , Analysis is not enough . We also need to test to reproduce and verify our conclusions . We’ve constructed an offline 1024G Snapshot Of ZookKeeper To test , stay initLimit=10 as well as syncLimit=5 In fact, as like as two peas, two phenomena appear exactly . After the author adjusted the parameters :

zoo.cfg
tickTime=2000
initLimit=100 // 200s
syncLimit=50 // 100s

Zookeeper The cluster is finally normal .

Offline with a new version 3.4.13 Try to reappear

We also use a newer version offline 3.4.13 Try to reappear , Find out Zookeeper Without adjusting the parameters , Soon, the host was selected successfully and the service was provided normally . I turned the source code , Find it directly in Leader.lead() Phase and SyncWithLeader Stage ( If it is to use Diff Words ) take takeSnapshot Removed . This avoids dealing with snapshot Too much time leads to the failure to provide services .

Zookeeper 3.4.13

ZookeeperServer.java
public void loadData(){
...
// takeSnapshot() Delete the last line of takeSnapshot
}

learner.java
protected void syncWithLeader(...){
boolean snapshotNeeded=true
if(qp.getType() == Leader.DIFF){
......
snapshotNeeded = false
}
......
if(snapshotNeeded){
zk.takeSnapshot();
}
......
}

It’s reliable to upgrade to a higher version , This version of the code also changes the confusing log !

why Dubbo-ZK There’s so much data

The last question is dubbo dependent ZK Why is there so much data ! I use ZK The use of

org.apache.zookeeper.server.SnapshotFormatter

Tools dump Come out and use shell(awk|unique) A bunch of , Find out dubbo Only a fraction of the total 1/4.
Yes 1/2 yes Solar Of Zookeeper( It’s been removed , What’s left on it ). also 1/4 Because of the distributed lock of a system Bug Keep writing in and not deleting ( They have been asked to modify ). So will dubbo-zk And other ZK How important data separation is ! Abuse can lead to major events !

summary

Zookeeper As an important metadata management system , Its inability to provide services could have an immeasurable impact . Thanks to the construction of double computer rooms, we have enough time and a relaxed mind to deal with this problem . in addition , although ZK The election is complicated , But just settle down and analyze it slowly , You can always find clues , And then find a breakthrough !
 picture

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注