Categories

  • Programming

Tags

  • Network

A exception raises with the rate around 10% in all execution in Mac OS:

libc++abi.dylib: terminating with uncaught exception of type boost::wrapexcept<boost::system::system_error>: cancel: Bad file descriptor
Process 14336 stopped

Found that the acceptor fd is closed twice:

   181 	    void event(int, const message::ConnectionEvent& e) override {
   182 	        switch (e) {
   183 	            case message::ConnectionEvent::timeout:
-> 184 	                stop();
   185 	                break;
   186 	            case message::ConnectionEvent::close:
   187 	                break;

The stop will call the server to stop and close the opening fd of accept(). This stop will be called whenever a timeout event occurs.

By the logs, seems that the asio timer cancel is correctly been called, but it is not canceled immediatly:

  [0] = "2020-09-07 07:44:03.920797 DEBUG tid:0x7000085d4000 in server.cpp:35 Accepting connection from 127.0.0.1"
  [1] = "2020-09-07 07:44:03.920797 DEBUG tid:0x700008657000 in client.cpp:27 Successfully connect to 127.0.0.1"
  [2] = "2020-09-07 07:44:03.928531 DEBUG tid:0x700008657000 in connection.cpp:314 Send heartbeat to 127.0.0.1:50265."
  [3] = "2020-09-07 07:44:03.928914 DEBUG tid:0x7000085d4000 in connection.cpp:249 Receive heartbeat from 127.0.0.1:50266"
  [4] = "2020-09-07 07:44:03.934058 DEBUG tid:0x700008657000 in connection.cpp:314 Send heartbeat to 127.0.0.1:50265."
  [5] = "2020-09-07 07:44:03.935789 WARN tid:0x7000085d4000 in connection.cpp:268 Peer socket: 127.0.0.1:50266 timeout"
  [6] = "2020-09-07 07:44:03.935929 DEBUG tid:0x7000085d4000 in heartbeat.cpp:42 The heartbeat timer is cancelled."
  [7] = "2020-09-07 07:44:03.936252 DEBUG tid:0x7000085d4000 in connection.cpp:249 Receive heartbeat from 127.0.0.1:50266"
  [8] = "2020-09-07 07:44:03.936356 DEBUG tid:0x7000085d4000 in server.cpp:39 Acceptor stopped"
  [9] = "2020-09-07 07:44:03.937584 ERROR tid:0x700008657000 in connection.cpp:291 Server: 127.0.0.1:50265 closes connection."
  [10] = "2020-09-07 07:44:03.937829 DEBUG tid:0x700008657000 in connection.cpp:307 Timer of socket id 127.0.0.1:50265: Operation canceled"
  [11] = "2020-09-07 07:44:03.942797 WARN tid:0x7000085d4000 in connection.cpp:268 Peer socket: 127.0.0.1:50266 timeout"
  [12] = "2020-09-07 07:44:03.942868 DEBUG tid:0x7000085d4000 in heartbeat.cpp:42 The heartbeat timer is cancelled."

The connection will be closed immediatly when timer expires:

void ServerConnectionWithHeartbeat::when_timeout(const std::error_code &ec) {
    if (ec) {
        if (ec.value() == asio::error::operation_aborted)
            // could be the timer reset or stop the heartbeat
            return;
        else {
            TI_ERROR(format("Error when {} timer expires: {}",
                            socket_to_string(), ec.message()));
        }
    } else {
        TI_WARN(format("Peer socket: {} timeout", socket_to_string()));
        connection_close();
        upper_layer_event_callback(message::ConnectionEvent::timeout);
    }
}

But seems the connection is not “closed” immediately: Log 6 shows that the server received the heartbeat very soon after the timer is canceled, and invoke a callback to restart the timer. Then we see the timer timeout again, trigger a timeout event and the server stops again.

No time to examine whether there could still be a reading event in kqueue after close is called in MacOS, but just keep that case down here.

Update: OK… will fix that later https://stackoverflow.com/questions/20222079/epoll-wait-returning-events-on-closed-file-descriptor https://stackoverflow.com/questions/20222079/epoll-wait-returning-events-on-closed-file-descriptor