~mcepl/m2crypto#278: 
Dead lock due to ssl close and socket close happening at different time

Migrated from: https://gitlab.com/m2crypto/m2crypto/-/issues/278
Created by: chaen (@chaen)
Created at: 2020-06-17T08:28:28.853Z

This is kind of related to https://gitlab.com/m2crypto/m2crypto/-/issues/201

Calling SSL.Connection.close does not actually close the socket. This closure only happens when the garbage collector kicks in (https://gitlab.com/m2crypto/m2crypto/-/blob/master/M2Crypto/SSL/Connection.py#L88)

However, a connection may hang until the socket is closed, leading obviously to very undesirable results.

Since we do not want that to happen, and we have no way to force the GC to do the work, we need to close the socket anyway.

  

However, ideally we should close the underlying socket only after SSL was shutdown properly. This is because OpenSSL ssl3_shutdown (see callstack below) may still read some data (see https://github.com/openssl/openssl/blob/master/ssl/s3_lib.c#L4509)::

  
      1  0x00007fffe9d48fc0 in sock_read () from /lib/libcrypto.so.1.0.0
      2  0x00007fffe9d46e83 in BIO_read () from /lib/libcrypto.so.1.0.0
      3  0x00007fffe9eab9dd in ssl3_read_n () from /lib/libssl.so.1.0.0
      4  0x00007fffe9ead216 in ssl3_read_bytes () from /lib/libssl.so.1.0.0
      5  0x00007fffe9ea999c in ssl3_shutdown () from /lib/libssl.so.1.0.0
      6  0x00007fffe9ed4f93 in ssl_free () from /lib/libssl.so.1.0.0
      7  0x00007fffe9d46d5b in BIO_free () from /lib/libcrypto.so.1.0.0
      8  0x00007fffe9f30a96 in bio_free (bio=0x5555556f3200) at SWIG/_m2crypto_wrap.c:5008
      9  0x00007fffe9f30b1e in _wrap_bio_free (self=<optimized out>, args=<optimized out>) at SWIG/_m2crypto_wrap.c
      

So our code looks something like that

      myConnection.close()
      underlyingSocket = myConnection.socket
      # This encourages the GC to run
      myConnection = None
      underlyingSocket.close()

We unfortunately have no way to force that order, and there is a risk of deadlock when running in a multi threaded environment:

  
      Thread A opens socket, gets FD = 111
      Thread A works on it
      Thread A closes FD 111 (underlying socket.close())
      Thread B opens socket, gets FD = 111
      Thread A calls read on FD=111 from ssl3_shutdown

This is visibile on the real strace below::

  
      26461 14:25:15.266692 write(111<TCPv6:[[<srcAddressV6>]:42688->[<dstAddressV6>]:9140]>,
                                 "blabla", 37 <unfinished ...>
      26464 14:25:15.266857 <... connect resumed>) = 0 <0.000195>
      26464 14:25:15.267023 getsockname(120<UDP:[<srcAddress>:44252-><dstAddress>:9140]>, <unfinished ...>
      26461 14:25:15.267176 <... write resumed>) = 37 <0.000453>
      26464 14:25:15.267425 <... getsockname resumed>{sa_family=AF_INET, sin_port=htons(44252),
                             sin_addr=inet_addr("<srcAddress>")}, [28->16]) = 0 <0.000292>
      26461 14:25:15.267466 close(111<TCPv6:[[<srcAddressV6>]:42688->[<dstAddressV6>]:9140]> <unfinished ...>
      26464 14:25:15.267637 close(120<UDP:[<srcAddress>:44252-><dstAddress>:9140]> <unfinished ...>
      26464 14:25:15.267738 <... close resumed>) = 0 <0.000086>
      26461 14:25:15.267768 <... close resumed>) = 0 <0.000285>
      26464 14:25:15.267827 socket(AF_INET6, SOCK_DGRAM|SOCK_CLOEXEC, IPPROTO_IP <unfinished ...>
      26461 14:25:15.267888 futex(0x21f8620, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
      26464 14:25:15.267976 <... socket resumed>) = 111<UDPv6:[1207802822]> <0.000138>
      26461 14:25:15.268092 <... futex resumed>) = 1 <0.000196>
      26464 14:25:15.268195 connect(111<UDPv6:[1207802822]>,
                           {sa_family=AF_INET6, sin6_port=htons(9140),
                            inet_pton(AF_INET6, "<dstAddressV6>", &sin6_addr),
                           sin6_flowinfo=htonl(0), sin6_scope_id=0
                           }, 28 <unfinished ...>
      26461 14:25:15.268294 read(111<UDPv6:[[<srcAddressV6>]:42480->[<dstAddressV6>]:9140]>, <unfinished ...>
      26464 14:25:15.268503 <... connect resumed>) = 0 <0.000217>
      26464 14:25:15.268673 getsockname(111<UDPv6:[[<srcAddressV6>]:42480->[<dstAddressV6>]:9140]>, <unfinished ...>
      26464 14:25:15.268862 <... getsockname resumed>{sa_family=AF_INET6, sin6_port=htons(42480),
                             inet_pton(AF_INET6, "<srcAddressV6>", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=
      0}, [28]) = 0 <0.000168>
      26464 14:25:15.269048
      close(111<UDPv6:[[<srcAddressV6>]:42480->[<dstAddressV6>]:9140]>
      <unfinished ...>

It would be better if Connection.close would finish the work

Status
REPORTED
Submitter
~mcepl
Assigned to
No-one
Submitted
7 months ago
Updated
7 months ago
Labels
No labels applied.

~mcepl referenced this from #201 7 months ago

~mcepl 7 months ago

Changed on 2020-06-17T10:24:04.089Z by chaen:

mentioned in merge request !247

(Last edited at 2020-06-17T10:24:04.091Z.)

~mcepl 7 months ago

Changed on 2020-06-24T14:01:13.974Z by chaen:

changed the description

(Last edited at 2020-06-24T14:01:13.976Z.)

~mcepl 7 months ago

On 2020-10-15T14:16:18.337Z, Matěj Cepl wrote:

Thorough review and check of !188 would probably help a lot.

Register here or Log in to comment, or comment via email.